SSML 1.1 Implementation Report

Version: 15 December 2009

Contributors:

Daniel C. Burnett, Voxeo (Chief Editor)
双志伟 (Zhi Wei Shuang), IBM
Paolo Baggia, Loquendo
Paul Bagshaw, France Telecom
黄德智 (De Zhi Huang), France Telecom
康永国 (Yongguo Kang), Panasonic
楼晓雁 (Lou Xiaoyan), Toshiba
Scott McGlashan, HP
Laura Ricotti, Loquendo
严峻 (Yan Jun), iFLYTEK

1. Introduction
- 1.1 Implementation Report Objectives
- 1.2 Implementation Report Non-objectives
2. Work During the Candidate Recommendation Period
3. Participating in the Implementation Report
4. Entrance Criteria for the Proposed Recommendation phase
5. Implementation Report Requirements
- 5.1 Detailed requirements for Implementation Report
- 5.2 Notes on Testing
- 5.3 Out of Scope
6. Systems
7. Test Classification
- 7.1 Introduction
- 7.2 Method
- 7.3 Test assertions labels
8. Test Results
9. References
Appendices
- Appendix A. Test assertion XML API definition
  - A.1 Instruction
  - A.2 Reference Markup
  - A.3 Test Markup
  - A.4 Document Type Definition
  - A.5 Test examples
- Appendix B. Downloading tests
  - B.1 The Manifest
  - B.2 The Report Submission Template
  - B.3 The Stylesheet
- Appendix C. Acknowledgements

1. Introduction

The SSML 1.1 Specification entered the Candidate Recommendation period on 13 October 2008.

Preparation of an Implementation Report is a key criterion for moving beyond the Candidate Recommendation phase. This document describes the requirements for the Implementation Report, the process that the Voice Browser Working Group followed in preparing the report, and the results of the Implementation Report process..

1.1 Implementation Report Objectives

Must verify that the specification is implementable.
Must demonstrate interoperability of implementations of the specification.

1.2 Implementation Report Non-objectives

The IR does not attempt conformance testing of implementations.

2. Work During the Candidate Recommendation Period

During the CR period, the Working Group will carry out the following activities:

Clarification and improvement of the exposition of the specification.
Disposing of Comments that are communicated to the WG during the CR period.
Preparation of an Implementation Report meeting the criteria outlined in this document.

3. Participating in the Implementation Report

You are invited to contribute to the assessment of the W3C SSML 1.1 Specification by participating in the Implementation Report process.

Deadline for submission of a SSML Implementation Report is 27 October 2009.
All the tests reported in the table below and the report submission format are provided with this document. The whole test suite including the report submission format is also contained in ssml11-ir-20091215.zip.
Comments on this document and the test suite or requests for further information should be sent to the Working Group's public mailing list www-voice@w3.org (archive). If an updated version of this document or the test set is published, a notification will be sent to this mailing list.

4. Entrance Criteria for the Proposed Recommendation phase

The Voice Browser Working Group established the following entrance criteria for the Proposed Recommendation phase in the Request for CR:

Sufficient reports of implementation experience have been gathered to demonstrate that synthesis processors based on the specification are implementable and have compatible behavior.
Specific Implementation Report Requirements (outlined below) have been met.
The Working Group has formally addressed and responded to all public comments received by the Working Group.

5. Implementation Report Requirements

5.1 Detailed requirements for Implementation Report

Testimonials from implementers will be included in the IR when provided to document the utility and implementability of the specification.
IR must cover all specified features in the specification. For each feature the IR should indicate:
- Feature status: required, optional, other
- Feature utility/usefulness based on feedback from implementers
- Implementability of the feature specification
- Interoperability of multiple implementations of the feature
Feature status is a factor in test coverage in the report:
- Required specification features must have at least two implementations. Implementations that do not implement a required specification feature must document the reason for not implementing the feature.
- Optional specification features must have either at least one or at least two implementations, depending on whether the feature's conformance requirements have an impact on interoperability. Implementations that do not implement an optional specification feature should document the reason for not implementing the feature. The following criteria were used to decide whether an optional test assertion required one or two tests:
  1. If the specification text contains the word "should", then the associated test assertion will be optional and will be characterized by 1 test;
  2. If the specification text is of the form "if an implementation claims to support X, it must do Y", then the associated test assertion (for feature "Y") will be optional and will be characterized by 2 tests;
  This categorization provides a distinction between optional test assertions that demonstrate portability among vendors who support the feature (2 test cases) and optional test assertions for a feature for which no portability is truly to be expected (1 test case).

5.2 Notes on Testing

A test report must indicate the outcome of each test. Possible outcomes are "pass", "fail" or "not-impl". "pass" requires output of the synthesis processor that has been judged valid by one or more test validators (see below). Note that the evaluation criteria for some tests are subjective. A report must document the way test output was verified. "not-impl" means the synthesis processor has not implemented the specific feature required by a test.
A test report may contain an additional comment for each test. If a test fails, a comment should be added (see also Detailed requirements for Implementation Report).
Every attempt has been made to keep the tests language-neutral through the use of the Test API described in Appendix A. Tests are written in US English, with the exception of some tests, which are language-dependent.
Some tests contain notes that should be read before executing them. These notes are contained in the instructions inside the tests. See Appendix A for a detailed description of the coding rules.

5.4 Out of Scope

The SSML Implementation Report will not cover:

Integration with other markup languages (SMIL/ACSS/VoiceXML)

6. Systems

France Telecom, Orange Labs - SSML 1.1 in Baratinoo text-to-speech synthesis engine

Exec Summary

France Telecom, Orange Labs, is happy to contribute to the SSML 1.1 Recommendation and to support the activities of the W3C Voice Browser working group by submitting the following SSML 1.1 Implementation Report.

SSML 1.1 extends the functionality of SSML 1.0 with essential features that were formally fulfilled by vendor specific extensions or were required by multilingual applications of voice technologies. These new aspects of SSML 1.1 will bring many benefits to both authors of SSML and vendors of SSML compliant engines.

iFLYTEK Implementation

Exec Summary

As a leading provider of Chinese speech and language technology, iFLYTEK is very pleased to contribute this Recommendation by submitting this SSML 1.1 Implementation Report and to support the activities of the W3C Voice Browser working group. iFLYTEK has implemented all this ssml1.1 assertion in its InterPhonic 6.0 platform, This platform is the most widely used Text-to-Speech (TTS) solution in China.

7. Test Classification

7.1 Introduction

The aim of this section is to describe the taxonomy of tests developed for the SSML 1.1 Specification. Some basic assumptions (described in the sections below) led to the development of criteria that were used to categorize different tests according to what would be needed to structure, run, and pass the test. This categorization approach was then verified against each element and attribute described in the SSML 1.1 Specification. The overall work has resulted in a classification of the tests outlined below, where each test assertion has to be labelled.

7.2 Method

It is assumed that most effects generated by SSML testing are only subjectively verifiable via human perception of a given quality or characteristic of the TTS output. Consequently, the evaluation approach is based on subjective assessment methods present in the literature; each test is to be performed by presenting to testers the outputs of one or more SSML documents fed into a synthesis processor. The outputs are to be judged with respect to the perceived (or just detected) subjective effect. This approach is an adaptation of the classical "Paired Comparisons" (PC) method (see [PM], [PC]). Normally, in PC the respondents are presented with two objects at a time and asked to pick the one they prefer or the one that has the higher level of a given attribute. In this case, the task of ranking objects with respect to one feature is not difficult, since only a few characteristics are compared at a time. The simplifications applied were in line with the general recommendation to keep the features to be compared as isolated as possible, since as the number of items increases, the number of comparisons increases geometrically (n*(n -1)/2), and if the number of comparisons is too great, testers may be stressed and no longer carefully discriminate among them.

There are four separate assumptions:

The tests fall into three categories based on how they are to be executed: by giving an "absolute" answer to a single test; by comparing a reference test (raw text or not marked-up text rendered by a synthesis processor) and a marked test (marked-up text with SSML elements); and by comparing two or more marked tests with different features to be assessed. In the test labels below the second and third categories are combined together to be called "paired" comparisons.
In the interest of simplicity, the evaluation results are expressed as either "pass" or "fail"
The tests are classified according to the difficulty of evaluation: a) The ones that generate an effect that is easily detected by a single tester and b) The ones that measure the effect of features which are subject to individual differences. This reduced perception reliability may also result in a single tester generating a less precise answer. In these cases, repeated test sessions from a single user or the use of a panel of testers is suggested. For some of these tests, the effects on processor output can only be assessed properly by an expert: thus the use of expert or well trained testers is recommended.
Almost all the test cases simply result in a series of comparisons, including tests which are intended to distinguish among an ordinal scale of values (i.e. monotonically non-decreasing scale).

These assumptions are reflected in the implementation test table (see below) in the form of additional labels for each test assertion.

7.3 Test categories

Based on the method described above, a two-dimensional test classification was developed. Sections 7.3.1 and 7.3.2 describe the two dimensions, and section 7.3.3 describes the test categories that resulted.

7.3.1 Test Class

Test class identifies a classification of the test assertion based on its testing complexity. A test assertion may belong to one of two possible cases:

Absolute Rating : A single SSML document is sufficient to test the implementation of a particular feature (to highlight a marked-up behavior of the Specification);
Paired comparison : A comparison between a raw text and a marked-up SSML document (to highlight the marked-up behavior) is needed; or, a binary comparison between two marked-up SSML documents. Note that a scale of values (for example, the label values for several of the <prosody> attributes) would be tested via a set of "paired comparison" (binary) tests of increasing values.

7.3.2 Test Level

Test level characterizes a test assertion according to the easiness of discrimination of the audio feature under test.

Simple: The feature under test generates an effect that is easily detected by a single tester;
Complex: The feature under test requires expert assessment, a panel of evaluators, or repeat test sessions from a single tester to obtain an accurate result;

7.3.3 Test Category

The combination of the two dimensions above results in four different test categories:

Abs_rating_simple: A single SSML document is sufficient to test the implementation of a particular feature, and the feature under test generates an effect that is easily detected by a single tester.
Abs_rating_complex: A single SSML document is sufficient to test the implementation of a particular feature, and the feature under test requires expert assessment or, being subjective, more than one person to repeat the test.
Paired_simple: A raw text and marked-up SSML document, or two marked-up SSML documents are needed to be compared, and the feature under test generates an effect that is easily detected by a single tester.
Paired_complex: A raw text and marked-up SSML document, or two marked-up SSML documents are needed to be compared, and the feature under test requires expert assessment or, being subjective, more than one person to repeat the test.

8. Test Results

The following table lists all the assertions that were derived from the SSML 1.1 Specification.

The Assert ID column uniquely identifies the assertion and is linked to the corresponding test.

The Spec column identifies the section of the SSML 1.1 Specification from which the assertion was derived.

The Conformance column indicates whether or not the SSML 1.1 Specification requires the synthesis processor to implement the feature described by the test assertion.

The Test Type column indicates whether or not the associated test requires an adaptation to the testing environment. If the test assertion is considered manual, then in the instructions section of the associated test gives more details about how the test must be modified (see Appendix A.1).

Test Category is described in the Test Assertion Categories section. Profile identifies the SSML Profile of the feature. Finally the Assertion column describes the assertion.

The Results column summarizes the results from the companies that ran the test suite. It lists, for each feature, the number of implementations that passed, failed, or did not implement the feature.

Assert ID	Spec	Conformance	Test Type	Test Category	Profile	Assertion	Results
							Pass	Fail	N/I
343	[2.1]	Required	Auto	Abs_rating_simple	Core	The meta element must occur before all other elements and text contained within the root speak element.	2	0	0
344	[2.1]	Required	Auto	Abs_rating_simple	Core	metadata elements must occur before all other elements and text contained within the root speak element.	2	0	0
345	[2.1]	Required	Auto	Abs_rating_simple	Core	lexicon elements must occur before all other elements and text contained within the root speak element.	2	0	0
178	[3.1.1]	Required	Auto	Abs_rating_simple	Core	The version number for this specification is 1.1.	2	0	0
179	[3.1.1]	Required	Auto	Abs_rating_simple	Core	The xml:lang attribute is required on the element.	2	0	0
180	[3.1.1]	Required	Auto	Abs_rating_simple	Core	The xml:base attribute may be present on the element.	2	0	0
181	[3.1.1]	Required	Auto	Abs_rating_simple	Core	The version attribute must be present on the element.	2	0	0
182	[3.1.1]	Required	Auto	Abs_rating_simple	Core	The onlangfailure attribute is permitted in speak element.	2	0	0
251	[3.1.1]	Required	Auto	Abs_rating_simple	Core	Before the speak element is executed, the synthesis processor must select a default voice.	2	0	0
252	[3.1.1]	Required	Manual	Abs_rating_simple	Core	A language speaking failure will occur as soon as the first text is encountered if the language of the text is one that the default voice cannot speak.	2	0	0
389	[3.1.1]	Required	Auto	Abs_rating_simple	Core	The lookup element can occur in a speak element.	2	0	0
379	[3.1.1.1]	Required	Auto	Paired_Simple	Core	The startmark can be used to specify that rendering begins from a specific mark. If the startmark is specified, then rendering starts at the startmark. If startmark is not specified, rendering begins at the beginning of the document.	2	0	0
380	[3.1.1.1]	Required	Auto	Paired_Simple	Core	The end of rendering can be specified using the endmark. If the endmark is specified, then rendering ends at the endmark. If the endmark is not specified, rendering ends at the document end.	2	0	0
381	[3.1.1.1]	Required	Auto	Paired_Simple	Core	Startmark and endmark can be used to control both the start and end of rendering	2	0	0
382	[3.1.1.1]	Required	Auto	Paired_Simple	Core	If the startmark is after the endmark, then no audio is generated.	2	0	0
383	[3.1.1.1]	Required	Auto	Abs_rating_simple	Core	It is an error if the value given for either startmark or endmark is not a valid mark in the document.	2	0	0
340	[3.1.2]	Required	Auto	Abs_rating_simple	Core	Language information is inherited down the document hierarchy.	2	0	0
341	[3.1.2]	Required	Auto	Abs_rating_simple	Core	Language information nests, i.e. inner attributes overwrite outer attributes.	2	0	0
3	[3.1.3]	Required	Auto	Abs_rating_simple	Core	The base URI declaration affects the interpretation of a relative URI specified by the audio element's src attribute.	2	0	0
4	[3.1.3]	Required	Manual	Paired_Simple	Core	The base URI declaration affects the interpretation of a relative URI specified by the lexicon element's uri attribute.	2	0	0
5	[3.1.3.1]	Required	Auto	Abs_rating_simple	Core	When both are available, the base URI is defined by xml:base instead of by metadata discovered during a protocol interaction.	2	0	0
6	[3.1.3.1]	Required	Auto	Abs_rating_simple	Core	When both are available, the base URI is defined by xml:base instead of by the current document.	2	0	0
7	[3.1.3.1]	Required	Manual	Abs_rating_simple	Core	When both are available, the base URI is defined by metadata discovered during a protocol interaction instead of by the current document.	2	0	0
253	[3.1.4]	Required	Auto	Abs_rating_simple	Core	xml:id attribute must be unique to the document	2	0	0
254	[3.1.4]	Required	Auto	Abs_rating_simple	Core	The xml:id attribute is a permitted in p elements.	2	0	0
255	[3.1.4]	Required	Auto	Abs_rating_simple	Core	The xml:id attribute is a permitted attribute in s elements.	2	0	0
256	[3.1.4]	Required	Auto	Abs_rating_simple	Core	The xml:id attribute is a permitted attribute in w elements.	2	0	0
333	[3.1.5.1]	Required	Auto	Abs_rating_simple	Core	The pronunciation information contained within a lexicon document is used for words defined within the enclosing document.	2	0	0
334	[3.1.5.1]	Required	Auto	Abs_rating_simple	Core	Any number of lexicon elements may occur as immediate children of the speak element.	2	0	0
335	[3.1.5.1]	Required	Auto	Abs_rating_simple	Core	The lexicon element must have an xml:id attribute.	2	0	0
336	[3.1.5.1]	Required	Auto	Abs_rating_simple	Core	The name of xml:id attribute must be unique to the current SSML document.	2	0	0
338	[3.1.5.1]	Required	Auto	Abs_rating_simple	Core	A type attribute is optional to specify the media type of the pronunciation lexicon document.	2	0	0
339	[3.1.5.1]	Required	Auto	Abs_rating_simple	Core	The default value of the type attribute is application/pls+xml.	2	0	0
346	[3.1.5.1]	Required	Auto	Abs_rating_simple	Core	The lexicon element MUST have a uri attribute.	2	0	0
348	[3.1.5.1]	Required	Auto	Abs_rating_simple	Core	The lexicon element MAY have a fetchtimeout attribute that specifies the timeout for fetches.	2	0	0
349	[3.1.5.1]	Required	Auto	Abs_rating_simple	Core	The lexicon element MAY have a maxage attribute that indicates that the document is willing to use content whose age is no greater than the specified time	2	0	0
350	[3.1.5.1]	Required	Auto	Abs_rating_simple	Core	The lexicon element MAY have a maxstale attribute that indicates that the document is willing to use content that has exceeded its expiration time by no more than the specified amount of time.	2	0	0
316	[3.1.5.2]	Required	Auto	Abs_rating_simple	Core	The lookup element must have a ref attribute.	2	0	0
318	[3.1.5.2]	Required	Auto	Abs_rating_simple	Core	A lookup element may contain other lookup elements.	2	0	0
319	[3.1.5.2]	Required	Auto	Abs_rating_simple	Core	The child lookup elements have higher precedence.	2	0	0
320	[3.1.5.2]	Required	Auto	Abs_rating_simple	Core	The audio element can occur into lookup element.	2	0	0
321	[3.1.5.2]	Required	Auto	Abs_rating_simple	Core	The break element can occur into lookup element.	2	0	0
322	[3.1.5.2]	Required	Auto	Abs_rating_simple	Core	The emphasis element can occur into lookup element.	2	0	0
323	[3.1.5.2]	Required	Auto	Abs_rating_simple	Core	The lang element can occur into lookup element.	2	0	0
324	[3.1.5.2]	Required	Auto	Abs_rating_simple	Core	The mark element can occur into lookup element.	2	0	0
325	[3.1.5.2]	Required	Auto	Abs_rating_simple	Core	The p element can occur into lookup element.	2	0	0
326	[3.1.5.2]	Required	Auto	Abs_rating_simple	Core	The phoneme element can occur into lookup element.	2	0	0
327	[3.1.5.2]	Required	Auto	Abs_rating_simple	Core	The prosody element can occur into lookup element.	2	0	0
328	[3.1.5.2]	Required	Auto	Abs_rating_simple	Core	The say-as element can occur into lookup element.	2	0	0
329	[3.1.5.2]	Required	Auto	Abs_rating_simple	Core	The sub element can occur into lookup element.	2	0	0
330	[3.1.5.2]	Required	Auto	Abs_rating_simple	Core	The s element can occur into lookup element.	2	0	0
331	[3.1.5.2]	Required	Auto	Abs_rating_simple	Core	The voice element can occur into lookup element.	2	0	0
332	[3.1.5.2]	Required	Auto	Abs_rating_simple	Core	The w element can occur into lookup element.	2	0	0
8	[3.1.6]	Required	Manual	Abs_rating_simple	Core	The seeAlso property of name attribute is used to specify a resource that might provide additional metadata information about the content.	2	0	0
9	[3.1.6]	Required	Auto	Abs_rating_simple	Core	Either a name or http-equiv attribute is required.	2	0	0
10	[3.1.6]	Required	Auto	Abs_rating_simple	Core	It is an error to provide both name and http-equiv attributes.	2	0	0
11	[3.1.7]	Required	Manual	Abs_rating_simple	Core	The metadata element is a container in which information about the document can be placed using a metadata schema.	2	0	0
12	[3.1.7]	Required	Auto	Abs_rating_simple	Core	Any metadata schema can be used with metadata, but it is recommended that the Resource Description Format (RDF) schema [RDF-SCHEMA] be used in conjunction with the general metadata properties defined in the Dublin Core Metadata Initiative [DC].	2	0	0
361	[3.1.8.1]	Required	Auto	Abs_rating_simple	Core	The lookup element can occur in a p element.	2	0	0
362	[3.1.8.1]	Required	Auto	Abs_rating_simple	Core	The lookup element can occur in a s element.	2	0	0
193	[3.1.8.2]	Required	Auto	Paired_Complex	Core	The token element allows the author to indicate its content is a token and to eliminate token (word) segmentation ambiguities of the synthesis processor.	2	0	0
195	[3.1.8.2]	Required	Auto	Paired_Complex	Core	xml:lang is a defined attribute on the token element to identify the written language of the content.	2	0	0
196	[3.1.8.2]	Required	Auto	Abs_rating_simple	Core	xml:id is a defined attribute on the token element.	2	0	0
199	[3.1.8.2]	Required	Auto	Abs_rating_simple	Core	The text contents of the token element and its subelements are together considered to be one token for lexical lookup purposes as follows:1. All markup within the token element is removed (leaving the contents of the markup). 2. All remaining text is concatenated together in the order in which it appears in the document. 3. Leading and trailing spaces are removed from this single block of text. 4. Multiple contiguous white space characters are converted into a single space. 5. The result is treated as a sinjunyan@iflytek.com	2	0	0
200	[3.1.8.2]	Required	Auto	Abs_rating_simple	Core	role is an OPTIONAL defined attribute on the token element.	2	0	0
201	[3.1.8.2]	Required	Auto	Abs_rating_simple	Core	The role attribute takes as its value one or more white-space separated QNames.	2	0	0
202	[3.1.8.2]	Required	Auto	Abs_rating_simple	Core	The w element can contain text to be rendered.	2	0	0
203	[3.1.8.2]	Required	Auto	Abs_rating_simple	Core	The audio element can occur in a w element.	2	0	0
204	[3.1.8.2]	Required	Auto	Abs_rating_simple	Core	The break element can occur in a w element.	2	0	0
205	[3.1.8.2]	Required	Auto	Abs_rating_simple	Core	The emphasis element can occur in a w element.	2	0	0
206	[3.1.8.2]	Required	Auto	Abs_rating_simple	Core	The mark element can occur in a w element.	2	0	0
207	[3.1.8.2]	Required	Auto	Abs_rating_simple	Core	The phoneme element can occur in a w element.	2	0	0
208	[3.1.8.2]	Required	Auto	Abs_rating_simple	Core	The prosody element can occur in a w element.	2	0	0
209	[3.1.8.2]	Required	Auto	Abs_rating_simple	Core	The say-as element can occur in a w element.	2	0	0
210	[3.1.8.2]	Required	Auto	Abs_rating_simple	Core	The sub element can occur in a w element.	2	0	0
212	[3.1.8.2]	Required	Auto	Abs_rating_simple	Core	The w element can occur in a audio element.	2	0	0
213	[3.1.8.2]	Required	Auto	Abs_rating_simple	Core	The w element can occur in a emphasis element.	2	0	0
214	[3.1.8.2]	Required	Auto	Abs_rating_simple	Core	The w element can occur in a lang element.	2	0	0
215	[3.1.8.2]	Required	Auto	Abs_rating_simple	Core	The w element can occur in a lookup element.	2	0	0
216	[3.1.8.2]	Required	Auto	Abs_rating_simple	Core	The w element can occur in a prosody element.	2	0	0
217	[3.1.8.2]	Required	Auto	Abs_rating_simple	Core	The w element can occur in a speak element.	2	0	0
218	[3.1.8.2]	Required	Auto	Abs_rating_simple	Core	The w element can occur in a p element.	2	0	0
219	[3.1.8.2]	Required	Auto	Abs_rating_simple	Core	The w element can occur in a s element.	2	0	0
220	[3.1.8.2]	Required	Auto	Abs_rating_simple	Core	The w element can occur in a voice element.	2	0	0
347	[3.1.8.2]	Required	Auto	Paired_Complex	Core	The w element is an alias for the token element.	2	0	0
13	[3.1.9]	Required	Manual	Paired_Simple	Core	When the value for the interpret-as attribute is unknown or unsupported by a processor, it must render the contained text as if no interpret-as value were specified.	2	0	0
14	[3.1.9]	Required	Manual	Paired_Simple	Core	When the value for the format attribute is unknown or unsupported by a processor, it must render the contained text as if no format value were specified.	2	0	0
15	[3.1.9]	Required	Manual	Abs_rating_simple	Core	The interpret-as attribute is always required.	2	0	0
16	[3.1.9]	Required	Manual	Abs_rating_simple	Core	The format attribute is optional.	2	0	0
17	[3.1.9]	Required	Manual	Abs_rating_simple	Core	When the content of the element contains other text in addition to the indicated content type, the synthesis processor must attempt to render such text.	2	0	0
18	[3.1.9]	Optional	Manual	Abs_rating_simple	Core	A synthesis processor should pronounce the contained text in a manner in which such content is normally produced for the language.	2	0	0
19	[3.1.9]	Required	Manual	Paired_Simple	Core	The detail attribute can be used for all say-as interpret-as types.	2	0	0
20	[3.1.9]	Required	Manual	Paired_Simple	Core	Every value of the detail attribute must render all of the informational content in the contained text.	2	0	0
21	[3.1.9]	Required	Manual	Paired_Simple	Core	If the detail attribute is not specified, the level of detail that is produced by the synthesis processor depends on the text content and the language.	2	0	0
22	[3.1.9]	Required	Manual	Paired_Simple	Core	When the value for the detail attribute is unknown or unsupported by a processor, it must render the contained text as if no value were specified for the detail attribute.	2	0	0
23	[3.1.9]	Required	Manual	Abs_rating_simple	Core	The say-as element can only contain text to be rendered.	2	0	0
24	[3.1.9]	Required	Manual	Abs_rating_simple	Core	When the content of the say-as element contains additional text next to the content that is in the indicated format and interpret-as type, then this additional text MUST be rendered.	2	0	0
25	[3.1.9]	Required	Manual	Abs_rating_simple	Core	When the content of the say-as element contains no content in the indicated interpret-as type or format, the processor must render the content as if the attributes are not present.	2	0	0
26	[3.1.9]	Optional	Manual	Abs_rating_simple	Core	When the content of the say-as element contains no content in the indicated interpret-as type or format the processor should notify the environment of the mismatch.	1	0	1
183	[3.1.10]	Required	Auto	Paired_Simple	Core	The phoneme element may be empty.	2	0	0
184	[3.1.10]	Required	Manual	Paired_Simple	Core	The phoneme element provides a phonetic pronunciation for the contained text.	2	0	0
185	[3.1.10]	Required	Manual	Abs_rating_simple	Core	The ph attribute is a required attribute that specifies the phoneme string.	2	0	0
186	[3.1.10]	Required	Manual	Paired_Simple	Core	The alphabet attribute is an optional attribute that specifies the phonetic alphabet. The default value is processor-specific.	2	0	0
187	[3.1.10]	Optional	Manual	Paired_Simple	Core	Synthesis processors should support a value for alphabet of "ipa", corresponding to characters composing the International Phonetic Alphabet [IPA].	2	0	0
188	[3.1.10]	Required	Manual	Abs_rating_simple	Core	It is an error if a value for alphabet is specified that is not known or cannot be applied by a synthesis processor.	2	0	0
189	[3.1.10]	Required	Manual	Abs_rating_simple	Core	No elements can occur within the content of the phoneme element	2	0	0
191	[3.1.10]	Optional	Manual	Abs_rating_simple	Core	For processors that support IPA, the processor must syntactically accept all legal ph values.	2	0	0
192	[3.1.10]	Optional	Auto	Abs_rating_simple	Core	For processors supporting the IPA alphabet, the processor should produce output when given unicode IPA codes that can reasonably be considered to belong to the current language.	2	0	0
393	[3.1.10]	Required	Auto	Abs_rating_simple	Core	The type attribute is an optional attribute that indicates additional information about how the pronunciation information is to be interpreted. One of the allowed values for this attribute is "default", which has no implications.	2	0	0
394	[3.1.10]	Required	Manual	Abs_rating_simple	Core	The type attribute is an optional attribute that indicates additional information about how the pronunciation information is to be interpreted. One of the allowed values for this attribute is "ruby", which indicates that the pronunciation information is from ruby text.	2	0	0
395	[3.1.10]	Required	Auto	Paired_Simple	Core	The default value of the 'type' attribute is "default"	2	0	0
190	[3.1.10.1]	Required	Manual	Abs_rating_simple	Core	The only valid values for the alphabet attribute are "ipa",registered values and vendor-defined strings of the form "x-organization" or "x-organization-alphabet".	2	0	0
27	[3.1.11]	Required	Auto	Abs_rating_simple	Core	The sub element is employed to indicate that the specified text replaces the contained text for pronunciation.	2	0	0
28	[3.1.11]	Required	Auto	Abs_rating_simple	Core	The alias attribute is required.	2	0	0
30	[3.1.11]	Optional	Auto	Paired_Simple	Core	The processor should apply text normalization to the alias value.	2	0	0
31	[3.1.11]	Required	Auto	Abs_rating_simple	Core	The sub element can only contain text (no elements).	2	0	0
235	[3.1.12]	Required	Auto	Abs_rating_simple	Core	The lang element must have an xml:lang attribute.	2	0	0
236	[3.1.12]	Required	Auto	Abs_rating_simple	Core	There is no text structure associated with the language change indicated by the lang element.It may be used to specify the language of the content at a level other than a paragraph, sentence or word level.	2	0	0
237	[3.1.12]	Required	Auto	Abs_rating_simple	Core	The lang element can contain text to be rendered.	2	0	0
238	[3.1.12]	Required	Manual	Abs_rating_simple	Core	The lang element can contain the audio element.	2	0	0
239	[3.1.12]	Required	Auto	Abs_rating_simple	Core	The lang element can contain the break element.	2	0	0
240	[3.1.12]	Required	Auto	Abs_rating_simple	Core	The lang element can contain the emphasis element.	2	0	0
241	[3.1.12]	Required	Auto	Abs_rating_simple	Core	The lang element can contain the p element.	2	0	0
242	[3.1.12]	Required	Auto	Abs_rating_simple	Core	The lang element can contain the phoneme element.	2	0	0
243	[3.1.12]	Required	Auto	Abs_rating_simple	Core	The lang element can contain the prosody element.	2	0	0
244	[3.1.12]	Required	Auto	Abs_rating_simple	Core	The lang element can contain the say-as element.	2	0	0
245	[3.1.12]	Required	Auto	Abs_rating_simple	Core	The lang element can contain the sub element.	2	0	0
246	[3.1.12]	Required	Auto	Abs_rating_simple	Core	The lang element can contain the s element.	2	0	0
247	[3.1.12]	Required	Auto	Abs_rating_simple	Core	The lang element can contain the voice element.	2	0	0
248	[3.1.12]	Required	Auto	Abs_rating_simple	Core	The lang element can contain the w element.	2	0	0
249	[3.1.12]	Required	Auto	Abs_rating_simple	Core	The lang element can contain the lang element.	2	0	0
250	[3.1.12]	Required	Auto	Abs_rating_simple	Core	The lang element can contain the mark element.	2	0	0
367	[3.1.12]	Required	Auto	Abs_rating_simple	Core	The lookup element can occur in a lang element.	2	0	0
221	[3.1.13]	Required	Auto	Abs_rating_simple	Core	The onlangfailure attribute is a defined attribute for the speak element.	2	0	0
222	[3.1.13]	Required	Auto	Abs_rating_simple	Core	The onlangfailure attribute is a defined attribute for the lang element.	2	0	0
223	[3.1.13]	Required	Auto	Abs_rating_simple	Core	The onlangfailure attribute is a defined attribute for the desc element.	2	0	0
224	[3.1.13]	Required	Auto	Abs_rating_simple	Core	The onlangfailure attribute is a defined attribute for the p element.	2	0	0
225	[3.1.13]	Required	Auto	Abs_rating_simple	Core	The onlangfailure attribute is a defined attribute for the s element.	2	0	0
226	[3.1.13]	Required	Auto	Abs_rating_simple	Core	The onlangfailure attribute is a defined attribute for the w element.	2	0	0
227	[3.1.13]	Required	Auto	Abs_rating_simple	Core	A language speaking failure occurs whenever the synthesis processor decides that the currently-selected voice cannot speak the declared language of the text.	2	0	0
228	[3.1.13]	Required	Manual	Abs_rating_simple	Core	If the onlangfailure attribute is set to "changevoice", the processor must report a language speaking failure and, if a voice exists that can speak the language, the processor must switch to that voice and speak the content.	2	0	0
229	[3.1.13]	Required	Auto	Abs_rating_simple	Core	If the onlangfailure attribute is set to "changevoice", the processor must report a language speaking failure and, if a voice does not exist that can speak the language, the processor must choose another behavior (either ignoretext or ignorelang).	2	0	0
230	[3.1.13]	Required	Auto	Abs_rating_simple	Core	If the onlangfailure attribute is set to "ignoretext", the processor must report a language speaking failure and will not attempt to render the text that is in the failed language.	2	0	0
231	[3.1.13]	Required	Auto	Abs_rating_simple	Core	If the onlangfailure attribute is set to "ignorelang", the processor must report a language speaking failure and will ignore the change in language and speak as if the content were in the previous language.	2	0	0
232	[3.1.13]	Required	Manual	Abs_rating_simple	Core	If the onlangfailure attribute is set to "processorchoice", the processor must report a language speaking failure and chooses the behavior (either changevoice, ignoretext, or ignorelang).	2	0	0
233	[3.1.13]	Required	Auto	Abs_rating_simple	Core	The value of this attribute is inherited down the document hierarchy.	2	0	0
234	[3.1.13]	Required	Manual	Abs_rating_simple	Core	The top-level default value for this attribute is "processorchoice".	2	0	0
257	[3.2.1]	Required	Auto	Abs_rating_simple	Core	gender: attribute indicating the preferred gender of the voice to speak the contained text. value : "female".	2	0	0
258	[3.2.1]	Required	Auto	Abs_rating_simple	Core	gender: attribute indicating the preferred gender of the voice to speak the contained text. value : "male".	2	0	0
259	[3.2.1]	Required	Auto	Abs_rating_simple	Core	gender: attribute indicating the preferred gender of the voice to speak the contained text. value : "neutral".	2	0	0
260	[3.2.1]	Required	Auto	Abs_rating_simple	Core	age: attribute indicating the preferred age of the voice to speak the contained text. integer value : 5	2	0	0
261	[3.2.1]	Required	Manual	Abs_rating_simple	Core	variant: attribute indicating a preferred variant of the other voice characteristics to speak the contained text. Integer value : manual	2	0	0
262	[3.2.1]	Required	Manual	Abs_rating_simple	Core	name: attribute indicating a platform-specific voice name to speak the contained text. value : manual.	2	0	0
263	[3.2.1]	Required	Manual	Abs_rating_simple	Core	name: attribute indicating a platform-specific voice name to speak the contained text. The value may be a space-separated list of names. Value : manual.	2	0	0
267	[3.2.1]	Required	Manual	Abs_rating_simple	Core	Although each attribute individually is optional, at least one must be specified any time the voice element is used.	2	0	0
268	[3.2.1]	Optional	Manual	Paired_Complex	Core	Relative changes in prosodic parameters should be carried across voice changes. Test with pitch attribute.	2	0	0
269	[3.2.1]	Optional	Manual	Paired_Complex	Core	Relative changes in prosodic parameters should be carried across voice changes. Test with range attribute.	2	0	0
270	[3.2.1]	Optional	Manual	Paired_Complex	Core	Relative changes in prosodic parameters should be carried across voice changes. Test with rate attribute.	2	0	0
271	[3.2.1]	Optional	Manual	Paired_Complex	Core	Relative changes in prosodic parameters should be carried across voice changes. Test with volume attribute.	2	0	0
272	[3.2.1]	Required	Auto	Abs_rating_simple	Core	gender: attribute indicating the preferred gender of the voice to speak the contained text. value : "".	2	0	0
273	[3.2.1]	Required	Auto	Abs_rating_simple	Core	age: attribute indicating the preferred age of the voice to speak the contained text. value : ""	2	0	0
274	[3.2.1]	Required	Auto	Abs_rating_simple	Core	variant: attribute indicating a preferred variant of the other voice characteristics to speak the contained text. value : ""	2	0	0
275	[3.2.1]	Required	Manual	Abs_rating_simple	Core	languages: optional attribute indicating the list of languages the voice can speak, with optional accent indication per language, value: "en:zh zh:zh".	2	0	0
276	[3.2.1]	Required	Manual	Abs_rating_simple	Core	languages: optional attribute indicating the list of languages the voice can speak, with optional accent indication per language, value: "".	2	0	0
277	[3.2.1]	Required	Manual	Abs_rating_simple	Core	A voice satisfies the languages feature if, for each language/accent pair in the list, the voice is documented (see Voice descriptions) as reading/speaking a language that matches the Extended Language Range given by language according to the Extended Filtering matching algorithm [BCP47, Matching of Language Tags (3.3.2)]	2	0	0
278	[3.2.1]	Required	Manual	Abs_rating_simple	Core	A voice satisfies the languages feature if, for each language/accent pair in the list, if an accent is given, the voice is documented (see Voice descriptions) as reading/speaking the language above with an accent that matches the Extended Language Range given by accent according to the Extended Filtering matching algorithm [BCP47, Matching of Language Tags (3.3.2)], except that the script and extension subtags of the accent MUST be ignored by the synthesis processor.	2	0	0
279	[3.2.1]	Required	Manual	Abs_rating_simple	Core	Language/accent pairs must be separated by white space	2	0	0
280	[3.2.1]	Required	Auto	Abs_rating_simple	Core	The top-level default value for all feature attributes is "", the empty string.	2	0	0
281	[3.2.1]	Required	Manual	Abs_rating_simple	Core	If no voice is identified for which the values of all voice feature attributes listed in the required attribute value are matched, there is voice selection failure.	2	0	0
282	[3.2.1]	Required	Manual	Abs_rating_simple	Core	Valid values of required are a space-separated list composed of values from the list of feature names: "name", "languages", "gender", "age", "variant" or the empty string "".	2	0	0
283	[3.2.1]	Required	Manual	Abs_rating_simple	Core	The default value for required attribute is "languages".	2	0	0
284	[3.2.1]	Required	Manual	Abs_rating_simple	Core	ordering: OPTIONAL attribute that specifies the matching priority ordering for voice selection.	2	0	0
285	[3.2.1]	Required	Auto	Abs_rating_simple	Core	Valid values of ordering are a space-separated list composed of values from the list of feature names: "name", "languages", "gender", "age", "variant" or the empty string "".	2	0	0
287	[3.2.1]	Required	Auto	Abs_rating_simple	Core	The default value of ordering attribute is "languages".	2	0	0
288	[3.2.1]	Required	Manual	Abs_rating_simple	Core	If there is voice selection failure and onvoicefailure="priorityselect",then if there is at least one voice that matches by feature priority, one such voice must be used.	2	0	0
291	[3.2.1]	Required	Manual	Abs_rating_simple	Core	A conforming synthesis processor must report an onvoicefailure if voice selection is not successful.	2	0	0
292	[3.2.1]	Required	Manual	Abs_rating_simple	Core	The default value for onvoicefailure attribute is "priorityselect".	2	0	0
293	[3.2.1]	Required	Manual	Abs_rating_simple	Core	If one or more available voice is identified for which the values of all voice feature attributes listed in the required attribute value are matched, then voice selection is successful.	2	0	0
294	[3.2.1]	Required	Manual	Abs_rating_simple	Core	priorityselect: the synthesis processor uses the values of all voice feature attributes to select a voice by feature priority, where the starting candidate set is the set of all available voices.	2	0	0
295	[3.2.1]	Required	Manual	Abs_rating_simple	Core	keepexisting - if voice selection fail the voice does not change.	2	0	0
296	[3.2.1]	Required	Manual	Abs_rating_simple	Core	processorchoice - if voice selection failthe synthesis processor chooses the behavior (either priorityselect or keepexisting).	2	0	0
297	[3.2.1]	Required	Manual	Abs_rating_simple	Core	Although each attribute individually is optional, it is an error if no attributes are specified when the voice element is used.	2	0	0
299	[3.2.1]	Required	Manual	Abs_rating_simple	Core	When the value of the required attribute is the empty string "", if one or more voices are available any of the voices is considered a successful match; otherwise there is voice selection failure.	2	0	0
300	[3.2.1]	Required	Manual	Abs_rating_simple	Core	If one or more voices are identified for which the values of all voice feature attributes listed in the required attribute value are matched, then out of those voices, one that matches by feature priority must be selected.	2	0	0
301	[3.2.1]	Required	Manual	Abs_rating_simple	Core	voice attributes are inherited down the tree including to within elements that change the language.	2	0	0
302	[3.2.1]	Required	Manual	Abs_rating_simple	Core	if a voice is changed by the processor as a result of a language speaking failure, the prior voice is restored when that voice is again able to speak the content.	2	0	0
304	[3.2.1]	Required	Manual	Abs_rating_simple	Core	changes in voice are scoped and apply only to the content of the element in which the change occurred.	2	0	0
351	[3.2.1]	Required	Auto	Abs_rating_simple	Core	The voice element can contain audio element.	2	0	0
352	[3.2.1]	Required	Auto	Paired_Complex	Core	The voice element can contain the break element.	2	0	0
353	[3.2.1]	Required	Auto	Abs_rating_simple	Core	The voice element can contain emphasis element.	2	0	0
354	[3.2.1]	Required	Auto	Abs_rating_simple	Core	The voice element can contain lang element.	2	0	0
355	[3.2.1]	Required	Auto	Abs_rating_simple	Core	The voice element can contain mark element.	2	0	0
356	[3.2.1]	Required	Auto	Abs_rating_simple	Core	The voice element can contain p, s, w element.	2	0	0
357	[3.2.1]	Required	Auto	Abs_rating_simple	Core	The voice element can contain phoneme element.	2	0	0
358	[3.2.1]	Required	Auto	Abs_rating_simple	Core	The voice element can contain prosody element.	2	0	0
359	[3.2.1]	Required	Auto	Abs_rating_simple	Core	The voice element can contain say-as element.	2	0	0
360	[3.2.1]	Required	Auto	Abs_rating_simple	Core	The voice element can contain sub element.	2	0	0
363	[3.2.1]	Required	Auto	Abs_rating_simple	Core	The lookup element can occur in a voice element.	2	0	0
32	[3.2.2]	Required	Auto	Abs_rating_simple	Core	level: the level attribute indicates the strength of emphasis to be applied. Value : "moderate".	2	0	0
33	[3.2.2]	Required	Auto	Paired_Complex	Core	level: the level attribute indicates the strength of emphasis to be applied. The default level is "moderate".	2	0	0
34	[3.2.2]	Required	Auto	Paired_Complex	Core	level: the level attribute indicates the strength of emphasis to be applied. "strong" >= "moderate".	2	0	0
35	[3.2.2]	Required	Auto	Paired_Complex	Core	level: the optional level attribute indicates the strength of emphasis to be applied. Value : "strong".	2	0	0
36	[3.2.2]	Required	Auto	Paired_Simple	Core	level: The "none" level is used to prevent the speech synthesis processor from emphasizing words that it might typically emphasize.	2	0	0
37	[3.2.2]	Required	Manual	Paired_Simple	Core	comparison with/without 'none' emphasis on specified sentences. must be customized by hand : IR participant must write a sentence where their TTS automatically puts emphasis.	2	0	0
38	[3.2.2]	Required	Manual	Paired_Complex	Core	level: The "reduced" level is effectively the opposite of emphasizing a word.	2	0	0
39	[3.2.2]	Required	Auto	Paired_Complex	Core	level: the level attribute indicates the strength of emphasis to be applied. "moderate" >= "none".	2	0	0
366	[3.2.2]	Required	Auto	Abs_rating_simple	Core	The lookup element can occur in an emphasis element.	2	0	0
40	[3.2.3]	Required	Auto	Paired_Simple	Core	A break with no attributes, must produce a break with a prosodic strength greater than that which the processor would otherwise have used if no break element was supplied.	2	0	0
41	[3.2.3]	Required	Auto	Abs_rating_simple	Core	time and strength: The time and strength attributes are optional for the break element.	2	0	0
42	[3.2.3]	Required	Auto	Abs_rating_simple	Core	The break element must always be empty.	2	0	0
43	[3.2.3]	Required	Auto	Abs_rating_simple	Core	strength: legal value: "x-strong".	2	0	0
44	[3.2.3]	Required	Auto	Abs_rating_simple	Core	strength: legal value: "strong".	2	0	0
45	[3.2.3]	Required	Auto	Abs_rating_simple	Core	strength: legal value: "medium".	2	0	0
46	[3.2.3]	Required	Auto	Abs_rating_simple	Core	strength: legal value: "weak".	2	0	0
47	[3.2.3]	Required	Auto	Abs_rating_simple	Core	strength: legal value: "x-weak".	2	0	0
48	[3.2.3]	Required	Auto	Abs_rating_simple	Core	strength: legal value: "none".	2	0	0
49	[3.2.3]	Required	Auto	Paired_Simple	Core	strength: default value == medium.	2	0	0
50	[3.2.3]	Required	Auto	Abs_rating_simple	Core	time: legal value in seconds "s".	2	0	0
51	[3.2.3]	Required	Auto	Abs_rating_simple	Core	time: legal value in milliseconds "ms".	2	0	0
52	[3.2.3]	Required	Auto	Paired_Complex	Core	strength: comparative test, "weak" equal to or stronger than "x-weak"	2	0	0
53	[3.2.3]	Required	Auto	Paired_Complex	Core	strength: comparative test, "medium" equal to or stronger than "weak".	2	0	0
54	[3.2.3]	Required	Auto	Paired_Complex	Core	strength: comparative test, "strong" equal to or stronger than "medium".	2	0	0
55	[3.2.3]	Required	Auto	Paired_Complex	Core	strength: comparative test, "x-strong" equal to or stronger than "strong".	2	0	0
56	[3.2.3]	Required	Auto	Abs_rating_simple	Core	If both 'strength' and 'time' are supplied, the processor will insert a break with a duration as specified by the time attribute, with other prosodic changes in the output based on the value of the strength attribute.	2	0	0
57	[3.2.3]	Optional	Manual	Paired_Simple	Core	strength: comparative test, the value "none" indicates that no prosodic break boundary should be outputted, which can be used to prevent a prosodic break which the processor would otherwise produce.	1	1	0
58	[3.2.4]	Required	Auto	Abs_rating_simple	Core	Although each attribute individually is optional, it is an error if no attributes are specified.	2	0	0
59	[3.2.4]	Required	Auto	Abs_rating_simple	Core	pitch: legal value, number followed by "Hz".	2	0	0
60	[3.2.4]	Required	Auto	Abs_rating_simple	Core	pitch: legal value, relative positive change, "+" number followed by "Hz".	2	0	0
61	[3.2.4]	Required	Auto	Abs_rating_simple	Core	pitch: legal value, relative negative change, "-" number followed by "Hz".	2	0	0
62	[3.2.4]	Required	Auto	Abs_rating_simple	Core	pitch: legal value, relative negative semitone change, "-" number followed by "st".	2	0	0
63	[3.2.4]	Required	Auto	Abs_rating_simple	Core	pitch: legal value, relative positive semitone change, "+" number followed by "st".	2	0	0
64	[3.2.4]	Required	Auto	Abs_rating_simple	Core	pitch: legal value: "x-high".	2	0	0
65	[3.2.4]	Required	Auto	Abs_rating_simple	Core	pitch: legal value: "high".	2	0	0
66	[3.2.4]	Required	Auto	Abs_rating_simple	Core	pitch: legal value: "medium".	2	0	0
67	[3.2.4]	Required	Auto	Abs_rating_simple	Core	pitch: legal value: "low".	2	0	0
68	[3.2.4]	Required	Auto	Abs_rating_simple	Core	pitch: legal value: "x-low".	2	0	0
69	[3.2.4]	Required	Auto	Abs_rating_simple	Core	pitch: legal value: "default".	2	0	0
70	[3.2.4]	Required	Auto	Paired_Simple	Core	pitch: comparative test, no pitch is equal to "default".	2	0	0
71	[3.2.4]	Required	Auto	Paired_Simple	Core	pitch: comparative test, "low" higher or equal then "x-low".	2	0	0
72	[3.2.4]	Required	Auto	Paired_Simple	Core	pitch: comparative test, "medium" higher or equal then "low".	2	0	0
73	[3.2.4]	Required	Auto	Paired_Simple	Core	pitch: comparative test, "high" higher than or equal to "medium".	2	0	0
74	[3.2.4]	Required	Auto	Paired_Simple	Core	pitch: comparative test, "x-high" higher than or equal to "high".	2	0	0
75	[3.2.4]	Required	Auto	Abs_rating_simple	Core	range: legal value, number followed by "Hz".	2	0	0
76	[3.2.4]	Required	Auto	Abs_rating_simple	Core	range: legal value, relative positive change, "+" number followed by "Hz".	2	0	0
77	[3.2.4]	Required	Auto	Abs_rating_simple	Core	range: legal value, relative negative change, "-" number followed by "Hz".	2	0	0
78	[3.2.4]	Required	Auto	Abs_rating_simple	Core	range: legal value, relative negative semitone change, "-" number followed by "st".	2	0	0
79	[3.2.4]	Required	Auto	Abs_rating_simple	Core	range: legal value, relative positive semitone change, "+" number followed by "st".	2	0	0
80	[3.2.4]	Required	Auto	Abs_rating_simple	Core	range: legal value: "x-high".	2	0	0
81	[3.2.4]	Required	Auto	Abs_rating_simple	Core	range: legal value: "high".	2	0	0
82	[3.2.4]	Required	Auto	Abs_rating_simple	Core	range: legal value: "medium".	2	0	0
83	[3.2.4]	Required	Auto	Abs_rating_simple	Core	range: legal value: "low".	2	0	0
84	[3.2.4]	Required	Auto	Abs_rating_simple	Core	range: legal value: "x-low".	2	0	0
85	[3.2.4]	Required	Auto	Abs_rating_simple	Core	range: legal value: "default".	2	0	0
86	[3.2.4]	Required	Auto	Paired_Complex	Core	range: comparative test, no range is equal to "default".	2	0	0
87	[3.2.4]	Required	Auto	Paired_Complex	Core	range: comparative test, "low" higher than or equal to "x-low".	2	0	0
88	[3.2.4]	Required	Auto	Paired_Complex	Core	range: comparative test, "medium" higher than or equal to "low".	2	0	0
89	[3.2.4]	Required	Auto	Paired_Complex	Core	range: comparative test, "high" higher than or equal to "medium".	2	0	0
90	[3.2.4]	Required	Auto	Paired_Complex	Core	range: comparative test, "x-high" higher than or equal to "high".	2	0	0
92	[3.2.4]	Required	Auto	Abs_rating_simple	Core	rate: legal value: "x-fast".	2	0	0
93	[3.2.4]	Required	Auto	Abs_rating_simple	Core	rate: legal value: "fast".	2	0	0
94	[3.2.4]	Required	Auto	Abs_rating_simple	Core	rate: legal value: "medium".	2	0	0
95	[3.2.4]	Required	Auto	Abs_rating_simple	Core	rate: legal value: "slow".	2	0	0
96	[3.2.4]	Required	Auto	Abs_rating_simple	Core	rate: legal value: "x-slow".	2	0	0
97	[3.2.4]	Required	Auto	Abs_rating_simple	Core	rate: legal value: "default".	2	0	0
98	[3.2.4]	Required	Auto	Paired_Complex	Core	rate: comparative test, no rate is equal to "default".	2	0	0
99	[3.2.4]	Required	Auto	Paired_Complex	Core	rate: comparative test, "slow" faster than or equal to "x-slow".	2	0	0
100	[3.2.4]	Required	Auto	Paired_Complex	Core	rate: comparative test, "medium" faster than or equal to "slow".	2	0	0
101	[3.2.4]	Required	Auto	Paired_Complex	Core	rate: comparative test, "fast" faster than or equal to "medium".	2	0	0
102	[3.2.4]	Required	Auto	Paired_Complex	Core	rate: comparative test, "x-fast" faster than or equal to "fast".	2	0	0
103	[3.2.4]	Required	Auto	Abs_rating_simple	Core	duration: legal value in seconds.	2	0	0
104	[3.2.4]	Required	Auto	Abs_rating_simple	Core	duration: legal value in milliseconds.	2	0	0
106	[3.2.4]	Required	Auto	Abs_rating_simple	Core	volume: legal value, a number preceded by "+" and immediately followed by "dB".	2	0	0
107	[3.2.4]	Required	Auto	Abs_rating_simple	Core	volume: legal value, a number preceded by "-" and immediately followed by "dB".	2	0	0
108	[3.2.4]	Required	Auto	Abs_rating_simple	Core	volume: legal value: "x-loud".	2	0	0
109	[3.2.4]	Required	Auto	Abs_rating_simple	Core	volume: legal value: "loud".	2	0	0
110	[3.2.4]	Required	Auto	Abs_rating_simple	Core	volume: legal value: "medium".	2	0	0
111	[3.2.4]	Required	Auto	Abs_rating_simple	Core	volume: legal value: "soft".	2	0	0
112	[3.2.4]	Required	Auto	Abs_rating_simple	Core	volume: legal value: "x-soft".	2	0	0
113	[3.2.4]	Required	Auto	Abs_rating_simple	Core	volume: legal value: "silent".	2	0	0
114	[3.2.4]	Required	Auto	Abs_rating_simple	Core	volume: legal value: "default".	2	0	0
115	[3.2.4]	Optional	Auto	Paired_Complex	Core	volume: comparative test, no volume is equal to "default".	2	0	0
117	[3.2.4]	Required	Auto	Paired_Simple	Core	volume: comparative test, "silent" amounts to specifying minus infinity decibels (dB).	2	0	0
118	[3.2.4]	Required	Auto	Paired_Complex	Core	volume: comparative test, "soft" louder than or equal to "x-soft".	2	0	0
119	[3.2.4]	Required	Auto	Paired_Complex	Core	volume: comparative test, "medium" louder than or equal to "soft".	2	0	0
120	[3.2.4]	Required	Auto	Paired_Complex	Core	volume: comparative test, "loud" louder than or equal to "medium".	2	0	0
121	[3.2.4]	Required	Auto	Paired_Complex	Core	volume: comparative test, "x-loud" louder than or equal to "loud".	2	0	0
122	[3.2.4]	Required	Auto	Abs_rating_simple	Core	contour: legal value, relative negative percentage change, "-" number followed by "%".	2	0	0
123	[3.2.4]	Required	Auto	Abs_rating_simple	Core	contour: legal value, number followed by "Hz".	2	0	0
125	[3.2.4]	Required	Auto	Abs_rating_simple	Core	contour: legal value, relative positive percentage change, "+" number followed by "%".	2	0	0
126	[3.2.4]	Required	Auto	Abs_rating_simple	Core	contour: legal value, relative positive change, "+" number followed by "Hz".	2	0	0
127	[3.2.4]	Required	Auto	Abs_rating_simple	Core	contour: legal value, relative negative change, "-" number followed by "Hz".	2	0	0
128	[3.2.4]	Required	Auto	Abs_rating_simple	Core	contour: legal value, relative negative semitone change, "-" number followed by "st".	2	0	0
129	[3.2.4]	Required	Auto	Abs_rating_simple	Core	contour: legal value, relative positive semitone change, "+" number followed by "st".	2	0	0
130	[3.2.4]	Required	Auto	Abs_rating_simple	Core	contour: legal value: "x-high".	2	0	0
131	[3.2.4]	Required	Auto	Abs_rating_simple	Core	contour: legal value: "high".	2	0	0
132	[3.2.4]	Required	Auto	Abs_rating_simple	Core	contour: legal value: "medium".	2	0	0
133	[3.2.4]	Required	Auto	Abs_rating_simple	Core	contour: legal value: "low".	2	0	0
134	[3.2.4]	Required	Auto	Abs_rating_simple	Core	contour: legal value: "x-low".	2	0	0
135	[3.2.4]	Required	Auto	Abs_rating_simple	Core	contour: legal value: "default".	2	0	0
136	[3.2.4]	Required	Auto	Paired_Complex	Core	contour: comparative test, time positions less then 0% are ignored.	2	0	0
137	[3.2.4]	Required	Auto	Paired_Complex	Core	contour: comparative test, time positions greater then 100% are ignored.	2	0	0
138	[3.2.4]	Required	Auto	Abs_rating_complex	Core	contour: comparative test, relative values for pitch are relative to the pitch just before the contained text.	2	0	0
139	[3.2.4]	Required	Auto	Paired_Complex	Core	contour: comparative test, contour takes precedence over pitch.	2	0	0
140	[3.2.4]	Required	Auto	Paired_Complex	Core	contour: comparative test, contour takes precedence over range.	2	0	0
142	[3.2.4]	Required	Auto	Abs_rating_simple	Core	pitch: legal value, relative positive percentage change, "+" number followed by "%".	2	0	0
143	[3.2.4]	Required	Auto	Abs_rating_simple	Core	pitch: legal value, relative negative percentage change, "-" number followed by "%".	2	0	0
145	[3.2.4]	Required	Auto	Abs_rating_complex	Core	range: legal value, relative positive percentage change, "+" number followed by "%".	2	0	0
146	[3.2.4]	Required	Auto	Abs_rating_complex	Core	range: legal value, relative negative percentage change, "-" number followed by "%".	2	0	0
147	[3.2.4]	Required	Auto	Abs_rating_simple	Core	rate: legal value, relative percentage change, number followed by "%".	2	0	0
153	[3.2.4]	Required	Auto	Abs_rating_simple	Core	pitch: legal value, "Hz" is case-sensitive.	2	0	0
154	[3.2.4]	Required	Auto	Abs_rating_simple	Core	pitch: legal value, "st" is case-sensitive.	2	0	0
155	[3.2.4]	Required	Auto	Abs_rating_simple	Core	range: legal value, "Hz" is case-sensitive.	2	0	0
156	[3.2.4]	Required	Auto	Abs_rating_simple	Core	range: legal value, "st" is case-sensitive.	2	0	0
158	[3.2.4]	Required	Auto	Abs_rating_simple	Core	contour: legal value, "Hz" is case-sensitive.	2	0	0
159	[3.2.4]	Required	Auto	Abs_rating_simple	Core	contour: legal value, "st" is case-sensitive.	2	0	0
160	[3.2.4]	Required	Auto	Paired_Simple	Core	duration: comparative test, duration takes precedence over rate.	2	0	0
161	[3.2.4]	Optional	Auto	Abs_rating_complex	Core	rate: the default rate for a voice should be such that it is experienced as a normal speaking rate for the voice when reading aloud text.	2	0	0
364	[3.2.4]	Required	Auto	Abs_rating_simple	Core	The lookup element can occur in a prosody element.	2	0	0
162	[3.3.1]	Required	Manual	Abs_rating_simple	Core	If text only output is not required, the processor must try to play the referenced audio document (Raw (headerless) 8kHz 8-bit mono mu-law [PCM] single channel).	2	0	0
163	[3.3.1]	Required	Auto	Paired_Simple	Core	If the audio document cannot be played and text only output is not required, the alternate content must be rendered. The alternate content may be empty.	2	0	0
164	[3.3.1]	Required	Auto	Paired_Simple	Core	If the alternate content contains an audio element that cannot be played, the processor must recursively attempt to find its alternate content to render.	2	0	0
165	[3.3.1]	Optional	Auto	Abs_rating_simple	Core	If the audio element is not successfully rendered, the synthesis processor should continue processing and should notify the hosting environment.	2	0	0
166	[3.3.1]	Required	Auto	Paired_Simple	Core	If the audio document cannot be played and text only output is not required, the alternate content must be rendered. The alternate content may contain text.	2	0	0
167	[3.3.1]	Required	Auto	Paired_Simple	Core	If the audio document cannot be played and text only output is not required, the alternate content must be rendered. The alternate content may contain other markup.	2	0	0
168	[3.3.1]	Required	Auto	Paired_Simple	Core	If the audio document cannot be played and text only output is not required, the alternate content must be rendered. The alternate content may contain desc elements.	2	0	0
169	[3.3.1]	Required	Auto	Paired_Simple	Core	If the audio document cannot be played and text only output is not required, the alternate content must be rendered. The alternate content may contain other audio elements.	2	0	0
170	[3.3.1]	Required	Manual	Abs_rating_simple	Core	If text only output is not required, the processor must try to play the referenced audio document (Raw (headerless) 8kHz 8-bit mono A-law [PCM] single channel).	2	0	0
171	[3.3.1]	Required	Auto	Abs_rating_simple	Core	If text only output is not required, the processor must try to play the referenced audio document (WAV (RIFF) 8kHz 8-bit mono mu-law [PCM] single channel).	2	0	0
172	[3.3.1]	Required	Auto	Abs_rating_simple	Core	If text only output is not required, the processor must try to play the referenced audio document (WAV (RIFF) 8kHz 8-bit mono A-law [PCM] single channel).	2	0	0
365	[3.3.1]	Required	Auto	Abs_rating_simple	Core	The lookup element can occur in an audio element.	2	0	0
386	[3.3.1]	Required	Auto	Abs_rating_simple	Core	fetchhint tells the synthesis processor whether or not it can attempt to optimize rendering by pre-fetching audio. The value is prefetch to permit, but not require the processor to pre-fetch the audio.	2	0	0
387	[3.3.1]	Required	Auto	Abs_rating_simple	Core	fetchhint tells the synthesis processor whether or not it can attempt to optimize rendering by pre-fetching audio. The value is safe to say that audio is only fetched when it is needed, never before.	2	0	0
388	[3.3.1]	Required	Auto	Abs_rating_simple	Core	If src is absent, the audio element behaves as if src were present with a legal URI but the document could not be fetched.	2	0	0
390	[3.3.1]	Required	Auto	Abs_rating_simple	Core	fetchtimeout:The timeout for fetches.	2	0	0
391	[3.3.1]	Required	Auto	Abs_rating_simple	Core	maxage:indicates that the document is willing to use content whose age is no greater than the specified time (cf. 'max-age' in HTTP 1.1 [RFC2616]). The document is not willing to use stale content, unless maxstale is also provided.	2	0	0
392	[3.3.1]	Required	Auto	Abs_rating_simple	Core	maxstale:Indicates that the document is willing to use content that has exceeded its expiration time (cf. 'max-stale' in HTTP 1.1 [RFC2616]). If maxstale is assigned a value, then the document is willing to accept content that has exceeded its expiration time by no more than the specified amount of time.	2	0	0
371	[3.3.1.1]	Required	Auto	Paired_Simple	Extended	clipBegin:offset from start of media to begin rendering. This offset is measured in normal media playback time from the beginning of the media.	2	0	0
372	[3.3.1.1]	Required	Auto	Paired_Simple	Extended	clipEnd:offset from start of media to end rendering. This offset is measured in normal media playback time from the beginning of the media.	2	0	0
373	[3.3.1.1]	Required	Auto	Paired_Simple	Extended	repeatCount:number of iterations of media to render. A fractional value describes a portion of the rendered media.	2	0	0
374	[3.3.1.1]	Required	Auto	Paired_Simple	Extended	repeatDur:total duration for repeatedly rendering media. This duration is measured in normal media playback time from the beginning of the media.	2	0	0
375	[3.3.1.1]	Required	Auto	Paired_Simple	Extended	repeatDur takes precedence over repeatCount in determining the total time for rendering media.	2	0	0
384	[3.3.1.1]	Required	Auto	Abs_rating_simple	Extended	If clipBegin is after clipEnd, no audio will be produced.	2	0	0
385	[3.3.1.1]	Required	Auto	Abs_rating_simple	Extended	If clipEnd is after the end of the audio, then rendering ends at the audio end.	2	0	0
377	[3.3.1.2]	Required	Auto	Paired_Simple	Extended	The soundLevel attribute specifies the relative volume of the referenced audio.	2	0	0
378	[3.3.1.3]	Required	Auto	Paired_Simple	Extended	The speed attribute controls the playback speed of the referenced audio, to speed up or slow down the effective rate of play relative to the original speed of the waveform.	2	0	0
173	[3.3.2]	Required	Auto	Paired_Simple	Core	The mark element does not affect the speech output process. Test of markers at word boundaries.	2	0	0
174	[3.3.2]	Required	Manual	Abs_rating_simple	Core	When processing a mark element, the synthesis processor must do one or both of the following: (1) inform the hosting environment with the value of the name attribute and with information allowing the platform to retrieve the corresponding position in the rendered output or (2) when audio output of the SSML document reaches the mark, issue an event that includes the value of the name attribute. The processor must send the event to the destination specified by the hosting environment.	2	0	0
175	[3.3.2]	Required	Auto	Paired_Simple	Core	The mark element does not affect the speech output process Test of markers at the boundaries of the input text.	2	0	0
176	[3.3.3]	Optional	Manual	Paired_Simple	Core	If text only output is required, the content of the desc element(s) should be rendered instead of other alternative content.	0	0	2
177	[3.3.3]	Optional	Manual	Paired_Simple	Core	If text only output is required, the content of the desc element(s) should be rendered instead of other alternative content. The xml:lang attribute can be used to indicate that the content of the element is in a different language from that of the content surrounding the element.	0	0	2

9. References

[PM]: Psychometric methods, Guilford J.P., 1954, New York, McGraw-Hill.
[PC]: The method of paired comparisons for social values, Thurstone L.L., in "Journal of Social Psychology", 1927, nr. 11, pp. 384-400.

Appendices

Appendix A - Test assertion XML API definition

This appendix describes a lightweight framework for authoring SSML tests. The framework encourages a consistent format for writing tests by facilitating Absolute Rating and Paired Comparison tests to be authored in a straightforward manner. By employing a stylesheet, vendors may adapt the framework to their own test infrastructure. For example, a test infrastructure may present the instruction for a test visually instead of via synthesized text.

The Test API consists of a superset of the schema specified in the SSML 1.1 Specification with the addition of a set of four elements in their own namespace (http://www.w3.org/2002/ssml-conformance). The main element of the test API is <conf:test>; it can contain the following element containers:

A.1 Instruction

This <conf:instruction> element marks the instructions for the tester to successfully evaluate the test. For example, for a Paired Comparison test, the instruction might describe the expected differences between the audio produced for the reference and markup test to be assessed as "pass". The instruction is written in plain text and is compulsory. If the test assertion is labelled as Manual, then the instruction will specify the adaptation of the testing environment that is required to execute the test. There are two kinds of possible adaptations:

The test is about a processor-specific feature. These tests are marked in the <conf:instruction> by "Manual='PLAT_DEP'".
The test is language dependent. These tests are marked in the <conf:instruction> by "Manual='LANG_DEP'".

A.2 Reference markup

This (optional) <conf:reference_markup> element is used to indicate the reference SSML document for Paired Comparison tests. If the element contains no children, the raw text from the test markup is used instead as the reference. Otherwise the contained SSML markup is employed. Note that the SSML markup must include the <speak> element.

A.3 Test markup

The <conf:test_markup> element indicates the SSML test markup. This element always contains SSML markup and is compulsory. Note that (as with the reference) the SSML markup must include the <speak> element.

A.4 Document Type Definition

This section contains the DTD (ssml11-conf.dtd) for the Test API markup. The Test API DTD includes a placeholder for SSML content and hence test documents may be validated directly against the Test API DTD.

<!-- Placeholder for SSML -->
<!ELEMENT speak ANY>
<!ATTLIST speak
    version CDATA #REQUIRED
    xml:lang CDATA #IMPLIED
    xmlns CDATA #IMPLIED
    xmlns:xsi CDATA #IMPLIED
    xsi:schemaLocation CDATA #IMPLIED
    xml:base CDATA #IMPLIED
    onlangfailure CDATA #IMPLIED
    startmark CDATA #IMPLIED
    endmark CDATA #IMPLIED
>

<!-- Control prefixing - can be 'switched off' in internal subset -->
<!ENTITY % Conf.prefixed "INCLUDE" >

<!-- Declare the actual namespace -->
<!ENTITY % Conf.xmlns "http://www.w3.org/2002/ssml-conformance" >

<!-- Declare the prefix -->
<!ENTITY % Conf.prefix "conf" >

<![%Conf.prefixed;[
<!ENTITY % Conf.pfx "%Conf.prefix;:" >
]]>
<!ENTITY % Conf.pfx "" >

<![%Conf.prefixed;[
<!ENTITY % Conf.xmlns.attrib
    "xmlns:%Conf.prefix; CDATA #FIXED '%Conf.xmlns;'"
>
]]>
<!ENTITY % Conf.xmlns.attrib
     "xmlns CDATA  #FIXED '%Conf.xmlns;'"
>

<!-- Qualified names -->
<!ENTITY % Conf.test.qname "%Conf.pfx;test" >
<!ENTITY % Conf.instruction.qname "%Conf.pfx;instruction" >
<!ENTITY % Conf.reference_markup.qname "%Conf.pfx;reference_markup" >
<!ENTITY % Conf.test_markup.qname "%Conf.pfx;test_markup" >

<!-- Define the content model -->
<!ELEMENT %Conf.test.qname;
    (%Conf.instruction.qname;,
    (%Conf.reference_markup.qname;)?,
    %Conf.test_markup.qname;) >

<!ATTLIST %Conf.test.qname; %Conf.xmlns.attrib; >

<!ELEMENT %Conf.instruction.qname; (#PCDATA) >

<!ELEMENT %Conf.reference_markup.qname; (speak)? >

<!ELEMENT %Conf.test_markup.qname; (speak) >

A.5 Test examples

The following examples illustrate the use of the Test API. These examples were written to help validate the stylesheet (see section A.6) used to generate the tests.

Example 1 - abs_rating_simple

The following test illustrates an abs_rating_simple test where no reference markup is required:

<?xml version="1.0" encoding="UTF-8" ?> 
<!-- @ Copyright 2008 W3C (MIT, ERCIM, Keio), All Rights Reserved.
     See http://www.w3.org/Consortium/Legal/. @ -->
	 
<conf:test xmlns:conf="http://www.w3.org/2002/ssml-conformance">
    <conf:instruction>
        Must hear the audio to pass.
    </conf:instruction>
	
    <conf:test_markup>
        <speak xml:lang="en-US" version="1.1"
               xmlns="http://www.w3.org/2001/10/synthesis">
            <w>listen<audio src="turca.wav"/></w>
        </speak>
    </conf:test_markup>
	
</conf:test>

After transformation, the instruction markup is:

<?xml version="1.0" encoding="UTF-8"?>
<speak xmlns="http://www.w3.org/2001/10/synthesis"
       version="1.1" xml:lang="en-US" >
    Must hear the audio to pass.
</speak>

After transformation, the test markup is:

<?xml version="1.0" encoding="UTF-8"?>
<speak version="1.1" xml:lang="en-US"
 xmlns="http://www.w3.org/2001/10/synthesis">
   <metadata>
      <rdf:RDF xmlns:dc="http://purl.org/metadata/dublin_core#"
       xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
       xmlns:rdfs="http://www.w3.org/TR/1999/PR-rdf-schema-19990303#">
         <rdf:Description
          dc:Description="Must hear the audio to pass."/>
      </rdf:RDF>
   </metadata>
   <w>listen<audio src="turca.wav"/></w>
</speak>

Example 2 - abs_rating_complex

The following test illustrates an abs_rating_complex test where no reference markup is required:

<?xml version="1.0" encoding="UTF-8" ?> 
<!-- @ Copyright 2003 W3C (MIT, ERCIM, Keio), All Rights Reserved.
     See http://www.w3.org/Consortium/Legal/. @ -->
	 
<conf:test xmlns:conf="http://www.w3.org/2002/ssml-conformance">
    <conf:instruction>
        The word moon in this sentence should sound
        at a higher pitch (+20Hz) to pass this test.
    </conf:instruction>
	
    <conf:test_markup>
        <speak xml:lang="en-US" version="1.1"
               xmlns="http://www.w3.org/2001/10/synthesis">
            The cat jumped over the <prosody contour="(0%,+20Hz)">moon</prosody>.
        </speak>
    </conf:test_markup>
	
</conf:test>

After transformation, the instruction markup is:

<?xml version="1.0" encoding="UTF-8"?>
<speak xmlns="http://www.w3.org/2001/10/synthesis"
       version="1.1" xml:lang="en-US" >
    The word moon in the sentence should sound
    at a higher pitch (+20Hz) to pass this test.
</speak>

After transformation, the test markup is:

<?xml version="1.0" encoding="UTF-8"?>
<speak version="1.1" xml:lang="en-US"
 xmlns="http://www.w3.org/2001/10/synthesis">
   <metadata>
      <rdf:RDF xmlns:dc="http://purl.org/metadata/dublin_core#"
       xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
       xmlns:rdfs="http://www.w3.org/TR/1999/PR-rdf-schema-19990303#">
         <rdf:Description
          dc:Description="The word moon in the sentence should sound
                          at a higher pitch (+20Hz) to pass this test."/>
      </rdf:RDF>
   </metadata>
   The cat jumped over the <prosody contour="(0%,+20Hz)> moon</prosody>.
</speak>

Example 3 - paired_simple

The following test illustrates a paired_simple comparison where the reference is constructed from the supplied SSML markup:

<?xml version="1.0" encoding="UTF-8"?>
<conf:test xmlns:conf="http://www.w3.org/2002/ssml-conformance">
   <conf:instruction>
    In order for the test to pass, the test audio
    should be louder than or equal to the reference audio
   </conf:instruction>

   <conf:reference_markup>
      <speak version="1.1" xml:lang="en-US"
       xmlns="http://www.w3.org/2001/10/synthesis">
         <prosody volume="medium">The cat jumped over the moon</prosody>
      </speak>
   </conf:reference_markup>

   <conf:test_markup>
      <speak version="1.1" xml:lang="en-US"
       xmlns="http://www.w3.org/2001/10/synthesis">
         <prosody volume="loud">The cat jumped over the moon</prosody>
      </speak>
   </conf:test_markup>
</conf:test>

After transformation, the instruction markup is:

<?xml version="1.0" encoding="UTF-8"?>
<speak version="1.1" xml:lang="en-US"
 xmlns="http://www.w3.org/2001/10/synthesis">
   In order for the test to pass, the test audio
   should be louder than or equal to the reference audio
</speak>

After transformation, the reference markup is:

<?xml version="1.0" encoding="UTF-8"?>
<speak version="1.1" xml:lang="en-US"
 xmlns="http://www.w3.org/2001/10/synthesis">
   <metadata>
      <rdf:RDF xmlns:dc="http://purl.org/metadata/dublin_core#"
       xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
       xmlns:rdfs="http://www.w3.org/TR/1999/PR-rdf-schema-19990303#">
         <rdf:Description
           dc:Description="In order for the test to pass, the test audio
                          should be louder than or equal to the reference audio"/>
      </rdf:RDF>
   </metadata>
   <prosody volume="medium">The cat jumped over the moon</prosody>
</speak>

After transformation, the test markup is:

<?xml version="1.0" encoding="UTF-8"?>
<speak version="1.1" xml:lang="en-US"
 xmlns="http://www.w3.org/2001/10/synthesis">
   <metadata>
      <rdf:RDF xmlns:dc="http://purl.org/metadata/dublin_core#"
       xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
       xmlns:rdfs="http://www.w3.org/TR/1999/PR-rdf-schema-19990303#">
         <rdf:Description
          dc:Description="In order for the test to pass, the test audio
                          should be louder than or equal to the reference audio"/>
      </rdf:RDF>
   </metadata>
   <prosody volume="loud">The cat jumped over the moon</prosody>
</speak>

Example 4 - paired_complex

The following test illustrates a paired_complex comparison where the reference is constructed from the supplied SSML markup:

<?xml version="1.0" encoding="UTF-8"?>
<!-- @ Copyright 2004 W3C (MIT, ERCIM, Keio), All Rights Reserved.
      See http://www.w3.org/Consortium/Legal/. @ -->
	   
<conf:test xmlns:conf="http://www.w3.org/2002/ssml-conformance">
    <conf:instruction>
        Tests that contour takes precedence over range.
        The two tests must sound exactly the same.
    </conf:instruction>
    <conf:reference_markup>
        <speak version="1.1" xml:lang="en-US"
               xmlns="http://www.w3.org/2001/10/synthesis">
            <prosody contour="(0%,+20Hz) (10%,+30Hz) (40%,+10Hz)">
                The cat jumped over the moon.
            </prosody>
        </speak>
    </conf:reference_markup>
    <conf:test_markup>
        <speak version="1.1" xml:lang="en-US"
               xmlns="http://www.w3.org/2001/10/synthesis">
            <prosody contour="(0%,+20Hz) (10%,+30Hz) (40%,+10Hz)" range="x-low">
                The cat jumped over the moon.
            </prosody>
        </speak>
    </conf:test_markup>
</conf:test>

After transformation, the instruction markup is:

<?xml version="1.0" encoding="UTF-8"?>
<speak version="1.1" xml:lang="en-US"
       xmlns="http://www.w3.org/2001/10/synthesis">
   Tests that contour takes precedence over range.
   The two tests must sound exactly the same.
</speak>

After transformation, the reference markup is:

<?xml version="1.0" encoding="UTF-8"?>
<speak version="1.1" xml:lang="en-US"
 xmlns="http://www.w3.org/2001/10/synthesis">
   <metadata>
      <rdf:RDF xmlns:dc="http://purl.org/metadata/dublin_core#"
       xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
       xmlns:rdfs="http://www.w3.org/TR/1999/PR-rdf-schema-19990303#">
         <rdf:Description
           dc:Description="Tests that contour takes precedence over range.
                           The two tests must sound exactly the same."/>
      </rdf:RDF>
   </metadata>
   <prosody contour="(0%,+20.Hz) (10%,+30%) (40%,+10.0Hz)">
      The cat jumped over the moon.
   </prosody>
</speak>

After transformation, the test markup is:

<?xml version="1.0" encoding="UTF-8"?>
<speak version="1.1" xml:lang="en-US"
 xmlns="http://www.w3.org/2001/10/synthesis">
   <metadata>
      <rdf:RDF xmlns:dc="http://purl.org/metadata/dublin_core#"
       xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
       xmlns:rdfs="http://www.w3.org/TR/1999/PR-rdf-schema-19990303#">
         <rdf:Description
          dc:Description="Tests that contour takes precedence over range.
                          The two tests must sound exactly the same."/>
      </rdf:RDF>
   </metadata>
   <prosody contour="(0%,+20.Hz) (10%,+30%) (40%,+10.0Hz)" range="x-low">
      The cat jumped over the moon.
   </prosody>
</speak>

A.6 Sample XSLT Template Definition

The following is a listing of an XSLT (ssml11-test.xsl) that can be used to transform the Test API into valid SSML. The XSLT is parameterizable: the parameter "mode" may be set to one of "instruction", "reference", or "test".

<?xml version="1.0" encoding="UTF-8"?>
<!-- Copyright 1998-2004 W3C (MIT, ERCIM, Keio), All Rights Reserved. -->
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:ssml="http://www.w3.org/2001/10/synthesis"
                xmlns:conf="http://www.w3.org/2002/ssml-conformance"
                xmlns="http://www.w3.org/2001/10/synthesis"
                exclude-result-prefixes="ssml conf">

<!-- ################### -->
<!-- P a r a m e t e r s -->
<!-- ################### -->
<xsl:param name="mode" select="'test'"/>
   <!-- select = 'instruction', 'reference', or 'test' -->

<!-- ################ -->
<!-- T o p  L e v e l -->
<!-- ################ -->
<xsl:output method="xml" indent="yes" encoding="UTF-8"/>

<xsl:template match="/">
    <xsl:choose>
        <xsl:when test="$mode = 'instruction'">
            <xsl:apply-templates select="//conf:instruction"/>
        </xsl:when>
        <xsl:when test="$mode = 'reference'">
            <xsl:apply-templates select="//conf:reference_markup"/>

            <!-- For consistency, always return a valid SSML document -->
            <xsl:if test="count(//conf:reference_markup) = 0">
                <speak version="1.1">
                    <xsl:call-template name="meta"/>
                </speak>
            </xsl:if>
        </xsl:when>
        <xsl:when test="$mode = 'test'">
            <xsl:apply-templates select="//conf:test_markup"/>
        </xsl:when>
        <xsl:otherwise>
            Error - unknown mode type: <xsl:value-of select="$mode"/>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>

<!-- ##################### -->
<!-- I n s t r u c t i o n -->
<!-- ##################### -->
<xsl:template match="conf:instruction">
    <speak version="1.1" xml:lang="en-US">
        <xsl:value-of select="."/>
    </speak>
</xsl:template>

<!-- #################  -->
<!-- R e f e r e n c e  -->
<!-- #################  -->
<xsl:template match="conf:reference_markup">
    <xsl:choose>
        <xsl:when test="0 = count(child::*)">
            <speak version="1.1">
                <xsl:apply-templates
                 select="//conf:test_markup/ssml:speak/@xml:lang"/>
                <xsl:call-template name="meta"/>
                <xsl:value-of
                 select="normalize-space(//conf:test_markup/ssml:speak)"/>
            </speak>
        </xsl:when>
        <xsl:otherwise>
            <xsl:call-template name="copy_speak"/>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>

<!-- ################ -->
<!-- T e s t  S S M L -->
<!-- ################ -->
<xsl:template match="conf:test_markup">
    <xsl:call-template name="copy_speak"/>
</xsl:template>

<!-- ############################## -->
<!-- H e l p e r  T e m p l a t e s -->
<!-- ############################## -->
<!-- Copy the speak element -->
<xsl:template name="copy_speak">
          <xsl:element name="speak">
                  <xsl:apply-templates select="ssml:speak/@*"/>
                  <xsl:call-template name="meta"/>
                  <xsl:apply-templates select="ssml:speak/node()"/>
          </xsl:element>
</xsl:template>

<!-- Do copy without the namespace information duplicated -->
<xsl:template match="*">
        <xsl:element 
         name="{name()}"><xsl:apply-templates select="@* | node()"/>
        </xsl:element>
</xsl:template>
<xsl:template match="@*">
        <xsl:attribute name="{name()}">
                <xsl:value-of select="."/>
        </xsl:attribute>
</xsl:template>
<xsl:template match="text()">
        <xsl:value-of select="."/>
</xsl:template>

<!-- Meta data -->
<xsl:template name="meta">
    <metadata>
        <rdf:RDF
         xmlns:rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:rdfs = "http://www.w3.org/TR/1999/PR-rdf-schema-19990303#"
         xmlns:dc = "http://purl.org/metadata/dublin_core#">
            <xsl:element name="rdf:Description">
                <xsl:attribute name="dc:Description">
                    <xsl:copy-of
                     select="normalize-space(//conf:instruction)"/>
                </xsl:attribute>
            </xsl:element>
        </rdf:RDF>
    </metadata>
</xsl:template>

</xsl:stylesheet>

Appendix B - Downloading tests

The "ssml11-ir-20091215.zip" archive contains a number of resources. The SSML tests are ordered by test assertion id and are organized into folders where the folder name corresponds to the assertion id. In addition the archive includes the following:

B.1 The Manifest

"manifest.xml" is a file containing the complete information about test assertions written in the SSML Implementation Report project. The structure of the Manifest presents a root element called <tests> ; this is the container of all the Test Assertions. Every Test Assertion is represented by an <assertion> element containing CDATA that represents the description of the test assertion. At the end of the file is the <contribs> element; this lists all the people who have contributed to the Implementation Report preparation. The <assertion> element must contain a <start> element that references the main test file and may optionally contain several <dep> element that identify the other tests useful to complete the test case. Here's the DTD (manifest.dtd) for the Manifest:

<!ELEMENT tests (test+)>
<!ELEMENT test (assertion, start*, dep*)>

<!ELEMENT assertion (#PCDATA)>
<!ATTLIST assertion
    id CDATA #REQUIRED
    spec CDATA #REQUIRED
    conformance (Required | Optional) #REQUIRED
    test-type (Manual | Auto) #REQUIRED
    category (Abs_rating_simple | Abs_rating_complex | Paired_Simple | Paired_Complex) #REQUIRED
    profile (Core | Extended) #REQUIRED
>

<!ELEMENT dep EMPTY>
<!ATTLIST dep
    uri CDATA #REQUIRED
    type CDATA #REQUIRED
>
<!ELEMENT start EMPTY>
<!ATTLIST start
    uri CDATA #REQUIRED
    type CDATA #REQUIRED
>

Test assertion typology is defined by several attributes on the <assertion> element. These attributes allow for a more complete identification of the nature of the current assertion and an idea of related tests' structure.

id : Test Assertion Identifier.
spec: reference to paragraph belonging to the SSML 1.1 Specification.
conf_level: information about the optional or required nature of the current test assertion; it can assume one of the possible values: “Optional”, “Required”.
test-type: whether the test can be executed without manual modification or intervention; it can assume one of the possible values: “Manual”, “Auto”.
test_category: information about the test category. In particular it can assume one of the possible values: “Abs_rating_simple”, “Abs_rating_complex”, “Paired_simple”, “Paired_complex”.
profile: the profile of the tested feature; it can assume one of the possible values: “Core”, “Extended”.

<start> and <dep> elements are characterized by the following attributes:

uri: relative URI that links the referenced file;
type: referenced file mime type.

For instance here’s a fragment of the manifest.xml document:

<tests>
[…]
 <test>
  <assertion id="63" spec="3.2.4" test-type="Auto" category="Abs_rating_simple"
             profile="Core">
      pitch: legal value, relative positive semitone change,
             "+" number followed by "st".
  </assertion>
  <start uri="63/63.txml" type="text/x-txml"/>
 </test>
[…]
</tests>

Here’s another fragment of the manifest.xml document to show the use of the <dep> element:

<tests>
[…]
 <test>
  <assertion id="170" spec="3.3.1" test-type="Auto" category="Abs_rating_simple"
             profile="Core">
      If text only output is not required, the processor must try
      to play the referenced audio document 
      (Raw (headerless) 8kHz 8-bit mono A-law [PCM] single channel)     
  </assertion>
  <start uri="170/ta_170.txml" type="text/x-txml"/>
  <dep uri="170/beep_a.raw" type="audio/basic"/>
 <test>
[…]
</tests>

B.2 The Report Submission Template

The template (ssml11-ir-results-template.xml) has to be filled by the company following the rules described in Section 3. An excerpt of the Template is shown below.

<system-report name="YOUR-SYSTEM-NAME-HERE">
<testimonial> YOUR-WELL-FORMED-TESTIMOMIAL-CONTENT-HERE</testimonial>
<assert id="Id1" res="pass|fail|not-impl">OPTIONAL-NOTES-HERE</assert>
        [...]
<assert id="Idn" res="pass|fail|not-impl">OPTIONAL-NOTES-HERE</assert>
</system-report>

B.3 The Stylesheet

A specific stylesheet transforms the meta markup language used to write the tests into valid SSML documents (the stylesheet structure is described in Appendix A.6). The output of the stylesheet presents three valid SSML documents containing respectively the instructions, the reference text and the test itself. Parameterization is used so that a single stylesheet may be used for producing all three documents.

Appendix C - Acknowledgements

The Voice Browser Working Group would like to acknowledge the contributions of several individuals:

DeZhi Huang (France Telecom), who managed the SSML-IR website that allowed contributors to manage and execute the assertions and tests.
Patrizio Bergallo (Loquendo), who provided us with all SSML-IR website modifications we needed.
All the contributors to the SSML 1.0 IR plan, on which this work is based.

Thanks to Kazuyuki Ashimura, Jim Larson, and Matt Womer for important management support. Many thanks go to France Telecom for hosting the SSML-IR website.