Copyright © 2019 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and document use rules apply.
The objective of the Pronunciation Task Force is to develop normative specifications and best practices guidance collaborating with other W3C groups as appropriate, to provide for proper pronunciation in HTML content when using text to speech (TTS) synthesis. This document provides various user scenarios highlighting the need for standardization of pronunciation markup, to ensure that consistent and accurate representation of the content. The requirements that come from the user scenarios provide the basis for the technical requirements/specifications.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
This is a First Public Working Draft of Pronunciation User Scenarios by the Accessible Platform Architectures Working Group. It was initially developed by the Pronunciation Task Force to provide various user scenarios highlighting the need for standardization of pronunciation markup, to ensure that consistent and accurate representation of the content. The requirements that come from the user scenarios provide the basis for the technical requirements/specifications.
To comment, file an issue in the W3C pronunciation GitHub repository. If this is not feasible, send email to public-pronunciation@w3.org (subscribe, archives). Comments are requested by 14 October 2019. In-progress updates to the document may be viewed in the publicly visible editors' draft.
Publication as a First Public Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the W3C Patent Policy. The group does not expect this document to become a W3C Recommendation. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This document is governed by the 1 March 2019 W3C Process Document.
This section is non-normative.
As part of the Accessible Platform Architectures (APA) Working Group, the Pronunciation Task Force (PTF) is a collaboration of subject matter experts working to identify and specify the optimal approach which can deliver reliably accurate pronunciation across browser and operating environments. With the introduction of the Kurzweil reading aid in 1976, to the more sophisticated synthetic speech currently used to assist communication as reading aids for the visually impaired and those with reading disabilities, the technology has multiple applications in education, communication, entertainment, etc. From helping to teach spelling and pronunciation in different languages, Text-to-Speech (TTS) has become a vital technology for providing access to digital content on the web and through mobile devices.
The challenges that TTS presents include but are not limited to: the inability to accommodate regional variation and presentation of every phoneme present throughout the world; the incorrect determination by TTS of the pronunciation of content in context, and; the current inability to influence other pronunciation characteristics such as prosody and emphasis.
The purpose of developing user scenarios is to facilitate discussion and further requirements definition for pronunciation standards developed within the PTF prior to review of the APA. There are numerous interpretations of what form user scenarios adopt. Within the user experience research (UXR) body of practice, a user scenario is a written narrative related to the use of a service from the perspective of a user or user group. Importantly, the context of use is emphasized as is the desired outcome of use. There are potentially thousands of user scenarios for a technology such as TTS, however, the focus for the PTF is on the core scenarios that relate to the kinds of users who will engage with TTS.
User scenarios, like Personas, represent a composite of real-world experiences. In the case of the PTF, the scenarios were derived from interviews of people who were end-consumers of TTS, as well as submitted narratives and industry examples from practitioners. There are several formats of scenarios. Several are general goal or task-oriented scenarios. Others elaborate on richer context, for example, educational assessment.
The following user scenarios are organized on the three perspectives of TTS use derived from analysis of the qualitative data collected from the discovery work:
Ultimately, the quality and variation of TTS rendering by assistive technologies vary widely according to a user's context. The following user scenarios reinforce the necessity for accurate pronunciation from the perspective of those who consume digitally generated content.
The advent of graphical user interfaces (GUIs) for the management and editing of text content has given rise to content creators not requiring technical expertise beyond the ability to operate a text editing application such as Microsoft Word. The following scenario summarizes the general use, accompanied by a hypothetical application.
In the educational assessment field, providing accurate and concise pronunciation for students with auditory accommodations, such as text-to-speech (TTS) or students with screen readers, is vital for ensuring content validity and alignment with the intended construct, which objectively measures a test takers knowledge and skills. For test administrators/educators, pronunciations must be consistent across instruction and assessment in order to avoid test bias or impact effects for students. Some additional requirements for the test administrators, include, but are not limited to, such scenarios:
a3-b3=(a-b)(a2+ab+b2)
may incorrectly render through some technologies and applications as a3-b3=(a-b)(a2+ab+b2).The extension of content management in TTS is one as a means of encoding and preserving spoken text for academic analyses; irrespective of discipline, subject domain, or research methodology.
Technical standards for software development assist organizations and individuals to provide accessible experiences for users with disabilities. The final user scenarios in this document are considered from the perspective of those who design and develop software.
This section is non-normative.
The following people contributed to the development of this document.