This repo contains the client libraries that demonstrate Microsoft’s algorithms to process spoken language. With these APIs, developers can easily include the ability to add speech driven actions to their applications. In certain cases, the APIs also allow for real-time interaction with the user as well. See the tech in action on our demo page or learn more about the API with our documentation.
Convert spoken audio to text. The API can be directed to turn on and recognize audio coming from the microphone in real-time, recognize audio coming from a different real-time audio source, or to recognize audio from within a file. In all cases, real-time streaming is available, so as the audio is being sent to the server, partial recognition results are also being returned.
Convert spoken audio to intent. Similar to Speech Recognition, Speech Intent Recognition -in addition to returning recognized text from audio input- returns structured information about the incoming speech so that apps can easily parse the intent of the speaker, and subsequently drive further action.
With this APIs developers can easily convert text to spoken audio. When applications need to “talk” back to their users, this API can be used to convert text that is generated by the app into audio that can be played back to the user. See the tech in action on our demo page or learn more about the API with our documentation.
To get started, select the technology that you are interested.
We welcome contributions and are always looking for new SDKs, input, and suggestions. Feel free to file issues on the repo and we'll address them as we can. You can also learn more about how you can help on the Contribution Rules & Guidelines.
For questions, feedback, or suggestions about Microsoft Cognitive Services, feel free to reach out to us directly.
All Microsoft Cognitive Services SDKs and samples are licensed with the MIT License. For more details, see LICENSE.
Sample images are licensed separately, please refer to LICENSE-IMAGE.