Website Links

Wednesday 3 September 2014

Using Text To Speech For Windows Phone 8.1

Text to speech in Windows Phone Silverlight apps is remarkably simple. However, as with everything the more you want to do the more complex it gets. Having said that this tutorial should be able to guide you through most things you may commonly want to do. First, we created a button in our xaml that when clicked called the following method with a word of our choosing:


  private async void Play_Word(string word)
  {
      var synth = new SpeechSynthesizer();
      await synth.SpeakTextAsync(word);
  }



The async keyword basically says that the method will run asynchronously, so as to not block the caller's thread. It is used because speaking text may be a long running task. It ensures that the user doesn't experience a lack of responsiveness. Whereas the await keyword means that the rest of the method will not be executed until the call following the await has completed. We actually instantiated our SpeechSynthesizer in the constructor, but we have moved the instantiation into Play_Word for completeness, and will now just refer to the SpeechSynthesizer as _synth.

In our app we used text to speech to say a word for the user to spell. However, upon using it several times we realised the rate of speech was too high as words such as 'ant' and 'bee' were spoken too quickly to be distinguishable. Therefore, we switched to using SSML to speak the text at our desired rate. Our amended method was as follows:


  private async void Play_Word(string word)
  {
      string ssml = String.Format("<speak version=\"1.0\" 
        xmlns=\"http://www.w3.org/2001/10/synthesis\"
        xml:lang=\"en-US\">
        <prosody rate=\"-2\">{0}</prosody>
        </speak>", word);
      await _synth.SpeakSsmlAsync(ssml);
  }



Basically, the speak tag must surround the SSML tags that specify how your words will be spoken, and its attribute xmlns defines the XML namespace. Prosody controls the rate, pitch and output volume. As we only wanted to control the rate we set the rate attribute of prosody. There are also other aspects of speech you can control, and W3's SSML page documents them.

No comments:

Post a Comment