Jade’s Status Update for Feb 15

This week I worked on finding the components I need for the audio input, as well as choosing a speech recognition package and investigating text to speech packages.

For audio input hardware I decided to go with a simple TRS to USB soundcard input into the Raspberry Pi. Along with this I also choose a set of speakers and investigated microphones to use.

Because speech recognition is such a big part of our project I have decided to use off the shelf parts for the analog audio input processing so that I can spend more time getting speech processing working.

For choosing a speech recognition package I looked through several different applications. Below is a chart of a few that I considered.

Pros Cons
PocketSphinx Has been shown to work on embedded devices such as Raspberry Pi. Completely offline. Hard to setup, can be slow because all processing is in Pi.
libsprec Works on embedded devices. Uses GoogleSpeechAPi which works well for speech recognition. Processing occurs off the Pi. Recquires Google Speech API Keys.
Julius Works on embedded devices. Low latency, small footprint. Inaccurate recognition, slow install process
SpeechRecognition Python Can use with pocket sphinx to get offline processing. Easy to install, works with PocketSphinx. Also can switch to using other speech recognition models. No guarantees on latency of processing, will need to perform testing.

I decided to use SpeechRecognition mostly because it is the most flexible of all the packages. SpeechRecognition is a python package that allows you to choose a speech recognition model. Some of the model’s that SpeechRecognition works with include: CMU Sphinx, Google Speech Recognition, Houndify API, IBM Speech to text and various others. The flexibility of SpeechRecognition and relative ease of install made it appealing to me. It also works with PocketSphinx which means that it can run completely offline.

After choosing a speech recognition package I started working on installing it and getting it working. So far, I am working on getting SpeechRecognition working. Currently the basic unit tests are failing so I am trying to debug those. I also am working on installing PocketSphinx’s databases

I am a little behind where I would like to be because I haven’t found a good text to speech package that I would like to use. I was considering using CMU-Flite because it is supposed to be a lightweight text-to-speech application. The main issue with using that package is that the built-in voice might not be friendly to children. To mitigate this I’m considering possibly finding another text-to-speech application which has voices that are friendly to children. The other alternative is using CMU-Flite and pitch shifting the output using a phase vocoding algorithm in order to make the voice sound more friendly.

In the upcoming week I want to get speech recognition working on the raspberry pi, and figure out exactly which text-to-speech application to use.



Leave a Reply

Your email address will not be published. Required fields are marked *