04/06
Progress
I spent the first half of the week figuring out a valid virtual keyboard that could allow standard GUI item name and price input. The initial plan was to use the built-in virtual keyboard in RPI, but the current packages available are not compatible with 64-bit RPI OS. Therefore I turned to a virtual keyboard component within the web application. The current keyboard is attached.
As for the VUI side, the web app could now provide financial report and entry list based on the audio input “generate report/get entries from [start date] to [end date]”. Since the date information is “Year-Month”, the filtering date is set as the 1st day of the start month and the last day of the end month. The web app could also provide audio output corresponding to user voice input. After all response text files are finalized, the app should be able to work for visually-impaired groups to use.
I also did unit tests on the audio input part and rendering functionalities for all web pages.
Schedule
I am on schedule now.
Next step
I am planning to implement the VUI of the entry-entering function and apply tests for it. I would also work with our team to conduct larger-scale tests that involve more subsystems.
Verification
- Tests for speech recognition accuracy:
To test the speech recognition pipeline, I wrote a script that accepted keyboard input as a session of recording and translating. To perform unit tests, I pressed the corresponding keyboard and gave voice commands. The script should be able to output converted text that is identical to the audio input.
Since the expected user audio inputs are short sentences starting with standard verb keywords, I first conducted unit tests on the verb keywords in both quiet environments and crowd environments. Words tested include “Enter”, “Get”, “Generate” and similar verbs. The accuracy was about 95% for these verbs, with a relatively low accuracy when the program starts first and a relatively high accuracy after the 1st input.
Then I tested whether the script could catch numbers including both price numbers and dates accurately. With the keyword “dollar” attached to the price number, the recognition accuracy would reach 99% with the current test cases. For example, “five point three dollars” could be translated to “$5.3” directly. As for dates, the “Year-Month” pattern could be successfully converted to text without difficulty in most cases. However, the accuracy for the month “May” is lower than all other months, so it would be necessary for us to provide some backup for the inaccurate translation.
As for item names, the accuracy drops to 95% again due to the large number of possible nouns. The pipeline could catch items that have relatively unique and complex pronunciations, whereas words with simpler pronunciations may be converted to similar words that are incorrect.
- Tests for web application VUI:
After the speech recognition pipeline was implemented within the web application, I conducted the same unit tests listed in the last category in the app. The performance was generally the same, which means that the web application framework would not affect the speech recognition process.