Lynn’s Status Report for 04/27

Progress

The primary task for this week is the integral testing of the whole system. I invited two volunteers to conduct the test by handing the device to them and letting them play around with the functionalities. I recorded the time interval starting from the user giving a command to the system outputting corresponding results. Another metric is the overall accuracy of the system. I counted the number of times the system provided a “Try again” output, which means the system failed to examine the command. After the testing process ended, I asked each volunteer about their comments on the design of the system and recorded their thoughts.

After the testing was finished, I cleaned up the data and discussed the results with my teammates. We would adjust the user interface design based on the feedback and present the final version on the demo day.

Schedule

I am on schedule.

Next Step

I will work with Yuxuan to design and record the final video, and write the final report and conduct the final demo with all teammates next week.

Lynn’s Status Report for 04/20

Progress

This week I worked with Yuxuan to finish implementing all functionalities of the application. We first connected all the models trained by each teammate together and constructed a complete helper function that allows users to render different pages based on the input verb. Then we designed the VUI for entering a new entry, modifying existing entries, and deleting an entry. The user could record the item they purchased by simply saying the item name and price. The word2vec model would assign a category based on the item name, and the created entry will be saved to the database. As for modifying an entry, the user could give a command with the entry number and a modifying page would be rendered. The user could then provide specific changing requests to a field and save the changes by saying “confirm”. After the functions were implemented, we conducted several groups of tests to figure out potential edge cases for each action. 

As for testing, I finished unit testing for the speech recognition model and calculated the accuracy by counting the number of misparsing words. The outcome is about 98.3% accuracy, which is much greater than the expected performance earlier in the design requirement. 

Schedule

I am on schedule now.

Next step

I will work with my team to finish all the remaining work for the final demo including the poster, video, and report. 

New tools and knowledge

Since I am responsible for the speech recognition part of our project, I first learned how to use the Python libraries we selected including PyAudio, noisereduce, SpeechRecognition, and gTTS to implement the audio-to-text pipeline. To understand the basic functions of these libraries, I read through the documentation and searched for sample usage of the methods. 

After receiving the raspberry pi and the monitor, our team spent a lot of time setting up the system and downloading all the required libraries. Since the Linux system on raspberry pi is different from the ordinary Windows or MacOS we use on our laptops, unseen errors may take place at any component of the whole process. For example, we used “sudo apt install” for most of the libraries, but this command did not work when we were trying to download gTTS. We searched on forums and spent quite some time trying different possible solutions. In the end, we succeeded in downloading and running the library in our web application.

Lynn’s Status Report for 04/06

04/06

Progress

I spent the first half of the week figuring out a valid virtual keyboard that could allow standard GUI item name and price input. The initial plan was to use the built-in virtual keyboard in RPI, but the current packages available are not compatible with 64-bit RPI OS. Therefore I turned to a virtual keyboard component within the web application. The current keyboard is attached.

As for the VUI side, the web app could now provide financial report and entry list based on the audio input “generate report/get entries from [start date] to [end date]”. Since the date information is “Year-Month”, the filtering date is set as the 1st day of the start month and the last day of the end month. The web app could also provide audio output corresponding to user voice input. After all response text files are finalized, the app should be able to work for visually-impaired groups to use.

I also did unit tests on the audio input part and rendering functionalities for all web pages.

Schedule

I am on schedule now.

Next step

I am planning to implement the VUI of the entry-entering function and apply tests for it. I would also work with our team to conduct larger-scale tests that involve more subsystems.

Verification

  • Tests for speech recognition accuracy: 

To test the speech recognition pipeline, I wrote a script that accepted keyboard input as a session of recording and translating. To perform unit tests, I pressed the corresponding keyboard and gave voice commands. The script should be able to output converted text that is identical to the audio input.

Since the expected user audio inputs are short sentences starting with standard verb keywords, I first conducted unit tests on the verb keywords in both quiet environments and crowd environments. Words tested include “Enter”, “Get”, “Generate” and similar verbs. The accuracy was about 95% for these verbs, with a relatively low accuracy when the program starts first and a relatively high accuracy after the 1st input. 

Then I tested whether the script could catch numbers including both price numbers and dates accurately. With the keyword “dollar” attached to the price number, the recognition accuracy would reach 99% with the current test cases. For example, “five point three dollars” could be translated to “$5.3” directly. As for dates, the “Year-Month” pattern could be successfully converted to text without difficulty in most cases. However, the accuracy for the month “May” is lower than all other months, so it would be necessary for us to provide some backup for the inaccurate translation.  

As for item names, the accuracy drops to 95% again due to the large number of possible nouns. The pipeline could catch items that have relatively unique and complex pronunciations, whereas words with simpler pronunciations may be converted to similar words that are incorrect.

  • Tests for web application VUI:

After the speech recognition pipeline was implemented within the web application, I conducted the same unit tests listed in the last category in the app. The performance was generally the same, which means that the web application framework would not affect the speech recognition process.

Lynn’s Status Report for 03/30

Progress

I first tested the speech recognition pipeline on the RPI and it worked as desired. Then I spent most of my time this week working on the implementation of the web application. I wrote the CSS styling based on the UI design diagrams and constructed the financial report charts. Now the users could view financial report charts based on the time range they want to view. The current page layouts are attached below.

As for the VUI side, the user could now press the button to give voice commands, and the web app could redirect to certain pages based on the input keywords. For example, if the user gives the command “generate report”, the report page will be rendered and the charts will be displayed successfully. 

I also wrote primary scripts for generating audio outputs. The script could convert the input txt file to audio and play the file directly, and the audio file will be deleted after playing to save memory.

Schedule

I am on schedule now.

Next Steps

I am planning to design the audio output template txt files and generate corresponding content based on the user data within the web app. The users should be able to receive audio outputs when interacting with the application.

Lynn’s Status Report for 03/23

Progress

This week our team worked on setting up and downloading the Python libraries on the RPi. I tested the speech recognition libraries and guaranteed that the libraries worked as expected. I also concatenated the speech recognition pipeline to the primary NLP model. After conducting some unit tests on the combined system, I would conclude that the verb commands could be successfully distinguished with a relatively high accuracy. 

After that, I worked with Yuxuan to make some adjustments to the web app design. We figured out how to filter the expected data efficiently when the user requests to view an entry or financial report. I started to work on the styling of the web application, and this will also be my primary task for the next week.

Schedule

I am on schedule.

Next Steps

I will focus on the css styling of the web app first, and then I need to construct the financial report page with a pie chart and a histogram. 

I would also implement the “press to speak” button and allow it to run the speech recognition pipeline.

 

Lynn’s Status Report for 03/16

Progress

I was working on finalizing the speech recognition pipeline and did basic testing toward the subsystem this week. To simplify the script and save memory, I decided not to temporarily save the audio inputs as .wav files. Instead, I choose to feed the byte frames directly into the noisereduce and speech recognition methods. 

After researching and doing some primary testing on the primary version of the script, I realized that it is not necessary to include both a “start” and “end” event to manually control audio recording. With the modified version of the audio recording and speech recognition pipeline, the recording process will terminate automatically after a specific time, and would end the current session if do not hear from the speaker for another set time period.

The current script could recognize the standard commands with acceptable accuracy:

A major focus of the speech recognition process is the price number. Currently, the price could be accurately recognized if the “dollar” keyword is included. On the other hand, if the speaker gives vague word commands such as “four-sixty”, the recognizer would directly convert it to “460”, which is discrepant from the expected value. We may need further discussion on how to deal with this.

Schedule

I am a little behind schedule for testing the scripts on RPi, but I will catch that next week. 

Next Step

I will test the pipeline on the RPi in both quiet environments and crowded environments. Also, I will work with Yuxuan to implement the web app and connect the front-end buttons to the speech recognition pipeline.

Lynn’s Status Report for 03/09

Progress

The primary focus for the week is the design report, and I spent time composing the Architecture and/or principle of operation, System Implementation, and some parts of the trade studies. To better clarify the design details and hierarchy, I added several new diagrams for both the hardware and software systems. I also did the design report formatting for the team after the contents were finished. 

I spent the rest of the week diving deeper into the signal processing section. After successfully recording audio using PyAudio, I started to write scripts for noise reduction and tried to test the different noise reduction libraries using audio inputs under higher volume environments. I also wrote scripts to test the speech recognition function.

Schedule

I am on schedule

Next Step

After the primary test passes for noise reduction, I am planning to test the functionality on RPi since we have set up the raspberry pi and monitor. I will also start to construct the whole pipeline for signal processing.

Lynn’s Status Report for 02/24

During Monday’s meeting, I did the design review presentation for our team. As the speaker of the presentation, I spent the majority of my time in the first half of the week preparing for the speech. Following the job breakdown, my focus for the design review was still Solution Approach and Implementation Plan, so I made the slides for these parts and finalized the presentation script with my teammates. I also updated the block diagram accordingly, 

After the presentation, I started to write Python scripts for the audio input with the microphone connected. Based on the design, PyAudio library is used to start and end speech recording for user input commands. The input stream is then fed to the PyPI noisereduce algorithms. 

Schedule

I am on schedule

Next Step

During the team meeting, different parts of the design report were discussed and assigned. Therefore, I will focus on writing my part of the design report, Architecture and/or principle of operation, System Implementation, and some parts of the trade studies, next week. I will also write more scripts for audio input analysis.

Lynn’s Status Report for 02/17

Progress

After the weekly meeting with the instructor and TA, our team started to work on finalizing the design details. Our team discussed a lot about further design choices and tried to answer the “why” question about those choices. My primary focus was on the completed version of the front-end UI and the finite state machine diagram of the user interaction flow chart. After that, Yuxuan and I worked together to specify the MVC design of the web application from both an ordinary user’s and a visually impaired user’s perspective. 

To prepare for the design review presentation next week, I read through the documentation of spaCy and word2vec to gain a further understanding of the NLP process. 

A copy of the finalized UI and FSM is attached.

 

Schedule

I am on schedule.

Next Step

As the presenter of the design review, I would first finish the presentation during Monday’s or Wednesday’s class. After that, I will start working on the primary microphone setup since the hardware components ordered last week have arrived.

Lynn’s Status Report for 02/10

Progress

The primary focus for this week is the proposal presentation. Although I am not the presenter this time, research on the technical challenges and corresponding solution approaches is still my major effort. For the audio input section of our design, signal processing libraries including PyAudio, PyPi noisereduce module, and SpeechRecognition should be utilized to reduce noise and transfer audio to text strings. Natural language processing models such as spaCy and pre-trained BERT model are then considered to recognize user commands and categorize the purchased items. I designed a general block diagram from user input to all expected components and the various forms of output financial reports.

After the proposal presentation, I started to design the layout front-end UI of our web application. A primary draft is attached. 

 I also researched the usability of the spaCy library to identify and match the user commands to specific functionalities. The Tokenizer and Matcher in the library should be utilized, and I wrote several Python scripts to further familiarize myself with spaCy. 

Schedule

I am on schedule.

Next Step

I will first focus on finalizing the design of front-end UI the next week, and our team will design the MVC framework of the web application together. I will also continue working on researching and implementing the NLP models.