Team Status Report for 4/27

Risk Mitigation

The main risk of our system is the possible failure in parsing the voice commands. To resolve this issue, we listed out the supported commands that are guaranteed to work and integrated these instructions in the “help” command. The user can simply say “help” to get sample commands and start exploring more from there.

 

Design Changes

There is no change in our design.

 

Updated Schedule

The updated schedule is as reflected in the final presentation.

 

List of all tests

Unit tests

Test Expected Performance Actual Performance
Latency (voice command) 4 seconds to render the page Average 4.52 seconds to render the page
Battery life Consumes less than 50% of power with monitor on for 1 hour Power drops from 100% to 77%, consuming 23% of total power
Portability 500 grams 500 grams
Accessibility 32% of right half of screen 35.45% of right half  of screen
Noise reduction 90% accuracy of integral test under 70dB environment 9 out of 10 commands work as expected, 90% accuracy

 

Test Expected Performance Actual Performance
Audio to text Less than 20% of word error rate (WER) 98.3% accuracy, edge case exists
Text to command (NLP) Identify verb with 100% accuracy;

Identify item name, money, date, and number with 95% accuracy;

NLP process takes less than 3s

Identify verb, money, date, and number with 100% accuracy;

Identify item name with 96% accuracy

NLP process takes about 2.5s

Item classification (Glove) 90% of item names correctly classified 18 out of 20 item names are correctly classified, 90% accuracy
Voice response All audio requests should be assisted with a voice response The app will read out the content of the response (entries, report, etc.)

System tests

User Experience Test: Each of the 5 volunteers would have half of the day to interact with our product without our interference. Volunteers are expected to give feedback about what they like and dislike about the product.

Overall Latency: Test different commands (e.g. record/change/delete entry, view entry, generate report, etc.) Expect <15s from pressing the button to rendering the page for all commands.

Overall Accuracy: Expect >95% accuracy for the whole system.

Test Findings

There are some improvements that could be made about the user interface. The font could be larger for normal users. We support “remove” command but not “delete” command, while there are delete buttons which could be misleading. The audio instruction is too long for some user to remember what commands are supported. It might be necessary to tell the user to strictly follow the instructions, or the accuracy of commands will be very low. We will make corresponding improvements before the final demo.

Lynn’s Status Report for 04/27

Progress

The primary task for this week is the integral testing of the whole system. I invited two volunteers to conduct the test by handing the device to them and letting them play around with the functionalities. I recorded the time interval starting from the user giving a command to the system outputting corresponding results. Another metric is the overall accuracy of the system. I counted the number of times the system provided a “Try again” output, which means the system failed to examine the command. After the testing process ended, I asked each volunteer about their comments on the design of the system and recorded their thoughts.

After the testing was finished, I cleaned up the data and discussed the results with my teammates. We would adjust the user interface design based on the feedback and present the final version on the demo day.

Schedule

I am on schedule.

Next Step

I will work with Yuxuan to design and record the final video, and write the final report and conduct the final demo with all teammates next week.

Yuxuan’s Status Report for 4/27

Progress

I finalized out web application by adding the audio output for “modify”, “delete”, and “help” actions, so that all voice requests will be responded with voice output. The “help” command will trigger a voice instruction about how to use the app, giving the user some sample commands such as “enter apple for 4.99 dollars” to play around with the app.

I also conducted the user experience test with a volunteer. After the participant explored the system, I recorded the overall accuracy and latency of their command requests and responses.

 

Schedule

I am on schedule.

 

Next Steps

Next week our team will work on the poster, video, and final report, and prepare for the demo.

Yixin’s Status Report for 04/27/2024

  • What did you personally accomplish this week on the project? Give files or photos that demonstrate your progress. Prove to the reader that you put sufficient effort into the project over the course of the week (12+ hours).

I gave our products to two users this week and collected their feedback. One of them said that some commands would not have an audio response. I also updated this feedback with Yuxuan and Yuxuan has added the audio response for those commands. In addition, I am working on the final poster right now. I have prepared the new information (e.g. what we have achieved) for the poster and will do the formatting next week.

  • Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

on schedule

  • What deliverables do you hope to complete in the next week?
  1. prepare for final deliverables

Team Status Report for 04/20

Risk Mitigation

Our team collaborated to finish all the required functionalities of our application this week. During implementation of the web application request handler functions, we figured out that some verbs may not be distinguished correctly and thus lead to bad performance. Therefore, we communicated effectively and applied a newly trained SpaCy model to deal with the edge cases. We also constructed some specific handlers for potential failures found during testing.

We also worked together to prepare for the final presentation. We conducted unit testing on each module separately and did some primary integral testing on the web application. After that, we designed the final presentation slides as well.

Design Changes

There is no design change made.

Updated Schedule

There is no schedule update this week.

Lynn’s Status Report for 04/20

Progress

This week I worked with Yuxuan to finish implementing all functionalities of the application. We first connected all the models trained by each teammate together and constructed a complete helper function that allows users to render different pages based on the input verb. Then we designed the VUI for entering a new entry, modifying existing entries, and deleting an entry. The user could record the item they purchased by simply saying the item name and price. The word2vec model would assign a category based on the item name, and the created entry will be saved to the database. As for modifying an entry, the user could give a command with the entry number and a modifying page would be rendered. The user could then provide specific changing requests to a field and save the changes by saying “confirm”. After the functions were implemented, we conducted several groups of tests to figure out potential edge cases for each action. 

As for testing, I finished unit testing for the speech recognition model and calculated the accuracy by counting the number of misparsing words. The outcome is about 98.3% accuracy, which is much greater than the expected performance earlier in the design requirement. 

Schedule

I am on schedule now.

Next step

I will work with my team to finish all the remaining work for the final demo including the poster, video, and report. 

New tools and knowledge

Since I am responsible for the speech recognition part of our project, I first learned how to use the Python libraries we selected including PyAudio, noisereduce, SpeechRecognition, and gTTS to implement the audio-to-text pipeline. To understand the basic functions of these libraries, I read through the documentation and searched for sample usage of the methods. 

After receiving the raspberry pi and the monitor, our team spent a lot of time setting up the system and downloading all the required libraries. Since the Linux system on raspberry pi is different from the ordinary Windows or MacOS we use on our laptops, unseen errors may take place at any component of the whole process. For example, we used “sudo apt install” for most of the libraries, but this command did not work when we were trying to download gTTS. We searched on forums and spent quite some time trying different possible solutions. In the end, we succeeded in downloading and running the library in our web application.

Yuxuan’s Status report for 4/20

Progress

This week I implemented the whole VUI part of our web app with Lynn. Specifically, we parsed the voice command processed by SpaCy models and handled edge cases that SpaCy failed to catch. Based on the command and the current page, we rendered a new page with necessary parameters and generated audio output as assistance to the visually impaired people. The supported voice commands on each starting page are listed as follows.

Home page: enter an entry, view entry list (in a time range), generate report (in a time range)

Entry list page: enter an entry, modify an entry, delete an entry, view entry list, generate report

Modify page: confirm modification, enter an entry, view entry list, generate report

Report page: enter an entry, view entry list, generate report

I also ran the test for item classification accuracy. I prepared 20 item names and labeled their correct categories, and then fed those item names to my classification model based on word2vec method for the actual results.

 

Schedule

I am on schedule.

 

Next steps

We will continue with our user experience test and prepared for the final demo next week.

 

This week’s special

I learned word2vec embeddings and was responsible for customizing a word2vec model for our item classification function. I learned how to use the pretrained model by looking at examples of using the word2vec model and selecting the methods useful in my customization.

Through this capstone project I was able to learn by doing. Sometimes I won’t figure out the best way to implement a feature until I start coding it up. For example, we searched so long for a method to get data from the previous web page but could not find an effective way online. Later when we tried to keep error message in session, we realized we could use sessions to store those data and retrieve them later. There were many similar design choices we had to make during actual implementation of the project, and I learned to consider tradeoffs and actual requirements when choosing the methods.

Yixin’s Status Report for 04/20/2024

  • What did you personally accomplish this week on the project? Give files or photos that demonstrate your progress. Prove to the reader that you put sufficient effort into the project over the course of the week (12+ hours).

I have trained spacy that could identify ITEM. I have about 100 commands as training datasets for the final version. In addition, I also trained the model again to add two labels: CHANGE and CATEGORY. CHANGE will identify which part of the entry (item name, price, or classification) we want to modify. CATEGORY would identify what the new classification we would want to assign to the items (e.g. if the command is “change category to entertainment”, the entertainment would have a label CATEGORY). In addition, there is also another label named CARDINAL. This label is used when we are using commands like “delete entry number 5” or “modify entry number 1”. “5” and “1” will be parsed from the commands and they will have the label CARDINAL.

In addition, I have done testing for all these new labels (ITEM, CHANGE, CATEGORY, and CARDINAL). The accuracy for CHANGE is not ideal, so we would not use this label in our model. The other three labels have pretty good accuracy (>95%) and meet our expectations.

In addition, since I need to present next week for the final presentation. I also spend some time preparing for the presentations.

  • Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

on schedule

  • What deliverables do you hope to complete in the next week?
  1. prepare for the presentation
  2. integral testing
  • This Week’s Special

One of the thing I have learned is how to downloading all the things we needed on a hardware. In past courses, there would always be a detailed and clear guide provided by the professors to guide us through how to setup softwares like Python and Django. However, to download all packages we needed in Raspberry Pi does take much longer than we expected. We need to figure out why Spacy would not work on our Pi. We searched online for a while and finally found a post that have the same problem as we did. This post reminded us that Spacy required a 64-bit OS while we are not sure if our Pi have it. Therefore, we finally reinstalled a 64-bit OS and Spacy work. After having this experience, I would checked the requirements and compatability carefully next time before downloading anything.

In addition, I also encountered some problems when training Spacy, the package I used for the first time. To acquire this new knowledge, I read through the Linguistic Features  and Training Models part of Spacy’s guide. These help me to choose what features our app would use and what model I would train. In addition, I also searched online and found step by step guidance that taught me how to prepare the dataset. When I encountered some errors, I would search for posts that had the same problems and tried to fix these errors with their methods.

 

Lynn’s Status Report for 04/06

04/06

Progress

I spent the first half of the week figuring out a valid virtual keyboard that could allow standard GUI item name and price input. The initial plan was to use the built-in virtual keyboard in RPI, but the current packages available are not compatible with 64-bit RPI OS. Therefore I turned to a virtual keyboard component within the web application. The current keyboard is attached.

As for the VUI side, the web app could now provide financial report and entry list based on the audio input “generate report/get entries from [start date] to [end date]”. Since the date information is “Year-Month”, the filtering date is set as the 1st day of the start month and the last day of the end month. The web app could also provide audio output corresponding to user voice input. After all response text files are finalized, the app should be able to work for visually-impaired groups to use.

I also did unit tests on the audio input part and rendering functionalities for all web pages.

Schedule

I am on schedule now.

Next step

I am planning to implement the VUI of the entry-entering function and apply tests for it. I would also work with our team to conduct larger-scale tests that involve more subsystems.

Verification

  • Tests for speech recognition accuracy: 

To test the speech recognition pipeline, I wrote a script that accepted keyboard input as a session of recording and translating. To perform unit tests, I pressed the corresponding keyboard and gave voice commands. The script should be able to output converted text that is identical to the audio input.

Since the expected user audio inputs are short sentences starting with standard verb keywords, I first conducted unit tests on the verb keywords in both quiet environments and crowd environments. Words tested include “Enter”, “Get”, “Generate” and similar verbs. The accuracy was about 95% for these verbs, with a relatively low accuracy when the program starts first and a relatively high accuracy after the 1st input. 

Then I tested whether the script could catch numbers including both price numbers and dates accurately. With the keyword “dollar” attached to the price number, the recognition accuracy would reach 99% with the current test cases. For example, “five point three dollars” could be translated to “$5.3” directly. As for dates, the “Year-Month” pattern could be successfully converted to text without difficulty in most cases. However, the accuracy for the month “May” is lower than all other months, so it would be necessary for us to provide some backup for the inaccurate translation.  

As for item names, the accuracy drops to 95% again due to the large number of possible nouns. The pipeline could catch items that have relatively unique and complex pronunciations, whereas words with simpler pronunciations may be converted to similar words that are incorrect.

  • Tests for web application VUI:

After the speech recognition pipeline was implemented within the web application, I conducted the same unit tests listed in the last category in the app. The performance was generally the same, which means that the web application framework would not affect the speech recognition process.

Yuxuan’s Status Report for 4/6

Progress

This week I continued downloading necessary modules on RPi. I tried to run a small script to load the Google news word2vec model on RPi but the Pi always stuck probably due to the gigantic size of the model. I decided to switch to another word embedding method GloVe (Global Vectors for Word Representation), which has various models with different sizes and dimensions to choose from. I downloaded the GloVe model with 400,000 words and 100 dimensions, and I successfully loaded the model on the RPi.

I also implemented the modify button for each entry in the entry list page. This includes displaying the id for each entry and creating a new html page and a new function in views.py to handle modification action.

Schedule

I am on schedule.

Next Steps

Next week I will continue to implement the delete function for the entries and help with the audio input functions of our web app. I will also rewrite the word2vec script using the new GloVe model and incorporate the item classification into the NLP process.

Verification method

Test for manual input feature: To test that the basic manual input functionalities are implemented as expected, we will run our web app following the flow chart in our design report, covering all branches and corner cases of the flow chart.

Test for item classification accuracy: We will randomly select 20 item names outside our dataset, label them with their correct categories, and feed them into my item classification script built upon the GloVe model. We will then determine the accuracy of its prediction, which is expected to be 90% or above.

Test for latency: if the whole speech processing pipeline takes more than 3 seconds on average, I should consider reducing the number of dimensions of the GloVe model for less loading time.