yuxuanx – Team B0: EchoBudget

April 28, 2024

Team Status Report for 4/27

Risk Mitigation

The main risk of our system is the possible failure in parsing the voice commands. To resolve this issue, we listed out the supported commands that are guaranteed to work and integrated these instructions in the “help” command. The user can simply say “help” to get sample commands and start exploring more from there.

Design Changes

There is no change in our design.

Updated Schedule

The updated schedule is as reflected in the final presentation.

List of all tests

Unit tests

Test	Expected Performance	Actual Performance
Latency (voice command)	4 seconds to render the page	Average 4.52 seconds to render the page
Battery life	Consumes less than 50% of power with monitor on for 1 hour	Power drops from 100% to 77%, consuming 23% of total power
Portability	500 grams	500 grams
Accessibility	32% of right half of screen	35.45% of right half of screen
Noise reduction	90% accuracy of integral test under 70dB environment	9 out of 10 commands work as expected, 90% accuracy

Test	Expected Performance	Actual Performance
Audio to text	Less than 20% of word error rate (WER)	98.3% accuracy, edge case exists
Text to command (NLP)	Identify verb with 100% accuracy; Identify item name, money, date, and number with 95% accuracy; NLP process takes less than 3s	Identify verb, money, date, and number with 100% accuracy; Identify item name with 96% accuracy NLP process takes about 2.5s
Item classification (Glove)	90% of item names correctly classified	18 out of 20 item names are correctly classified, 90% accuracy
Voice response	All audio requests should be assisted with a voice response	The app will read out the content of the response (entries, report, etc.)

System tests

User Experience Test: Each of the 5 volunteers would have half of the day to interact with our product without our interference. Volunteers are expected to give feedback about what they like and dislike about the product.

Overall Latency: Test different commands (e.g. record/change/delete entry, view entry, generate report, etc.) Expect <15s from pressing the button to rendering the page for all commands.

Overall Accuracy: Expect >95% accuracy for the whole system.

Test Findings

There are some improvements that could be made about the user interface. The font could be larger for normal users. We support “remove” command but not “delete” command, while there are delete buttons which could be misleading. The audio instruction is too long for some user to remember what commands are supported. It might be necessary to tell the user to strictly follow the instructions, or the accuracy of commands will be very low. We will make corresponding improvements before the final demo.

April 28, 2024April 28, 2024

Yuxuan’s Status Report for 4/27

Progress

I finalized out web application by adding the audio output for “modify”, “delete”, and “help” actions, so that all voice requests will be responded with voice output. The “help” command will trigger a voice instruction about how to use the app, giving the user some sample commands such as “enter apple for 4.99 dollars” to play around with the app.

I also conducted the user experience test with a volunteer. After the participant explored the system, I recorded the overall accuracy and latency of their command requests and responses.

Schedule

I am on schedule.

Next Steps

Next week our team will work on the poster, video, and final report, and prepare for the demo.

April 21, 2024

Yuxuan’s Status report for 4/20

Progress

This week I implemented the whole VUI part of our web app with Lynn. Specifically, we parsed the voice command processed by SpaCy models and handled edge cases that SpaCy failed to catch. Based on the command and the current page, we rendered a new page with necessary parameters and generated audio output as assistance to the visually impaired people. The supported voice commands on each starting page are listed as follows.

Home page: enter an entry, view entry list (in a time range), generate report (in a time range)

Entry list page: enter an entry, modify an entry, delete an entry, view entry list, generate report

Modify page: confirm modification, enter an entry, view entry list, generate report

Report page: enter an entry, view entry list, generate report

I also ran the test for item classification accuracy. I prepared 20 item names and labeled their correct categories, and then fed those item names to my classification model based on word2vec method for the actual results.

Schedule

I am on schedule.

Next steps

We will continue with our user experience test and prepared for the final demo next week.

This week’s special

I learned word2vec embeddings and was responsible for customizing a word2vec model for our item classification function. I learned how to use the pretrained model by looking at examples of using the word2vec model and selecting the methods useful in my customization.

Through this capstone project I was able to learn by doing. Sometimes I won’t figure out the best way to implement a feature until I start coding it up. For example, we searched so long for a method to get data from the previous web page but could not find an effective way online. Later when we tried to keep error message in session, we realized we could use sessions to store those data and retrieve them later. There were many similar design choices we had to make during actual implementation of the project, and I learned to consider tradeoffs and actual requirements when choosing the methods.

April 7, 2024

Yuxuan’s Status Report for 4/6

Progress

This week I continued downloading necessary modules on RPi. I tried to run a small script to load the Google news word2vec model on RPi but the Pi always stuck probably due to the gigantic size of the model. I decided to switch to another word embedding method GloVe (Global Vectors for Word Representation), which has various models with different sizes and dimensions to choose from. I downloaded the GloVe model with 400,000 words and 100 dimensions, and I successfully loaded the model on the RPi.

I also implemented the modify button for each entry in the entry list page. This includes displaying the id for each entry and creating a new html page and a new function in views.py to handle modification action.

Schedule

I am on schedule.

Next Steps

Next week I will continue to implement the delete function for the entries and help with the audio input functions of our web app. I will also rewrite the word2vec script using the new GloVe model and incorporate the item classification into the NLP process.

Verification method

Test for manual input feature: To test that the basic manual input functionalities are implemented as expected, we will run our web app following the flow chart in our design report, covering all branches and corner cases of the flow chart.

Test for item classification accuracy: We will randomly select 20 item names outside our dataset, label them with their correct categories, and feed them into my item classification script built upon the GloVe model. We will then determine the accuracy of its prediction, which is expected to be 90% or above.

Test for latency: if the whole speech processing pipeline takes more than 3 seconds on average, I should consider reducing the number of dimensions of the GloVe model for less loading time.

March 31, 2024

Yuxuan’s Status Report for 3/30

Progress

This week I spent most of the time installing and integrating the necessary modules and components to RPi with my team. Besides that I customized the Google word2vec model for classifying items names to our five pre-set categories. The code calculates scores that evaluate a new item name’s similarity with some words (in dataset) in each category, and it assigns the category with highest score to the new item name. The script will be finalized with a larger dataset of words for all categories and be integrated into our web app.

Schedule

I am on schedule.

Next Steps

Next week I will integrate the word2vec classification into the NLP pipeline on the RPi, which includes downloading the Google word2vec model, setting up the initial word dataset in a migration file, and integrating the classification feature in view.py. I will also work with my team to implement the audio input features (linking parsed commands to different actions) for our web app.

March 31, 2024

Team Status Report for 3/30

Risk Mitigation

This week our team integrated all components and scripts into Raspberry Pi to make sure that they are compatible. There was an issue when downloading SpaCy, which was solved by upgrading the Raspbian OS from 32 bits to 64 bits. We successfully installed SpaCy, PyAudio, and all related modules to the RPi, and we also set up Django and were able to run our web app on the RPi. We incorporated the microphone and made sure SpeechRecognition worked fine on the RPi.

Design Changes

There is no design change made.

Updated Schedule

The main functions to implement include text-to-speech conversion, audio input feature integrated into the web app, and report generation in web app. Below is an updated schedule for the rest of the project.

March 24, 2024March 24, 2024

Yuxuan’s Status Report for 3/23

Progress

This week I worked with my team to install python libraries and modules on Raspberry Pi, which took us a lot of time because they are supported by different Python versions.

I also spent much time on the web app. For now, I have added some dummy data in the migration files, made the “submit” button functional by saving the entry form into database, and displayed all entries within the selected time range in the record page. Below are snapshots of the code and the website.

Schedule

I am on schedule.

Next Steps

I will customize the Google News word2vec model for our app and integrate the classification script with the rest of the speech recognition and NLP pipeline.

March 16, 2024

Yuxuan’s Status Report for 3/16

Progress

My main focus of this week is the web application. I wrote html files for the 3 main pages: new entry, spending list, and financial report. I also implemented a navigation bar to jump between pages. I initialized the 6 categories in migration file so that user can select from the preset categories from a dropdown when creating an entry. Some buttons are dummy for now and the css style also requires further perfection.

Schedule

I am on schedule.

Next Steps

Next week I will continue work on the web app, improving the styles of the components and complete the functionalities. I aim to make the app functional with manual input by the interim demo.

March 10, 2024

Yuxuan’s Status Report for 3/9

Progress

This week I focused on composing the design review document. Specifically, I worked on introduction, word2vec of system implementation, word2vec and database in trade studies, testing, project management, and summary sections. Since I was responsible for the word2vec part, I tested out a pretrained word2vec model provided by gensim on my laptop. It took me a long time to set up the library and the model because gensim does not support latest versions of python. I was able to get the vector representation of a word and the similarity between two words. After familiarizing myself with the library and customizing the model for our use, the process of installing and using required modules should be smooth on Raspberry Pi as the next step.

Schedule

I am behind schedule because the design report took me longer than expected. I will focus on web app development next week as I have slack time after next week based on the original schedule.

Next steps

For next week I will implement the basic functions of the app (not including the audio input function) and continue integrating components in Raspberry Pi with my teammates.

March 10, 2024

Team Status Report for 3/9

Risk Mitigation

We started assembling Raspberry Pi and the monitor last week in case any changes need to be made before we write the design report. The integration worked as expected, and we are now able to program on Raspberry Pi with the touch screen and a keyboard. We will also integrate microphone (should be available once plugged in), test the built-in speaker and keyboard on the touchscreen next week to make sure all hardware components function compatibly.

Design Changes

We originally planned to use a power bank of 10000mAh, but after recalculating the power consumption, we might only need a power bank of 3500mAh, a changed reflected in the design report.

Updated Schedule

We are a bit behind schedule due to the time spent on the design report. The updated schedule is attached below.

This Week Special

Part A was written by Yixin, Part B was written by Yuxuan, and Part C was written by Lynn (Tianyi).

Part A

In considering global factors, this app addresses the fundamental need for financial management across different demographics. People all over the world, not only students and not only people in Pittsburgh, would have the need to track their spendings. By leveraging voice recognition technology, the app significantly lowers the barrier to entry for users. It would definitely help people with limited literacy or visual impairments, but it would also help general users to improve their user experience. This inclusivity ensures that people, regardless of their technological proficiency or physical capabilities, can efficiently manage their finances with ease.

Our current app’s focus on English-speaking users is designed to refine and perfect the user experience, ensuring that the core functionalities—such as expense tracking, report generation, and voice recognition—are robust and user-friendly. This strategic approach allows us to cater effectively to a significant portion of the global population, providing them with a powerful tool for financial management.

Part B

Our target users are mainly visually impaired people and the elderly group, and our design takes two main cultural factors into consideration.

One factor is social inclusion. Both visually impaired and elderly people tend to feel marginalized by society. Specifically, many money tracking tools on the market are applications on computers or mobile phones, to which these minority groups might have no access. Our product, however, provide a cheap access to money tracking for these groups. By supporting audio input and output, we enable these groups to use a money tracking tool like everyone else, satisfying their sense of belonging to the society.

The other factor is simple operation. The elderly and the visually impaired might have trouble interacting with a complex system due to the incapability to see the page or to understand the components. Therefore, we designed a simple UI so that the users can interact with the app without effort and almost hands-free. It enhanced the user experience for the targeted groups while keeping the essential functionalities of a money tracking app.

Part C

A major design consideration in our product is power consumption. While a portable power bank is attached to the device when customers are using it in environments without an approachable outlet, it is generally preferred to plug the device into a stable charger to guarantee that the system functions properly. Continuous charging may result in excessive power consumption. To avoid potential power waste, customers are encouraged to disable HDMI if they are not using the device.

Another concern is the screen radiation emitted from the touchscreen monitor in the device. Long-term exposure to such environment may cause diseases. However, the average screen time for our design is estimated to be under 30 minutes per day to guarantee that such radiation problems would hardly take place.