Lynn’s Status Report for 03/16

Progress

I was working on finalizing the speech recognition pipeline and did basic testing toward the subsystem this week. To simplify the script and save memory, I decided not to temporarily save the audio inputs as .wav files. Instead, I choose to feed the byte frames directly into the noisereduce and speech recognition methods. 

After researching and doing some primary testing on the primary version of the script, I realized that it is not necessary to include both a “start” and “end” event to manually control audio recording. With the modified version of the audio recording and speech recognition pipeline, the recording process will terminate automatically after a specific time, and would end the current session if do not hear from the speaker for another set time period.

The current script could recognize the standard commands with acceptable accuracy:

A major focus of the speech recognition process is the price number. Currently, the price could be accurately recognized if the “dollar” keyword is included. On the other hand, if the speaker gives vague word commands such as “four-sixty”, the recognizer would directly convert it to “460”, which is discrepant from the expected value. We may need further discussion on how to deal with this.

Schedule

I am a little behind schedule for testing the scripts on RPi, but I will catch that next week. 

Next Step

I will test the pipeline on the RPi in both quiet environments and crowded environments. Also, I will work with Yuxuan to implement the web app and connect the front-end buttons to the speech recognition pipeline.

Yuxuan’s Status Report for 3/16

Progress

My main focus of this week is the web application. I wrote html files for the 3 main pages: new entry, spending list, and financial report. I also implemented a navigation bar to jump between pages. I initialized the 6 categories in migration file so that user can select from the preset categories from a dropdown when creating an entry. Some buttons are dummy for now and the css style also requires further perfection.

Schedule

I am on schedule.

Next Steps

Next week I will continue work on the web app, improving the styles of the components and complete the functionalities. I aim to make the app functional with manual input by the interim demo.

Lynn’s Status Report for 03/09

Progress

The primary focus for the week is the design report, and I spent time composing the Architecture and/or principle of operation, System Implementation, and some parts of the trade studies. To better clarify the design details and hierarchy, I added several new diagrams for both the hardware and software systems. I also did the design report formatting for the team after the contents were finished. 

I spent the rest of the week diving deeper into the signal processing section. After successfully recording audio using PyAudio, I started to write scripts for noise reduction and tried to test the different noise reduction libraries using audio inputs under higher volume environments. I also wrote scripts to test the speech recognition function.

Schedule

I am on schedule

Next Step

After the primary test passes for noise reduction, I am planning to test the functionality on RPi since we have set up the raspberry pi and monitor. I will also start to construct the whole pipeline for signal processing.

Yixin’s Status Report for 03/09/2024

  • What did you personally accomplish this week on the project? Give files or photos that demonstrate your progress. Prove to the reader that you put sufficient effort into the project over the course of the week (12+ hours).

Last week, I set up the hardware with my group. The Raspberry Pi, monitor, and microphone are connected. Our hardware setup works fine. The touch screen of the monitor is sensitive, and we could run some Python programs on it. For next week, we need to set the on-screen keyboard for the monitor (use wire to connect a keyboard first, download the packages needed for the on-screen keyboard, then disconnect the wired keyboard).

Like before, my main task is to work with spaCy. I adjusted the strategy to train spaCy a little bit, because we refined the requirements for our app. I would want to train the model to assign labels for the action verbs we want before this week. This week we refine the design of our app and we would require the user to use imperative sentences. Therefore, I would use the POS attributes of spaCy to identify the verbs directly. I have tested the new design, and it has high accuracy.

In addition, I still want to train the model to make sure it can assign labels (items & prices) for corresponding words in the sentence. I have written the script to train the label and already trained it with a small dataset (10 words for each label). This seems to work. I would train it with larger datasets next week to test the accuracy.

I also wrote many parts of the design reports and our team also met to discuss some design details.

  • Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

on schedule

  • What deliverables do you hope to complete in the next week?
  1. download on-screen keyboard
  2. train the spaCy model with larger datasets

Yuxuan’s Status Report for 3/9

Progress

This week I focused on composing the design review document. Specifically, I worked on introduction, word2vec of system implementation, word2vec and database in trade studies, testing, project management, and summary sections. Since I was responsible for the word2vec part, I tested out a pretrained word2vec model provided by gensim on my laptop. It took me a long time to set up the library and the model because gensim does not support latest versions of python. I was able to get the vector representation of a word and the similarity between two words. After familiarizing myself with the library and customizing the model for our use, the process of installing and using required modules should be smooth on Raspberry Pi as the next step.

Schedule

I am behind schedule because the design report took me longer than expected. I will focus on web app development next week as I have slack time after next week based on the original schedule.

Next steps

For next week I will implement the basic functions of the app (not including the audio input function) and continue integrating components in Raspberry Pi with my teammates.

Team Status Report for 3/9

Risk Mitigation

We started assembling Raspberry Pi and the monitor last week in case any changes need to be made before we write the design report. The integration worked as expected, and we are now able to program on Raspberry Pi with the touch screen and a keyboard. We will also integrate microphone (should be available once plugged in), test the built-in speaker and keyboard on the touchscreen next week to make sure all hardware components function compatibly.

Design Changes

We originally planned to use a power bank of 10000mAh, but after recalculating the power consumption, we might only need a power bank of 3500mAh, a changed reflected in the design report.

Updated Schedule

We are a bit behind schedule due to the time spent on the design report. The updated schedule is attached below.

This Week Special

Part A was written by Yixin, Part B was written by Yuxuan, and Part C was written by Lynn (Tianyi).

Part A

In considering global factors, this app addresses the fundamental need for financial management across different demographics. People all over the world, not only students and not only people in Pittsburgh, would have the need to track their spendings. By leveraging voice recognition technology, the app significantly lowers the barrier to entry for users. It would definitely help people with limited literacy or visual impairments, but it would also help general users to improve their user experience. This inclusivity ensures that people, regardless of their technological proficiency or physical capabilities, can efficiently manage their finances with ease.

Our current app’s focus on English-speaking users is designed to refine and perfect the user experience, ensuring that the core functionalities—such as expense tracking, report generation, and voice recognition—are robust and user-friendly. This strategic approach allows us to cater effectively to a significant portion of the global population, providing them with a powerful tool for financial management.

Part B

Our target users are mainly visually impaired people and the elderly group, and our design takes two main cultural factors into consideration.

One factor is social inclusion. Both visually impaired and elderly people tend to feel marginalized by society. Specifically, many money tracking tools on the market are applications on computers or mobile phones, to which these minority groups might have no access. Our product, however, provide a cheap access to money tracking for these groups. By supporting audio input and output, we enable these groups to use a money tracking tool like everyone else, satisfying their sense of belonging to the society.

The other factor is simple operation. The elderly and the visually impaired might have trouble interacting with a complex system due to the incapability to see the page or to understand the components. Therefore, we designed a simple UI so that the users can interact with the app without effort and almost hands-free. It enhanced the user experience for the targeted groups while keeping the essential functionalities of a money tracking app.

Part C

A major design consideration in our product is power consumption. While a portable power bank is attached to the device when customers are using it in environments without an approachable outlet, it is generally preferred to plug the device into a stable charger to guarantee that the system functions properly. Continuous charging may result in excessive power consumption. To avoid potential power waste, customers are encouraged to disable HDMI if they are not using the device.

Another concern is the screen radiation emitted from the touchscreen monitor in the device. Long-term exposure to such environment may cause diseases. However, the average screen time for our design is estimated to be under 30 minutes per day to guarantee that such radiation problems would hardly take place.