Shiyi Zhang’s Status Report for 03/18/2023

Personal Accomplishments

  • Frontend

This past week, my main focus has been on making adjustments to our frontend code. Specifically, I’ve been transitioning it from Django to Tkinter. The reason behind this decision was simply due to the fact that our current project priorities lie elsewhere and we need the frontend to be operational as quickly as possible. However, we may switch back to Django once we’re nearing the completion of the project since it provides us with more styling options.

Currently, the pages have the capability to wait for output variables from the backend, such as the system response, and to then display the text like a typewriter. Additionally, the pages are able to disable audio inputs while waiting for responses from the backend or while still in the process of type-writing.

  • Sensor

I have installed the operating system, a fan, and some heat sinks onto our Raspberry Pi. The code for the sensor has also been transferred to the RPi and is functioning correctly.

On schedule

Yes, my progress is on schedule.

Next week

I will fetch Lisa’s NLP code and incorporate it into my workspace, connecting it to my frontend. Additionally, I will also integrate the sensor code with the frontend.

Lisa Xiong’s Status Report For 3/18/2023

Personal Accomplishments

I collaborated with Nina to finish our NLP and database integration this week. The output of my NLP algorithm can now feed directly into the database to be an Order object. I started writing the speech recognition code with Nina and the system is able to convert user input into text in near real-time. The speech recognition system is still weak on noise reduction (as described in Nina’s status report) and speech conversion speed, which we plan to improve soon. I also fixed some bugs in my NLP algorithm while making it account for more use cases, such as when a user orders and then deletes an item in the same sentence. Another change I made in the NLP is that when we cannot detect the quantity of a menu item in the sentence, we will default it to 1.

Schedule

I am on track with our schedule this week, for the NLP and database integration is almost finished and the speech recognition work has started.

Plans for Next Week

I will continue to work with Nina to improve our NLP and database integration, and tune the speech recognition to our microphone so that it can convert user speech faster and more accurately. If I have enough time left, the integration between speech recognition and NLP systems will be the next task I am going to work on.

Nina Duan’s Status Report For 3/18/2023

Personal Accomplishment

In addition to completing the ethics assignment, I integrated our database module and preliminary NLP module with Lisa and modified the microphone & speech recognition system provided by Python’s SpeechRecognition library.

After integration, our system is now able to extract menu items and quantities from simple sentences, add them to an Order object, and upload that object to the database. However, there are still flaws with this simple system because we have yet to implement the checkout portion of the NLP module.

The open-source SpeechRecognition library provides a basic real-time speech recognition functionality that can be used with an external microphone. This process, however, doesn’t allow room for noise reduction. Therefore, I explored the source code of the library, determined where the microphone’s input is read, and extended it to utilize a noise reduction algorithm. For now, it uses a simple, deterministic noise cancellation algorithm that attempts to cancel out low amplitudes by mixing with the signal’s inversion. By slightly altering this visualization tool, I was able to visualize the difference. This is what it looks like when I speak at conversational volume from a distance of ~0.7m, with a restaurant ambience noise YouTube video playing in the background (graphs are in time domain; top = raw microphone input, bottom = filtered input):

No speech, only noise.
Speech with noise, with amplitude decreased.
About Schedule

I have caught up to the schedule. The microphone has been set up, and preliminary signal processing code has been written.

Plans for Next Week

I will continue to work with Lisa to improve our NLP & database modules, as this is the core part of our system. In addition, I will start installing necessary dependencies on and transferring our code to the microcontroller (RPi 4).

Team Status Report For 3/18/2023

Risks

The greatest risk that we are currently facing is the low performance of the speech recognition system. As we started writing the speech recognition algorithm, we realized that although the Python SpeechRecognition library usually returns coherent sentences, which helps our NLP system to parse the input, the speed and accuracy of recognition is not very optimistic. We will test how to change certain metrics in the SpeechRecognition library for a more accurate output, and in the worst case, we can switch to other speech recognition algorithms compatible with Python.

Design Changes

We may be able to use fewer infrared sensors. We conducted testing this week and found that the system was still able to detect relatively short human figures accurately with just one or two sensors.

To accommodate inflexibilities in our current speech recognition and NLP modules, we decided that checkout can only be triggered by certain keywords (“checkout,” “finish,” and “done”). We will also be taking item orders one by one, so a sample interaction would look like this:

Kiosk: “Welcome! Please order your first item!”

Customer: “One hamburger, please.”

Kiosk: “You’ve ordered one hamburger. Is this correct?”

Customer: “Yes.”

Kiosk: “One hamburger, confirmed. Please order your next item, or say ‘finish’ to checkout.”

Customer: “Checkout.”

Kiosk: “Are you ready to checkout?”

Customer: “Yes.”

Kiosk: “Checkout successful! Your total is $XX.XX. Your order number is XX. You will be called when your order is ready. Thank you for using Meal By Words!”

Schedule

There is no schedule change this week. Everyone is on track with our plan.

Shiyi Zhang’s Status Report for 03/11/2023

Personal Accomplishments

During Spring break, I continued working on the client-side UI and now have two pages: one that appears when user speech is detected, and another that displays our menu and added items.

Page #1

This page is supposed to be voice-operated, but as we have not yet received the microphone, I have decided to use a click button that listens to the laptop’s microphone for now. By utilizing Mozilla’s Web Speech API and its JavaScript functions, the page is capable of displaying real-time transcribed text in the provided text area.

Page #2

This is where the customer views the menu and review their order before checkout.

Schedule

The client-side UI is close to completion,  but it’s currently not talking to any sub-system such as the Django backend, so my progress is a bit behind on schedule. I don’t think it’s too much of a problem since the mic/tool kit will arrive next week, and utilizing the outputs from the sensors should not take too long.

Next week

Next week I will be working on making the sensors & the mic work with the backend and, if I got time, making it work with the frontend as well. I will work with Lisa on the mic part since she is responsible for language parsing.

Lisa Xiong’s Status Report For 3/11/2023

Personal Accomplishments

The problem in my NLP system I tried to solve since the last status report is the parsing of menu items with multiple words, as I realized that the dependency parser does not support parsing multiple words as a single token identity. The first solution I came up with was to add all menu items into a list of named entities, so that the NER pipeline can recognize them. However, the menu items could not be considered as named entities in spaCy even when I tried to capitalize the initial letters. Token matching could work, but a lot of flexibility will be compromised when handling varied sentence structures, since the token matcher needs more rigid rules than the dependency matcher. I solved the problem by defining a new set of dependency matching rules for menu items with multiple words. For example, for the menu items “veggie burger” and “chicken burger”, this pattern will use “burger” as the anchor token and find its immediate adjective dependent (either “veggie” or “chicken”) and the quantifier dependent (number or determiner).

The following command line output shows the natural language processing system’s input and output.

For changing item entries, the current solution is to find a set of keywords indicating the change (“remove”, “delete”, “add”, “change”), which should be the immediate head of the menu item token, and change the order information accordingly. The actual ordering situations will be more complex than this scenario, and we plan to fix it after having a functioning system that reaches MVP.

Schedule

I have caught up with the schedule for NLP system programming: a MVP version of the algorithm can be completed by Monday March 13 the latest. Nina and I have not started integrating the database and the NLP system yet; the schedule change is mentioned in the team status report for this week. We did coordinate on what type of data structure the NLP system should return so that the information can feed directly into the database, and I believe the integration should not take long since we already have compatible data structures.

Plans for Next Week

I plan to work with Nina to integrate my NLP system with the database next week. I will also start working on programming the speech recognition system.

Team Status Report For 3/11/2023

New Tools

To properly access and control our cloud database, Redis, we need to use RedisLabs, an online platform for viewing and manipulating database settings, and RedisInsight, a desktop application that visualizes and allows manually changing database data. These tools will also allow us to view statistics about the database, such as latency and number of accesses, which may help with testing speed of service in the future.

In addition, we are planning to use some jQuery libraries to write JavaScript faster and easier. jQuery also works with multiple browsers so our code is compatible regardless of which features does the browser contain.

Risks

As the natural language processing algorithm is developed further, we realized that it is heavily relying on the grammar structure of input sentences to capture the necessary information. The most significant risk is that if our speech recognition system fails to generate grammatically coherent sentences, it will be difficult for the speech recognition and natural language processing subsystems to integrate. To mitigate the risk, we are ready to use the token matcher on top of the dependency matcher to capture key words in the sentences instead of grammar structures.

The risk with regards to the UIs is that some Bootstrap templates we are currently using are unstable. Depending on how well they are maintained remotely, some always work when the pages are loaded while some may not due to the fact that the servers they live on are poorly maintained. Therefore, we are considering using static styling (CSS, SCSS, and JavaScript) only, but the decision is not finalized yet.

Design Changes

Our design has not changed from our design review report, but we solidified a few design details.

First, we finalized our menu and constructed an immutable dictionary for future use:

cheeseburger              $7.99
hamburger                  $6.99
veggie burger             $7.49
chicken burger           $7.49
beef sandwich            $8.99
chicken sandwich     $8.99
hot dog                        $4.99
corn dog                      $5.99
taco                              $6.99
donut                           $3.99
fries                              $2.99
onion rings                $4.99
fountain drink          $1.29
coffee                          $3.29
ice cream                   $2.99

For the MVP, we do not plan on allowing customizations or size selections.

Second, the cloud database and the staff-side module will maintain a server-client-like relationship. When the staff-side module’s subscriber thread receives notification of a new order (sent by the customer-side module when a customer checks out), it requests the order’s information from the database by spawning a child thread. This eliminates the need to constantly poll the cloud database for new data.

Third, we may change how we conduct speech interactions (e.g. near real-time parsing vs. letting customers speak one sentence and then parse) based on how well the noise-reduction and the speech recognition libraries work together.

Schedule

We move the integration between database and NLP to the week after spring break since the MVP version of the two subsystems have just been completed. As a result, the tasks following database and NLP integration have been pushed back as well.

Nina Duan’s Status Report For 3/11/2023

Personal Accomplishment

Other than completing the design review report with my teammates, I also worked on a couple of tasks.

1. Voice Synthesizer Script

To assist Shiyi with developing an accessible UI, I created a voice synthesizer script using the open source library Google Text-to-Speech (gTTS). The script allows the user to synthesize any English text from both an input prompt and the command line:

2. Database and customer-side model for orders, items, and the menu

I finalized the representations of orders, items, and the menu both on the cloud database and in local storage:

The design review report goes into detail about the model and how they interact with each other, so I won’t repeat them here. The important thing to note is that, by design, the local copy won’t be uploaded to the cloud until the customer finishes ordering by calling checkout().

I have tested the flow and successfully added sample orders into the cloud database:

3. Staff-side model for orders

I designed a model to represent orders for the staff-side as well:

This object will automatically be generated when a subscriber to the Redis pub/sub channel receives a new orderNum. It allows the staff to view order items, cross out prepared items (using finishItem()), and remove completed orders from the cloud database (using removeOrder()).

4. Redis pub/sub and fetching orders from the database

The Redis pub/sub channel is shared by the customer-side modules (publishers) and the staff-side modules (subscribers). Once the customer-side order publishes its orderNum, the staff-side subscriber thread will receive a message containing the orderNum and spawn a child thread to fetch that orderNum’s information from the database.

I have implemented this functionality as well, but it still requires more testing.

About Schedule

Since all of us are slightly behind, the database and NLP integration hasn’t been able to happen, yet. I am fairly confident that the database component is complete functionality-wise, and unit-testing has been conducted. Therefore, once we meet again next week, Lisa and I will be able to start utilizing the database with data from the NLP module.

Plans for Next Week

Our microphone and infrared sensor are set to arrive next week. Therefore, I will shift gears and start programming the microphone against our RPi 4.

Lisa and I will also try to integrate our NLP modules and database modules during the mandatory lab meetings.

Lisa Xiong’s Status Report For 2/25/2023

Personal Accomplishments

I have started programming the natural language processing system for the project this week using Python’s spaCy library. I used the built in en_core_web_sm pipeline for the basic file parsing with tokenizer, tagger, parser and NER. There are two ideas for grammar rule matching that I experimented with. The Dependency Matcher is able to get the menu item and the quantity even with multiple words in between, such as “a splendidly delicious hamburger”, but it is complicated to set rules for vocabulary without directly dependent relationships. If the matcher fails to detect that the item quantity word is related to the menu item word, there is no way they can be identified using the Dependency Matcher. The token modifier works better in this situation, since the rules can be set based on part-of-speech tags or other token properties instead. However, the token modifier is not very proficient in identifying the relationship between different parts of the sentence and might require more edge-case accommodations. Based on the findings, I will attempt to utilize both matchers to create a more comprehensive algorithm.

Schedule

I’m slightly behind our original schedule, since I haven’t made significant progress on the natural language processing algorithm due to the large amount of time spent in familiarizing myself with spaCy and the matchers. Our Gantt chart plan has changed to accommodate this. Since the database also has not been established yet, the integration between database and natural language processing system can be pushed back until after spring break to allocate more time for the development of both tools. I will make sure to work on the algorithm more in the next week and get an MVP version by the end of spring break at the latest.

Plans for Next Week

I plan to refine my natural language processing algorithm while working on the design report with my teammates. By the end of next week, my algorithm should be able to parse the user input in the following situations:

  1. When there are words between the quantity and item name (eg. “a beautifully packaged cheeseburger”)
  2. When the user makes an attempt to change the order using some easily detectable keywords (eg. “remove the diet coke” / “I wanna add another cheeseburger”)

Shiyi Zhang’s Status Report for 02/25/2023

Personal Accomplishments

Over the past week, I’ve been working on creating the cart page for our kiosk. To make things easier, we decided to use Django, which will allow us to write our backend in Python. For the frontend, we thought it would be best to stick with HTML and rely on Bootstrap for styling. With this approach, we can keep things simple while still creating a great user experience.

Here’s what the page will look like (some places are displaying source code because they are expecting outputs from the backend, which is currently under development):

Code:

On schedule?

This week I’m a bit behind on schedule as we have a lot of pages to create. Originally I planned to make at least two pages per week. However, due to other commitments, such as wrapping up another project and taking a final exam for a half-semester class, I fell behind. I will get us back on track next week.

Deliverables for next week

I’m planning to create two additional web pages. The first page will be displayed when the kiosk is listening to the customer speaking, and the second page will be an error page that will only be shown when the customer’s audio quality is poor and we need them to repeat their request. These pages will help improve the overall user experience by providing clear instructions and feedback to our customers.