Team Status Reports – Team D3: Meal By Words

Team Status Report For 4/29/2023

Testing

1. Audio to text

Due to the nature of our system, we mainly care that the speech recognition system recognizes the correct word, not its verb tense or singularness. Therefore, verbs of different tenses will be considered as the same word (e.g. “wake,” “Woke,” “waken” are considered the same). Similarly, we won’t distinguish between singular and plural nouns (e.g. “hamburger” and “hamburgers” are considered the same).

The average accuracy across 10 samples was 87.9%. A more detailed report of the test results is available at “Nina Duan’s Status Report For 4/22/2023.”

2. Text to command

We tested the NLP system by sending it text input, simulating the parsed result of the speech recognition system by removing capitalization and punctuations. Some example inputs have already been listed in “Lisa Xiong’s Status Report For 4/22/2023”, and a detailed report will be included in “Lisa Xiong’s Status Report For 4/29/2023”. The NLP system is able to reach 100% accuracy when parsing basic commands.

3. Order upload latency

This tests the latency between the customer-side uploading an order and the staff-side receiving the uploaded order. To calculate the difference, we printed the time when the order was sent and the time when the order was received. “Time” is defined as the amount of time, in seconds, since the epoch (same as how Unix defines time).

The average accuracy across 20 samples, collected over 2 days, was 1.638s. The median was 1.021s. A more detailed report of the test results is available at “Nina Duan’s Status Report For 4/22/2023.”

The latency falls in an acceptable range but fluctuates depending on the latency of the network.

4. Order upload accuracy

This test checks that the order received by the staff-side is the same as the order uploaded from the customer-side. We hard-coded 10 different orders, with varying order items and quantity, and uploaded them to the database.

The resulting accuracy was 100%. We found no mismatch between the order received by the staff-side and the order uploaded from the customer-side.

5. Kiosk activation latency and accuracy

This test checks how long it takes for the distance sensor to detect the presence of an approaching customer. We tested distances including 20 cm, 30 cm, 40 cm, 50 cm, 60 cm, and 80cm. For 20 – 60 cm, the resulting accuracy was 100%. For 80 cm, the accuracy was around 70% but since we are only expecting to detect customers within 60 cm away from the kiosk, we would say we achieved our goal. Latency was always under 2 seconds, and for 93% of time, it was below 1.5 seconds.

6. Latency and accuracy of distance detection for the mic

This test checks how long it takes (latency) for the distance sensor attached to our mic to detect if the person is speaking close enough (25 cm) to the mic when it is their time to speak. In addition, we tested how accurate the results are. For the latency part, it took an average of 1.3 seconds to detect that a customer was not close enough ( > 25 cm). The accuracy rate was 95% on average.

7. Integration test

We have done end-to-end testing among ourselves to make sure that the system is functioning as expected. Our next steps will be documenting the end-to-end order accuracy through thorough tests and finding volunteers to give feedback on the overall design.

Risks

Our entire system depends on the internet. Therefore, if the WiFi at the demo location fails, our system will fail. However, since we are using our laptops to run the system logic, we could use personal hotspots from our phones to keep the system up.

Another risk which we have mentioned in the 4/22 status report is the noise level at Wiegand Gym. We hope that our demo location can be placed in one of the smaller rooms for a quieter environment.

Design Changes

We did not implement any design changes this week. We have finalized our system and are proceeding to the integration/volunteer testing phase of the project.

Schedule

We are on track with our current schedule.

Team Status Report For 4/22/2023

Risks

The most significant risk is that if our demo location is the Wiegand Gym, the surrounding background noise will be challenging for our speech detection and recognition. Our system currently works under capstone lab noise, which is similar to a restaurant’s noise level; however, it’s difficult to simulate the environment of Wiegand Gym (room size, the number of people coming to the demos, and the echoes) and, hence, we can’t properly test that our system won’t be affected. We will set up the booth with our microphone facing its back wall, hoping that the sound shield can block the noises coming from outside the booth.

Design Changes

Instead of timing out after two minutes of inactivity, our system now uses an ultrasonic distance sensor to detect whether a customer is present in the ordering area. This means that we will be able to terminate an order interaction and remove the current order if the serviced customer walks away from the kiosk without checking out. This ultrasonic distance sensor will also be used to wake up our kiosk when a new customer approaches. We will also attach an ultrasonic distance sensor to the mic so that we can remind the customer to stand closer to the mic when speaking.

In addition, we are still debating on whether to add a computer vision system that detects the number of customers waiting in line. This depends on whether we can find two poles that are tall enough to hang the overhead camera for detection. If the poles are not tall enough, the detection success rate would drop significantly, and therefore not worth having the camera/cv part at all.

Schedule

We are on track with our current schedule.

Team Status Report For 4/8/2023

Risks

The significant risks according to the current progress is the sensitivity of the sensors. As we are ordering new distance sensors to replace the infrared sensor we had, we will have to test and see if they can accurately detect incoming customers for our purpose. If the new distance sensors have unsatisfactory performance, we will either fall back to our original infrared sensor or use OpenCV for human detection.

A minor risk is the previously mentioned latency of our speech recognition and NLP system. Although we were able to greatly reduce the lag, the system still takes 1s to 5s to respond to the customer when they say a menu item. This risk is no longer as damaging as it was because the lag has been reduced to a range that isn’t very noticeable. However, as we start testing our system with volunteers, we may need to further optimize the system if the lag causes bad experiences.

Design Changes

Through testing with the RPi 4 some more this past week, we’ve found that it is insufficient to drive the sensors, the microphone, the speech recognition and NLP loop, and the customer-side UI all at the same time. Therefore, we decided to use one of our laptops as the main CPU. The microphone will be plugged into the laptop, which runs the customer-side UI and the backend server that supports it. The sensors will be driven by an Arduino Uno Rev3.

We’ve also decided to switch from using PIR sensors to using ultrasonic module HC-SR04 distance sensors, because they could provide detailed information about our customers (exactly how far away they are from the kiosk and the microphone) rather than just whether they are detected.

Schedule

We are on track with our previous schedule, finishing all our assigned tasks before the interim demo. However, because we switched from the RPi 4 to an Arduino, we need to adjust our sensor code to accommodate. We’ve updated our Gantt chart to reflect the additional work required to do so.

Team Status Report For 4/1/2023

Risks

The biggest risk we are facing is the difference between our current development environment (MacOS) and that of the Raspberry Pi (RPiOS). While our prototype backend flows relatively smoothly on MacOS, it behaves in unexpected ways when migrated to the Raspberry Pi. We are still trying to find the root cause of this. Our fallback is running everything on our laptop instead, which means we might switch to using Arduino. The hardware components therefore would not include a Raspberry Pi. However, this is not a finalized decision. We will continue debugging on our Raspberry Pi this weekend.

Another risk is the overall speech-processing speed. The time it takes for our system to listen to the user input, convert to text, parse into entries, and add to a local order object is longer than our ideal goal of 1 second. Because we are unable to correctly determine the end of speech every time, sometimes the speech recognition module keeps listening after the customer has finished speaking.

Design Changes

We modified our menu to accommodate some NLP edge cases. The current menu is:

cheeseburger $7.99
hamburger    $6.99
chicken burger $7.49
beef sandwich $8.99
chicken sandwich $8.99
hot dog       $4.99
corn dog       $5.99
taco       $6.99
donut       $3.99
fries       $2.99
onion rings    $4.99
cheesecake $5.99
fountain drink    $1.29
coffee       $3.29
ice cream    $2.99

We are planning to add three more infrared sensors to increase detection accuracy. We might also use OpenCV as assistance to better detect people with special needs such as children or people sitting in wheelchairs.

Schedule

We have pushed back the design of staff-UI since the integration, testing and revisions will take up all the time before our interim demo. An up-to-date schedule has been attached.

Team Status Report For 3/25/2023

Risks

Although we have successfully integrated our microphone, speech recognition, and NLP modules, the functionality is still rather limited. For now, the system only has an accuracy of ~50% when translating from speech to text. In addition to exploring more noise cancellation algorithms, we will also find ways to limit how long a customer can speak. For example, we will ask customers to order items one by one instead of placing the entire order in one sentence. We will also repeat the detected item and quantity to the customer and ask them to confirm. In addition, at any time during the ordering process, the customer can say “remove XX item” to remove an item from the order. Hopefully, these measures are enough to guarantee that we don’t mistakenly order unwanted items for the customers.

In addition, we are currently having trouble downloading the related Spacy package to our Raspberry Pi due to Operating System incompatibility. We have tried 32-bit RPi OS as well as 64-bit RPi OS but have had no luck so far. This weekend we will try Ubuntu. In the worst case, we might use sockets to request and fetch NLP and speech recognition results from another computer. Another fallback option is to simply run our backend modules on a laptop, as we have already tested them on MacOS.

Design Changes

For the NLP system, we have changed the way order deletion is processed. Previously when the user input includes a deletion keyword but no quantity is present, we chose to not process the request. Now in this situation, the NLP system will consider it a request to delete all of the mentioned menu items in the order since that is the more intuitive intention. For example, when the customer says “no cheeseburgers”, we should be able to remove all cheeseburger entries in the current order.

In addition, for our web application, we switched back to using Django from pure Python because it provides better support for client-server integration.

Schedule

We broke some larger tasks down into smaller chunks for better keeping track of everyone’s progress.

One major schedule change we made is pushing the staff UI design to early April. As this is a post-MVP feature, we will work on it after all other subsystems have been integrated.

Currently, everyone is on track with the new schedule.

Team Status Report For 3/18/2023

Risks

The greatest risk that we are currently facing is the low performance of the speech recognition system. As we started writing the speech recognition algorithm, we realized that although the Python SpeechRecognition library usually returns coherent sentences, which helps our NLP system to parse the input, the speed and accuracy of recognition is not very optimistic. We will test how to change certain metrics in the SpeechRecognition library for a more accurate output, and in the worst case, we can switch to other speech recognition algorithms compatible with Python.

Design Changes

We may be able to use fewer infrared sensors. We conducted testing this week and found that the system was still able to detect relatively short human figures accurately with just one or two sensors.

To accommodate inflexibilities in our current speech recognition and NLP modules, we decided that checkout can only be triggered by certain keywords (“checkout,” “finish,” and “done”). We will also be taking item orders one by one, so a sample interaction would look like this:

Kiosk: “Welcome! Please order your first item!”

Customer: “One hamburger, please.”

Kiosk: “You’ve ordered one hamburger. Is this correct?”

Customer: “Yes.”

Kiosk: “One hamburger, confirmed. Please order your next item, or say ‘finish’ to checkout.”

Customer: “Checkout.”

Kiosk: “Are you ready to checkout?”

Customer: “Yes.”

Kiosk: “Checkout successful! Your total is $XX.XX. Your order number is XX. You will be called when your order is ready. Thank you for using Meal By Words!”

Schedule

There is no schedule change this week. Everyone is on track with our plan.

Team Status Report For 3/11/2023

New Tools

To properly access and control our cloud database, Redis, we need to use RedisLabs, an online platform for viewing and manipulating database settings, and RedisInsight, a desktop application that visualizes and allows manually changing database data. These tools will also allow us to view statistics about the database, such as latency and number of accesses, which may help with testing speed of service in the future.

In addition, we are planning to use some jQuery libraries to write JavaScript faster and easier. jQuery also works with multiple browsers so our code is compatible regardless of which features does the browser contain.

Risks

As the natural language processing algorithm is developed further, we realized that it is heavily relying on the grammar structure of input sentences to capture the necessary information. The most significant risk is that if our speech recognition system fails to generate grammatically coherent sentences, it will be difficult for the speech recognition and natural language processing subsystems to integrate. To mitigate the risk, we are ready to use the token matcher on top of the dependency matcher to capture key words in the sentences instead of grammar structures.

The risk with regards to the UIs is that some Bootstrap templates we are currently using are unstable. Depending on how well they are maintained remotely, some always work when the pages are loaded while some may not due to the fact that the servers they live on are poorly maintained. Therefore, we are considering using static styling (CSS, SCSS, and JavaScript) only, but the decision is not finalized yet.

Design Changes

Our design has not changed from our design review report, but we solidified a few design details.

First, we finalized our menu and constructed an immutable dictionary for future use:

cheeseburger $7.99
hamburger    $6.99
veggie burger $7.49
chicken burger $7.49
beef sandwich $8.99
chicken sandwich $8.99
hot dog       $4.99
corn dog       $5.99
taco       $6.99
donut       $3.99
fries       $2.99
onion rings    $4.99
fountain drink    $1.29
coffee       $3.29
ice cream    $2.99

For the MVP, we do not plan on allowing customizations or size selections.

Second, the cloud database and the staff-side module will maintain a server-client-like relationship. When the staff-side module’s subscriber thread receives notification of a new order (sent by the customer-side module when a customer checks out), it requests the order’s information from the database by spawning a child thread. This eliminates the need to constantly poll the cloud database for new data.

Third, we may change how we conduct speech interactions (e.g. near real-time parsing vs. letting customers speak one sentence and then parse) based on how well the noise-reduction and the speech recognition libraries work together.

Schedule

We move the integration between database and NLP to the week after spring break since the MVP version of the two subsystems have just been completed. As a result, the tasks following database and NLP integration have been pushed back as well.

Team Status Report For 2/25/2023

Teaming

This week we have made a Github repository for our project’s code files.

Each member has made progress on some of the assigned tasks, which will be explained in detail in everyone’s status reports. A simple version of the natural language processing system has been created; it is able to detect menu items based on basic sentence structures (such as “I want one hamburger” / “A cheeseburger, please”), and we are still in the process of debugging and determining the ideal approach to process more complicated grammar structures and tackle edge cases.

While working on our individual tasks next week, we will write the design review report together.

Risks

The most significant risk is falling behind the schedule for our project, since most of the work is taking longer than expected. We will make sure to allocate enough slack time before the final deadline to accommodate potential schedule changes. We also have decided to continue working on the project over spring break to make more progress.

Design Changes

Since it’s unlikely that we will be able to get AWS credit through the capstone course, we plan on switching our cloud database to Redis. We also considered Replit, which an instructor suggested. However, the free version only allows us to create public repositories, which doesn’t satisfy one of our basic requirements. Fortunately, this change doesn’t affect our design much as our project only relies on a few basic functionalities that are common among most noSQL cloud databases. In addition, since we have a few spare infrared sensors, we might be using multiple sensors to detect the presence of a customer in order to increase detection accuracy.

Schedule

We have updated our schedule according to the current week’s progress.

The setup of the infrared sensor is shifted to an earlier date since the Raspberry Pi has already arrived.

The natural language processing system is taking longer than expected to program, so we have extended the timeline for a week and will potentially still work on polishing it when integrating the database and NLP system.

Team Status Report For 2/18/2023

Principles of Engineering, Science, and Mathematics

Modularity – We broke our design down into smaller chunks that each manage a cohesive group of tasks. For example, the program that runs on the Raspberry Pi consists of two modules: one monitors the infrared sensor and wakes up the main backend loop; the other manages the heavy-lifting for speech parsing and recognition. These modules can further be broken down into submodules such as signal processing, speech-to-text translation, and text parsing (NLP).
Ethicality – One of the main goals of our project is to improve the welfare of fast-food restaurant employees. We believe that the success of our system will alleviate the burden of kitchen staff, enabling them to focus only on preparing food. Our infrared sensor and ordering station will also accommodate customers in wheelchairs as well as children.

Risks

Since we are still in the design phase of our project, the most significant risk that could jeopardize its success is failing to consider important design requirements, which would lead to fundamental flaws in our design. To mitigate this risk, we will carefully review feedback from our design presentation and discuss potential problems with our instructors.

Design Changes

We finalized our design for the design review presentation and created a system diagram for the current design:

We have already requested and received a Raspberry Pi 4 with 8GB memory from the ECE inventory. Once we present our design and receive feedback, we will start ordering the hardware components (infrared sensor, microphone, and sound shield).

Schedule

We reformatted our schedule and took spring break into consideration.

Here’s the updated version:

Team Status Report For 2/11/2023

Our project includes considerations for customer convenience, employee welfare, and restaurant cost reduction. Our system will provide an alternative ordering approach to fast food restaurant customers, and reduce the number of cashiers required. This could also improve existing employees’ working conditions, as they no longer need to shuffle between the counter and the kitchen and can focus on food preparation.

This week, we updated our Gantt Chart to increase slack time at the end of the project timeline. This time will allow us to conduct more end-to-end tests if necessary and fix unexpected issues with our final product. We also created preliminary designs for our whole system, separating the system into hardware, front-end software, and back-end software components. Use-case requirements and testing metrics were updated based on our research about existing fast-food services and hardware systems. Next week, we will finalize our design, prepare for the upcoming design presentation, and start gathering necessary project components.