yiningd – Team D3: Meal By Words

Nina Duan’s Status Report For 4/29/2023

Personal Accomplishment

Bug fixes

I found a bug where the instructions for the frontend display fails to update even though the system is playing the current sound for the instruction. I realized that this was due to a previous modification to the threads in the backend and made changes accordingly. Now, the backend currently supplies the current instruction strings to the frontend.

Integration testing

I’ve started conducting end-to-end tests by running through the entire process, starting from approaching the kiosk to successfully checking out. However, these tests are still fairly preliminary, mainly for checking that the entire workflow is reasonable and bug-free. I haven’t recorded any quantitative measurements, yet.

About Schedule

I am on track with the schedule.

Plans for Next Week

Other than working on the final poster, video, and report, I will continue to work with my teammates to conduct integration and volunteer testing.

Team Status Report For 4/29/2023

Testing

1. Audio to text

Due to the nature of our system, we mainly care that the speech recognition system recognizes the correct word, not its verb tense or singularness. Therefore, verbs of different tenses will be considered as the same word (e.g. “wake,” “Woke,” “waken” are considered the same). Similarly, we won’t distinguish between singular and plural nouns (e.g. “hamburger” and “hamburgers” are considered the same).

The average accuracy across 10 samples was 87.9%. A more detailed report of the test results is available at “Nina Duan’s Status Report For 4/22/2023.”

2. Text to command

We tested the NLP system by sending it text input, simulating the parsed result of the speech recognition system by removing capitalization and punctuations. Some example inputs have already been listed in “Lisa Xiong’s Status Report For 4/22/2023”, and a detailed report will be included in “Lisa Xiong’s Status Report For 4/29/2023”. The NLP system is able to reach 100% accuracy when parsing basic commands.

3. Order upload latency

This tests the latency between the customer-side uploading an order and the staff-side receiving the uploaded order. To calculate the difference, we printed the time when the order was sent and the time when the order was received. “Time” is defined as the amount of time, in seconds, since the epoch (same as how Unix defines time).

The average accuracy across 20 samples, collected over 2 days, was 1.638s. The median was 1.021s. A more detailed report of the test results is available at “Nina Duan’s Status Report For 4/22/2023.”

The latency falls in an acceptable range but fluctuates depending on the latency of the network.

4. Order upload accuracy

This test checks that the order received by the staff-side is the same as the order uploaded from the customer-side. We hard-coded 10 different orders, with varying order items and quantity, and uploaded them to the database.

The resulting accuracy was 100%. We found no mismatch between the order received by the staff-side and the order uploaded from the customer-side.

5. Kiosk activation latency and accuracy

This test checks how long it takes for the distance sensor to detect the presence of an approaching customer. We tested distances including 20 cm, 30 cm, 40 cm, 50 cm, 60 cm, and 80cm. For 20 – 60 cm, the resulting accuracy was 100%. For 80 cm, the accuracy was around 70% but since we are only expecting to detect customers within 60 cm away from the kiosk, we would say we achieved our goal. Latency was always under 2 seconds, and for 93% of time, it was below 1.5 seconds.

6. Latency and accuracy of distance detection for the mic

This test checks how long it takes (latency) for the distance sensor attached to our mic to detect if the person is speaking close enough (25 cm) to the mic when it is their time to speak. In addition, we tested how accurate the results are. For the latency part, it took an average of 1.3 seconds to detect that a customer was not close enough ( > 25 cm). The accuracy rate was 95% on average.

7. Integration test

We have done end-to-end testing among ourselves to make sure that the system is functioning as expected. Our next steps will be documenting the end-to-end order accuracy through thorough tests and finding volunteers to give feedback on the overall design.

Risks

Our entire system depends on the internet. Therefore, if the WiFi at the demo location fails, our system will fail. However, since we are using our laptops to run the system logic, we could use personal hotspots from our phones to keep the system up.

Another risk which we have mentioned in the 4/22 status report is the noise level at Wiegand Gym. We hope that our demo location can be placed in one of the smaller rooms for a quieter environment.

Design Changes

We did not implement any design changes this week. We have finalized our system and are proceeding to the integration/volunteer testing phase of the project.

Schedule

We are on track with our current schedule.

Nina Duan’s Status Report For 4/22/2023

Personal Accomplishment

Order termination

I added to our system the ability to terminate the speech recognition system. By calling this method, all background threads and the currently-running interaction will be terminated. The order information that belongs to the current interaction will also be deleted. When our new distance sensor detects that the customer has walked away, our system will use this functionality to terminate the current order interaction.

Testing

The first test I conducted involved the latency between the customer-side uploading an order and the staff-side receiving the uploaded order. To calculate the difference, I printed the time when the order was sent and the time when the order was received. “Time” is defined as the amount of time, in seconds, since the epoch (same as how Unix defines time). I conducted two groups of ten tests (twenty in total) and received varying results.

CMU-SECURE (4/12/2023)
Trial #	Time Sent	Time Arrived	Total Time (Time Arrived – Time Sent)
1	1681311756.920146	1681311757.988204	1.068058
2	1681311787.9240708	1681311788.301242	0.377171278
3	1681312039.466178	1681312040.4595578	0.9933798313
4	1681312140.5965528	1681312144.366512	3.769959211
5	1681312195.167861	1681312197.255147	2.087286
6	1681312260.151395	1681312263.733936	3.582541
7	1681312359.745095	1681312360.1503391	0.405244112
8	1681312407.597444	1681312417.1346428	9.537198782
9	1681312475.726104	1681312478.991681	3.265577
10	1681312525.983286	1681312526.7069042	0.723618269
Avg.			2.581003348
Median			1.577672

CMU-SECURE (4/17/2023)
Trial #	Time Sent	Time Arrived	Total Time (Time Arrived – Time Sent)
1	1681743612.060843	1681743612.5518022	0.4909591675
2	1681743614.6954062	1681743615.077455	0.3820488453
3	1681743618.92501	1681743619.474676	0.549666
4	1681743624.654081	1681743625.0546181	0.400537014
5	1681744253.4430232	1681744254.947194	1.504170895
6	1681744293.227913	1681744294.318719	1.090806
7	1681744319.980497	1681744320.395576	0.415079
8	1681744338.203062	1681744338.6219149	0.4188528061
9	1681744356.008338	1681744356.6730611	0.6647231579
10	1681744377.0767202	1681744378.1253068	1.048586607
Avg.			0.6965429493
Median			0.5203125838

I then tested our speech recognition system for audio-to-text accuracy.

Due to the nature of our system, we mainly care that the speech recognition system recognizes the correct word, not its verb tense or singularness. Therefore, verbs of different tenses will be considered as the same word (e.g. “wake,” “woke,” “waken” are considered the same). Similarly, we won’t distinguish between singular and plural nouns (e.g. “hamburger” and “hamburgers” are considered the same).

Sentence Spoken	# of Words Spoken	Sentence Recognized	# of Words Correctly Recognized	Accuracy (Words Spoken/Words Recognized)
“I’d like two hamburgers.”	5	“I like to hamburgers”	3	60%
“I want two cheeseburgers.”	4	“I want to cheeseburger”	3	75%
“One beautifully-packaged chicken sandwich, please.”	6	“1 beautifully packaged chicken sandwich please”	6	100%
“I want to order a hundred cheesecakes.”	7	“I want to order 100 cheesecake”	7	100%
“Get me two hamburgers.”	4	“Get me to Hamburg”	2	50%
“I’d like one fries and three fountain drinks.”	9	“I like 1 fries and 3 fountain drinks”	8	88.9%
“Check out.”	2	“Check out”	2	100%
“Hello, let’s go with four tacos and three ice creams.”	11	“Hello let’s go with 4 taco and 3 ice creams”	11	100%
“I’d like one cup of coffee.”	7	“I like 1 cup of coffee.”	6	85.7%
“Fifty corn dogs.”	3	“50 corn dog”	3	100%
Avg.				87.9%

This is not a comprehensive test. We will continue to monitor the accuracy of our speech recognition system as we start integration testing.

About Schedule

I am on track with the schedule.

Plans for Next Week

Other than attending the final presentations, my teammates and I will start conducting integration tests with volunteers.

Team Status Report For 4/22/2023

Risks

The most significant risk is that if our demo location is the Wiegand Gym, the surrounding background noise will be challenging for our speech detection and recognition. Our system currently works under capstone lab noise, which is similar to a restaurant’s noise level; however, it’s difficult to simulate the environment of Wiegand Gym (room size, the number of people coming to the demos, and the echoes) and, hence, we can’t properly test that our system won’t be affected. We will set up the booth with our microphone facing its back wall, hoping that the sound shield can block the noises coming from outside the booth.

Design Changes

Instead of timing out after two minutes of inactivity, our system now uses an ultrasonic distance sensor to detect whether a customer is present in the ordering area. This means that we will be able to terminate an order interaction and remove the current order if the serviced customer walks away from the kiosk without checking out. This ultrasonic distance sensor will also be used to wake up our kiosk when a new customer approaches. We will also attach an ultrasonic distance sensor to the mic so that we can remind the customer to stand closer to the mic when speaking.

In addition, we are still debating on whether to add a computer vision system that detects the number of customers waiting in line. This depends on whether we can find two poles that are tall enough to hang the overhead camera for detection. If the poles are not tall enough, the detection success rate would drop significantly, and therefore not worth having the camera/cv part at all.

Schedule

We are on track with our current schedule.

Nina Duan’s Status Report For 4/8/2023

Verification and Validation Plan

Other than running through the entire order workflow (from ordering the first item to checking out) without the UI, which we showed during the interim demo, I’m planning on conducting the following tests:

Using Python’s built-in function for getting the current system time, measure the difference between the time an order is uploaded to the database after a customer confirms checkout and the time the staff-side UI’s backend is notified of its existence. Ideally, the time should be less than 0.5s (500ms). This will allow the staff-side UI to fetch the data from the database and display the new order in the anticipated 1s latency requirement.
Verify that the order fetched from the database matches the order the customer placed. All of the following parameters should match: order number, order time, items ordered, and total price.
Find audio clips with different levels of background noise and play them to the microphone. The speech recognition accuracy should be kept above 85%. This will allow our NLP to recognize menu items most of the time.

After completing each individual test, we will get together as a group and perform some integration tests, preferably with volunteers with different speech habits or from different cultural backgrounds.

Personal Accomplishment

With Lisa’s NLP support, I was able to add a “confirm” functionality to our checkout process. Now, instead of directly checking the customer out when they say “checkout,” the system will ask the customer to review their order. If the customer says “yes,” the system will check them out through the same process as before. Otherwise, the system will return to the previous state, where the customer can add more items or remove existing items.

I also fixed a bug in our system that allowed customers to remove items they didn’t order. Before, the system would respond to a “remove” request with “you have removed …” without checking whether the order contains said item. Now, the system will only say so when the customer has, indeed, order the item they desire to remove.

To better support Shiyi’s frontend design, I created a separate thread for indicating when the customer should speak and when they should stop speaking. This thread will be used to control a microphone icon on the customer-side UI. When the system is listening for customer speech, the microphone icon will flash green and invite the customer to speak. Otherwise, the icon will let the customer know that the system is currently unable to hear what they are saying. This long-running thread terminates when the customer confirms to check out, so it can also be used to detect when the checkout process is complete. Therefore, it can also control when the customer-side UI navigates to the “order complete” page.

About Schedule

I am on track with the schedule.

Plans for Next Week

I will work with Shiyi to integrate the customer-side UI with the newly edited backend. I will also work with Lisa to integrate the preliminary staff-side UI with the database’s pub-sub functionality. At the same time, I will conduct the tests mentioned in the “Verification and Validation Plan” section.

Team Status Report For 4/8/2023

Risks

The significant risks according to the current progress is the sensitivity of the sensors. As we are ordering new distance sensors to replace the infrared sensor we had, we will have to test and see if they can accurately detect incoming customers for our purpose. If the new distance sensors have unsatisfactory performance, we will either fall back to our original infrared sensor or use OpenCV for human detection.

A minor risk is the previously mentioned latency of our speech recognition and NLP system. Although we were able to greatly reduce the lag, the system still takes 1s to 5s to respond to the customer when they say a menu item. This risk is no longer as damaging as it was because the lag has been reduced to a range that isn’t very noticeable. However, as we start testing our system with volunteers, we may need to further optimize the system if the lag causes bad experiences.

Design Changes

Through testing with the RPi 4 some more this past week, we’ve found that it is insufficient to drive the sensors, the microphone, the speech recognition and NLP loop, and the customer-side UI all at the same time. Therefore, we decided to use one of our laptops as the main CPU. The microphone will be plugged into the laptop, which runs the customer-side UI and the backend server that supports it. The sensors will be driven by an Arduino Uno Rev3.

We’ve also decided to switch from using PIR sensors to using ultrasonic module HC-SR04 distance sensors, because they could provide detailed information about our customers (exactly how far away they are from the kiosk and the microphone) rather than just whether they are detected.

Schedule

We are on track with our previous schedule, finishing all our assigned tasks before the interim demo. However, because we switched from the RPi 4 to an Arduino, we need to adjust our sensor code to accommodate. We’ve updated our Gantt chart to reflect the additional work required to do so.

Nina Duan’s Status Report For 4/1/2023

Personal Accomplishment

1. Voice Generation

I modified the voice synthesizing script I wrote earlier in the semester to support mass generation of constant messages (e.g. “Welcome to Meal by Words,” “please order your next item,” etc.) from the command line. I also created some helper functions so that the logic of the script can be called in real time to generate messages that haven’t been prepared already.

2. Order Interaction Workflow

I laid out the entire backend workflow of the order interaction in code. The interaction is as follows:

(After the back end has been woken up by an infrared sensor) Play the synthesized welcome message.
Ask the customer to order the first item. The system does support ordering and/or removing multiple items at a time, but, to maintain a relatively high item detection rate, we are limiting it to one item (with quantity) at a time for the MVP.
Parse customer speech and detect menu items.
If an item is detected, repeat the item and its quantity back to the customer. Otherwise, the system will ask the customer to repeat their order item after 15 seconds.
Ask the customer to order the next item. They can also start the sentence with “remove” to remove a certain amount of an item, or say “checkout” to checkout.
Repeat steps 3 to 5 until the customer says “checkout.”
Upload the order to the database and give the customer their order number.

In the future, we are also planning on adding a confirm feature for checkout, so the customer will be asked to confirm their order (displayed on customer UI and/or spoken out loud by voice synthesizer) before step 7.

About Schedule

I am on track with the schedule.

However, it’s important to note that because we are still having trouble with migrating our code to the microcontroller, some of the completed tasks may need to be reevaluated.

Plans for Next Week

I will work with Lisa to add a confirm feature to our system. This will require support from both the NLP module and the overall order interaction workflow. I will also fine-tune parameters such as the energy-level threshold of our speech recognition system and the amplitude of our noise cancellation filter to better accommodate the RPi environment. However, if we do need to replace the microcontroller with some other back end controller, we will also make the decision in the coming week.

If time permits, I will start integrating the cloud database with a preliminary, command-line-based staff-side UI.

Team Status Report For 4/1/2023

Risks

The biggest risk we are facing is the difference between our current development environment (MacOS) and that of the Raspberry Pi (RPiOS). While our prototype backend flows relatively smoothly on MacOS, it behaves in unexpected ways when migrated to the Raspberry Pi. We are still trying to find the root cause of this. Our fallback is running everything on our laptop instead, which means we might switch to using Arduino. The hardware components therefore would not include a Raspberry Pi. However, this is not a finalized decision. We will continue debugging on our Raspberry Pi this weekend.

Another risk is the overall speech-processing speed. The time it takes for our system to listen to the user input, convert to text, parse into entries, and add to a local order object is longer than our ideal goal of 1 second. Because we are unable to correctly determine the end of speech every time, sometimes the speech recognition module keeps listening after the customer has finished speaking.

Design Changes

We modified our menu to accommodate some NLP edge cases. The current menu is:

cheeseburger $7.99
hamburger    $6.99
chicken burger $7.49
beef sandwich $8.99
chicken sandwich $8.99
hot dog       $4.99
corn dog       $5.99
taco       $6.99
donut       $3.99
fries       $2.99
onion rings    $4.99
cheesecake $5.99
fountain drink    $1.29
coffee       $3.29
ice cream    $2.99

We are planning to add three more infrared sensors to increase detection accuracy. We might also use OpenCV as assistance to better detect people with special needs such as children or people sitting in wheelchairs.

Schedule

We have pushed back the design of staff-UI since the integration, testing and revisions will take up all the time before our interim demo. An up-to-date schedule has been attached.

Nina Duan’s Status Report For 3/25/2023

Personal Accomplishment

I worked with Lisa to integrate the microphone and speech recognition modules with the NLP module. For now, the system is able to correctly find the desired microphone, listen for and transcribe speech, parse long sentences like “I’d like one burger and two fries and three veggie burgers” in the background (in another thread), and store the parsed items under a single local Order object.

The first line in the screenshot is transcribed from speech.

About Schedule

I am on track with the schedule.

Plans for Next Week

I will continue to work with Lisa to improve our microphone, speech recognition, and NLP modules, as the usability of our system depends heavily on these parts. In addition, I will start creating ways of handling errors. For example, what should the system do if it fails to parse the customer’s speech? How should the system react if it times out?

Team Status Report For 3/25/2023

Risks

Although we have successfully integrated our microphone, speech recognition, and NLP modules, the functionality is still rather limited. For now, the system only has an accuracy of ~50% when translating from speech to text. In addition to exploring more noise cancellation algorithms, we will also find ways to limit how long a customer can speak. For example, we will ask customers to order items one by one instead of placing the entire order in one sentence. We will also repeat the detected item and quantity to the customer and ask them to confirm. In addition, at any time during the ordering process, the customer can say “remove XX item” to remove an item from the order. Hopefully, these measures are enough to guarantee that we don’t mistakenly order unwanted items for the customers.

In addition, we are currently having trouble downloading the related Spacy package to our Raspberry Pi due to Operating System incompatibility. We have tried 32-bit RPi OS as well as 64-bit RPi OS but have had no luck so far. This weekend we will try Ubuntu. In the worst case, we might use sockets to request and fetch NLP and speech recognition results from another computer. Another fallback option is to simply run our backend modules on a laptop, as we have already tested them on MacOS.

Design Changes

For the NLP system, we have changed the way order deletion is processed. Previously when the user input includes a deletion keyword but no quantity is present, we chose to not process the request. Now in this situation, the NLP system will consider it a request to delete all of the mentioned menu items in the order since that is the more intuitive intention. For example, when the customer says “no cheeseburgers”, we should be able to remove all cheeseburger entries in the current order.

In addition, for our web application, we switched back to using Django from pure Python because it provides better support for client-server integration.

Schedule

We broke some larger tasks down into smaller chunks for better keeping track of everyone’s progress.

One major schedule change we made is pushing the staff UI design to early April. As this is a post-MVP feature, we will work on it after all other subsystems have been integrated.

Currently, everyone is on track with the new schedule.