Shiyi Zhang’s Status Report for 04/29/2023

Personal Accomplishments

  1. Constructed two wooden stands for the sensors to remain stationary. One of the sensors need to be on the monitor while the other needs to be on the table. The sensor stand supposed to be attached to the monitor is able to lock onto the monitor, and the other will need to be duct-taped to the table on the demo day.
  2. Conducted tests. I conducted various tests for the UI and the sensors. The test results can be found in the group status report for this week.

Schedule

My progress is on schedule.

Next week

I will be using the newly arrived alligator clips next week to extend our wires. This will allow us to place the sensors connected to the Arduino further away on both the monitor and the table. I will also be working with my teammates on the final report, poster, and video.Furthermore, I will utilize the remaining time to search for possible corner cases to ensure that they do not occur on the demo day.

Lisa Xiong’s Status Report For 4/29/2023

Personal Accomplishments

This week I have completed our final presentation and started working on the final poster and report. I have also done more unit tests on the NLP system and the results are attached below. All the simple commands only involving addition or deletions resulted in 100% accuracy; the only test cases which went wrong are the ones where addition and deletion commands appear in the same sentence. Fortunately, based on our requirement for the users to order item-by-item, this complicated sentence should not occur in the use case of our system. The reason why the inaccurate test cases are hard to fix is that different menu items can generate different dependency trees even in the exact same sentence structure. Accounting for one edge case often means giving up on another.

Trial Test string Addition Accuracy Removal Accuracy Inaccurate cases
1 can I have a [menu item (sg)] please 100% N/A
2 two [menu item (pl)] 100% N/A
3 two [menu item (sg)] 100% N/A
4 five beautifully packaged [menu item (pl)] 100% N/A
5 a box of [menu item (sg)] please 100% N/A
6 twenty [menu item1 (pl)] and three [menu item3 (pl)] 100% N/A
7 remove [menu item (pl)] please N/A 100%
8 i will get rid of [menu item (pl)] N/A 100%
9 one [menu item (sg)] oh wait can i get rid of a terrible [menu item (sg)] please 80% 100% Hot dog, fries, onion rings
10 i want three [menu item (pl)] can i delete two [menu item (pl)] 80% 100% Cheeseburger, chicken burger, chicken sandwich
Schedule

I am on track with our schedule.

Plans for Next Week

I will work with my teammates for the final report, poster and video submission. We will find volunteers to test our system as well.

Nina Duan’s Status Report For 4/29/2023

Personal Accomplishment
  1. Bug fixes

I found a bug where the instructions for the frontend display fails to update even though the system is playing the current sound for the instruction. I realized that this was due to a previous modification to the threads in the backend and made changes accordingly. Now, the backend currently supplies the current instruction strings to the frontend.

  1. Integration testing

I’ve started conducting end-to-end tests by running through the entire process, starting from approaching the kiosk to successfully checking out. However, these tests are still fairly preliminary, mainly for checking that the entire workflow is reasonable and bug-free. I haven’t recorded any quantitative measurements, yet.

About Schedule

I am on track with the schedule.

Plans for Next Week

Other than working on the final poster, video, and report, I will continue to work with my teammates to conduct integration and volunteer testing.

Team Status Report For 4/29/2023

Testing

1. Audio to text

Due to the nature of our system, we mainly care that the speech recognition system recognizes the correct word, not its verb tense or singularness. Therefore, verbs of different tenses will be considered as the same word (e.g. “wake,” “Woke,” “waken” are considered the same). Similarly, we won’t distinguish between singular and plural nouns (e.g. “hamburger” and “hamburgers” are considered the same).

The average accuracy across 10 samples was 87.9%. A more detailed report of the test results is available at “Nina Duan’s Status Report For 4/22/2023.”

2. Text to command

We tested the NLP system by sending it text input, simulating the parsed result of the speech recognition system by removing capitalization and punctuations. Some example inputs have already been listed in “Lisa Xiong’s Status Report For 4/22/2023”, and a detailed report will be included in “Lisa Xiong’s Status Report For 4/29/2023”. The NLP system is able to reach 100% accuracy when parsing basic commands.

3. Order upload latency

This tests the latency between the customer-side uploading an order and the staff-side receiving the uploaded order. To calculate the difference, we printed the time when the order was sent and the time when the order was received. “Time” is defined as the amount of time, in seconds, since the epoch (same as how Unix defines time).

The average accuracy across 20 samples, collected over 2 days, was 1.638s. The median was 1.021s. A more detailed report of the test results is available at “Nina Duan’s Status Report For 4/22/2023.”

The latency falls in an acceptable range but fluctuates depending on the latency of the network.

4. Order upload accuracy

This test checks that the order received by the staff-side is the same as the order uploaded from the customer-side. We hard-coded 10 different orders, with varying order items and quantity, and uploaded them to the database.

The resulting accuracy was 100%. We found no mismatch between the order received by the staff-side and the order uploaded from the customer-side.

5. Kiosk activation latency and accuracy

This test checks how long it takes for the distance sensor to detect the presence of an approaching customer. We tested distances including 20 cm, 30 cm, 40 cm, 50 cm, 60 cm, and 80cm. For 20 – 60 cm, the resulting accuracy was 100%. For 80 cm, the accuracy was around 70% but since we are only expecting to detect customers within 60 cm away from the kiosk, we would say we achieved our goal. Latency was always under 2 seconds, and for 93% of time, it was below 1.5 seconds.

6. Latency and accuracy of distance detection for the mic

This test checks how long it takes (latency) for the distance sensor attached to our mic to detect if the person is speaking close enough (25 cm) to the mic when it is their time to speak. In addition, we tested how accurate the results are. For the latency part, it took an average of 1.3 seconds to detect that a customer was not close enough ( > 25 cm). The accuracy rate was 95% on average.

7. Integration test

We have done end-to-end testing among ourselves to make sure that the system is functioning as expected. Our next steps will be documenting the end-to-end order accuracy through thorough tests and finding volunteers to give feedback on the overall design.

Risks

Our entire system depends on the internet. Therefore, if the WiFi at the demo location fails, our system will fail. However, since we are using our laptops to run the system logic, we could use personal hotspots from our phones to keep the system up.

Another risk which we have mentioned in the 4/22 status report is the noise level at Wiegand Gym. We hope that our demo location can be placed in one of the smaller rooms for a quieter environment.

Design Changes

We did not implement any design changes this week. We have finalized our system and are proceeding to the integration/volunteer testing phase of the project.

Schedule

We are on track with our current schedule.

Shiyi Zhang’s Status Report for 04/22/2023

Personal Accomplishments

This week, I’ve been working on bringing together all the different components of our project. The application can now transition from waking up from the sleep mode, taking orders for the current customer, processing a checkout, and then returning to the sleep mode if there are no new customers in line.

Additionally, I’ve also styled our web application to make it visually appealing for anyone looking for some fast food. I’ve also filmed a video that shows the entire ordering process for our final presentation slides.

I’ve also conducted some tests to measure the response time of our sensors. On average, it takes our sensors about 1-2 seconds to detect the presence of a customer and begin the ordering process. Additionally, our tests revealed that when a customer is not speaking close enough to our microphone, it takes approximately 1 second for our system to recognize their speech.

Schedule

My progress is on schedule.

Next week

I’ve found some minor bugs in our application, and I will be addressing them next week. One issue that arose was the occasional skipping of an alert display that reminds the customer that it is time to speak/be silent. Another issue was that occasionally there was mismatch between the speech recognition result and the frontend display of that result due to using two different libraries for the same job.

Lisa Xiong’s Status Report For 4/22/2023

Personal Accomplishments

I have completed all the staff UI features by adding buttons which use Nina’s order item removal function, allowing the staff to delete order items once they are done. The details of the staff UI will be shown in the final presentation as a quick demo.

The unit tests for the NLP system have been completed. I made test files with basic sentence structures, such as “can I have a [menu item] please”, “2 [menu items]” and “five beautifully packaged [menu items]”. And replaced the [menu item] with actual names of the items. According to our unit tests, I have made modifications to our NLP program to handle the parsing of “2 ice creams”. Usually the quantifier (2 in this case) will be directly linked to the last word of the menu item, but for “ice cream” it is modifying “ice”, which makes my existing NLP rules not applicable. The NLP system passed all basic sentence structure tests with 100% accuracy.

Schedule

I am on track with our schedule; the staff UI is finished according to plan.

Plans for Next Week

I will work with my teammates for the final presentation and final video submission. We will find volunteers to test our system as well.

Nina Duan’s Status Report For 4/22/2023

Personal Accomplishment
  1. Order termination

I added to our system the ability to terminate the speech recognition system. By calling this method, all background threads and the currently-running interaction will be terminated. The order information that belongs to the current interaction will also be deleted. When our new distance sensor detects that the customer has walked away, our system will use this functionality to terminate the current order interaction.

  1. Testing

The first test I conducted involved the latency between the customer-side uploading an order and the staff-side receiving the uploaded order. To calculate the difference, I printed the time when the order was sent and the time when the order was received. “Time” is defined as the amount of time, in seconds, since the epoch (same as how Unix defines time). I conducted two groups of ten tests (twenty in total) and received varying results.

CMU-SECURE (4/12/2023)

Trial #

Time Sent Time Arrived

Total Time (Time Arrived – Time Sent)

1 1681311756.920146 1681311757.988204 1.068058
2 1681311787.9240708 1681311788.301242 0.377171278
3 1681312039.466178 1681312040.4595578 0.9933798313
4 1681312140.5965528 1681312144.366512 3.769959211
5 1681312195.167861 1681312197.255147 2.087286
6 1681312260.151395 1681312263.733936 3.582541
7 1681312359.745095 1681312360.1503391 0.405244112
8 1681312407.597444 1681312417.1346428 9.537198782
9 1681312475.726104 1681312478.991681 3.265577
10 1681312525.983286 1681312526.7069042 0.723618269
Avg. 2.581003348
Median 1.577672

 

CMU-SECURE (4/17/2023)

Trial # Time Sent Time Arrived Total Time (Time Arrived – Time Sent)
1 1681743612.060843 1681743612.5518022 0.4909591675
2 1681743614.6954062 1681743615.077455 0.3820488453
3 1681743618.92501 1681743619.474676 0.549666
4 1681743624.654081 1681743625.0546181 0.400537014
5 1681744253.4430232 1681744254.947194 1.504170895
6 1681744293.227913 1681744294.318719 1.090806
7 1681744319.980497 1681744320.395576 0.415079
8 1681744338.203062 1681744338.6219149 0.4188528061
9 1681744356.008338 1681744356.6730611 0.6647231579
10 1681744377.0767202 1681744378.1253068 1.048586607
Avg. 0.6965429493
Median 0.5203125838

I then tested our speech recognition system for audio-to-text accuracy.

Due to the nature of our system, we mainly care that the speech recognition system recognizes the correct word, not its verb tense or singularness. Therefore, verbs of different tenses will be considered as the same word (e.g. “wake,” “woke,” “waken” are considered the same). Similarly, we won’t distinguish between singular and plural nouns (e.g. “hamburger” and “hamburgers” are considered the same).

Sentence Spoken # of Words Spoken Sentence Recognized # of Words Correctly Recognized Accuracy (Words Spoken/Words Recognized)
“I’d like two hamburgers.” 5 “I like to hamburgers” 3 60%
“I want two cheeseburgers.” 4 “I want to cheeseburger” 3 75%
“One beautifully-packaged chicken sandwich, please.” 6 “1 beautifully packaged chicken sandwich please” 6 100%
“I want to order a hundred cheesecakes.” 7 “I want to order 100 cheesecake” 7 100%
“Get me two hamburgers.” 4 “Get me to Hamburg” 2 50%
“I’d like one fries and three fountain drinks.” 9 “I like 1 fries and 3 fountain drinks” 8 88.9%
“Check out.” 2 “Check out” 2 100%
“Hello, let’s go with four tacos and three ice creams.” 11 “Hello let’s go with 4 taco and 3 ice creams” 11 100%
“I’d like one cup of coffee.” 7 “I like 1 cup of coffee.” 6 85.7%
“Fifty corn dogs.” 3 “50 corn dog” 3 100%
Avg. 87.9%

This is not a comprehensive test. We will continue to monitor the accuracy of our speech recognition system as we start integration testing.

About Schedule

I am on track with the schedule.

Plans for Next Week

Other than attending the final presentations, my teammates and I will start conducting integration tests with volunteers.

Team Status Report For 4/22/2023

Risks

The most significant risk is that if our demo location is the Wiegand Gym, the surrounding background noise will be challenging for our speech detection and recognition. Our system currently works under capstone lab noise, which is similar to a restaurant’s noise level; however, it’s difficult to simulate the environment of Wiegand Gym (room size, the number of people coming to the demos, and the echoes) and, hence, we can’t properly test that our system won’t be affected. We will set up the booth with our microphone facing its back wall, hoping that the sound shield can block the noises coming from outside the booth.

Design Changes

Instead of timing out after two minutes of inactivity, our system now uses an ultrasonic distance sensor to detect whether a customer is present in the ordering area. This means that we will be able to terminate an order interaction and remove the current order if the serviced customer walks away from the kiosk without checking out. This ultrasonic distance sensor will also be used to wake up our kiosk when a new customer approaches. We will also attach an ultrasonic distance sensor to the mic so that we can remind the customer to stand closer to the mic when speaking.

In addition, we are still debating on whether to add a computer vision system that detects the number of customers waiting in line. This depends on whether we can find two poles that are tall enough to hang the overhead camera for detection. If the poles are not tall enough, the detection success rate would drop significantly, and therefore not worth having the camera/cv part at all.

Schedule

We are on track with our current schedule.

Shiyi Zhang’s Status Report for 04/08/2023

Tests

For the parts I am responsible for (this includes distance sensors, user interface, and possibly a camera), I have conducted some unit tests on PIR sensors and user interface. Our distance sensors have not arrived yet, so tests for them will be delayed, but we are expecting them to arrive next week.

  1. User interface: The duration of the timeout given to the text transcription (i.e., audio input to text) has a direct impact on the completeness of the transcribed text. After testing timeouts of 1, 2, and 3 seconds, I found that 3 seconds was the safest option, as the text rarely got cut off. However, this extended delay came at the expense of user experience. In contrast, a timeout of 1 second provided a better user experience but required the customer to speak quickly with no gap at all, or risk having their speech cut off. After weighing these options, I ultimately decided to go with a timeout of 1 second. In addition, I have tested edge cases including receiving no speech for longer than 30 seconds (should go into INACTIVE mode and delete the current, incomplete order), checking out (should submit order), and receiving unrecognizable speech (should wait). They work as intended. However, I have not tested the UI with the sensors and the camera installed. My plan is to test whether the UI can reflect the number of people waiting in line, whether it can remind the customer to get closer to the mic, and whether it can switch to the appropriate page when no customer is around.
  2. Sensors and the camera: My plan is to experiment with tilting the sensors to find the optimal angle for detecting people within a specific distance range, while ignoring those beyond that range. There are several factors to consider, including the location of the sensors and how to distinguish between an individual and a large crowd.Once we have the sensors installed and calibrated, I will evaluate their performance in terms of accuracy and speed. Specifically, I’ll be looking at how accurately the system can count the number of people in line (actual # of people vs # calculated by us), as well as how fast the camera/OpenCV can process the data (within how many seconds the # of people is counted).

Personal Accomplishments

This week, my focus has been on integrating the backend and the frontend. Nina added flag variables and a new interface for the frontend, which is now used by the frontend to read the status of the speech recognition and natural language processing parts of the system. As a result of the changes, the frontend now has the ability to detect when it’s time for customers to speak and when the system is processing and won’t accept any audio inputs. Additionally, I implemented code that can transcribe speech to text to display on the screen. This will enable customers who are hard of hearing to view their order.

Aside from the frontend work, I’ve also been working on the hardware aspect of the project. Since the distance sensors have not arrived yet, I have been exploring the use of OpenCV to better understand what the customer is doing. As a result, the system can now detect the number of people waiting in line, as well as identify if a person is present.

Schedule

My progress has been slightly delayed because the distance sensors haven’t arrived yet, and we have just switched from RPI to Arduino. However, to avoid any further delays, I’ve implemented a backup solution using OpenCV and a camera. This should ensure that our progress won’t be affected, even if the sensors never arrive. We also have a second backup plan in place, which involves using PIR sensors. I have already written the necessary code for this option, so we are prepared.

Next week

Once the distance sensors arrive, my plan is to install them on our Arduino and then work on debugging the code I have prepared for them. Additionally, I intend to integrate these sensors with a camera, using the OpenCV library, so that the OpenCV part knows when and when not to check the surrounding.

Lisa Xiong’s Status Report For 4/8/2023

Personal Accomplishments

There are two changes I added to the NLP algorithm this week. The first one is when deleting all of a certain item, for example, when the customer says “no hamburgers”, instead of using the hard-coded 1000 (a quantity large enough than reasonable numbers), I now use Nina’s getItemQuantity function to retrieve the menu item’s actual quantity and delete that exact number from the order. The second one is automatically changing the word “hamburg” to “hamburger” when parsing, since the speech recognition system often mishears “hamburger” as “hamburg”, and the word “hamburg” does not commonly appear in fast food orders.

I also have started designing the staff UI. My plan is to make each order appear in a small box, and make the entire page resemble a Kanban board.

Testing Algorithms

I have run tests with basic sentence structures for all the menu items, giving the NLP system user input as text and asking it to print out the final order. Two examples of basic sentences will be “I want [quantity] [menu item]” and “Can I get [quantity] [menu item]?”. Some tests with sentences including the removal of menu items also have been conducted, such as “Can I get rid of a [menu item]?” or “No [menu item]”. My initial plan was to use a timer to test the speed of NLP processing, but I soon realized that the processing speed is far below 1 second so the speed test was no longer necessary. There are no design or use case requirements specifically set for my NLP system or staff UI; the order accuracy metric involving the customer UI, speech recognition, NLP and database will be tested in a later stage with us and volunteers imitating the actual ordering situation.

Schedule

I am on track with our old (and updated) schedule (see team status report section for details). I have started working on the staff UI, and I will keep optimizing the NLP system when we test during integration.

Plans for Next Week

I will work on the staff UI design next week and aim to get it done before when the next status report is due. If more edge cases of ordering commands arise during the testing process, I will also update the NLP system to accommodate them.