Nina Duan’s Status Report For 4/29/2023

Personal Accomplishment
  1. Bug fixes

I found a bug where the instructions for the frontend display fails to update even though the system is playing the current sound for the instruction. I realized that this was due to a previous modification to the threads in the backend and made changes accordingly. Now, the backend currently supplies the current instruction strings to the frontend.

  1. Integration testing

I’ve started conducting end-to-end tests by running through the entire process, starting from approaching the kiosk to successfully checking out. However, these tests are still fairly preliminary, mainly for checking that the entire workflow is reasonable and bug-free. I haven’t recorded any quantitative measurements, yet.

About Schedule

I am on track with the schedule.

Plans for Next Week

Other than working on the final poster, video, and report, I will continue to work with my teammates to conduct integration and volunteer testing.

Nina Duan’s Status Report For 4/22/2023

Personal Accomplishment
  1. Order termination

I added to our system the ability to terminate the speech recognition system. By calling this method, all background threads and the currently-running interaction will be terminated. The order information that belongs to the current interaction will also be deleted. When our new distance sensor detects that the customer has walked away, our system will use this functionality to terminate the current order interaction.

  1. Testing

The first test I conducted involved the latency between the customer-side uploading an order and the staff-side receiving the uploaded order. To calculate the difference, I printed the time when the order was sent and the time when the order was received. “Time” is defined as the amount of time, in seconds, since the epoch (same as how Unix defines time). I conducted two groups of ten tests (twenty in total) and received varying results.

CMU-SECURE (4/12/2023)

Trial #

Time Sent Time Arrived

Total Time (Time Arrived – Time Sent)

1 1681311756.920146 1681311757.988204 1.068058
2 1681311787.9240708 1681311788.301242 0.377171278
3 1681312039.466178 1681312040.4595578 0.9933798313
4 1681312140.5965528 1681312144.366512 3.769959211
5 1681312195.167861 1681312197.255147 2.087286
6 1681312260.151395 1681312263.733936 3.582541
7 1681312359.745095 1681312360.1503391 0.405244112
8 1681312407.597444 1681312417.1346428 9.537198782
9 1681312475.726104 1681312478.991681 3.265577
10 1681312525.983286 1681312526.7069042 0.723618269
Avg. 2.581003348
Median 1.577672

 

CMU-SECURE (4/17/2023)

Trial # Time Sent Time Arrived Total Time (Time Arrived – Time Sent)
1 1681743612.060843 1681743612.5518022 0.4909591675
2 1681743614.6954062 1681743615.077455 0.3820488453
3 1681743618.92501 1681743619.474676 0.549666
4 1681743624.654081 1681743625.0546181 0.400537014
5 1681744253.4430232 1681744254.947194 1.504170895
6 1681744293.227913 1681744294.318719 1.090806
7 1681744319.980497 1681744320.395576 0.415079
8 1681744338.203062 1681744338.6219149 0.4188528061
9 1681744356.008338 1681744356.6730611 0.6647231579
10 1681744377.0767202 1681744378.1253068 1.048586607
Avg. 0.6965429493
Median 0.5203125838

I then tested our speech recognition system for audio-to-text accuracy.

Due to the nature of our system, we mainly care that the speech recognition system recognizes the correct word, not its verb tense or singularness. Therefore, verbs of different tenses will be considered as the same word (e.g. “wake,” “woke,” “waken” are considered the same). Similarly, we won’t distinguish between singular and plural nouns (e.g. “hamburger” and “hamburgers” are considered the same).

Sentence Spoken # of Words Spoken Sentence Recognized # of Words Correctly Recognized Accuracy (Words Spoken/Words Recognized)
“I’d like two hamburgers.” 5 “I like to hamburgers” 3 60%
“I want two cheeseburgers.” 4 “I want to cheeseburger” 3 75%
“One beautifully-packaged chicken sandwich, please.” 6 “1 beautifully packaged chicken sandwich please” 6 100%
“I want to order a hundred cheesecakes.” 7 “I want to order 100 cheesecake” 7 100%
“Get me two hamburgers.” 4 “Get me to Hamburg” 2 50%
“I’d like one fries and three fountain drinks.” 9 “I like 1 fries and 3 fountain drinks” 8 88.9%
“Check out.” 2 “Check out” 2 100%
“Hello, let’s go with four tacos and three ice creams.” 11 “Hello let’s go with 4 taco and 3 ice creams” 11 100%
“I’d like one cup of coffee.” 7 “I like 1 cup of coffee.” 6 85.7%
“Fifty corn dogs.” 3 “50 corn dog” 3 100%
Avg. 87.9%

This is not a comprehensive test. We will continue to monitor the accuracy of our speech recognition system as we start integration testing.

About Schedule

I am on track with the schedule.

Plans for Next Week

Other than attending the final presentations, my teammates and I will start conducting integration tests with volunteers.

Nina Duan’s Status Report For 4/8/2023

Verification and Validation Plan

Other than running through the entire order workflow (from ordering the first item to checking out) without the UI, which we showed during the interim demo, I’m planning on conducting the following tests:

  1. Using Python’s built-in function for getting the current system time, measure the difference between the time an order is uploaded to the database after a customer confirms checkout and the time the staff-side UI’s backend is notified of its existence. Ideally, the time should be less than 0.5s (500ms). This will allow the staff-side UI to fetch the data from the database and display the new order in the anticipated 1s latency requirement.
  2. Verify that the order fetched from the database matches the order the customer placed. All of the following parameters should match: order number, order time, items ordered, and total price.
  3. Find audio clips with different levels of background noise and play them to the microphone. The speech recognition accuracy should be kept above 85%. This will allow our NLP to recognize menu items most of the time.

After completing each individual test, we will get together as a group and perform some integration tests, preferably with volunteers with different speech habits or from different cultural backgrounds.

Personal Accomplishment

With Lisa’s NLP support, I was able to add a “confirm” functionality to our checkout process. Now, instead of directly checking the customer out when they say “checkout,” the system will ask the customer to review their order. If the customer says “yes,” the system will check them out through the same process as before. Otherwise, the system will return to the previous state, where the customer can add more items or remove existing items.

I also fixed a bug in our system that allowed customers to remove items they didn’t order. Before, the system would respond to a “remove” request with “you have removed …” without checking whether the order contains said item. Now, the system will only say so when the customer has, indeed, order the item they desire to remove.

To better support Shiyi’s frontend design, I created a separate thread for indicating when the customer should speak and when they should stop speaking. This thread will be used to control a microphone icon on the customer-side UI. When the system is listening for customer speech, the microphone icon will flash green and invite the customer to speak. Otherwise, the icon will let the customer know that the system is currently unable to hear what they are saying. This long-running thread terminates when the customer confirms to check out, so it can also be used to detect when the checkout process is complete. Therefore, it can also control when the customer-side UI navigates to the “order complete” page.

About Schedule

I am on track with the schedule.

Plans for Next Week

I will work with Shiyi to integrate the customer-side UI with the newly edited backend. I will also work with Lisa to integrate the preliminary staff-side UI with the database’s pub-sub functionality. At the same time, I will conduct the tests mentioned in the “Verification and Validation Plan” section.

Nina Duan’s Status Report For 4/1/2023

Personal Accomplishment

1. Voice Generation

I modified the voice synthesizing script I wrote earlier in the semester to support mass generation of constant messages (e.g. “Welcome to Meal by Words,” “please order your next item,” etc.) from the command line. I also created some helper functions so that the logic of the script can be called in real time to generate messages that haven’t been prepared already.

2. Order Interaction Workflow

I laid out the entire backend workflow of the order interaction in code. The interaction is as follows:

  1. (After the back end has been woken up by an infrared sensor) Play the synthesized welcome message.
  2. Ask the customer to order the first item. The system does support ordering and/or removing multiple items at a time, but, to maintain a relatively high item detection rate, we are limiting it to one item (with quantity) at a time for the MVP.
  3. Parse customer speech and detect menu items.
  4. If an item is detected, repeat the item and its quantity back to the customer. Otherwise, the system will ask the customer to repeat their order item after 15 seconds.
  5. Ask the customer to order the next item. They can also start the sentence with “remove” to remove a certain amount of an item, or say “checkout” to checkout.
  6. Repeat steps 3 to 5 until the customer says “checkout.”
  7. Upload the order to the database and give the customer their order number.

In the future, we are also planning on adding a confirm feature for checkout, so the customer will be asked to confirm their order (displayed on customer UI and/or spoken out loud by voice synthesizer) before step 7.

About Schedule

I am on track with the schedule.

However, it’s important to note that because we are still having trouble with migrating our code to the microcontroller, some of the completed tasks may need to be reevaluated.

Plans for Next Week

I will work with Lisa to add a confirm feature to our system. This will require support from both the NLP module and the overall order interaction workflow. I will also fine-tune parameters such as the energy-level threshold of our speech recognition system and the amplitude of our noise cancellation filter to better accommodate the RPi environment. However, if we do need to replace the microcontroller with some other back end controller, we will also make the decision in the coming week.

If time permits, I will start integrating the cloud database with a preliminary, command-line-based staff-side UI.

Nina Duan’s Status Report For 3/25/2023

Personal Accomplishment

I worked with Lisa to integrate the microphone and speech recognition modules with the NLP module. For now, the system is able to correctly find the desired microphone, listen for and transcribe speech, parse long sentences like “I’d like one burger and two fries and three veggie burgers” in the background (in another thread), and store the parsed items under a single local Order object.

The first line in the screenshot is transcribed from speech.

About Schedule

I am on track with the schedule.

Plans for Next Week

I will continue to work with Lisa to improve our microphone, speech recognition, and NLP modules, as the usability of our system depends heavily on these parts. In addition, I will start creating ways of handling errors. For example, what should the system do if it fails to parse the customer’s speech? How should the system react if it times out?

Nina Duan’s Status Report For 3/18/2023

Personal Accomplishment

In addition to completing the ethics assignment, I integrated our database module and preliminary NLP module with Lisa and modified the microphone & speech recognition system provided by Python’s SpeechRecognition library.

After integration, our system is now able to extract menu items and quantities from simple sentences, add them to an Order object, and upload that object to the database. However, there are still flaws with this simple system because we have yet to implement the checkout portion of the NLP module.

The open-source SpeechRecognition library provides a basic real-time speech recognition functionality that can be used with an external microphone. This process, however, doesn’t allow room for noise reduction. Therefore, I explored the source code of the library, determined where the microphone’s input is read, and extended it to utilize a noise reduction algorithm. For now, it uses a simple, deterministic noise cancellation algorithm that attempts to cancel out low amplitudes by mixing with the signal’s inversion. By slightly altering this visualization tool, I was able to visualize the difference. This is what it looks like when I speak at conversational volume from a distance of ~0.7m, with a restaurant ambience noise YouTube video playing in the background (graphs are in time domain; top = raw microphone input, bottom = filtered input):

No speech, only noise.
Speech with noise, with amplitude decreased.
About Schedule

I have caught up to the schedule. The microphone has been set up, and preliminary signal processing code has been written.

Plans for Next Week

I will continue to work with Lisa to improve our NLP & database modules, as this is the core part of our system. In addition, I will start installing necessary dependencies on and transferring our code to the microcontroller (RPi 4).

Nina Duan’s Status Report For 3/11/2023

Personal Accomplishment

Other than completing the design review report with my teammates, I also worked on a couple of tasks.

1. Voice Synthesizer Script

To assist Shiyi with developing an accessible UI, I created a voice synthesizer script using the open source library Google Text-to-Speech (gTTS). The script allows the user to synthesize any English text from both an input prompt and the command line:

2. Database and customer-side model for orders, items, and the menu

I finalized the representations of orders, items, and the menu both on the cloud database and in local storage:

The design review report goes into detail about the model and how they interact with each other, so I won’t repeat them here. The important thing to note is that, by design, the local copy won’t be uploaded to the cloud until the customer finishes ordering by calling checkout().

I have tested the flow and successfully added sample orders into the cloud database:

3. Staff-side model for orders

I designed a model to represent orders for the staff-side as well:

This object will automatically be generated when a subscriber to the Redis pub/sub channel receives a new orderNum. It allows the staff to view order items, cross out prepared items (using finishItem()), and remove completed orders from the cloud database (using removeOrder()).

4. Redis pub/sub and fetching orders from the database

The Redis pub/sub channel is shared by the customer-side modules (publishers) and the staff-side modules (subscribers). Once the customer-side order publishes its orderNum, the staff-side subscriber thread will receive a message containing the orderNum and spawn a child thread to fetch that orderNum’s information from the database.

I have implemented this functionality as well, but it still requires more testing.

About Schedule

Since all of us are slightly behind, the database and NLP integration hasn’t been able to happen, yet. I am fairly confident that the database component is complete functionality-wise, and unit-testing has been conducted. Therefore, once we meet again next week, Lisa and I will be able to start utilizing the database with data from the NLP module.

Plans for Next Week

Our microphone and infrared sensor are set to arrive next week. Therefore, I will shift gears and start programming the microphone against our RPi 4.

Lisa and I will also try to integrate our NLP modules and database modules during the mandatory lab meetings.

Nina Duan’s Status Report For 2/25/2023

Personal Accomplishment

On Wednesday, I presented my group’s design review presentation and received valuable feedback.

After following up on our request for AWS credit, I was told that we should use a free, open source database instead of AWS DynamoDB. As a result, I spent some time experimenting with Replit, the database an instructor recommended, and Redis. However, the free version only allows us to create public repositories, which could result in academic integrity issues, so I ended up choosing Redis. While Redis is built to support storage of complex data structures, it works perfectly well with small-scale, simple key-value pairs we are planning on storing, too. In addition, because it is an open source database, there are many sample projects and usages that we can draw inspiration from.

As of now, I have finished setting up the cloud database and written skeletal Python code for simple data insertion, removal, and modification.

I will follow up with a more detailed storage model design in my next status report and our design review report.

About Schedule

Since we switched to Redis Database in the middle of the week, I have fallen behind schedule. However, because our project only relies on a few basic functionalities that are common among most noSQL cloud databases, this change won’t require a drastic change in our design.

Plans for Next Week

Other than crafting the design review report, I will create object classes representing customer orders and related subcategories in Python, which will match how they are stored in the cloud database. Completing this will allow us to integrate the cloud database with our NLP algorithm, which Lisa is still in the process of fine tuning.

Nina Duan’s Status Report For 2/18/2023

Principles of Engineering, Science, and Mathematics – Relevant Courses

Many courses touched on the importance of ethical considerations in engineering. For example, both 18-100 and 18-500 had slides dedicated to the societal/economic/environmental impact of engineering.

Modularity is also emphasized in many ECE and CS courses. Project-heavy courses such as 18-341, 18-349, 15-445, 17-214 especially focused on this, since modularity makes a large project more testable and maintainable.

Personal Accomplishment

For the first half of the week, I focused on researching microcontrollers and databases.

In the end, my teammates and I decided to use a Raspberry Pi for speech recognition because it can interface with sensors and microphones and has CPU and memory powerful enough to drive a speech recognition algorithm. I also found some sample projects that use a Raspberry Pi for signal processing:

I focused my database research on comparing Amazon DynamoDB and Redis:

DynamoDB Redis (Remote Dictionary Server)
General Commercial system (pay) Open-source, can be used for commercial purposes
Storage Model Key-value

Document model

Key-value

Secondary database models: document store, graph DBMS, and spatial DBMS

Partitioning Sharding Sharding
Performance 20+ million requests/sec

R&W fast regardless of table size

In-memory database (requires large amount of memory to run quickly)

Optimized for complicated data structures

Durability & Availability 3 separate zones

Data still available even if one zone goes offline

Open-source version not very durable (diskless DB)
Security Encryption No encryption
Use Cases Applications that require high-speed data writing and reading Session cache, chat, messaging, and queues

Geospatial data, live streams, and real-time analytics

Pricing On-demand mode: based on number of accesses Free, open-source

For now, we are planning on using DynamoDB because our project requires fast insertions and deletions but not complex data structures. The final decision will, of course, also depend on whether we’re able to get AWS credit through this course.

For the second half of the week, I worked on preparing for the design review presentation.

About Schedule

I am on track with our schedule. In fact, we were able to get a Raspberry Pi 4 and start playing with it ahead of time.

Plans for Next Week

I will be presenting our design in class.

In addition, I will discuss the potential of getting AWS credits with the instructors and start familiarizing myself with DynamoDB’s APIs.

Once we review design review feedback, my teammate and I will also place orders for the hardware components.

Nina Duan’s Status Report For 2/11/2023

Personal Accomplishment

This week, I mainly focused on conducting research for our proposal use-case requirements and some components necessary to achieve them.

To properly quantify our project’s service expectations, I took a look into research about service times of existing fast food restaurants and found this 2016 research by QSR Magazine particularly interesting. Although it is about drive-thru service specifically, the research data does suggest that customers expect an average service time of about 200 seconds. A news report from 2020 claims that drive-thru has been slowing down in recent years, which means the expectation nowadays could potentially be even lower.

To achieve our use-case requirements (see proposal), we need one or more directional microphones that can receive verbal inputs from a distance of 0.5m to 1.0m. They will be driven by a Raspberry Pi or an Arduino, which requires USB or I2S connectivity. Here’s a list of some options I’ve found so far:

  • WM8960 I2S Microphone
    • Raspberry Pi connectivity, compatible with Raspberry Pi Zero/Zero W/Zero
    • WH/2B/3B/3B+
    • Comes with demo and development guide in Python
  • MP34DT01 I2S Microphone
    • More compatible with Arduino, includes device-specific library
    • CircuitPython module (in Python, C)
  • Samson Go USB Mic
    • Compatible with with Raspberry Pi and laptops (Mac & Windows)

In addition, I also took a look at available commercial databases. In our proposal, we chose to use noSQL cloud database, which leaves us with two prominent options:

  1. AWS DynamoDB: fast insertions and deletions, but less customizable and structured
  2. Redis: supports secondary database models like the document store, graph DBMS, and spatial DBMS
About Schedule

Since we have scheduled two weeks for preliminary research (i.e. to be completed by 2/19/2023, the due date for our design presentation), I am on track with our schedule.

Plans for Next Week

I hope to finalize our microphone, microcontroller, and database selection by next Wednesday, which will allow us to finish our design and start gathering chosen components. I will also be our group’s presenter, so the rest of the week will be spent on polishing our presentation slides and preparing for Monday’s presentation.