I found a bug where the instructions for the frontend display fails to update even though the system is playing the current sound for the instruction. I realized that this was due to a previous modification to the threads in the backend and made changes accordingly. Now, the backend currently supplies the current instruction strings to the frontend.
Integration testing
I’ve started conducting end-to-end tests by running through the entire process, starting from approaching the kiosk to successfully checking out. However, these tests are still fairly preliminary, mainly for checking that the entire workflow is reasonable and bug-free. I haven’t recorded any quantitative measurements, yet.
About Schedule
I am on track with the schedule.
Plans for Next Week
Other than working on the final poster, video, and report, I will continue to work with my teammates to conduct integration and volunteer testing.
I added to our system the ability to terminate the speech recognition system. By calling this method, all background threads and the currently-running interaction will be terminated. The order information that belongs to the current interaction will also be deleted. When our new distance sensor detects that the customer has walked away, our system will use this functionality to terminate the current order interaction.
Testing
The first test I conducted involved the latency between the customer-side uploading an order and the staff-side receiving the uploaded order. To calculate the difference, I printed the time when the order was sent and the time when the order was received. “Time” is defined as the amount of time, in seconds, since the epoch (same as how Unix defines time). I conducted two groups of ten tests (twenty in total) and received varying results.
CMU-SECURE (4/12/2023)
Trial #
Time Sent
Time Arrived
Total Time (Time Arrived – Time Sent)
1
1681311756.920146
1681311757.988204
1.068058
2
1681311787.9240708
1681311788.301242
0.377171278
3
1681312039.466178
1681312040.4595578
0.9933798313
4
1681312140.5965528
1681312144.366512
3.769959211
5
1681312195.167861
1681312197.255147
2.087286
6
1681312260.151395
1681312263.733936
3.582541
7
1681312359.745095
1681312360.1503391
0.405244112
8
1681312407.597444
1681312417.1346428
9.537198782
9
1681312475.726104
1681312478.991681
3.265577
10
1681312525.983286
1681312526.7069042
0.723618269
Avg.
2.581003348
Median
1.577672
CMU-SECURE (4/17/2023)
Trial #
Time Sent
Time Arrived
Total Time (Time Arrived – Time Sent)
1
1681743612.060843
1681743612.5518022
0.4909591675
2
1681743614.6954062
1681743615.077455
0.3820488453
3
1681743618.92501
1681743619.474676
0.549666
4
1681743624.654081
1681743625.0546181
0.400537014
5
1681744253.4430232
1681744254.947194
1.504170895
6
1681744293.227913
1681744294.318719
1.090806
7
1681744319.980497
1681744320.395576
0.415079
8
1681744338.203062
1681744338.6219149
0.4188528061
9
1681744356.008338
1681744356.6730611
0.6647231579
10
1681744377.0767202
1681744378.1253068
1.048586607
Avg.
0.6965429493
Median
0.5203125838
I then tested our speech recognition system for audio-to-text accuracy.
Due to the nature of our system, we mainly care that the speech recognition system recognizes the correct word, not its verb tense or singularness. Therefore, verbs of different tenses will be considered as the same word (e.g. “wake,” “woke,” “waken” are considered the same). Similarly, we won’t distinguish between singular and plural nouns (e.g. “hamburger” and “hamburgers” are considered the same).
Other than running through the entire order workflow (from ordering the first item to checking out) without the UI, which we showed during the interim demo, I’m planning on conducting the following tests:
Using Python’s built-in function for getting the current system time, measure the difference between the time an order is uploaded to the database after a customer confirms checkout and the time the staff-side UI’s backend is notified of its existence. Ideally, the time should be less than 0.5s (500ms). This will allow the staff-side UI to fetch the data from the database and display the new order in the anticipated 1s latency requirement.
Verify that the order fetched from the database matches the order the customer placed. All of the following parameters should match: order number, order time, items ordered, and total price.
Find audio clips with different levels of background noise and play them to the microphone. The speech recognition accuracy should be kept above 85%. This will allow our NLP to recognize menu items most of the time.
After completing each individual test, we will get together as a group and perform some integration tests, preferably with volunteers with different speech habits or from different cultural backgrounds.
Personal Accomplishment
With Lisa’s NLP support, I was able to add a “confirm” functionality to our checkout process. Now, instead of directly checking the customer out when they say “checkout,” the system will ask the customer to review their order. If the customer says “yes,” the system will check them out through the same process as before. Otherwise, the system will return to the previous state, where the customer can add more items or remove existing items.
I also fixed a bug in our system that allowed customers to remove items they didn’t order. Before, the system would respond to a “remove” request with “you have removed …” without checking whether the order contains said item. Now, the system will only say so when the customer has, indeed, order the item they desire to remove.
To better support Shiyi’s frontend design, I created a separate thread for indicating when the customer should speak and when they should stop speaking. This thread will be used to control a microphone icon on the customer-side UI. When the system is listening for customer speech, the microphone icon will flash green and invite the customer to speak. Otherwise, the icon will let the customer know that the system is currently unable to hear what they are saying. This long-running thread terminates when the customer confirms to check out, so it can also be used to detect when the checkout process is complete. Therefore, it can also control when the customer-side UI navigates to the “order complete” page.
About Schedule
I am on track with the schedule.
Plans for Next Week
I will work with Shiyi to integrate the customer-side UI with the newly edited backend. I will also work with Lisa to integrate the preliminary staff-side UI with the database’s pub-sub functionality. At the same time, I will conduct the tests mentioned in the “Verification and Validation Plan” section.
I modified the voice synthesizing script I wrote earlier in the semester to support mass generation of constant messages (e.g. “Welcome to Meal by Words,” “please order your next item,” etc.) from the command line. I also created some helper functions so that the logic of the script can be called in real time to generate messages that haven’t been prepared already.
2. Order Interaction Workflow
I laid out the entire backend workflow of the order interaction in code. The interaction is as follows:
(After the back end has been woken up by an infrared sensor) Play the synthesized welcome message.
Ask the customer to order the first item. The system does support ordering and/or removing multiple items at a time, but, to maintain a relatively high item detection rate, we are limiting it to one item (with quantity) at a time for the MVP.
Parse customer speech and detect menu items.
If an item is detected, repeat the item and its quantity back to the customer. Otherwise, the system will ask the customer to repeat their order item after 15 seconds.
Ask the customer to order the next item. They can also start the sentence with “remove” to remove a certain amount of an item, or say “checkout” to checkout.
Repeat steps 3 to 5 until the customer says “checkout.”
Upload the order to the database and give the customer their order number.
In the future, we are also planning on adding a confirm feature for checkout, so the customer will be asked to confirm their order (displayed on customer UI and/or spoken out loud by voice synthesizer) before step 7.
About Schedule
I am on track with the schedule.
However, it’s important to note that because we are still having trouble with migrating our code to the microcontroller, some of the completed tasks may need to be reevaluated.
Plans for Next Week
I will work with Lisa to add a confirm feature to our system. This will require support from both the NLP module and the overall order interaction workflow. I will also fine-tune parameters such as the energy-level threshold of our speech recognition system and the amplitude of our noise cancellation filter to better accommodate the RPi environment. However, if we do need to replace the microcontroller with some other back end controller, we will also make the decision in the coming week.
If time permits, I will start integrating the cloud database with a preliminary, command-line-based staff-side UI.
I worked with Lisa to integrate the microphone and speech recognition modules with the NLP module. For now, the system is able to correctly find the desired microphone, listen for and transcribe speech, parse long sentences like “I’d like one burger and two fries and three veggie burgers” in the background (in another thread), and store the parsed items under a single local Order object.
The first line in the screenshot is transcribed from speech.
About Schedule
I am on track with the schedule.
Plans for Next Week
I will continue to work with Lisa to improve our microphone, speech recognition, and NLP modules, as the usability of our system depends heavily on these parts. In addition, I will start creating ways of handling errors. For example, what should the system do if it fails to parse the customer’s speech? How should the system react if it times out?
In addition to completing the ethics assignment, I integrated our database module and preliminary NLP module with Lisa and modified the microphone & speech recognition system provided by Python’s SpeechRecognition library.
After integration, our system is now able to extract menu items and quantities from simple sentences, add them to an Order object, and upload that object to the database. However, there are still flaws with this simple system because we have yet to implement the checkout portion of the NLP module.
The open-source SpeechRecognition library provides a basic real-time speech recognition functionality that can be used with an external microphone. This process, however, doesn’t allow room for noise reduction. Therefore, I explored the source code of the library, determined where the microphone’s input is read, and extended it to utilize a noise reduction algorithm. For now, it uses a simple, deterministic noise cancellation algorithm that attempts to cancel out low amplitudes by mixing with the signal’s inversion. By slightly altering this visualization tool, I was able to visualize the difference. This is what it looks like when I speak at conversational volume from a distance of ~0.7m, with a restaurant ambience noise YouTube video playing in the background (graphs are in time domain; top = raw microphone input, bottom = filtered input):
No speech, only noise.Speech with noise, with amplitude decreased.
About Schedule
I have caught up to the schedule. The microphone has been set up, and preliminary signal processing code has been written.
Plans for Next Week
I will continue to work with Lisa to improve our NLP & database modules, as this is the core part of our system. In addition, I will start installing necessary dependencies on and transferring our code to the microcontroller (RPi 4).
Other than completing the design review report with my teammates, I also worked on a couple of tasks.
1. Voice Synthesizer Script
To assist Shiyi with developing an accessible UI, I created a voice synthesizer script using the open source library Google Text-to-Speech (gTTS). The script allows the user to synthesize any English text from both an input prompt and the command line:
2. Database and customer-side model for orders, items, and the menu
I finalized the representations of orders, items, and the menu both on the cloud database and in local storage:
The design review report goes into detail about the model and how they interact with each other, so I won’t repeat them here. The important thing to note is that, by design, the local copy won’t be uploaded to the cloud until the customer finishes ordering by calling checkout().
I have tested the flow and successfully added sample orders into the cloud database:
3. Staff-side model for orders
I designed a model to represent orders for the staff-side as well:
This object will automatically be generated when a subscriber to the Redis pub/sub channel receives a new orderNum. It allows the staff to view order items, cross out prepared items (using finishItem()), and remove completed orders from the cloud database (using removeOrder()).
4. Redis pub/sub and fetching orders from the database
The Redis pub/sub channel is shared by the customer-side modules (publishers) and the staff-side modules (subscribers). Once the customer-side order publishes its orderNum, the staff-side subscriber thread will receive a message containing the orderNum and spawn a child thread to fetch that orderNum’s information from the database.
I have implemented this functionality as well, but it still requires more testing.
About Schedule
Since all of us are slightly behind, the database and NLP integration hasn’t been able to happen, yet. I am fairly confident that the database component is complete functionality-wise, and unit-testing has been conducted. Therefore, once we meet again next week, Lisa and I will be able to start utilizing the database with data from the NLP module.
Plans for Next Week
Our microphone and infrared sensor are set to arrive next week. Therefore, I will shift gears and start programming the microphone against our RPi 4.
Lisa and I will also try to integrate our NLP modules and database modules during the mandatory lab meetings.
On Wednesday, I presented my group’s design review presentation and received valuable feedback.
After following up on our request for AWS credit, I was told that we should use a free, open source database instead of AWS DynamoDB. As a result, I spent some time experimenting with Replit, the database an instructor recommended, and Redis. However, the free version only allows us to create public repositories, which could result in academic integrity issues, so I ended up choosing Redis. While Redis is built to support storage of complex data structures, it works perfectly well with small-scale, simple key-value pairs we are planning on storing, too. In addition, because it is an open source database, there are many sample projects and usages that we can draw inspiration from.
As of now, I have finished setting up the cloud database and written skeletal Python code for simple data insertion, removal, and modification.
I will follow up with a more detailed storage model design in my next status report and our design review report.
About Schedule
Since we switched to Redis Database in the middle of the week, I have fallen behind schedule. However, because our project only relies on a few basic functionalities that are common among most noSQL cloud databases, this change won’t require a drastic change in our design.
Plans for Next Week
Other than crafting the design review report, I will create object classes representing customer orders and related subcategories in Python, which will match how they are stored in the cloud database. Completing this will allow us to integrate the cloud database with our NLP algorithm, which Lisa is still in the process of fine tuning.
Principles of Engineering, Science, and Mathematics – Relevant Courses
Many courses touched on the importance of ethical considerations in engineering. For example, both 18-100 and 18-500 had slides dedicated to the societal/economic/environmental impact of engineering.
Modularity is also emphasized in many ECE and CS courses. Project-heavy courses such as 18-341, 18-349, 15-445, 17-214 especially focused on this, since modularity makes a large project more testable and maintainable.
Personal Accomplishment
For the first half of the week, I focused on researching microcontrollers and databases.
In the end, my teammates and I decided to use a Raspberry Pi for speech recognition because it can interface with sensors and microphones and has CPU and memory powerful enough to drive a speech recognition algorithm. I also found some sample projects that use a Raspberry Pi for signal processing:
I focused my database research on comparing Amazon DynamoDB and Redis:
DynamoDB
Redis (Remote Dictionary Server)
General
Commercial system (pay)
Open-source, can be used for commercial purposes
Storage Model
Key-value
Document model
Key-value
Secondary database models: document store, graph DBMS, and spatial DBMS
Partitioning
Sharding
Sharding
Performance
20+ million requests/sec
R&W fast regardless of table size
In-memory database (requires large amount of memory to run quickly)
Optimized for complicated data structures
Durability & Availability
3 separate zones
Data still available even if one zone goes offline
Open-source version not very durable (diskless DB)
Security
Encryption
No encryption
Use Cases
Applications that require high-speed data writing and reading
Session cache, chat, messaging, and queues
Geospatial data, live streams, and real-time analytics
Pricing
On-demand mode: based on number of accesses
Free, open-source
For now, we are planning on using DynamoDB because our project requires fast insertions and deletions but not complex data structures. The final decision will, of course, also depend on whether we’re able to get AWS credit through this course.
For the second half of the week, I worked on preparing for the design review presentation.
About Schedule
I am on track with our schedule. In fact, we were able to get a Raspberry Pi 4 and start playing with it ahead of time.
Plans for Next Week
I will be presenting our design in class.
In addition, I will discuss the potential of getting AWS credits with the instructors and start familiarizing myself with DynamoDB’s APIs.
Once we review design review feedback, my teammate and I will also place orders for the hardware components.
This week, I mainly focused on conducting research for our proposal use-case requirements and some components necessary to achieve them.
To properly quantify our project’s service expectations, I took a look into research about service times of existing fast food restaurants and found this 2016 research by QSR Magazineparticularly interesting. Although it is about drive-thru service specifically, the research data does suggest that customers expect an average service time of about 200 seconds. A news report from 2020 claims that drive-thru has been slowing down in recent years, which means the expectation nowadays could potentially be even lower.
To achieve our use-case requirements (see proposal), we need one or more directional microphones that can receive verbal inputs from a distance of 0.5m to 1.0m. They will be driven by a Raspberry Pi or an Arduino, which requires USB or I2S connectivity. Here’s a list of some options I’ve found so far:
More compatible with Arduino, includes device-specific library
CircuitPython module (in Python, C)
Samson Go USB Mic
Compatible with with Raspberry Pi and laptops (Mac & Windows)
In addition, I also took a look at available commercial databases. In our proposal, we chose to use noSQL cloud database, which leaves us with two prominent options:
AWS DynamoDB: fast insertions and deletions, but less customizable and structured
Redis: supports secondary database models like the document store, graph DBMS, and spatial DBMS
About Schedule
Since we have scheduled two weeks for preliminary research (i.e. to be completed by 2/19/2023, the due date for our design presentation), I am on track with our schedule.
Plans for Next Week
I hope to finalize our microphone, microcontroller, and database selection by next Wednesday, which will allow us to finish our design and start gathering chosen components. I will also be our group’s presenter, so the rest of the week will be spent on polishing our presentation slides and preparing for Monday’s presentation.