Team B1: CookAR – Page 4 – Carnegie Mellon ECE Capstone, Spring 2025

Rebecca’s Status Report for February 22, 2025

Report

The remaining parts (Rasppis, SD cards, camera, HDMI cables for testing) were ordered Monday/Tuesday, and most of them arrived by Friday. What remains to arrive are, unfortunately, the Rasppis themselves, which is bottlenecking my progress.
I’ve drafted the CAD as expected (see below) which took. so long. Just so very many more hours than I thought, which, yeah, probably should have seen that one coming. Note for future modelling: do not use splines. Splines are the highway to incompletely constrained sketches. God, why did I use splines.

I’ve flashed the SD cards with the Raspberry Pi OS so I can boot them as soon as they arrive (expected Monday). Diya and I can sit down and check the model, and run the tests I need for power draw then.

Progress Schedule

A few of the tasks I expected to be done this Friday/Saturday did not get done because of the delivery delay. I cannot measure the power consumption of a board I do not have.
If I don’t get horribly unlucky, this should be done early next week; some of next week’s tasks may end up getting pushed into spring break, but we have that slack time there for that very reason. Most of the time dedicated to this class for the upcoming week is likely to be spent writing the design report.

Next Week’s Deliverables

The design report, obviously, is due at the end of next week. This is a team deliverable.
The Mediapipe/OpenCV-on-Rasppi tests I expected to do this week. We’ll know the power consumption, and then I can figure out what kind of battery I’ll need.

Charvi’s Status Report for 2/22/25

This week, I got some more work done on the webapp.

I was able to add profile functionality, with basic profile picture editing, bio adding, and follow / unfollow funtionality. Here are some screenshots:

https://drive.google.com/drive/folders/11YqsSXDr60VctZHMILzMGM5wH-JmnyWg?usp=sharing

(wordpress is seriously dropping the quality so that nothing is readable, so instead I linked it here^^).

It’s still quite barebones, but the functionality works and can be built upon.

Earlier in the week, our team also spent a decent amount of time working on further specifying the design specs for our design review presentation, and also working on the slides themselves.

I was able to add the unfollowing / following and basic networking functionality to the webapp, as well as making the website navigatable via hyperlinks, but it is not as clean as I would like and there are a few (though small) bugs. I also was unable to add the score / level functionality.

Overall, I was not able to get a satisfactory amount of work done on this project this week due to other classwork and other circumstances, so I am running behind.

As for what I have to do to get back on track, this week I really want to be completley done with the app functionality – Diya is tasked with working on parts of the networking (nameley deploying on AWS, and adding more advanced friending features such as requests), so I will focus on the score part and also cleaning up everything else. And then, depending on what my teammates are doing, I will either work on the website front end and make it look better, or help Diya out with the gesture recognition.

Diya’s Status Report for 02/22/2024

This past week, I was catching up on a lot of work since I was really sick the previous week and also had a midterm on Thursday. Despite that, I made significant progress on the project. I worked on the design presentation slides and presented them on Monday. Additionally, I have been working on OpenCV gesture recognition, ensuring it runs locally on my computer. The setup is now complete, and I am currently in the process of testing the accuracy of the model. Now that I have the gesture recognition working locally, the project is back on schedule. The progress aligns with our timeline, and I am ready to move forward with the next steps.

For the upcoming week, I plan to

Continue testing the accuracy of the gesture recognition model
Work on Figma design for the website interface.
Start working on the networking portion of the project for the webapp
Begin drafting and finalizing the design review report submission.

Charvi’s Status Report for 2/15/25

This week, I focused on getting a basic webapp with basic functionalities working locally. This included the add recipe functionality, select recipe functionality, as well as log in, log out, and registering users.

I got this done (though it looks quite bare bones)

Here are some screenshots (wordpress is lowering the quality of these photos alot and I cannot find out how to fix this, but essentially there are basic user register, log in, recipe input, and recipe selection functionality):

Our team also spent a lot of time this week redefining our project goals, scope, and complexity. This is explained more in the team status report, but what that means for the webapp is that we plan to add a lot of networking functionality, specifically with friending users, veiwing progress of others, and scoring / acheivments. This means there is quite a lot I will need to add / rework into the webapp.

In addition, we decided to no longer use Unity XR development as this seems like overkill for what we want to do (static text images), as it is for XR game development. Thus, we will just generate static images on the application in a simpler way.

My goals for this week according to my previous report were to get the basic MVP for the website done, which I had mostly done (though as mentioned, it will need a rework). I also wanted to work with Unity XR development a little, which I did not do, due to the rework of our goals.

Due to the project goal rework, I would say I am running behind schedule. Though, I don’t think by too much – I can use the existing basic webapp and models for the DB that I have created, just need to move things around and add some more functionality.

My goals for next week are to, as mentioned before, add networking functionality, friending / following functionality, and add score / level / acheivment functionality for completing recipe to the webapp.

Diya’s Status Report for 02/15/2024

This week was quite challenging for me as I was sick for most of it. Last week, I was recovering from a bacterial infection, and unfortunately, I came down with the flu this week, which led to a visit to urgent care. Despite that, I was still able to contribute to the project, particularly in refining our approach to hand gesture recognition and pivoting my role to contribute more effectively.

Initially, I had misunderstood the gesture recognition task, thinking I needed to find and train a dataset myself. However, after further research, I realized that MediaPipe provides a pretrained model with 90% accuracy for gesture recognition, meaning I could directly integrate it without training a new model. This required a shift in my focus, and I pivoted to handling the networking aspect of the project to add complexity and depth to my contribution.

Beyond that, I have been actively involved in facilitating group meetings, translating our use case requirements into quantitative design requirements, and preparing for the design review presentation this week.

Given my health issues, my progress is slightly behind where I initially wanted to be, but I have taken steps to ensure that I am back on track. Since the gesture recognition aspect is now streamlined with MediaPipe, I have moved focus to the networking component, which is a new responsibility. I am catching up by working on setting up the foundational pieces of the social network feature in our web app.

Next week, I plan to make significant progress on the networking component of the project. Specifically, I aim to set up user authentication for the web app to allow users to create accounts, implement user profiles, which will include cooking levels, past recipe attempts, and preferences, and develop a basic social network feature, where users can add friends and view their cooking activities.

Team Status Report for February 15, 2025

Project Risks and Mitigation Strategies

The most significant risk is that the AR display won’t be arriving till mid March so the integration has had to be pushed. The board has an HDMI output so we can test the system using a computer monitor instead of the AR display. If the AR display arrives later than expected we will conduct most of our testing on an external monitor.
Two other significant risks are that the gesture recognition algorithm cannot run on the Raspberry Pi, or that the power demand of it running is too high for a reasonably-weighted battery. If either of these are true, we can offload some of the computational power onto the web app via the board’s wireless functionality.

Changes to System Design

We are adding a networking and social feature to the web app. This involves adding scoring incentive to completing recipes that are then translated to levels that are displayed on the user profile. Users can follow each other and view each other’s progress on profiles. We will also deploy our application on AWS EC2. This change was necessary to add back complexity into the project since we are directly using the pre-trained model from MediaPipe for gesture recognition.

Schedule Progress

We have reworked and detailed our Gantt chart.

Meeting Specific Needs

Part A was written by Diya, Part B was written by Charvi, and Part C was written by Rebecca.

Please write a paragraph or two describing how the product solution you are designing will meet a specified need…

Part A: … with respect to considerations of public health, safety or welfare.

By using the gesture interaction method, users can navigate recipes without touching the screen and this reduces cross contamination risks especially when handling raw ingredients. Additionally, by allowing the users to cook in a step by step manner, it helps the users to focus on one task at a time which allows beginner cooks to gain confidence. Also, with gesture control user’s will have less distractions such as phones or tablets and can minimize the risk of accidents in the kitchen. Moreover, the social network feature can allow users to track their progress and connect with other beginner cooks to promote a sense of community amongst new cooks.

Part B: … with consideration of social factors.

Our target user group is “new cooks” – this includes people that don’t cook often, younger adults and children that are in new environments where they have to start cooking for themselves, and people that have been bored or confused by cooking on their own. Our CookAR product will allow people to connect with one another on a platform focused on cooking and trying out new recipes, which will lead them to be motivated by like-minded peers to work on furthering their culinary knowledge and reach. In addition, by game-ifying the cooking process by allowing users to level up based on how many new recipes they tried, CookAR will also motivate people to try new recipes and engage with each other’s profiles, which will be displaying the same information about what recipes were tried and how many.

Part C: … with consideration of economic factors.

A lightweight headset made from relatively inexpensive parts- fifteen-dollar Raspberry Pi boards, a few dollars at most for each of the rest of the peripherals; the most expensive part is the FLCoS display, and even that is only a few tens of dollars for a single item off the shelf- is ideal for a target audience comprised of people trying to get into something they haven’t done much or any of before, and so a target audience which is unlikely to want to spend a lot of money on a tool like this. Compared to a more generalized heads-up-display on the market (or rather, formerly on the market) like the Google Glass, which retailed for $1500, this construction is cheap, while still being fairly resilient.

Additionally, this same hardware and software framework could be generalized to a wide variety of tasks with marginal changes, and a hypothetical “going-into-production” variant of this product would very easily be able to swap out the four-year-old Raspberry Pi Zero W for a something taking advantage of those years of silicon development- for instance additional accelerators like an NPU, tailored to our needs- in a manner such that scale offsets the increase in individual part price.

Rebecca’s Status Report for February 15, 2025

Report

I spent much of this week reworking the hardware decisions, because I realized in our meeting Tuesday morning after walking through the hardware specs of the ESP32s and the demands of the software that they almost certainly would not cut it. I decided to open the options to boards that demand 5V input, or recommend 5V input for heavy computation, and achieve this voltage by using a boost board on a 3.7V LiPo battery. After considering a wide variety of boards I narrowed my options down to two:
- The Luckfox Pico Mini, which is based on a Raspberry Pi Pico; it is extremely small (~5g) but has an image processing and neural network accelerators. It has more RAM than an ESP32 (64MB in the spec, about 34MB usable space according to previous users) but still not a huge amount.
- The Raspberry Pi Zero W, which has more RAM than the Luckfox (512MB) and a quad-core chip. It is also about twice the size of the Luckfox (~10g), but has a native AV out, which seems to be fairly unusual, and Bluetooth LE capability. This makes it ideal for running the microdisplay, which takes AV in, so I will not have to additionally purchase a converter board.
The decision was primarily which board to use for the camera input. Without intensive testing, it seems to me that if either are capable of running the MediaPipe/OpenCV algorithm we plan to use for gesture recognition, both would be- so it comes down to weight, speed, and ease of use.
Ultimately I’ve decided to go with two Raspberry Pi Zero W boards, as learning the development process for two different boards- even if closely related boards, as these are- would cost more time than I have to give. Additionally, if the Rasppi is not capable of running the algorithm, it already has wireless capability, so it is simpler to offload some of the computation onto the web app than it would be if I had to acquire an additional Bluetooth shield for the Luckfox, or pipe information through the other Rasppi’s wireless connection.
Power consumption will be an issue with these more powerful boards. After we get the algorithm running on one, I plan to test its loaded power consumption and judge the size of the battery I will need to meet our one-hour operation spec from there.
Additionally, considering the lightness of the program to run the display (as the Rasppi was chosen to run this part for its native AV out, not for its computational power) it may be possible to run both peripherals from a single board. I plan to test this once we have the recognition algorithm and a simple display generation program functional. If so, I will be able to trade the weight of the board I’m dropping into 10g more battery, which would give me more flexibility on lifetime.
Because of the display’s extremely long lead time, I plan to develop and test the display program using the Rasppi’s HDMI output, so it will be almost entirely functional- only needing to switch over to AV output- when the display arrives, and I can bring it online immediately.

Progress Schedule

Due to the display’s extremely long lead time and the changes we’ve made to the specs of the project, we’ve reworked our schedule from the ground up. The new Gantt chart can be found in the team status report for this week.

Next Week’s Deliverables

The initial CAD draft got put off because I sank so much time into board decisions. I believe this is okay because selecting the right hardware now will make our lives much easier later. Additionally, the time at which I’ll be able to print the headset frame has been pushed out significantly, so I’ve broken up the CAD into several steps, which are marked on the Gantt chart. The early draft, which is just the shape of the frame (and includes the time for me to become reacquainted with OnShape) should be mostly done by the end of next week. I expect this to take maybe four or five more hours.
The Rasppis will be ordered on Amazon Prime. They will arrive very quickly. At the same time I will order the camera, microHDMI->HDMI converter and an HDMI cable, so I can boot the boards immediately upon receipt and get their most basic I/O operational this week or very early next week.

Team Status Report for February 8, 2025

Project Risks and Mitigation Strategies

Gesture Recognition Accuracy and Performance Issues
- Risk: the accuracy of the gesture detection might be inconsistent or there might be limitations in the model chosen
- Mitigation: test multiple approaches (MediaPipe, CNNs, Optical Flow) to determine the most robust method and then fine tune the model
- If vision recognition is very unreliable, explore other sensor based alternatives such as integrating IMUs for gesture detection
Microcontroller compatibility
- Risk: the microcontroller needs to support the real time data processing for the gesture recognition and AR display without latency issues
- Mitigation: carefully evaluate microcontroller options to ensure compatibility with CV model. The intended camera board is designed for intensive visual processing.
  - If the microcontroller is not suitable for the CV model, we will look into offloading some of the processing power from the microcontroller to the laptop. This may require sending a great deal of data wirelessly and must be approached with caution.

Changes to the System Design

Finalizing the device selection: There are fewer development board options than modules; however, we need the development board as we do not have the time to sink into creating our own environment. So we will be using the ESP32-DevKitC-VE Development Board, which implements a WROVER-E controller. This has the most storage capacity for its form factor and reasonable price.
- See Rebecca’s status report for the same week for more information about the device selection.
Refining the computer vision model approach: Initially only considered a CNN based classification model for gesture recognition but after more research also testing MediaPipe and Optical Flow for potential improvements

Schedule Progress

Our deadlines do not start until next week, so our schedule remains the same.

Rebecca’s Status Report for February 8, 2025

Report

I have researched & decided upon specific devices for use in the project. I will need two microcontrollers, a microdisplay, a small camera, and a battery, all of which combined are reasonable to mount to a lightweight headset.
- The microcontroller I will use for the display is the ESP32-WROVER-E (datasheet linked), via the development kit ESP32-DevKitC-VE. I will additionally use an ESP32-Cam module for the camera and controller.
  - I considered a number of modules and development boards. I decided that it was necessary to purchase a development board rather than just the module as it is both less expensive and will save me time interfacing with the controller as the development board comes with a micro USB port for loading instructions from the computer as well as easily-accessible pinouts.
  - The datasheet for the ESP32-Cam notes that the 5V power supply is recommended, however it is possible to power on the 3.3V supply.
  - The ESP32-Cam module does not have a USB port on the board, so I will also need to use an ESP-32-CAM-MB Adapter. As this is always required, these are usually sold in conjunction with the camera board.
- The display I will use is a 0.2″ FLCoS display, which comes with an optics module so the image can be reflected from the display onto a lens.
- The camera I will use is an OV2640 camera as part of the ESP32-Cam module.
- The battery I will use is a 3.3V rechargeable battery. Likely a Li-PO or LiFePO4 battery, but I need to nail down current draw requirements for the rest of my devices before I finalize exactly which power supply I’ll use.
I have found an ESP32 library for generating composite video, which is the input that the microdisplay takes. The github is here.
I have set up & have begun to get used to a ESP32-IDF environment (works on VSCode). I also have used the Arduino IDE before, which seems to be the older preferred environment for programming ESP32s.
I have begun to draft the CAD for the 3D-printed headset.

Progress Schedule

Progress is on schedule. Our schedule’s deadlines do not begin until next week.
I’m worried about the lead time on the FLCoS display. I couldn’t find anyone selling a comparable device with a quicker lead time (though I could find several displays that were much larger and cost several hundred dollars). The very small size (0.2″) seems to be fairly unusual. I may have to reshuffle some tasks around if it does not arrive before the end of February/spring break. This could delay the finalization of our hardware.

Next Week’s Deliverables

By the end of the weekend (Sunday) I plan to have submitted the purchasing forms for the microcontrollers, camera, and display, so that I can talk to my TA Monday for approval, and the orders can go out on Tuesday. In the time between now and Tuesday, I’ll finalize my battery choice so it can hopefully go through on Thursday, or early the following week.
By the end of next week I plan to have the CAD for the 3D printed headset near-complete, with specific exception of the precise dimensions for the device mounting points, which I expect to need physical measurements that I can’t get from the spec sheets. Nailing down these dimensions should only require modification of a few constraints, assuming my preliminary estimates are accurate, so when the devices come in (the longest lead time is the display, which seems to be a little longer than two weeks) I expect CAD completion to take no more than an hour or so, and printing doable within a day or so thereafter.
I plan to finish reading through the ESP32 composite video library and begin to write the code for the display generation so that when it is delivered I can quickly proof successful communication and begin testing.
I plan to work through the ESP32-Cam guide so that when it arrives (much shorter lead time than the display) I can begin to test and code it, and we can validate the wireless connections.

Diya’s Status Report for 02/08

This week my primary focus was researching gesture recognition algorithms and setting up the necessary environment to begin implementation. Since I am relatively new to this field, I dedicated a significant amount of time to understanding the different approaches that I can use to implement real time gesture recognition and looking at the feasibility for integration into the CookAR glasses.

I have detailed some algorithms I have researched below:

Google MediaPipe – MediaPipe Hand Tracking
1. The MediaPipe Hand Tracking offers real time hand tracking and provides 21 3D hand landmarks so this allows us to determine the hand position, orientation and gesture.
2. It has a > 90% real time accuracy and it is designed to be lightweight so it can be run on a microcontroller which has limited processing power
3. For the environment set up, I am using python, specifically the MediaPipe Python Package
4. Next steps include defining the gestures we want the algorithm to recognize so this includes swipe left, right for next, open palm for pause etc and then record the landmark data for each gesture. After this, I will extract the relevant features from the landmark data like the distance between key joints, angle between fingers, velocity of the hand movement.
5. I am planning to use a simple rule based system approach based on thresholds for distances/angles as the model for gesture classification. I looked into more robust models such as training a machine learning classifier using the extracted features. Here, I could use TensorFlow Lite to run the model efficiently on the microcontroller. I am first going to start off by just using the simple rule based approach and pivot to the more robust model if needed.
6. Since we are using Unity for the AR display, I also have to create a script that receives the gesture data and updates the AR elements accordingly. This is something I am looking more into.
Hidden Markov Models for Dynamic Gestures:
1. Used to recognize sequences of movements so this would be ideal for gestures that can involve a lot of different hand positions over time
2. Dataset of recorded gestures for training. I found a preliminary dataset with gestures
  1. https://www.visionbib.com/bibliography/contentspeople.html#Face%20Recognition,%20Detection,%20Tracking,%20Gesture%20Recognition,%20Fingerprints,%20Biometrics
  2. American Sign Language Dataset to recognize basic gestures
3. Implement using Tensorflow but it would need gesture sequence data for effective training

Technical Challenges

I need to gather data in different lighting condition and different backgrounds to make sure the testing is robust
I can also synthetically create more training data by adding noise, varying the lighting and rotating hand images

Setting up the Development Environment

Since this is my first time working with gesture recognition, I spent time getting the necessary tools and dependencies installed:

Installed necessary libraries like OpenCV, MediaPipe, TensorFlow
Configured Jupyter Notebook for testing different models and algorithms

Progress Update

I would say that I am slightly behind schedule in terms of actual implementation but on track in terms of understanding the concepts and setting up the groundwork. The research and initial setup phase took longer than expected but now that I have a better understanding of the algorithms and their implementation, I should be able to move forward with actually implementing code.

To catch up, I plan to:

Run and analyze sample gesture recognition models in Python
Begin experimenting with CNN models for static gesture classification

Next Week’s Deliverables:

By the end of the week, I aim to have:

A working mediapipe hand tracking prototype capturing and displaying hand keypoints
A basic CNN model for static gesture classification