Team Status Report for February 8, 2025

Project Risks and Mitigation Strategies

  • The most significant risk is that the AR display won’t be arriving till mid March so the integration has had to be pushed. The board has an HDMI output so we can test the system using a computer monitor instead of the AR display. If the AR display arrives later than expected we will conduct most of our testing on an external monitor.
  • Two other significant risks are that the gesture recognition algorithm cannot run on the Raspberry Pi, or that the power demand of it running is too high for a reasonably-weighted battery. If either of these are true, we can offload some of the computational power onto the web app via the board’s wireless functionality.

Changes to System Design

  • We are adding a networking and social feature to the web app. This involves adding scoring incentive to completing recipes that are then translated to levels that are displayed on the user profile. Users can follow each other and view each other’s progress on profiles. We will also deploy our application on AWS EC2. This change was necessary to add back complexity into the project since we are directly using the pre-trained model from MediaPipe for gesture recognition.

Schedule Progress

We have reworked and detailed our Gantt chart.

Meeting Specific Needs

Part A was written by Diya, Part B was written by Charvi, and Part C was written by Rebecca.

Please write a paragraph or two describing how the product solution you are designing will meet a specified need…

Part A: … with respect to considerations of public health, safety or welfare.

By using the gesture interaction method, users can navigate recipes without touching the screen and this reduces cross contamination risks especially when handling raw ingredients. Additionally, by allowing the users to cook in a step by step manner, it helps the users to focus on one task at a time which allows beginner cooks to gain confidence. Also, with gesture control user’s will have less distractions such as phones or tablets and can minimize the risk of accidents in the kitchen. Moreover, the social network feature can allow users to track their progress and connect with other beginner cooks to promote a sense of community amongst new cooks.

Part B: … with consideration of social factors.

Our target user group is “new cooks” – this includes people that don’t cook often, younger adults and children that are in new environments where they have to start cooking for themselves, and people that have been bored or confused by cooking on their own. Our CookAR product will allow people to connect with one another on a platform focused on cooking and trying out new recipes, which will lead them to be motivated by like-minded peers to work on furthering their culinary knowledge and reach. In addition, by game-ifying the cooking process by allowing users to level up based on how many new recipes they tried, CookAR will also motivate people to try new recipes and engage with each other’s profiles, which will be displaying the same information about what recipes were tried and how many.

Part C: … with consideration of economic factors.

A lightweight headset made from relatively inexpensive parts- fifteen-dollar Raspberry Pi boards, a few dollars at most for each of the rest of the peripherals; the most expensive part is the FLCoS display, and even that is only a few tens of dollars for a single item off the shelf- is ideal for a target audience comprised of people trying to get into something they haven’t done much or any of before, and so a target audience which is unlikely to want to spend a lot of money on a tool like this. Compared to a more generalized heads-up-display on the market (or rather, formerly on the market) like the Google Glass, which retailed for $1500, this construction is cheap, while still being fairly resilient.

Additionally, this same hardware and software framework could be generalized to a wide variety of tasks with marginal changes, and a hypothetical “going-into-production” variant of this product would very easily be able to swap out the four-year-old Raspberry Pi Zero W for a something taking advantage of those years of silicon development- for instance additional accelerators like an NPU, tailored to our needs- in a manner such that scale offsets the increase in individual part price.

Rebecca’s Status Report for February 15, 2025

Report

  • I spent much of this week reworking the hardware decisions, because I realized in our meeting Tuesday morning after walking through the hardware specs of the ESP32s and the demands of the software that they almost certainly would not cut it. I decided to open the options to boards that demand 5V input, or recommend 5V input for heavy computation, and achieve this voltage by using a boost board on a 3.7V LiPo battery. After considering a wide variety of boards I narrowed my options down to two:
    • The Luckfox Pico Mini, which is based on a Raspberry Pi Pico; it is extremely small (~5g) but has an image processing and neural network accelerators. It has more RAM than an ESP32 (64MB in the spec, about 34MB usable space according to previous users) but still not a huge amount.
    • The Raspberry Pi Zero W, which has more RAM than the Luckfox (512MB) and a quad-core chip. It is also about twice the size of the Luckfox (~10g), but has a native AV out, which seems to be fairly unusual, and Bluetooth LE capability. This makes it ideal for running the microdisplay, which takes AV in, so I will not have to additionally purchase a converter board.
  • The decision was primarily which board to use for the camera input. Without intensive testing, it seems to me that if either are capable of running the MediaPipe/OpenCV algorithm we plan to use for gesture recognition, both would be- so it comes down to weight, speed, and ease of use.
  • Ultimately I’ve decided to go with two Raspberry Pi Zero W boards, as learning the development process for two different boards- even if closely related boards, as these are- would cost more time than I have to give. Additionally, if the Rasppi is not capable of running the algorithm, it already has wireless capability, so it is simpler to offload some of the computation onto the web app than it would be if I had to acquire an additional Bluetooth shield for the Luckfox, or pipe information through the other Rasppi’s wireless connection.
  • Power consumption will be an issue with these more powerful boards. After we get the algorithm running on one, I plan to test its loaded power consumption and judge the size of the battery I will need to meet our one-hour operation spec from there.
  • Additionally, considering the lightness of the program to run the display (as the Rasppi was chosen to run this part for its native AV out, not for its computational power) it may be possible to run both peripherals from a single board. I plan to test this once we have the recognition algorithm and a simple display generation program functional. If so, I will be able to trade the weight of the board I’m dropping into 10g more battery, which would give me more flexibility on lifetime.
  • Because of the display’s extremely long lead time, I plan to develop and test the display program using the Rasppi’s HDMI output, so it will be almost entirely functional- only needing to switch over to AV output- when the display arrives, and I can bring it online immediately.

Progress Schedule

Due to the display’s extremely long lead time and the changes we’ve made to the specs of the project, we’ve reworked our schedule from the ground up. The new Gantt chart can be found in the team status report for this week.

Next Week’s Deliverables

  • The initial CAD draft got put off because I sank so much time into board decisions. I believe this is okay because selecting the right hardware now will make our lives much easier later. Additionally, the time at which I’ll be able to print the headset frame has been pushed out significantly, so I’ve broken up the CAD into several steps, which are marked on the Gantt chart. The early draft, which is just the shape of the frame (and includes the time for me to become reacquainted with OnShape) should be mostly done by the end of next week. I expect this to take maybe four or five more hours.
  • The Rasppis will be ordered on Amazon Prime. They will arrive very quickly. At the same time I will order the camera, microHDMI->HDMI converter and an HDMI cable, so I can boot the boards immediately upon receipt and get their most basic I/O operational this week or very early next week.

Team Status Report for February 8, 2025

Project Risks and Mitigation Strategies 

  • Gesture Recognition Accuracy and Performance Issues
    • Risk: the accuracy of the gesture detection might be inconsistent or there might be limitations in the model chosen
    • Mitigation: test multiple approaches (MediaPipe, CNNs, Optical Flow) to determine the most robust method and then fine tune the model
    • If vision recognition is very unreliable, explore other sensor based alternatives such as integrating IMUs for gesture detection
  • Microcontroller compatibility
    • Risk: the microcontroller needs to support the real time data processing for the gesture recognition and AR display without latency issues
    • Mitigation: carefully evaluate microcontroller options to ensure compatibility with CV model. The intended camera board is designed for intensive visual processing.
      • If the microcontroller is not suitable for the CV model, we will look into offloading some of the processing power from the microcontroller to the laptop. This may require sending a great deal of data wirelessly and must be approached with caution.

Changes to the System Design 

  • Finalizing the device selection: There are fewer development board options than modules; however, we need the development board as we do not have the time to sink into creating our own environment. So we will be using the ESP32-DevKitC-VE Development Board, which implements a WROVER-E controller. This has the most storage capacity for its form factor and reasonable price.
  • Refining the computer vision model approach: Initially only considered a CNN based classification model for gesture recognition but after more research also testing MediaPipe and Optical Flow for potential improvements

Schedule Progress

Our deadlines do not start until next week, so our schedule remains the same.

 

Rebecca’s Status Report for February 8, 2025

Report

  • I have researched & decided upon specific devices for use in the project. I will need two microcontrollers, a microdisplay, a small camera, and a battery, all of which combined are reasonable to mount to a lightweight headset.
    • The microcontroller I will use for the display is the ESP32-WROVER-E (datasheet linked), via the development kit ESP32-DevKitC-VE. I will additionally use an ESP32-Cam module for the camera and controller.
      • I considered a number of modules and development boards. I decided that it was necessary to purchase a development board rather than just the module as it is both less expensive and will save me time interfacing with the controller as the development board comes with a micro USB port for loading instructions from the computer as well as easily-accessible pinouts.
      • The datasheet for the ESP32-Cam notes that the 5V power supply is recommended, however it is possible to power on the 3.3V supply.
      • The ESP32-Cam module does not have a USB port on the board, so I will also need to use an ESP-32-CAM-MB Adapter. As this is always required, these are usually sold in conjunction with the camera board.
    • The display I will use is a 0.2″ FLCoS display, which comes with an optics module so the image can be reflected from the display onto a lens.
    • The camera I will use is an OV2640 camera as part of the ESP32-Cam module.
    • The battery I will use is a 3.3V rechargeable battery. Likely a Li-PO or LiFePO4 battery, but I need to nail down current draw requirements for the rest of my devices before I finalize exactly which power supply I’ll use.
  • I have found an ESP32 library for generating composite video, which is the input that the microdisplay takes. The github is here.
  • I have set up & have begun to get used to a ESP32-IDF environment (works on VSCode). I also have used the Arduino IDE before, which seems to be the older preferred environment for programming ESP32s.
  • I have begun to draft the CAD for the 3D-printed headset.

Progress Schedule

  • Progress is on schedule. Our schedule’s deadlines do not begin until next week.
  • I’m worried about the lead time on the FLCoS display. I couldn’t find anyone selling a comparable device with a quicker lead time (though I could find several displays that were much larger and cost several hundred dollars). The very small size (0.2″) seems to be fairly unusual. I may have to reshuffle some tasks around if it does not arrive before the end of February/spring break. This could delay the finalization of our hardware.

Next Week’s Deliverables

  • By the end of the weekend (Sunday) I plan to have submitted the purchasing forms for the microcontrollers, camera, and display, so that I can talk to my TA Monday for approval, and the orders can go out on Tuesday. In the time between now and Tuesday, I’ll finalize my battery choice so it can hopefully go through on Thursday, or early the following week.
  • By the end of next week I plan to have the CAD for the 3D printed headset near-complete, with specific exception of the precise dimensions for the device mounting points, which I expect to need physical measurements that I can’t get from the spec sheets. Nailing down these dimensions should only require modification of a few constraints, assuming my preliminary estimates are accurate, so when the devices come in (the longest lead time is the display, which seems to be a little longer than two weeks) I expect CAD completion to take no more than an hour or so, and printing doable within a day or so thereafter.
  • I plan to finish reading through the ESP32 composite video library and begin to write the code for the display generation so that when it is delivered I can quickly proof successful communication and begin testing.
  • I plan to work through the ESP32-Cam guide so that when it arrives (much shorter lead time than the display) I can begin to test and code it, and we can validate the wireless connections.

Diya’s Status Report for 02/08

This week my primary focus was researching gesture recognition algorithms and setting up the necessary environment to begin implementation. Since I am relatively new to this field, I dedicated a significant amount of time to understanding the different approaches that I can use to implement real time gesture recognition and looking at the feasibility for integration into the CookAR glasses. 

I have detailed some algorithms I have researched below: 

  1. Google MediaPipe – MediaPipe Hand Tracking 
    1. The MediaPipe Hand Tracking offers real time hand tracking and provides 21 3D hand landmarks so this allows us to determine the hand position, orientation and gesture. 
    2. It has a > 90% real time accuracy and it is designed to be lightweight so it can be run on a microcontroller which has limited processing power 
    3. For the environment set up, I am using python, specifically the MediaPipe Python Package 
    4. Next steps include defining the gestures we want the algorithm to recognize so this includes swipe left, right for next, open palm for pause etc and then record the landmark data for each gesture. After this, I will extract the relevant features from the landmark data like the distance between key joints, angle between fingers, velocity of the hand movement. 
    5. I am planning to use a simple rule based system approach based on thresholds for distances/angles as the model for gesture classification. I looked into more robust models such as training a machine learning classifier using the extracted features. Here, I could use TensorFlow Lite to run the model efficiently on the microcontroller. I am first going to start off by just using the simple rule based approach and pivot to the more robust model if needed. 
    6. Since we are using Unity for the AR display, I also have to create a script that receives the gesture data and updates the AR elements accordingly. This is something I am looking more into. 
  2. Hidden Markov Models for Dynamic Gestures: 
    1. Used to recognize sequences of movements so this would be ideal for gestures that can involve a lot of different hand positions over time 
    2. Dataset of recorded gestures for training. I found a preliminary dataset with gestures
      1. https://www.visionbib.com/bibliography/contentspeople.html#Face%20Recognition,%20Detection,%20Tracking,%20Gesture%20Recognition,%20Fingerprints,%20Biometrics
      2. American Sign Language Dataset to recognize basic gestures 
    3. Implement using Tensorflow but it would need gesture sequence data for effective training 

Technical Challenges 

  1. I need to gather data in different lighting condition and different backgrounds to make sure the testing is robust
  2. I can also synthetically create more training data by adding noise, varying the lighting and rotating hand images 

Setting up the Development Environment

Since this is my first time working with gesture recognition, I spent time getting the necessary tools and dependencies installed: 

  • Installed necessary libraries like OpenCV, MediaPipe, TensorFlow 
  • Configured Jupyter Notebook for testing different models and algorithms 

Progress Update

 I would say that I am slightly behind schedule in terms of actual implementation but on track in terms of understanding the concepts and setting up the groundwork. The research and initial setup phase took longer than expected but now that I have a better understanding of the algorithms and their implementation, I should be able to move forward with actually implementing code. 

To catch up, I plan to: 

  1. Run and analyze sample gesture recognition models in Python 
  2. Begin experimenting with CNN models for static gesture classification 

Next Week’s Deliverables: 

By the end of the week, I aim to have: 

  • A working mediapipe hand tracking prototype capturing and displaying hand keypoints 
  • A basic CNN model for static gesture classification 

 

Charvi’s Status Report for 2/8/25

This week, our team worked on the porject proposal, website, and presentation.

I worked individually on creating the gannt chart for the team as well as the testing schedule and general scedule, and our team met multiple times to go over our proposal presentation as well as general project requirements and details.

I presented for the team, and after presenting we got some intersting questions and new considerations from our peers to go into.

One particular question that was interesting was pertaining to hand gesture recognition, and that it may not be feasible given the circumstance of being in a kitchen and there being a lot of moving parts. I think this is something we will have to test early.

I also spent some time looking into XR (specifically AR) development on Unity, as we did a breif exploration earlier but I wanted to get started with development. I spent some time getting familiar with Unity in general, and will be ready to use it going forward.

I am generally on schedule, though we will need to pick up the pace especially since web app development is supposed to be pretty much finished within the next week or two.

This upcoming week, I hope to get a wep app MVP finished and our basic databases set up. I hope this won’t be too bad, as both Diya and I have taken web app development, though it has been a little over a year so we might need to refersh ourselves a bit. though I still don’t anticipate it being too difficult. Once this is done if I have time, I want to do some basic Unity AR development and test it on existing hardware (my roomate has a meta quest) – just to make sure that there are no huge obstacles.