Team Status Report for February 8, 2025

Project Risks and Mitigation Strategies 

  • Gesture Recognition Accuracy and Performance Issues
    • Risk: the accuracy of the gesture detection might be inconsistent or there might be limitations in the model chosen
    • Mitigation: test multiple approaches (MediaPipe, CNNs, Optical Flow) to determine the most robust method and then fine tune the model
    • If vision recognition is very unreliable, explore other sensor based alternatives such as integrating IMUs for gesture detection
  • Microcontroller compatibility
    • Risk: the microcontroller needs to support the real time data processing for the gesture recognition and AR display without latency issues
    • Mitigation: carefully evaluate microcontroller options to ensure compatibility with CV model. The intended camera board is designed for intensive visual processing.
      • If the microcontroller is not suitable for the CV model, we will look into offloading some of the processing power from the microcontroller to the laptop. This may require sending a great deal of data wirelessly and must be approached with caution.

Changes to the System Design 

  • Finalizing the device selection: There are fewer development board options than modules; however, we need the development board as we do not have the time to sink into creating our own environment. So we will be using the ESP32-DevKitC-VE Development Board, which implements a WROVER-E controller. This has the most storage capacity for its form factor and reasonable price.
  • Refining the computer vision model approach: Initially only considered a CNN based classification model for gesture recognition but after more research also testing MediaPipe and Optical Flow for potential improvements

Schedule Progress

Our deadlines do not start until next week, so our schedule remains the same.

 

Rebecca’s Status Report for February 8, 2025

Report

  • I have researched & decided upon specific devices for use in the project. I will need two microcontrollers, a microdisplay, a small camera, and a battery, all of which combined are reasonable to mount to a lightweight headset.
    • The microcontroller I will use for the display is the ESP32-WROVER-E (datasheet linked), via the development kit ESP32-DevKitC-VE. I will additionally use an ESP32-Cam module for the camera and controller.
      • I considered a number of modules and development boards. I decided that it was necessary to purchase a development board rather than just the module as it is both less expensive and will save me time interfacing with the controller as the development board comes with a micro USB port for loading instructions from the computer as well as easily-accessible pinouts.
      • The datasheet for the ESP32-Cam notes that the 5V power supply is recommended, however it is possible to power on the 3.3V supply.
      • The ESP32-Cam module does not have a USB port on the board, so I will also need to use an ESP-32-CAM-MB Adapter. As this is always required, these are usually sold in conjunction with the camera board.
    • The display I will use is a 0.2″ FLCoS display, which comes with an optics module so the image can be reflected from the display onto a lens.
    • The camera I will use is an OV2640 camera as part of the ESP32-Cam module.
    • The battery I will use is a 3.3V rechargeable battery. Likely a Li-PO or LiFePO4 battery, but I need to nail down current draw requirements for the rest of my devices before I finalize exactly which power supply I’ll use.
  • I have found an ESP32 library for generating composite video, which is the input that the microdisplay takes. The github is here.
  • I have set up & have begun to get used to a ESP32-IDF environment (works on VSCode). I also have used the Arduino IDE before, which seems to be the older preferred environment for programming ESP32s.
  • I have begun to draft the CAD for the 3D-printed headset.

Progress Schedule

  • Progress is on schedule. Our schedule’s deadlines do not begin until next week.
  • I’m worried about the lead time on the FLCoS display. I couldn’t find anyone selling a comparable device with a quicker lead time (though I could find several displays that were much larger and cost several hundred dollars). The very small size (0.2″) seems to be fairly unusual. I may have to reshuffle some tasks around if it does not arrive before the end of February/spring break. This could delay the finalization of our hardware.

Next Week’s Deliverables

  • By the end of the weekend (Sunday) I plan to have submitted the purchasing forms for the microcontrollers, camera, and display, so that I can talk to my TA Monday for approval, and the orders can go out on Tuesday. In the time between now and Tuesday, I’ll finalize my battery choice so it can hopefully go through on Thursday, or early the following week.
  • By the end of next week I plan to have the CAD for the 3D printed headset near-complete, with specific exception of the precise dimensions for the device mounting points, which I expect to need physical measurements that I can’t get from the spec sheets. Nailing down these dimensions should only require modification of a few constraints, assuming my preliminary estimates are accurate, so when the devices come in (the longest lead time is the display, which seems to be a little longer than two weeks) I expect CAD completion to take no more than an hour or so, and printing doable within a day or so thereafter.
  • I plan to finish reading through the ESP32 composite video library and begin to write the code for the display generation so that when it is delivered I can quickly proof successful communication and begin testing.
  • I plan to work through the ESP32-Cam guide so that when it arrives (much shorter lead time than the display) I can begin to test and code it, and we can validate the wireless connections.

Diya’s Status Report for 02/08

This week my primary focus was researching gesture recognition algorithms and setting up the necessary environment to begin implementation. Since I am relatively new to this field, I dedicated a significant amount of time to understanding the different approaches that I can use to implement real time gesture recognition and looking at the feasibility for integration into the CookAR glasses. 

I have detailed some algorithms I have researched below: 

  1. Google MediaPipe – MediaPipe Hand Tracking 
    1. The MediaPipe Hand Tracking offers real time hand tracking and provides 21 3D hand landmarks so this allows us to determine the hand position, orientation and gesture. 
    2. It has a > 90% real time accuracy and it is designed to be lightweight so it can be run on a microcontroller which has limited processing power 
    3. For the environment set up, I am using python, specifically the MediaPipe Python Package 
    4. Next steps include defining the gestures we want the algorithm to recognize so this includes swipe left, right for next, open palm for pause etc and then record the landmark data for each gesture. After this, I will extract the relevant features from the landmark data like the distance between key joints, angle between fingers, velocity of the hand movement. 
    5. I am planning to use a simple rule based system approach based on thresholds for distances/angles as the model for gesture classification. I looked into more robust models such as training a machine learning classifier using the extracted features. Here, I could use TensorFlow Lite to run the model efficiently on the microcontroller. I am first going to start off by just using the simple rule based approach and pivot to the more robust model if needed. 
    6. Since we are using Unity for the AR display, I also have to create a script that receives the gesture data and updates the AR elements accordingly. This is something I am looking more into. 
  2. Hidden Markov Models for Dynamic Gestures: 
    1. Used to recognize sequences of movements so this would be ideal for gestures that can involve a lot of different hand positions over time 
    2. Dataset of recorded gestures for training. I found a preliminary dataset with gestures
      1. https://www.visionbib.com/bibliography/contentspeople.html#Face%20Recognition,%20Detection,%20Tracking,%20Gesture%20Recognition,%20Fingerprints,%20Biometrics
      2. American Sign Language Dataset to recognize basic gestures 
    3. Implement using Tensorflow but it would need gesture sequence data for effective training 

Technical Challenges 

  1. I need to gather data in different lighting condition and different backgrounds to make sure the testing is robust
  2. I can also synthetically create more training data by adding noise, varying the lighting and rotating hand images 

Setting up the Development Environment

Since this is my first time working with gesture recognition, I spent time getting the necessary tools and dependencies installed: 

  • Installed necessary libraries like OpenCV, MediaPipe, TensorFlow 
  • Configured Jupyter Notebook for testing different models and algorithms 

Progress Update

 I would say that I am slightly behind schedule in terms of actual implementation but on track in terms of understanding the concepts and setting up the groundwork. The research and initial setup phase took longer than expected but now that I have a better understanding of the algorithms and their implementation, I should be able to move forward with actually implementing code. 

To catch up, I plan to: 

  1. Run and analyze sample gesture recognition models in Python 
  2. Begin experimenting with CNN models for static gesture classification 

Next Week’s Deliverables: 

By the end of the week, I aim to have: 

  • A working mediapipe hand tracking prototype capturing and displaying hand keypoints 
  • A basic CNN model for static gesture classification 

 

Charvi’s Status Report for 2/8/25

This week, our team worked on the porject proposal, website, and presentation.

I worked individually on creating the gannt chart for the team as well as the testing schedule and general scedule, and our team met multiple times to go over our proposal presentation as well as general project requirements and details.

I presented for the team, and after presenting we got some intersting questions and new considerations from our peers to go into.

One particular question that was interesting was pertaining to hand gesture recognition, and that it may not be feasible given the circumstance of being in a kitchen and there being a lot of moving parts. I think this is something we will have to test early.

I also spent some time looking into XR (specifically AR) development on Unity, as we did a breif exploration earlier but I wanted to get started with development. I spent some time getting familiar with Unity in general, and will be ready to use it going forward.

I am generally on schedule, though we will need to pick up the pace especially since web app development is supposed to be pretty much finished within the next week or two.

This upcoming week, I hope to get a wep app MVP finished and our basic databases set up. I hope this won’t be too bad, as both Diya and I have taken web app development, though it has been a little over a year so we might need to refersh ourselves a bit. though I still don’t anticipate it being too difficult. Once this is done if I have time, I want to do some basic Unity AR development and test it on existing hardware (my roomate has a meta quest) – just to make sure that there are no huge obstacles.