Diya’s Status Report for 02/08

This week my primary focus was researching gesture recognition algorithms and setting up the necessary environment to begin implementation. Since I am relatively new to this field, I dedicated a significant amount of time to understanding the different approaches that I can use to implement real time gesture recognition and looking at the feasibility for integration into the CookAR glasses. 

I have detailed some algorithms I have researched below: 

  1. Google MediaPipe – MediaPipe Hand Tracking 
    1. The MediaPipe Hand Tracking offers real time hand tracking and provides 21 3D hand landmarks so this allows us to determine the hand position, orientation and gesture. 
    2. It has a > 90% real time accuracy and it is designed to be lightweight so it can be run on a microcontroller which has limited processing power 
    3. For the environment set up, I am using python, specifically the MediaPipe Python Package 
    4. Next steps include defining the gestures we want the algorithm to recognize so this includes swipe left, right for next, open palm for pause etc and then record the landmark data for each gesture. After this, I will extract the relevant features from the landmark data like the distance between key joints, angle between fingers, velocity of the hand movement. 
    5. I am planning to use a simple rule based system approach based on thresholds for distances/angles as the model for gesture classification. I looked into more robust models such as training a machine learning classifier using the extracted features. Here, I could use TensorFlow Lite to run the model efficiently on the microcontroller. I am first going to start off by just using the simple rule based approach and pivot to the more robust model if needed. 
    6. Since we are using Unity for the AR display, I also have to create a script that receives the gesture data and updates the AR elements accordingly. This is something I am looking more into. 
  2. Hidden Markov Models for Dynamic Gestures: 
    1. Used to recognize sequences of movements so this would be ideal for gestures that can involve a lot of different hand positions over time 
    2. Dataset of recorded gestures for training. I found a preliminary dataset with gestures
      1. https://www.visionbib.com/bibliography/contentspeople.html#Face%20Recognition,%20Detection,%20Tracking,%20Gesture%20Recognition,%20Fingerprints,%20Biometrics
      2. American Sign Language Dataset to recognize basic gestures 
    3. Implement using Tensorflow but it would need gesture sequence data for effective training 

Technical Challenges 

  1. I need to gather data in different lighting condition and different backgrounds to make sure the testing is robust
  2. I can also synthetically create more training data by adding noise, varying the lighting and rotating hand images 

Setting up the Development Environment

Since this is my first time working with gesture recognition, I spent time getting the necessary tools and dependencies installed: 

  • Installed necessary libraries like OpenCV, MediaPipe, TensorFlow 
  • Configured Jupyter Notebook for testing different models and algorithms 

Progress Update

 I would say that I am slightly behind schedule in terms of actual implementation but on track in terms of understanding the concepts and setting up the groundwork. The research and initial setup phase took longer than expected but now that I have a better understanding of the algorithms and their implementation, I should be able to move forward with actually implementing code. 

To catch up, I plan to: 

  1. Run and analyze sample gesture recognition models in Python 
  2. Begin experimenting with CNN models for static gesture classification 

Next Week’s Deliverables: 

By the end of the week, I aim to have: 

  • A working mediapipe hand tracking prototype capturing and displaying hand keypoints 
  • A basic CNN model for static gesture classification