Team Status Report for 4/27

Projection warp may not work for demo environment. We need to make sure to complete calibration before the actual demo time.

Additionally, we found that the projection calibration slightly changed once we deployed the calibration code on flutter. We did not account for the fact that Flutter does transformations differently than python/numpy which is why much of this week was spent on adjusting the homography to work on Flutter. Regardless, we were able to make the necessary changes and complete integration.

No changes made to system.

No schedule changes.

Unit Tests:

  • Button Press / Swipe Test
    • tested how well and how long it is taking for system to recognize and execute button gestures.
    • Found that the timing is reasonable and the accuracy is about 90% ( out of 10 trials, accurately recognized the gesture 9 times)
    • Realized that video frame needs to be cropped/zoomed into button region to detect gestures. Otherwise the hand is too small for gestures to be recognized
  • Voice command Tests
    • tested how well and how long it is taking for system to recognize and execute voice commands
    • found that volume and pitch of one’s voice impacts recognition. The full system had difficultly recognizing commands from Caroline’s voice but was easily able to recognize Tahaseen’s voice commands when she spoke loudly and with a low pitch.
  • Recipe execution tests
    • we tested how the cooking experience is impacting the overall system. For example, when boiling water, we wanted to see if the steam was blocking the camera view.
    • Found that the steam doesn’t greatly impact the view as the burner is located to the side of the camera

Sumayya Syeda’s Report for 4/27

Progress Update:

This week I attempted a new method for how I will be detecting gesture recognition regions. Previously, I was using SIFT to look for the template button image in the projection image captured by the camera. But poor lighting and projection warp proved this method to be difficult.

I realized that instead of using the image of the projection to detect the button, I can instead use the already calculated homography to map coordinates in the UI to the coordinates in the projection. In this way, I can remove the need for the camera to detect the button and instead use the camera only to compute the homography. I decided to go with this method for the final demo as it works better at recognizing the button region.

While testing gestures with the usb camera, I realized that I need to zoom in to the frame / button region as the model can not recognize gestures if the hand does not take up more of the frame. This is why it became even more important to be able to detect the button region so I can properly crop the area.

Schedule: On Track!

Next Steps:

  • Test all features with both recipes
  • Continue fine tuning gesture recognition
  • Practice for demo day
  • Work on Report & Poster

Sumayya Syeda’s Report for 4/20

Progress Update:

This week I went back to work on object tracking. I tried multiple methods including SURF, ORB and SIFT, but found that SIFT worked the best. The biggest difference between my SIFT algorithm from a few weeks ago and this recent change is the use of a perspective transform to get a better boundary around the target object.

I am also working on region detection with the projector. Since we are using a light projector,  the camera will only see the color that is absorbed by the region in the projection at a given time. The projector scans each single channel color one by one and the corresponding color in the projection screen will show in the camera frame. For example, if the image I am looking for in the projection is red and the projector is scanning for red, the camera will see the red image. But if the projector is scanning for green or cyan, the camera will not see the image.

Since the frame rate is consistent and I know when the projector will be scanning for each color, I am able to factor these properties into determining the correct time to capture the frame and see the target image.

Projection as seen by the camera

Image displayed on the table

Schedule: on track!

Next Week:

  • continue to test region detection
  • continue to improve gesture tracking/recognition latency
  • continue to test computer vision with full system

 

As you’ve designed, implemented and debugged your project, what new tools or new knowledge did you find it necessary to learn to be able to accomplish these tasks? What learning strategies did you use to acquire this new knowledge?

 

With this project, I was able to explore new methods in computer vision to detect and track both objects and gestures. It was important to be able to throughly read the documentation from the source libraries, but also be able to follow advice and guidance from public forums. I learned to use various online resources such as YouTube videos, Chat GPT and articles to debug my issues and learn of new techniques.

Sumayya’s Status Report for 4/06

Progress Update:

From Last Week:

  • Set up camera with system – done
  • Test buttons and hand tracking with livestream video – done
  • Test reID with livestream video – done
  • integrate object tracking with camera / livestream – not done
    • Re-define how much to track and what user will see
  • Start processing the recipe – not done

I was able to demo gesture tracking mapped to button or gesture actions during the demo interim. This included setting up the external camera and testing gesture recognition/tracking with the project screen. There was a large learning curve when I realized how to use Meta’s API functions for a livestream but I was able to run my code with a live video feed.

Something I noticed was that there was a lot of latency in recognizing the gestures. I need to see if this was because of distance,  image quality or too much processing happening at once.

I had also implemented part of the calibration script that will look at the projected image and determine each button’s region and each swipe region. This was tested with a video input and worked very well. It’s harder with a projection due to lighting and distance.

Schedule:

Slightly behind: Need to make more progress on object tracking since reID is complete.

Next Week Plans: 

  • improve accuracy and latebcy of detecting a hand gesture
  • add object tracking with live video
  • set up arducam camera with AGX (Were using Etron camera but it has too much of fish eye effect and the fps is not compatible with our projector)
  • Help with recipe processing

Verification Plans:

Gesture Accuracy

  • Description: for each gesture, attempt to execute it on the specified region and note if system recognizes correctly
  • Gestures:
    • Start Button
    • Stop Button
    • Replay Button
    • Prev Swipe
    • Next Swipe
  • Goal: 90% accuracy

 

Gesture Recognition Latency

  • Description: for each gesture, attempt to execute it on the specified region and measure how how long the system takes to recognize the gesture
  • Goal: 3 seconds

Gesture Execution Latency

  • Description: for each gesture, attempt to execute it on the specified region and measure how how long the system takes to execute the gesture once its been recognized
  • Goal: 1 second

Single Object Re-ID Detection Accuracy

  • Description: how accurately is a single object detected in a frame. An image of the object will first be taken. The system must be able to detect this object again using the reference image.
  • Goal: 90% accuracy

 

Single Object Tracking Accuracy 

  • Description: single object can be smoothly tracked across the screen
  • Goal: given a set of continuous frames, object should be able to be tracked for 80-90% of the frames.

 

Multi Object Tracking Accuracy 

  • Description: multiple objects can be smoothly tracked across the screen
  • Goal: given a set of continuous frames, all intended objects should be able to be tracked for 80-90% of the frames.

 

Team Status Report for 3/30

Our biggest risk this week is the hardware set up and projector homography/calibration. Upon testing with our new projector mount, we found that we have to set up the projector to the side of the user rather than across from the user. This means the projection homography logic needs to be recalculated for the new rotation. Additionally, we realized that there are many more factors that are impacting the ratio and size of the projection internal to the project such as the keystone value. Both the projector set up and the homography calculations need to be tested more for better results.

No changes to schedule. We are on track.

Sumayya’s Status Report for 3/30

Progress Update:

From Last Week:

  • Implement button actions using gesture recognition – done
  • Implement Swipe Gesture only in a region of interest – done
    • currently this has only been implemented for functionality and not integrated into TC

I completed the integration for gestures and hand tracking. Now my feature can take in a button location on the frame and then check if that button is being pressed. It can also check if there is a swipe motion in a specific region to indicate a next and prev action. I added a publisher/subscriber implementation that bridges communication between the main controller module and the CV module. Essentially, once the CV module detects a gesture, it sends a message with the corresponding command the  to the CV topic.

Video of Button Press:

https://drive.google.com/file/d/1UqzU4HLCSh_pWWE0EWWv2mxO4PwzIQHu/view?usp=share_link

Schedule: On Track!

Next Week Plans:

  • Set up camera with system
  • Test buttons and hand tracking with livestream video
  • Test reID with livestream video
  • integrate object tracking with camera / livestream
    • Re-define how much to track and what user will see
  • Start processing the recipe

 

 

Sumayya’s Report for 3/23

Progress Update:

I completed the Object Re-Identification logic and finish implementing it. This feature can detect the ingredients grid in the UI then see which cells are occupied. Once it identifies the occupied cell locations, it parses the json file that contains the list of ingredients that should be present in the grid. Each item has a property called “cell_pos” that has the expected row and column of the ingredient along with properties such as “name”, “filename”,  and “filepath”. This allows the program to retrieve the ingredient name and label the image of the ingredient captured from the grid.

As a result of this feature, I created modular functions that can easily add and remove ingredients from the json file and scan singular cells for an ingredients vs scanning the entire grid at once.

I used the following template of the UI (made by Caroline) to do my testing. I added images of ingredients to reID. This is what the camera will see projected on to the table:

Here is a video demonstrating the reID process:

https://drive.google.com/file/d/1fGxNk6h5AN5JJqAoFRP-cgjNKqWtU1hy/view?usp=sharing

Schedule: On track!

Next Week Plans:

  • Implement button actions using gesture recognition
  • Implement Swipe Gesture only in a region of interest
    • currently this has only been implemented for functionality and not integrated into TC
  • Consider adding a cursor feature that follows user’s finger tip (beyond MVP)

Sumayya’s Status Report – 3/16

Progress Update:

I implemented the SIFT object tracker this weekend using a guide from Siromer. The tracking was accurate but quite slow as shown in the video below. I applied the algorithm on every other frame but this did not improve the speed. I need to do further research on how to improve performance.

Given the time constraints, I will be pausing my work on object tracking as I have two reliable trackers so far (CSRT and SIFT). I will be focusing on object re-identification logic that takes images of items in the Ingredients Grid and correlates label to ingredient.

https://drive.google.com/file/d/1vGplh-lNeOkQUPdKc1fmZiyRzDx353vl/view?usp=share_link

Schedule:

On Track!

Next Week Plans:

  • Object Re-identification / Labeling
    • Be able to identify Ingredients Grid
    • Be able to read from Labels JSON file to identify label of each cell
    • Take images of occupied cells and assign label to ingredient
    • Make this process modular so that new ingredients can be identified later in the recipe
  • Create JSON file template
  • Do gesture recognition on a region of interest (to mimic projected buttons)
  • Once above steps are done, create a unit function that can track specified object given its label.

Team Status Report for 3/9

The most significant risks that could jeopardize the success of the project include the processing speed of our object tracking algorithm and the calibration process of the projector. This has been a risk throughout the semester and a risk we plan to manage with lots of testing. The AGX should be sufficient in managing the computing power necessary for all CV tasks. We are working with multiple professors to make sure the warp logic/math is correct. The projector we are using was recently delivered so we plan to test the logic soon.

No changes made to existing design of the system.

No changes to schedule.

A was written by Caroline, B was written by Sumayya and C was written by Tahaseen

Global Factors:

TableCast is a product that anyone can use, given that they have a table, outlets, and cooking equipment. Ease-of-use is an important factor that we considered during our design process. Even if someone does not have a background in technology, we designed a system that anyone can set up with our step by step instructions. For example, a server will launch automatically after device boot, so that a user does not have to log into the AGX and set up that manually. Instead, all they have to do is type a link into their browser, which most people regardless of tech background can do. Additionally, we will have an intuitive, visually guided calibration step that anyone can follow along with. TableCast is a product anyone from any background can use to improve their skills in the kitchen.

Cultural Factors:

The goal of TableCast in a kitchen environment is to make cooking easier. It encourages independence but results in a better sense of community. Our design allows users to easily follow recipes even though they have never made the dish before. We strongly believe in empowering individuals to make their own meals, especially if they are afraid of making mistakes in the kitchen. With TableCast users can feel more confident in their abilities and improve their quality of life. Consequently, users are likely to want to share what they made with their loved ones and can be active in community gatherings such as potlucks and picnics.

Environmental Factors:

Our product, TableCast, does not directly cause harm or benefit the natural environment. However, there are several long term benefits to using our product rather than traditional paper cookbooks and/or expensive electronics in the kitchen. Reducing the reliance on paper cookbooks promotes great environmental benefits for reforestation causes. Given the chaotic and messy nature of a kitchen, electronics can easily become damaged requiring them to be replaced. The resource and production pipeline of consumer electronics like cell phones and tablets is notorious for being noxious and wasteful. By removing these devices from a risky environment like the kitchen, the longevity of the devices can be promoted, reducing the necessity of regular replacements.



Sumayya’s Status Report – 3/9

Progress Update:

I completed the Swipe Gesture Recognition as planned last report. I would need to test in a kitchen environment with our UI to further improve the gesture. This will be done once our UI is programmed.

I researched algorithms for object tracking along with some suggestions from Prof. Marios. I will be using the SIFT algorithm to track objects using their features. Since the object will be in a known location in the “first frame”, we will have an easy reference image to start tracking. In my research, I found a video by Shree K. Nayar explaining how feature tracking works. I plan to use this as a basis for my implementation.

Before attempting Nayar’s implementation, I attempted some tracker libraries that already exist in opencv. Specifically, I tried the KCF (Kernalized Correlation Filter) to track a couple objects. I followed the example by Khwab Kalra. I found that the KCF algorithm works great for easily identifiable and slow moving objects such as a car on a road. But it struggled to track a mouse moving across a screen. I’m not sure why this is the case yet and have much more testing to do with various example videos. OpenCV has 8 different trackers all with different specializations. I will test each of these trackers next week to see which works best. If they are not robust enough, I plan to use Nayar’s implemention with SIFT.

Link to Car tracking using KCF: https://drive.google.com/file/d/1zUjeYSGWuIXzmaCbMIEO1zvzxUYZY6Lv/view?usp=sharing

Link to Mouse tracking using KCF:

https://drive.google.com/file/d/1zUjeYSGWuIXzmaCbMIEO1zvzxUYZY6Lv/view?usp=sharing

As for the AGX, I have received confirmation from Prof. Marios’s grad students that it has been flashed and ready to be used.

Schedule Status:

On track!

Next Week Plans:

  • Implement and improve the 8 OpenCV trackers by next weekend. Select the tracker that works best for this project.
  • Implement SIFT for object tracking if OpenCV trackers are not good
  • If above step is complete, run algorithm on a real time video stream.