Sumayya Syeda’s Report for 4/27

Progress Update:

This week I attempted a new method for how I will be detecting gesture recognition regions. Previously, I was using SIFT to look for the template button image in the projection image captured by the camera. But poor lighting and projection warp proved this method to be difficult.

I realized that instead of using the image of the projection to detect the button, I can instead use the already calculated homography to map coordinates in the UI to the coordinates in the projection. In this way, I can remove the need for the camera to detect the button and instead use the camera only to compute the homography. I decided to go with this method for the final demo as it works better at recognizing the button region.

While testing gestures with the usb camera, I realized that I need to zoom in to the frame / button region as the model can not recognize gestures if the hand does not take up more of the frame. This is why it became even more important to be able to detect the button region so I can properly crop the area.

Schedule: On Track!

Next Steps:

  • Test all features with both recipes
  • Continue fine tuning gesture recognition
  • Practice for demo day
  • Work on Report & Poster

Sumayya Syeda’s Report for 4/20

Progress Update:

This week I went back to work on object tracking. I tried multiple methods including SURF, ORB and SIFT, but found that SIFT worked the best. The biggest difference between my SIFT algorithm from a few weeks ago and this recent change is the use of a perspective transform to get a better boundary around the target object.

I am also working on region detection with the projector. Since we are using a light projector,  the camera will only see the color that is absorbed by the region in the projection at a given time. The projector scans each single channel color one by one and the corresponding color in the projection screen will show in the camera frame. For example, if the image I am looking for in the projection is red and the projector is scanning for red, the camera will see the red image. But if the projector is scanning for green or cyan, the camera will not see the image.

Since the frame rate is consistent and I know when the projector will be scanning for each color, I am able to factor these properties into determining the correct time to capture the frame and see the target image.

Projection as seen by the camera

Image displayed on the table

Schedule: on track!

Next Week:

  • continue to test region detection
  • continue to improve gesture tracking/recognition latency
  • continue to test computer vision with full system

 

As you’ve designed, implemented and debugged your project, what new tools or new knowledge did you find it necessary to learn to be able to accomplish these tasks? What learning strategies did you use to acquire this new knowledge?

 

With this project, I was able to explore new methods in computer vision to detect and track both objects and gestures. It was important to be able to throughly read the documentation from the source libraries, but also be able to follow advice and guidance from public forums. I learned to use various online resources such as YouTube videos, Chat GPT and articles to debug my issues and learn of new techniques.

Sumayya’s Status Report for 3/30

Progress Update:

From Last Week:

  • Implement button actions using gesture recognition – done
  • Implement Swipe Gesture only in a region of interest – done
    • currently this has only been implemented for functionality and not integrated into TC

I completed the integration for gestures and hand tracking. Now my feature can take in a button location on the frame and then check if that button is being pressed. It can also check if there is a swipe motion in a specific region to indicate a next and prev action. I added a publisher/subscriber implementation that bridges communication between the main controller module and the CV module. Essentially, once the CV module detects a gesture, it sends a message with the corresponding command the  to the CV topic.

Video of Button Press:

https://drive.google.com/file/d/1UqzU4HLCSh_pWWE0EWWv2mxO4PwzIQHu/view?usp=share_link

Schedule: On Track!

Next Week Plans:

  • Set up camera with system
  • Test buttons and hand tracking with livestream video
  • Test reID with livestream video
  • integrate object tracking with camera / livestream
    • Re-define how much to track and what user will see
  • Start processing the recipe

 

 

Sumayya’s Status Report – 3/16

Progress Update:

I implemented the SIFT object tracker this weekend using a guide from Siromer. The tracking was accurate but quite slow as shown in the video below. I applied the algorithm on every other frame but this did not improve the speed. I need to do further research on how to improve performance.

Given the time constraints, I will be pausing my work on object tracking as I have two reliable trackers so far (CSRT and SIFT). I will be focusing on object re-identification logic that takes images of items in the Ingredients Grid and correlates label to ingredient.

https://drive.google.com/file/d/1vGplh-lNeOkQUPdKc1fmZiyRzDx353vl/view?usp=share_link

Schedule:

On Track!

Next Week Plans:

  • Object Re-identification / Labeling
    • Be able to identify Ingredients Grid
    • Be able to read from Labels JSON file to identify label of each cell
    • Take images of occupied cells and assign label to ingredient
    • Make this process modular so that new ingredients can be identified later in the recipe
  • Create JSON file template
  • Do gesture recognition on a region of interest (to mimic projected buttons)
  • Once above steps are done, create a unit function that can track specified object given its label.

Sumayya’s Status Report – 3/9

Progress Update:

I completed the Swipe Gesture Recognition as planned last report. I would need to test in a kitchen environment with our UI to further improve the gesture. This will be done once our UI is programmed.

I researched algorithms for object tracking along with some suggestions from Prof. Marios. I will be using the SIFT algorithm to track objects using their features. Since the object will be in a known location in the “first frame”, we will have an easy reference image to start tracking. In my research, I found a video by Shree K. Nayar explaining how feature tracking works. I plan to use this as a basis for my implementation.

Before attempting Nayar’s implementation, I attempted some tracker libraries that already exist in opencv. Specifically, I tried the KCF (Kernalized Correlation Filter) to track a couple objects. I followed the example by Khwab Kalra. I found that the KCF algorithm works great for easily identifiable and slow moving objects such as a car on a road. But it struggled to track a mouse moving across a screen. I’m not sure why this is the case yet and have much more testing to do with various example videos. OpenCV has 8 different trackers all with different specializations. I will test each of these trackers next week to see which works best. If they are not robust enough, I plan to use Nayar’s implemention with SIFT.

Link to Car tracking using KCF: https://drive.google.com/file/d/1zUjeYSGWuIXzmaCbMIEO1zvzxUYZY6Lv/view?usp=sharing

Link to Mouse tracking using KCF:

https://drive.google.com/file/d/1zUjeYSGWuIXzmaCbMIEO1zvzxUYZY6Lv/view?usp=sharing

As for the AGX, I have received confirmation from Prof. Marios’s grad students that it has been flashed and ready to be used.

Schedule Status:

On track!

Next Week Plans:

  • Implement and improve the 8 OpenCV trackers by next weekend. Select the tracker that works best for this project.
  • Implement SIFT for object tracking if OpenCV trackers are not good
  • If above step is complete, run algorithm on a real time video stream.

Sumayya’s Status Report for 2/24

Progress Update:

This week I made progress on gesture recognition using MediaPipe. I had already tested MediaPipe using the web browser demo, but this past week I worked on writing a python script to make the Gesture Recognition model work with my laptop camera. The script was able to recognize all the gestures that MediaPipe was originally trained for.

 

https://drive.google.com/file/d/1Xvm71s50BpO0O9d-hPm9-XWkQNBlrgQR/view?usp=share_link

Above is the link to a video demonstrating MediaPipe on my laptop. Angle of gesture is very important (notice how thumbs down was hard to recognize due to poor wrist position/angle).

The following are the gestures we decided we will need through out the program:

  • Open Palm (right hand)
  • Open Palm (left hand)
  • Swipe (left to right)
  • Swipe (right to left)

The first two gestures are already trained for in the model. For the Swipe gestures, I learned how to access the 21 hand landmarks and their properties such as the x, y, and z coordinates. This had originally proved to be difficult because the documentation was not easily accessible. Since a swipe is a translation on the x axis, I plan to simply calculate the difference in the x-coordinate over a set of frames to determine a Swipe.

 

On the left you can see the x, y, z coordinates of each of the 21 landmarks for each frame in the video.

https://drive.google.com/file/d/15Q_YZcS0Vv8EEd6kOf7mQT8irsR37j3Y/view?usp=share_link

Above video shows what the Swipe Gesture looks like from right to left.

Schedule Status: 

I am on track with my gesture recognition and tracking schedule. But I am behind with flashing the AGX as I still have not been able to get an external PC. There have been slow communications with Cylab. I plan to talk to the professor next week and find a quick solution. But I am not too concerned at the moment, as much of my testing with MediaPipe can be done on my laptop.

Next Week Plans:

  • Complete Swipe Gesture Recognition
  • Research algorithms for object tracking
  • Start implementing at least one algorithm for object tracking
  • Get a PC for flashing AGX and flash the AGX

 

 

Sumayya’s Status Report for 2/17

Progress Update:

This week I spent many hours attempting to flash the Xavier AGX. After trying multiple installation methods, I learned that it is extremely difficult to flash Nvidia products on an M1 chip computer as it has an ARM64 architecture rather than the required AMD64 architecture. I attempted to flash on both my teammates computers but this was also proving difficult. I opted to reach out to Professor Marios for help and was fortunately able to acquire a spare PC.

Intel Chip Macbook unable to recognize the AGX

Additionally, I also tried to use OpenPose and MediaPipe. Installing OpenPose had similar issues on my computer but MediaPipe was very easy to use on the web.  I was able to test some gestures on MediaPipe using the online demos and found it to be fairly robust. I plan to test the same gestures on OpenPose once I have it installed on the new PC so I can compare its performance against MediaPose.

MediaPipe Recognizes “Thumbs-Up” gesture

I am currently working on the python script to run the gesture recognition algorithm to use with my computer camera.

Schedule Status: On track!

Next Week Plans:

  • Have a running python script with camera streaming from laptop
  • Have the same python script running on the AGX with the Arducam
  • Flash Jetson on new PC

Sumayya’s Status Report for 2/10

I researched the multiple libraries available for gesture tracking this week. In particular, I weighed the pros and cons of OpenPose vs MediaPipe. Here is a table discussing the differences:

 

At the moment, we have decided to use OpenPose since we have the necessary processing power. Regardless, I plan to complete preliminary testing using both OpenPose and MediaPipe to judge how well each library recognizes gestures.

I was able to acquire the Xavier AGX and Arducam Camera module from the inventory and plan to start working with them this week.

I also spent a couple hours working with my team on creating material for the Proposal Presentation.

For next week I will:

  • Use Arducam camera module with AGX
    • Install necessary drivers
    • Be able to get a live feed
  • Test OpenPose and MediaPipe for accuracy
    • Start with basic gestures in front of camera
    • Transition to tests with hand on flat surface, camera facing down

Progress is on schedule.