Caroline’s Status Report for 2/24/24

I accomplished my task of making the voice commands faster and more accurate. I did this by modifying my previous code and searching for tips online on how to make it faster. I had experimented with different Speech-to-Text models, like Vosk and Google Assistant, to see what would happen, but I found that the Whisper + Picovoice Wake Word combination worked the best. I also worked more on the Flutter UI, based on the design that I created from last week. I am on schedule.

I hope to try modifying the voice commands so that actions are actually taken when a command is spoken (ie. actually pausing a video). I will also finish the Flutter UI so that we can test it on different table surfaces and see what it looks like.

Sumayya’s Status Report for 2/24

Progress Update:

This week I made progress on gesture recognition using MediaPipe. I had already tested MediaPipe using the web browser demo, but this past week I worked on writing a python script to make the Gesture Recognition model work with my laptop camera. The script was able to recognize all the gestures that MediaPipe was originally trained for.

 

https://drive.google.com/file/d/1Xvm71s50BpO0O9d-hPm9-XWkQNBlrgQR/view?usp=share_link

Above is the link to a video demonstrating MediaPipe on my laptop. Angle of gesture is very important (notice how thumbs down was hard to recognize due to poor wrist position/angle).

The following are the gestures we decided we will need through out the program:

  • Open Palm (right hand)
  • Open Palm (left hand)
  • Swipe (left to right)
  • Swipe (right to left)

The first two gestures are already trained for in the model. For the Swipe gestures, I learned how to access the 21 hand landmarks and their properties such as the x, y, and z coordinates. This had originally proved to be difficult because the documentation was not easily accessible. Since a swipe is a translation on the x axis, I plan to simply calculate the difference in the x-coordinate over a set of frames to determine a Swipe.

 

On the left you can see the x, y, z coordinates of each of the 21 landmarks for each frame in the video.

https://drive.google.com/file/d/15Q_YZcS0Vv8EEd6kOf7mQT8irsR37j3Y/view?usp=share_link

Above video shows what the Swipe Gesture looks like from right to left.

Schedule Status: 

I am on track with my gesture recognition and tracking schedule. But I am behind with flashing the AGX as I still have not been able to get an external PC. There have been slow communications with Cylab. I plan to talk to the professor next week and find a quick solution. But I am not too concerned at the moment, as much of my testing with MediaPipe can be done on my laptop.

Next Week Plans:

  • Complete Swipe Gesture Recognition
  • Research algorithms for object tracking
  • Start implementing at least one algorithm for object tracking
  • Get a PC for flashing AGX and flash the AGX

 

 

Team Status Report for 2/17

Status Report

The most significant risk for this week is with the change in mounting for the projector. Rather than having the projector be oriented to face down from above the cooking surface, it will be angled down from a position off to the side. We have a contingency plan in place to fall back on an overhead rig should the warping of the images fail. Both can be worked on concurrently and with minimal extra effort so there is no change to the timeline. This change was decided upon to avoid the need of flipping a heavy projector by 90 deg. This will additionally allow for easier setup and reduce the necessity for expensive mounting hardware. Another design change is that we will be using MediaPipe for gesture recognition rather than OpenPose in order to reduce the amount of computation required.

Product Solution Meeting Needs

Caroline Crooks: TableCast will encourage people to cook. People may be deterred from cooking for several reasons, such as an inability to read small text on a phone to read a recipe or just the mental hurdle of starting to learn how to cook. The projected content makes cooking more accessible because projected instructional content and tools will be across on the countertop in an intuitive and easy to read way. Our user interface is designed for a smooth and accessible cooking experience. Encouraged to cook using our product, people can live healthier lifestyles. Safety is an integral part of our design. Many of our components will be placed high above the table because we are projecting our content. We are designing a stable, secure system to ensure that components will fall or have the potential to be knocked down. We plan to carefully test our mounting mechanisms and install our components carefully. We also recognize the cooking itself can be a hazardous activity. Our instructions and video content will be clear and caution users to be vigilant during several steps of the recipe. 

Sumayya Syeda: TableCast will allow users to easily access and create recipes across cultures. Many struggle to find the correct ingredients and follow the unique steps required to create dishes from different parts of the world. With a product like TableCast, it is much easier to follow intricate recipes with the help of images and guiding widgets projected onto the kitchen counter in addition to voice commands. As a result, there is strong potential for better cultural appreciation. Furthermore, TableCast can increase one’s confidence to cook in the kitchen, especially when one requires an organized process to cook. Users will no longer have to switch between their device and the dish while having to constantly be conscious of multiple tasks occuring at once. TableCast is a clean and streamlined solution to making cooking more accessible across the world. 

Tahaseen Shaik: TableCast is designed to be a lower cost alternative to the current market solution of table displays. Rather than replacing an expensive kitchen countertop, our solution allows users to use their existing resources. Assembling TableCast is fairly straightforward as well. All it takes is to set up the tripod, make the appropriate connections and begin use. Individual component-wise, we are using cheap components to assemble and display the user interface. We also leverage the user’s laptop in order to simplify our hardware. For distribution, we will be able to condense all the components down into a relatively lightweight package, which would greatly reduce further economic costs. Consumption-wise, TableCast is an innovative product that is not readily available on the market and would fulfill an open market need. Users have historically turned to new technologies to supplement their learning process in the kitchen. Overall, we have taken great care to ensure our product is not unnecessarily expensive and left room for upgrades.

Tahaseen’s Status Report for 02/17

This week, I collected a projector and a tripod from Prof. Sankaranarayan.  We discussed different mounting solutions – specifically to avoid negative interactions with the cooking process (steam, smoke, food splatter, etc.). After some discussion, I decided to switch to having a projector angled from a side view that would project onto the table. We would calculate the appropriate planar homographies in order to warp our image. I have begun the initial calculations and now need to test them. Additionally, the brightness of the projector is satisfactory for its small size, but will require the lights in the testing space to be slightly dimmer than an average kitchen. Finally, I helped Sumayya troubleshoot flashing the AGX from my laptop. I was able to install the required SDK but had some trouble connecting the peripheral. However, we resolved this via using a PC.

I am on currently on track, but by next week I want to run a test with my computed homographies. I anticipate this to be a larger task because it will require a lot of fine testing. I also want to do an initial mockup of the web app.

Sumayya’s Status Report for 2/17

Progress Update:

This week I spent many hours attempting to flash the Xavier AGX. After trying multiple installation methods, I learned that it is extremely difficult to flash Nvidia products on an M1 chip computer as it has an ARM64 architecture rather than the required AMD64 architecture. I attempted to flash on both my teammates computers but this was also proving difficult. I opted to reach out to Professor Marios for help and was fortunately able to acquire a spare PC.

Intel Chip Macbook unable to recognize the AGX

Additionally, I also tried to use OpenPose and MediaPipe. Installing OpenPose had similar issues on my computer but MediaPipe was very easy to use on the web.  I was able to test some gestures on MediaPipe using the online demos and found it to be fairly robust. I plan to test the same gestures on OpenPose once I have it installed on the new PC so I can compare its performance against MediaPose.

MediaPipe Recognizes “Thumbs-Up” gesture

I am currently working on the python script to run the gesture recognition algorithm to use with my computer camera.

Schedule Status: On track!

Next Week Plans:

  • Have a running python script with camera streaming from laptop
  • Have the same python script running on the AGX with the Arducam
  • Flash Jetson on new PC

Caroline’s Status Report for 2/17/24

Table UI – I designed layout of user interface on Adobe XD, developed a few different layouts and designs to choose from. Below is an example of one design. I also downloaded flutter and reacquainted myself with the software through tutorials.

Voice Commands – I received the microphone and tested it with the speech recognition program on my laptop. I tested it to make sure that it works, and I was able to transcribe the speech.
I was able to develop voice commands where the user says [Wake up word] [command], but it doesn’t always work. I did get it to work a few times.

On track with scheduling
Next week
Voice commands – work on the script more to make it more accurate
UI – decide on the layout and color scheme, work on integrating it with flutter

Tahaseen’s Status Report for 02/10

This week, I researched specs needed for the projector portion of the project and began outlining the specifications for the mounting of the product. This was the most challenging because I had to determine what was an appropriate fit for our project. I reached out to Prof. Aswin Sankaranarayan for advice on this and camera placement. I also worked with my team to outline the individual technical requirements and testing & requirements. Additionally, I reviewed the HTTPServer documentation and found it pretty easy to read. Setting up one was pretty intuitive and should be easy for the final product. I am on schedule.

For this upcoming week, I want to have a design for the hardware mount. Additionally, I need to acquire a projector with an appropriate lumens and do some distance testing to verify brightness.

Team Status Report for 2/10

A significant risk we are currently considering is the mount for our hardware components. We need a large projector with strong light projection (similar to a projector used in classrooms) for our application similar to . The projector combined with the Xavier AGX will heavy components in our product that need to be secured to a ceiling. As a result, we will be putting extra efforts into designing and building a bracket and mount to secure the projector, AGX and camera module. We plan to work with faculty and peers to get advice on design specifications and will be using CAD software to design the bracket and mount. If the overall device cannot be mounted to the ceiling, we will create another structure that can hold the weight of the device. This may include a mount to a different wall.

There have been no changes to the existing design, and no update to schedule.

Sumayya’s Status Report for 2/10

I researched the multiple libraries available for gesture tracking this week. In particular, I weighed the pros and cons of OpenPose vs MediaPipe. Here is a table discussing the differences:

 

At the moment, we have decided to use OpenPose since we have the necessary processing power. Regardless, I plan to complete preliminary testing using both OpenPose and MediaPipe to judge how well each library recognizes gestures.

I was able to acquire the Xavier AGX and Arducam Camera module from the inventory and plan to start working with them this week.

I also spent a couple hours working with my team on creating material for the Proposal Presentation.

For next week I will:

  • Use Arducam camera module with AGX
    • Install necessary drivers
    • Be able to get a live feed
  • Test OpenPose and MediaPipe for accuracy
    • Start with basic gestures in front of camera
    • Transition to tests with hand on flat surface, camera facing down

Progress is on schedule.

Caroline’s Status Report for 2/10/24

I researched speech recognition software. Then, I experimented with Porcupine Wake Word recognition and SpeechRecognition Python library with Whisper. Both were easy to use and have good documentation. I made a program recognize a wake word with Porcupine’s library and printed out live speech with the SpeechRecognition library. The Wake Word recognition works quickly, but I had issues making the Speech Recognition translate speech as fast as I wanted. Additionally, I reached out to Prof. Sullivan this week for advice on microphones and decided on starting out with a wireless clip on microphone to pick up voice commands. I also spent a couple hours practicing the pitch presentation because it was my turn to present. Everything is on schedule.

Next week, I will make a faster speech recognition program and integrate that with wake up words to get basic voice commands working. I will also work on the interface by wireframing table and web UI and develop a list of all user interactions. Then, I will order new microphone.