Kavish Status Reports – Team E6: TransLingualVisionary

April 28, 2024

Kavish’s Status Report for 4/28/24

This week – apart from the final presentations – I spent optimizing our pipeline so we can decrease latency (now we achieve about 25 frames per second on average), which is an important metric for our use case. Additionally, I am working with Neeraj to improve our model performance on a larger set of classes (we originally had 100 but now are pushing towards 2000). We realized a couple mistakes when training our previous model which affected the real-time performance (like training on frames which mediapipe did not recognize properly and having a static batch-size). Thus, we are changing the pre-processing of the data and retraining the model. I am on schedule for my deliverables and we hope to have a fully finished product by early next week. After I integrate the pipeline with Sandra’s LLM API code, I will work to implement the project on the Jetson.

April 21, 2024April 21, 2024

Kavish’s Status Report for 4/20/24

Over the last two weeks, I worked a lot on the FPGA development, which has led significant changes on our overall project. The first thing I realized after trying a lot was that the our pose estimation model (mediapipe) is not easily quantizeable and deployable to my FPGA’s DPU. I was able to deploy another pose estimation model which had all the supported operations and functions; however, since our classification model (SPOTER) requires mediapipe landmarks from both hands and pose, this was not a feasible option to pursue anymore. Thus, I began optimizing and finalizing the inference of the mediapipe model as much as I could and then began integrating it with Neeraj’s classification model. I designed the system structure and am currently trying to finish all of the integration. I hope to have the final pipeline working by tonight, so then we can do all of the testing and measurements tomorrow. We are a little behind schedule but hopefully by putting enough time over the next couple of days we can finish the final demo.

What I have learned: I have learned a lot of new tools with regards to FPGA development. In particular, how to use Vivado and Vitis to synthesize designs and deploy them on DPU’s. I also learned a lot of basics behind machine learning models like CNNS, alongside libraries like PyTorch. In order to do so, I relied heavily on online resources like youtube or research papers. Many a times I also followed along different articles written by individuals who have tried to develop small project on Kria boards.

April 7, 2024

Kavish’s Status Report for 4/6/24

This week was our interim demo and I finish two subsystems for the demo: HPE on the FPGA and live viewer application for displaying the results. I have already discussed the live viewer application over the last couple of status reports. For the demo, I implemented a holistic mediapipe model for human pose estimation (full upper body tracking) on the programmable software (meaning the Kria SOMs). This model takes in a live video stream from a usb camera and outputs the detected vector points in json format through UDP over the ethernet. This implementation is obviously unoptimized and not accelerated on the fabric yet. Over the next two weeks I plan on accelerating this model on the fabric using the vitis and vivado tools and integrate it with the jetson. I am not behind schedule but depending on the difficulties I might have to put in more hours behind the project. By next week, I hope to have significant progress on putting the model on the dpu with maybe a few kinks to solve.

Verifying Subsystem (HPE on FPGA): Once I finish the implementation, there are two main ways I will be verifying my subsystem. The first is measuring the latency, this should be pretty simple benchmarking where I will run prerecording video streams and live video streams and measuring end to end delay till the model output. This will help me understand where to improve the speed of this subsystem. For the second part, I will be testing the accuracy of the model using PCK (percentage of correct keypoints). This means I will run a pre-recording and labeled video stream through the subsystem and ensure that the predicted and true keypoints are within certain threshold distances (50% of the head bone link).

March 31, 2024

Kavish’s Status Report for 3/30/24

I have been working this week on finishing subsystems for the interim demo. I have mostly finalized the viewer and it should be good for the demo. Once Sandra gets the LLM part working, I will work with her to integrate the viewer for a better demo. I have finished booting the kria board and am currently working on integrating the camera to it. For the demo, I aim to show a processed image stream through the FPGA since we are still working on finalizing the HPE model. I have been having some trouble working through the kinks of the board and so I am a little behind schedule. I will have to alter the timeline if I am unable to integrate the camera stream by interim demo.

March 24, 2024

Kavish’s Status Report for 3/23/24

I finished up the code for the viewer and synchronizer that I described last week (corrected few errors and made it product ready). The only part left for the viewer is determine the exact json and message format which I can change and test when we begin integration. I hoped to get back on the FPGA development and make more progress on that end like I mentioned last week, but I had to spend time on coming up with a solution for the end of sentence issue. The final solution that we decided on was to run a double query to the LLM while maintaining a log of previous words given by the classification model. The first query will aim to create as many sentences as possible based on the inputted log, while the second query will determine if the current log is a complete sentence. The output of the first query will be sent as a json message to my viewer which will continuously present the words. The log will be cleared based on the response of the second query. Although I am behind on the hardware tasks, Sandra has now taken over the task of testing and determining the correct prompts for the LLM to make the previous idea work, so I will now have time next week to get the FPGA working for the interim demo. My goal is to first have a basic HPE model working for the demo (even if it does not meet the specifications) and then spend the rest of the time finalizing it. I have updated the gantt chart accordingly.

March 16, 2024

Kavish’s Status Report for 3/16/24

This week I split my time working on the FPGA and developing a C++ project which will be useful for our testing and demo purposes. For the FPGA application, I currently in the process of booting PetaLinux 2022. Once that process is finished, I will connect our usb camera and try use my C++ project for a very basic demo. This demo is mainly to finalize the setup of FPGA (before I develop the HPE model) and test the communication bandwidth across all the ports. The C++ project is an application to receive two streams of data (a thread to receive images and another thread to receive json data) from an ethernet connection as UDP packets. The images and corresponding data must be pre-tagged with ID in the transmitting end so that I can match them on my receiving application. A third thread pops an image from the input buffer queue and matches its ID with corresponding json data (which is stored in a dictionary). It writes the data onto the image (could be words, bounding boxes, instance segmentation etc.) and saves the updated image to an output queue. A fourth thread then gets the data from the output queue and displays it. This project mainly relies on OpenCV for image processing and a github library (nlohmann) for json parsing in C++. I do not think I am behind schedule just yet and should be able to stick to the development cycle as planned. Over the next week I plan to finish booting the FPGA and then test running an image stream and across to the Jetson.

March 10, 2024March 10, 2024

Kavish’s Status Report for 3/9/24

My last two weeks were spent working mainly on the design report. This process made me look a little bit deeper into the system implementation component of our project compared to the design presentation. Since there were numerous ways to achieve the same goal of implementation, we analyzed the different methods to understand the advantages in terms of functional efficiency and developmental efficiencies. For the FPGA development, we will be highly reliant on the Vitis AI tool and its model zoo and libraries. This decision was made for fastest development time in mind instead of focusing on optimizations from the get-go. Similarly for the web application, we decided that we should initially just use the OpenCV library for viewing and debugging purposes and then transition to a more mature display platform later in the project. As for my progress on the HPE development on FPGA, I am still running into a couple of roadblocks to fully compile the project on the board. I am referring to other works online who have attempted similar projects and will reach out for additional help as required. As for the progress, I am on the border of on-track or a little behind schedule due to the time spent on the design report. I do not need to alter the schedule as of right now, but will update it next week if required.

February 25, 2024March 10, 2024

Kavish’s Status Report 2/24/24

Most of this week was spent on preparing the design presentation and then translating that presentation into report. Me and Neeraj discussed our design quite extensively for the presentation and report. We looked at different libraries for RNN and whether to use GRU or LSTM. I also spent time learning about the MediaPipe model as I am quite weak in the Machine Learning aspects. Unfortunately I was unable to work more on the FPGA development and still have to port a pre-trained model onto it for testing purposes. Although this does not put me behind the new schedule that we developed for design presentation, I will need to put in some extra hours next week to be ahead of schedule. I plan on implementing a basic pose estimation model from the Vitis model zoo and then benchmarking its performance to fully validate our idea.

February 18, 2024March 10, 2024

Kavish’s Status Report for 2/17/24

Over the last week, I got access to the canvas and piazza for 18643 class. Using that, I have followed along and finished the first few labs to understand Vitis tool flow and software emulation steps for testing. I also researched about the current confusion we are having – whether doing pose estimation on FPGA is worth it or not. Looking online, I found evidence that many pose estimation models have been ported onto our FPGA (KV260) and have achieved comparable throughput to many Jetsons. Although it is a positive response, it is not conclusive evidence that our current plan is fully feasible. Thus, I am currently working on trying to port pre-trained models from Vitis Model Zoo to self-measure the feasibility of the project. I am running into some roadblocks on some development steps on the tool side (trying to find or build the correct configuration files and booting the kernel on the SD card), which I am trying to resolve. Although I am a little behind schedule, it is not too far off just yet to change anything on our project timeline. I should be on schedule if I can make significant progress in trying to get the FPGA running with a pre-trained pose estimation model from the model zoo. Not only will this help to confirm if our current design is feasible, but it will also put us on track to get MediaPipe or OpenPose running later down the line.

February 11, 2024March 10, 2024

Kavish’s Status Report for 2/10/24

This week we finished and presented our proposals to the class and analyzed the feedback from the TAs and Professors. Apart from that I downloaded Vivado and Vitis tools on my computer to start ramping up the FPGA development. I started exploring the basic steps of understanding the block diagrams and stepping through the synthesis tool flow. I have also been working with Neeraj to understand different Human Pose Estimation models (OpenPose and MediaPipe) and the feasibility of porting them onto the FPGA. I found a detailed resource that has implemented ICAIPose network onto the DPU, and will try to replicate their steps this week to understand feasibility and performance. As long as I can get the board and tools set up by next week, I should be on schedule and maintain progress according to Gantt Chart.