Neeraj’s Status Report for 2/17/24 – Team E6: TransLingualVisionary

My main goal for this week was to experiment with various human pose estimation libraries. Primarily, I was focusing on determining whether to use OpenPose or Mediapipe and which of the two would better fit our design pipeline. These libraries have had a history of running on smaller IoT devices, meaning either of which has the potential to possibly work on an FPGA like we intend to.

When installing both of these models, I was having issues with installing the OpenPose models, meaning I might need more time experimenting with them. However, I have been able to test Mediapipe’s hand detection model. It can create 20 landmarks to detect hand position and pose, as well as distinguish left and right hands. It also outputs vectors that hold locations with the position of each landmark in the image. This means that we do not necessarily have to develop a CNN-RNN fusion model to account for spatial information and instead use these vectors as inputs into an RNN to classify words. I have tested this with a few still photos, as per Mediapipe’s test documentation code, which is in our team report. Combining this with the OpenCV library, I have developed a script that takes in a live video from a camera and return output vectors containing the positions of the vectors. This script would be representative of the beginning end of our design pipeline, which we can use for testing and verifying the hardware side of our design.

On another note, I have been looking about how prompt engineering works with LLMs. More specifically, I am looking at the following paper by Masaru Yamada:

https://arxiv.org/ftp/arxiv/papers/2308/2308.01391.pdf

The paper explored the influence of integrating the purpose of the translation and the target audience into prompts on the quality of translations produced by ChatGPT. The findings suggest that including suitable prompts related to the translation purpose and target audience can yield more flexible and higher quality translations that better meet industry standards. Specifically, the prompts allowed ChatGPT to generate translations that were more culturally adapted and persuasive for marketing content as well as more intelligible translations of culture-dependent idioms. The paper also demonstrated the practical application of translation concepts like dynamic equivalence by using prompts to guide creative translations. This paper could provide good insight into translating word fragments into full sentences, as well as what prompts we could begin experimenting with so that we can capture the nuances within ASL.

I am basically on schedule. This human pose estimation code would function as a good foundation for our pre-processing model, which we can also use for hardware testing. We are also on pace for starting prompt engineering for our LLM.

Leave a Reply Cancel reply