Sejal’s Status Report for 4/27/24

Accomplishments

This week, I worked with my teammates on improving the existing web app, deploying, and integrating with bluetooth. Continuing from the progress since last week, we displayed the translated sentence on the HTML page, integrating end of sentence logic and the LLM to structure the sentence. After this, I worked on the frontend UI by adding a home page, instructions page, and our main functionality on a separate detect page, as shown in the images below. I positioned the elements similar to how we originally decided from our wireframes in our design presentation and also included some text for a more seamless user experience. After having some issues deploying our web page with AWS, I tried another method of deploying. However, I am still running into issues with building all the libraries and packages into the rendering.

 

My progress is on schedule as deployment is the last step we have to complete in terms of the software before the final demo. 

Next week, we will complete this, connect with the hardware and work on the rest of the final deliverables.

Team Report for 4/27/24

Main Accomplishments for This Week

  • 3D printing of phone attachment completed
  • Fully functional web application (lap-top based)

  • Progress on displaying text via bluetooth

  • Final presentation

Risks & Risk Management

The biggest risk now is the major video input function, getUserMedia, is restricted to be used in secure contexts, including localhost and HTTPS. However, to enable phones to open the webpage, we are currently deploying the web application in AWS with HTTP. So, the webcam does not work if we open the web app on our phone. We are looking for methods to solve this issue, but if we cannot fix it, we will revert back to a laptop based implementation.

Design Changes

We decided to switch from a mobile app to a web app due to lack of MediaPipe support for IOS. In order to receive an accurate prediction, it is necessary to incorporate pose landmarks, which there is limited support for in iOS. By using a web app, we can incorporate necessary landmarks in a way we are already familiar with.

Schedule Changes

No schedule changes

Additional question for this week

We conducted unit tests on the CV module, and since our ML function is dependent on CV implementation, our tests on ML accuracy and latency were conducted after CV-ML integration.

For CV recognition requirements, we tested gesture recognition accuracy under two metrics: signing distance and background distraction. 

  • For signing distance: we manually signed at different distances and verified if hand and pose landmarks were accurately displayed on the signer. We adopted 5 samples at each interval: 1ft, 1.5ft, 2.0ft, 2.5ft, 3ft, 3.5ft, 4ft. Result: proper landmarks appear 100% of the time at distances between 1-3.9ft.
  • For background distractions: we manually created different background settings and verified if hand and pose landmarks were accurately displayed on the signer. We adopted 20 testing samples in total, with 2 samples at each trial: 1-10 distractors in the background. Result: landmarks are drawn on the target subject 95% of time (19 out of 20 samples). 

 

For translation latency requirements, we measured time delay on the web application. 

  • For translation latency: we timed the seconds elapsed between the appearance of landmarks and the display of English texts. We adopted 90 samples: Each team member signed each of the 10 phrases 3 times. Result: prediction of words appeared an average of 1100 ms after a gesture. Translation appeared an average of 2900ms after a sentence

 

For the ML accuracy requirements, 

  • The validation and training accuracies from training the dataset were represented graphically, with the training accuracy reaching about 96% and validation accuracy reaching about 93%. This suggests that the model should have high accuracy, with the possibility of some slight overfitting.
  • In order to test the accuracy while using the model, each of our team members signed each phrase 3 times along with 3 complex sentences using these phrases in a structure that was similar to typical sign language. Result: phrases translation accuracy: 82/90 of the phrases were accurate, Sentences translation accuracy: 8/9 sentences were accurate. Note that it is a bit difficult to determine if a sentence is accurate because there might be multiple correct interpretations of a sentence.
  • While integrating additional phrases into the model, we performed some informal testing as well. After training a model consisting of 10 additional phrases, we signed each of these signs and determined that the accuracy was significantly lower than the original model of only 10 signs from the DSL-10 dataset, without around every ⅓ translations being accurate. Limiting to only 1 additional phrase posed the same low accuracy, even after ensuring the data was consistent among each other. As a result, we decided that our MVP would only consist of the original 10 phrases because this model proved high accuracy.

Ran’s Status Report for 4/27/24

  • What did you personally accomplish this week on the project? 

I helped to prepare our final presentation slide earlier this week. Then I finished the web application development and accomplished CV and ML functionalities with Sejal. I was also responsible for setting up AWS cloud deployment. However, I encountered some difficulties because one of the key video input functions is forbidden under HTTP protocol. I am still searching for solutions. 

  • Is your progress on schedule or behind?

Yes, my progress is on schedule. Given that we are entering the final week of the project, I will make sure I keep up with our team’s pace and help each other out if difficulties arise.

  • What deliverables do you hope to complete in the next week?

Mobile phone compatible deployment

User experience survey distribution and collection

Final integration

Final poster designing

Final report writing

Leia’s Status Report for 4/27/2024

Progress

I’ve been making steady work on debugging the bluetooth feature of the web app. Currently the demo web app is able to successfully connect and control the Arduino Nano 33 BLE. Moreover, it can transmit data between the two, including characters of text, and display that text on the OLED screen.

 

Additional 3D printings of the case attachment is underway because of initial mishaps as well as design flaws recognized only after the prototype was developed. However, the printed parts of the refined design have been checked on and have significantly improved from the first model.

Next Steps

All that’s left in the hardware side is further debugging of the bluetooth feature to enable string transmission from the web app to the Arduino and the assembly of the 3D printed items.

Then the bluetooth feature will be integrated with our final web app in its production stage and user satisfaction will be tested. Outside of the actual project development in preparation for the final demo, we will be working on the final poster, video, and report.

Sejal’s Status Report for 4/20/24

Accomplishments

This week, my team and I worked on integrating our parts together. Since we had some issues with the mobile app and using CoreML, we decided to switch to a web app after evaluating the trade offs. To do this, Ran and I developed the frontend and backend functionalities. MediaPipe provides examples and support on JavaScript, so we extracted MediaPipe landmarks in the frontend, and sent these landmarks to the backend. I structured these landmarks in a way that was readable by the Python code, and used the existing model to output a prediction. I then sent this prediction back to the frontend, as depicted in the image below.

My progress is on schedule as we are working on integrating our parts, but we will have to do some more testing as a team tomorrow before the final presentation, ensuring our validation and verification metrics are satisfied by our working solution.

Next week, I will complete our integration, add to the UI to ensure a seamless user experience, and perform more testing as my team prepares for the final presentation, final report and final demo.

Additional question for this week:

To accomplish our tasks, I needed to refer to documentation regarding the OpenAI API to integrate the LLM processing to our sentence prediction functionality. After switching to a web app, we needed tools regarding the best way to send information between the frontend and backend using Django, the framework we are using to accomplish this. We also found it necessary to use demos and existing implementations regarding incorporating MediaPipe with Javascript. Some learning strategies learned were how to use existing video tutorials or demos and iterate upon them in order to accomplish our unique task. I also utilized github and github issue forums to help debug, if others had any similar errors as I did.

Team Status Report for 4/20

Main Accomplishments for This Week

  • 3D printing of phone attachment 
    • CAD model completed
    • Printing in progress
  • Integration of ML and MediaPipe functionality into web application
  • Progress on displaying text via bluetooth

Risks & Risk Management

  • As we are completing our integration, the biggest risk would be if the prediction translation seems to be significantly less accurate than the previous implementation using just a laptop webcam. If this is the case, we will use the next week to improve our integration before the final demo, or revert back to a laptop based implementation.

    Design Changes

    • We decided to switch from a mobile app to a web app due to lack of MediaPipe support for IOS. In order to receive an accurate prediction, it is necessary to incorporate pose landmarks, which there is limited support for in IOS. By using a web app, we can incorporate necessary landmarks in a way we are already familiar with.

    Schedule Changes

    • No schedule changes

    Additional question for this week

To accomplish our gesture and pose detection feature, we utilized the mediapipe module a lot. While this is a powerful library for generating landmarks on input image, video, or livestream video, it took us a couple weeks to study its official google developer website, read its documentation, and follow its tutorials to build the dependencies and experiment with the example code. In addition, we also watched YouTube videos to learn the overall pipeline to implement the CV module for gesture and pose recognition.

To accomplish our tasks, we needed to refer to documentation regarding the OpenAI API to integrate the LLM processing to our sentence prediction functionality. Although we are no longer focusing on mobile app development, while developing the mobile app, we needed to refer to IOS app development documentation and demos to gain familiarity with Swift. After switching to a web app, we needed tools regarding the best way to send information between the frontend and backend using Django, the framework we are using to accomplish this. We also found it necessary to use demos and existing implementations regarding incorporating MediaPipe with Javascript. Some learning strategies learned were how to use existing tutorials or demos and iterate upon them in order to accomplish our unique task.

Leia’s Status Report for 4/20/2024

Progress

Because of our team’s shift in solution approach from mobile app to web app, I have been adapting and rewriting code for the Arduino to implement bluetooth functionality with a webpage. With a general HTML, CSS, and Typescript setup, the demo website has been able to connect and control over BLE the Arduino unit. Textbox input has been added and tested to write sentences from the web app to the OLED screen via the Arduino. I’ve also been trying to transplant the necessary and appropriate demo web app components into our actual project web app. However, I’m facing similar issues and difficulties with trying to run our python scripts because my virtual environments and modules I download continue to be incompatible. Despite downgrading versions and installing appropriate elements, it does not work and so, I will be developing the bluetooth features so that they can be easily transferred into the web app with my team members.

The 3D case and attachment to the phone is currently underway. The design was completed and printing is in progress.

Next Steps

I will be retrieving the completed 3D print and incorporating the hardware parts into it to create the complete physical product of our project. I will also be trying to further accomplish and refine the text transmission over bluetooth between the Arduino and our web app, and assist with frontend if necessary.

With team members, we will be pushing into testing stages to ensure our product meets our use case and design requirements. We plan to have surveys to measure user satisfaction and other methods that verify our project.

Additional Prompt

To operate CAD software to import and configure models, I had to watch guidance videos provided by the program’s company to develop the final attachment. I also had to read documentation for the Web Bluetooth API to understand how a web app can implement BLE functions and pair with external devices such as an Arduino in addition to transmitting and receiving data from the Arduino. Forums such as Stack Overflow and Chrome For Developer’s articles about communicating with bluetooth devices over Javascript helped solidify my understandings.

Ran’s Status Report for 4/20/24

  • What did you personally accomplish this week on the project? 

In the first week, I discovered a mistake in my older way of integrating CoreML model with iOS app. I inserted the model in the incorrect function, so the model was never called, producing null results. After I moved it to the target function, multiple errors were raised and the integration was largely unsuccessful. Moreover, since MediaPipe only provides hand landmarker packages for iOS development but not pose landmarks, my search for alternative packages, including QuickPose and Xcode Vision library, failed to indicate a strong feasibility overall. So, after meeting with professors and TAs and our internal group discussion, we officially decided to change our integration plan to a web application in the Django framework. 

Accordingly, I was responsible for the overall codebase setup, the javascript hand and pose real-time recognition functions, and the data transmission between frontend and backend. Over 5 days starting from this Monday, I accomplished setting up Django framework, converting the original MediaPipe CV module written in Python to the same functionality written in javascript, and enabling frontend and backend request and response.  

  • Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

Yes, my progress is on schedule. Given that we are entering the final weeks of the project, I still need to speed up my process as much as possible to leave time for further integration.

  • What deliverables do you hope to complete in the next week?

User experience survey distribution and collection

Improving accuracy

Improving UI

  • Additional question for new module learning

To accomplish our gesture and pose detection feature, we utilized the MediaPipe module a lot. While MediaPipe is a powerful library for generating landmarks on input image, video, or livestream video, it took us a couple weeks to study its official google developer website, read its documentation, and follow its tutorials to build the dependencies and experiment with the example code. In addition, we also watched YouTube videos to learn the overall pipeline to implement the CV module for gesture and pose recognition.

Sejal’s Status Report for 4/06/24

Accomplishments

This week, my team did our interim demo and began to integrate our respective parts. To prepare for the demo, I made some small modifications to the prompt taken in API request for the LLM to ensure that it was outputting sentences that made sense according to the predicted gestures.

Since we are using CoreML to use machine learning within our IOS app, my teammate Ran and I worked on the swift code to integrate machine learning into our app. I converted my model to CoreML and wrote a function that behaves the same way the large language model in the machine learning processing works.

I also continued trying to debug why the expanded dataset from last week wasn’t working. I re-recorded some videos ensuring that MediaPipe was recognizing all the landmarks in the frame, with good lighting, and the same format as the rest of the dataset. Again, while the training and validation accuracies were high, while testing the gesture prediction with the model, it recognized very little of the gestures. This might suggest that with the expanded amount of data, the model is not complex enough to handle this. So, I continued to try to add layers to make the model more complex, but there didn’t seem to be any improvement.

My progress is on schedule as we are working on integrating our parts.

Next week, I am going to continue to work with Ran to integrate the ML into the IOS app. I will also try to fine tune the model structure some more to attempt to improve the existing one, and perform the testing described below.

Verification and Validation

Tests ran so far: I have done some informal testing by signing into the webcam and seeing if it displayed the signs accurately

Tests I plan to run: 

  • Quantitatively measure latency of gesture prediction 
    • Since one of our use case requirements was to have a 1-3 second latency for gesture recognition, I will measure how long it takes after a gesture is signed for a prediction to appear.
  • Quantitatively measure latency of LLM
    • Similar to measuring the  latency of gesture prediction, it is important to also measure how long it takes the LLM to process the prediction and output a sentence, so I will measure this as well.
  • Quantitatively measure accuracy of gesture prediction 
    • Since one of our use case requirements was to have a gesture prediction accuracy of > 95%, I will measure the accuracy of a signed gesture against its prediction
  • Qualitatively determine accuracy of LLM
    • Since there is no right/wrong output of the LLM, this testing is done qualitatively, to determine whether the sentence output makes sense based on the direct predictions, and in a conversational sentence.

I will do the above tests in various lightings, backgrounds, and with distractions to ensure it corresponds to our use case requirements in different settings, simulating where the device might be used.

Ran’s Status Report for 4/06/24

  • What did you personally accomplish this week on the project? 

I have mostly employed the ML model in the mobile application with CoreML. While the current model shows prediction on the phone screen, it lacks accuracy compared to what we have obtained on a laptop. The reason is mostly likely that I have not extracted pose landmarks in the mobile app, because mediapipe does not offer a pose package for iOS. The alternative packages include QuickPose, which I am now experimenting with and plan to integrate next week. 

  • Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

Yes, my progress is on schedule. Given that we are entering the final weeks of the project, I still need to speed up my process as much as possible to leave time for validation.

  • What deliverables do you hope to complete in the next week?

CoreML full functionality

Translation accuracy and latency testing

Mobile app to hardware connection