Team Report for 4/27/24

Main Accomplishments for This Week

  • 3D printing of phone attachment completed
  • Fully functional web application (lap-top based)

  • Progress on displaying text via bluetooth

  • Final presentation

Risks & Risk Management

The biggest risk now is the major video input function, getUserMedia, is restricted to be used in secure contexts, including localhost and HTTPS. However, to enable phones to open the webpage, we are currently deploying the web application in AWS with HTTP. So, the webcam does not work if we open the web app on our phone. We are looking for methods to solve this issue, but if we cannot fix it, we will revert back to a laptop based implementation.

Design Changes

We decided to switch from a mobile app to a web app due to lack of MediaPipe support for IOS. In order to receive an accurate prediction, it is necessary to incorporate pose landmarks, which there is limited support for in iOS. By using a web app, we can incorporate necessary landmarks in a way we are already familiar with.

Schedule Changes

No schedule changes

Additional question for this week

We conducted unit tests on the CV module, and since our ML function is dependent on CV implementation, our tests on ML accuracy and latency were conducted after CV-ML integration.

For CV recognition requirements, we tested gesture recognition accuracy under two metrics: signing distance and background distraction. 

  • For signing distance: we manually signed at different distances and verified if hand and pose landmarks were accurately displayed on the signer. We adopted 5 samples at each interval: 1ft, 1.5ft, 2.0ft, 2.5ft, 3ft, 3.5ft, 4ft. Result: proper landmarks appear 100% of the time at distances between 1-3.9ft.
  • For background distractions: we manually created different background settings and verified if hand and pose landmarks were accurately displayed on the signer. We adopted 20 testing samples in total, with 2 samples at each trial: 1-10 distractors in the background. Result: landmarks are drawn on the target subject 95% of time (19 out of 20 samples). 

 

For translation latency requirements, we measured time delay on the web application. 

  • For translation latency: we timed the seconds elapsed between the appearance of landmarks and the display of English texts. We adopted 90 samples: Each team member signed each of the 10 phrases 3 times. Result: prediction of words appeared an average of 1100 ms after a gesture. Translation appeared an average of 2900ms after a sentence

 

For the ML accuracy requirements, 

  • The validation and training accuracies from training the dataset were represented graphically, with the training accuracy reaching about 96% and validation accuracy reaching about 93%. This suggests that the model should have high accuracy, with the possibility of some slight overfitting.
  • In order to test the accuracy while using the model, each of our team members signed each phrase 3 times along with 3 complex sentences using these phrases in a structure that was similar to typical sign language. Result: phrases translation accuracy: 82/90 of the phrases were accurate, Sentences translation accuracy: 8/9 sentences were accurate. Note that it is a bit difficult to determine if a sentence is accurate because there might be multiple correct interpretations of a sentence.
  • While integrating additional phrases into the model, we performed some informal testing as well. After training a model consisting of 10 additional phrases, we signed each of these signs and determined that the accuracy was significantly lower than the original model of only 10 signs from the DSL-10 dataset, without around every ⅓ translations being accurate. Limiting to only 1 additional phrase posed the same low accuracy, even after ensuring the data was consistent among each other. As a result, we decided that our MVP would only consist of the original 10 phrases because this model proved high accuracy.

Ran’s Status Report for 4/27/24

  • What did you personally accomplish this week on the project? 

I helped to prepare our final presentation slide earlier this week. Then I finished the web application development and accomplished CV and ML functionalities with Sejal. I was also responsible for setting up AWS cloud deployment. However, I encountered some difficulties because one of the key video input functions is forbidden under HTTP protocol. I am still searching for solutions. 

  • Is your progress on schedule or behind?

Yes, my progress is on schedule. Given that we are entering the final week of the project, I will make sure I keep up with our team’s pace and help each other out if difficulties arise.

  • What deliverables do you hope to complete in the next week?

Mobile phone compatible deployment

User experience survey distribution and collection

Final integration

Final poster designing

Final report writing

Ran’s Status Report for 4/20/24

  • What did you personally accomplish this week on the project? 

In the first week, I discovered a mistake in my older way of integrating CoreML model with iOS app. I inserted the model in the incorrect function, so the model was never called, producing null results. After I moved it to the target function, multiple errors were raised and the integration was largely unsuccessful. Moreover, since MediaPipe only provides hand landmarker packages for iOS development but not pose landmarks, my search for alternative packages, including QuickPose and Xcode Vision library, failed to indicate a strong feasibility overall. So, after meeting with professors and TAs and our internal group discussion, we officially decided to change our integration plan to a web application in the Django framework. 

Accordingly, I was responsible for the overall codebase setup, the javascript hand and pose real-time recognition functions, and the data transmission between frontend and backend. Over 5 days starting from this Monday, I accomplished setting up Django framework, converting the original MediaPipe CV module written in Python to the same functionality written in javascript, and enabling frontend and backend request and response.  

  • Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

Yes, my progress is on schedule. Given that we are entering the final weeks of the project, I still need to speed up my process as much as possible to leave time for further integration.

  • What deliverables do you hope to complete in the next week?

User experience survey distribution and collection

Improving accuracy

Improving UI

  • Additional question for new module learning

To accomplish our gesture and pose detection feature, we utilized the MediaPipe module a lot. While MediaPipe is a powerful library for generating landmarks on input image, video, or livestream video, it took us a couple weeks to study its official google developer website, read its documentation, and follow its tutorials to build the dependencies and experiment with the example code. In addition, we also watched YouTube videos to learn the overall pipeline to implement the CV module for gesture and pose recognition.

Ran’s Status Report for 4/06/24

  • What did you personally accomplish this week on the project? 

I have mostly employed the ML model in the mobile application with CoreML. While the current model shows prediction on the phone screen, it lacks accuracy compared to what we have obtained on a laptop. The reason is mostly likely that I have not extracted pose landmarks in the mobile app, because mediapipe does not offer a pose package for iOS. The alternative packages include QuickPose, which I am now experimenting with and plan to integrate next week. 

  • Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

Yes, my progress is on schedule. Given that we are entering the final weeks of the project, I still need to speed up my process as much as possible to leave time for validation.

  • What deliverables do you hope to complete in the next week?

CoreML full functionality

Translation accuracy and latency testing

Mobile app to hardware connection

Ran’s Status Report for 3/30/24

  • What did you personally accomplish this week on the project? 

Following the plan to migrate CV and ML modules to the local iOS app, I was mainly responsible for getting the Swift version of code to work this week. I successfully made the mobile app running, showing 21 landmarks per hand as expected. However, I am still debugging the CoreML interface that should integrate the ML model we trained and the real-time mediapipe landmarks.

  • Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

I devoted quite an amount of time working on my task this week, but I am still a bit behind schedule. Since the set milestone and project deadlines are approaching, I will make sure to seek help if really stuck.

  • What deliverables do you hope to complete in the next week?

CoreML integration

Testing with ML latency and accuracy

Team Status Report for 3/30/24

Main Accomplishments for This Week

  • Mobile App
    • MediaPipe hand landmark detection
    • CoreML initiated

  • ML learning
    • Translation algorithms optimization
    • LLM initialization 

  • Hardware 
    • All wires soldered
    • Bluetooth connected
    • OLED screen display enabled

Risks & Risk Management

  • Since MediaPipe does not provide a pose detection model for iOS, not including pose landmarks might reduce the accuracy. Accordingly, we will seek possible solutions. But if no reliable complementary pose detection model is available, we would adjust our ML model to focus on prediction based on hand landmarks.

Design Changes

  • We decided to use LLM instead of NLP to structure sentences. Since OpenAI provides handy interface, an LLM model does not need complicated training and it provides more accurate result.

Schedule Changes

  • We may need an extra week to reach the final sentence translation milestone and the mobile app launch milestone. With staged progress, our overall system integration and final testing, verification and validation are expected to be finished in the last week. Accordingly, we have updated our schedule.

Ran’s Status Report for 3/23/24

  • What did you personally accomplish this week on the project? 

This week we decided to make a change on our implementation plan. Instead of incorporating cloud deployment, we switched to local processing on the phone. Since MediaPipe offers an interface with Swift, I started to set up the appropriate environment and migrate our original code for the CV module. Although initially I encountered integration issues, I managed to get the application compiled and run on my phone. Currently it captures the video and shows landmarks in real-time on the screen. The next task would be to transform the ML module with CoreML, but all our team members happen to be busy with other courses and interviews, so we will invest in-peron, collaborative working time next week to keep us on the schedule.

  • Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

Since some of the tasks have changed, I am currently behind schedule. Next week, I will allocate at least 6 hours besides regular class time to work on the remaining work offline, partially individually and partially with my teammates. 

  • What deliverables do you hope to complete in the next week?

iOS-based CV module refinement

CoreML integration

Ran’s Status Report for 3/16/24

  • What did you personally accomplish this week on the project? 

After our meeting with Professor Savvides and Neha on Monday, I explored the mobile app stream transmission resources they shared and experimented on some methods, including using ffmpeg, Apple’s HTTP Live Streaming (HLS), and some SDKs/open source libraries. After all, I found SwiftVideo might be a suitable package to assist mobile (local) to cloud server video transmission. Meanwhile, my teammates suggested moving CV and ML modules completely to the iPhone processor, by implementing MediaPipe features in Objective-C and using CoreML to realize ML prediction. At the current stage, I cannot tell which method will carry out a better outcome, so I will try to run two tasks in parallel and decide to adopt a certain one by the middle of next week. Moreover, I helped to test and debug the ML module for dynamic signing. 

  • Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

My progress is mostly on schedule. I plan to devote more time in writing code for 1) cloud transmission API and 2) transforming CV module from python into Objective-C.

  • What deliverables do you hope to complete in the next week?

Mobile app development

Team Status Report for 3/9/24

Main Accomplishments for The Past Two Weeks

* Major work was accomplished in the week of 3/2 as the spring break was planned as slack week.

  • Design report

  • ML model training for dynamic signs

  • Mobile app’s Video recording & saving capabilities

  • Cloud environment setup in AWS Amplify
  • Hardware components setup and connection

Risks & Risk Management

While the expected basic functions are being implemented and carried out, we have encountered issues in low quantitative performances, including prediction accuracy and overall latency. 

  • The static ML model is tested to predict relatively accurate single letters from the real-time signing, but the two dynamic ML models we have incorporated and trained so far produced errors in live signing testing. We are re-examining the way we extracted landmarks from the CV module to debug. In case of severe difficulties in dynamic sign prediction, we would pivot to a fallback mechanism that recognizes only static signs, leveraging existing models specifically trained for static sign recognition. 
  • Another issue lies in the way we store and transmit data. At the present stage, the mobile app captures and stores the camera input in the photo album as .mov file, which is likely to be stored as the same format in S3 database after we fully set up the AWS Amplify cloud environment. However, we aim for real-time, streaming-like transmissions, which means the current solution – saving the file after pressing stop recording – does not satisfy our requirements. We will conduct further research on reliable live transmission methods to solve this issue. The backup plan of a laptop-based web app that uses the webcam would be used if no feasible mitigations are available.

Design Changes

  • No design changes

Schedule Changes

  • No schedule changes

Additional Week-Specific Questions

Part A was written by Leia, Part B was written by Sejal, and Part C was written by Ran.

Part A: Global Factors (Leia)

Our project addresses a variety of global factors with its practicality. For those not technologically savvy, our procedure to install and utilize our translator is not complex nor difficult. Simply by downloading the app and attaching the display module to the phone, the translator is immediately functional. It is not restricted to one environment either, but intended to be used everywhere for many situations, from single-person conversation to group interactions. Additionally, its purpose is for the hard-of-hearing community, but can technically be accessed by anyone, including the speaking community. Admittedly, because we are focused on American sign language rather than the international variation and do not include other nations’ versions, our product is not global in this aspect. However, its ease of use makes it versatile for anyone who uses American sign language in communicating with anyone who speaks English.

Part B: Cultural Factors (Sejal)

Cultural factors such as language and communication norms vary among different groups of people. Our solution recognizes the importance of cultural diversity and aims to bridge communication barriers by facilitating real time ASL translation. For the deaf community specifically, ASL is not just a language but also a vital part of their cultural identity. The product acknowledges the cultural significance of ASL by providing accurate translations and preserving the integrity of ASL gestures, fostering a sense of cultural pride and belonging among ASL users. By displaying written English subtitles for non-ASL users, the product promotes cultural understanding and facilitates meaningful interactions between individuals with different communication preferences, aiming to build inclusive communities. Additionally, the portable design of the product ensures that users can carry it with them wherever they go, accommodating the diverse needs and preferences of users from different cultural backgrounds.

Part C: Environmental Factors (Ran)

Our product does not pose harm to the environment and consumes a minimal amount of energy resources. Our hardware is composed of an OLED screen, a Li-Ion battery, an arduino board, and a 3-D printed phone attachment. The OLED screen lasts longer than traditional display technologies with the same amount of energy, and it uses organic compounds that are less harmful in manufacturing processes. The rechargeable Li-Ion battery also reduces overall electronic waste by almost eliminating the need for replacement. So, the battery and screen supports our product’s sustainability by their extended use period and long lifespan. In addition, we enhance the environmental advantages with the choice of a 3D printed phone attachment from polylactic acid (PLA) filaments. This material is derived from renewable resources, such as cornstarch or sugarcane, contributing to a reduction in reliance on finite fossil fuels. Moreover, the lower extrusion temperatures required for PLA during printing result in decreased energy consumption, making it a more energy-efficient option. Most importantly, PLA is biodegradable and emits fewer greenhouse gasses, including volatile organic compounds (VOCs).

Meanwhile, our product is small in size and is highly portable, whose operation does not require external energy input other than the phone battery and the Li-Ion battery, nor does it produce by-products during usage.

Ran’s Status Report for 3/9/24

  • What did you personally accomplish this week on the project? 

I was mainly responsible for the iOS application programming and cloud environment setup this week. I improved the UI of the mobile app and added focus and saving features to the application. After pressing the stop recording button, the video is automatically saved in .mov format (after obtaining user’s permission to access the local photo album). This feature could be easily integrated with cloud deployment, where the movie file is instead stored in the S3 database. However, I have not implemented the real-time transmission to the cloud database. This could be a potential difficulty for future work. I also finished the design requirements, test & validation, and other subsections for the design review report.

  • Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

My progress is mostly on schedule. I plan to devote more time in CV-ML integration as there seems to be accuracy issues in dynamic signing prediction. Moreover, I will spend more time working on researching cloud transmission technologies and phone application coding.

  • What deliverables do you hope to complete in the next week?

Cloud deployment with real-time video transmission

CV and ML integration