Team Report for 4/27/24

Main Accomplishments for This Week

  • 3D printing of phone attachment completed
  • Fully functional web application (lap-top based)

  • Progress on displaying text via bluetooth

  • Final presentation

Risks & Risk Management

The biggest risk now is the major video input function, getUserMedia, is restricted to be used in secure contexts, including localhost and HTTPS. However, to enable phones to open the webpage, we are currently deploying the web application in AWS with HTTP. So, the webcam does not work if we open the web app on our phone. We are looking for methods to solve this issue, but if we cannot fix it, we will revert back to a laptop based implementation.

Design Changes

We decided to switch from a mobile app to a web app due to lack of MediaPipe support for IOS. In order to receive an accurate prediction, it is necessary to incorporate pose landmarks, which there is limited support for in iOS. By using a web app, we can incorporate necessary landmarks in a way we are already familiar with.

Schedule Changes

No schedule changes

Additional question for this week

We conducted unit tests on the CV module, and since our ML function is dependent on CV implementation, our tests on ML accuracy and latency were conducted after CV-ML integration.

For CV recognition requirements, we tested gesture recognition accuracy under two metrics: signing distance and background distraction. 

  • For signing distance: we manually signed at different distances and verified if hand and pose landmarks were accurately displayed on the signer. We adopted 5 samples at each interval: 1ft, 1.5ft, 2.0ft, 2.5ft, 3ft, 3.5ft, 4ft. Result: proper landmarks appear 100% of the time at distances between 1-3.9ft.
  • For background distractions: we manually created different background settings and verified if hand and pose landmarks were accurately displayed on the signer. We adopted 20 testing samples in total, with 2 samples at each trial: 1-10 distractors in the background. Result: landmarks are drawn on the target subject 95% of time (19 out of 20 samples). 

 

For translation latency requirements, we measured time delay on the web application. 

  • For translation latency: we timed the seconds elapsed between the appearance of landmarks and the display of English texts. We adopted 90 samples: Each team member signed each of the 10 phrases 3 times. Result: prediction of words appeared an average of 1100 ms after a gesture. Translation appeared an average of 2900ms after a sentence

 

For the ML accuracy requirements, 

  • The validation and training accuracies from training the dataset were represented graphically, with the training accuracy reaching about 96% and validation accuracy reaching about 93%. This suggests that the model should have high accuracy, with the possibility of some slight overfitting.
  • In order to test the accuracy while using the model, each of our team members signed each phrase 3 times along with 3 complex sentences using these phrases in a structure that was similar to typical sign language. Result: phrases translation accuracy: 82/90 of the phrases were accurate, Sentences translation accuracy: 8/9 sentences were accurate. Note that it is a bit difficult to determine if a sentence is accurate because there might be multiple correct interpretations of a sentence.
  • While integrating additional phrases into the model, we performed some informal testing as well. After training a model consisting of 10 additional phrases, we signed each of these signs and determined that the accuracy was significantly lower than the original model of only 10 signs from the DSL-10 dataset, without around every ⅓ translations being accurate. Limiting to only 1 additional phrase posed the same low accuracy, even after ensuring the data was consistent among each other. As a result, we decided that our MVP would only consist of the original 10 phrases because this model proved high accuracy.

Team Status Report for 4/20

Main Accomplishments for This Week

  • 3D printing of phone attachment 
    • CAD model completed
    • Printing in progress
  • Integration of ML and MediaPipe functionality into web application
  • Progress on displaying text via bluetooth

Risks & Risk Management

  • As we are completing our integration, the biggest risk would be if the prediction translation seems to be significantly less accurate than the previous implementation using just a laptop webcam. If this is the case, we will use the next week to improve our integration before the final demo, or revert back to a laptop based implementation.

    Design Changes

    • We decided to switch from a mobile app to a web app due to lack of MediaPipe support for IOS. In order to receive an accurate prediction, it is necessary to incorporate pose landmarks, which there is limited support for in IOS. By using a web app, we can incorporate necessary landmarks in a way we are already familiar with.

    Schedule Changes

    • No schedule changes

    Additional question for this week

To accomplish our gesture and pose detection feature, we utilized the mediapipe module a lot. While this is a powerful library for generating landmarks on input image, video, or livestream video, it took us a couple weeks to study its official google developer website, read its documentation, and follow its tutorials to build the dependencies and experiment with the example code. In addition, we also watched YouTube videos to learn the overall pipeline to implement the CV module for gesture and pose recognition.

To accomplish our tasks, we needed to refer to documentation regarding the OpenAI API to integrate the LLM processing to our sentence prediction functionality. Although we are no longer focusing on mobile app development, while developing the mobile app, we needed to refer to IOS app development documentation and demos to gain familiarity with Swift. After switching to a web app, we needed tools regarding the best way to send information between the frontend and backend using Django, the framework we are using to accomplish this. We also found it necessary to use demos and existing implementations regarding incorporating MediaPipe with Javascript. Some learning strategies learned were how to use existing tutorials or demos and iterate upon them in order to accomplish our unique task.

Team Status Report for 4/6

Main Accomplishments for This Week

  • Transporting our current ML implementation into CoreML to be iOS app compatible
  • Adapting our app CV input to work with the ML program
  • Successful integration of the LLM model to grammatically fix direct ASL translations into proper sentences
  • Hardware approaching near completion with app to screen text functionality

Risks & Risk Management

  • As we’re in the final week of our project, possible risks are issues with the total integration. The CV and ML are being worked on in one app and the Arduino and bluetooth screen in another, so eventually they will have to be merged together.
    • The risk mitigation for this is careful and honest communication across all team members. We don’t anticipate this to fail severely, but in the off chance that we cannot get them to combine, we will discuss which app to prioritize.
  • Another related concern that carries from last week is the difficulty incorporating a pose detection model as MediaPipe lacks it for iOS. This may lead to reduced accuracy, and our fallback if it continues to be unavailable is to focus entirely on hand landmarks.

Design Changes

  • No design changes

Schedule Changes

  • We added an extra week for NLP training and for the final system integration. 

Additional Week-Specific Question

  • Our validation tests for the overall project involve:
    • Distance tests to measure how close or far user needs to be from app camera
      • Our use case requirement was that the person must be between 1-3.9ft from the iPhone front camera, so we will test distance inside and out of this range to determine if it meets this requirement.
    • Accuracy of ASL translations displayed on the OLED screen
      •  Our use case requirement was that the accuracy for gesture detection and recognition should be >= 95%, so we have to ensure the accuracy meets this requirement
    • Latency of text appearing on screen after gestures signed
      • We also have to ensure our latency meets the requirement of <= 1-3 seconds, consisting of the ML processing, LLM processing, and displaying on the OLED screen.
    • Accessibility and user experience surveys
      • We will get ASL users to test out the device and collect feedback through surveys in order to reach out user satisfaction rate requirement of > 90%

Team Status Report for 3/30/24

Main Accomplishments for This Week

  • Mobile App
    • MediaPipe hand landmark detection
    • CoreML initiated

  • ML learning
    • Translation algorithms optimization
    • LLM initialization 

  • Hardware 
    • All wires soldered
    • Bluetooth connected
    • OLED screen display enabled

Risks & Risk Management

  • Since MediaPipe does not provide a pose detection model for iOS, not including pose landmarks might reduce the accuracy. Accordingly, we will seek possible solutions. But if no reliable complementary pose detection model is available, we would adjust our ML model to focus on prediction based on hand landmarks.

Design Changes

  • We decided to use LLM instead of NLP to structure sentences. Since OpenAI provides handy interface, an LLM model does not need complicated training and it provides more accurate result.

Schedule Changes

  • We may need an extra week to reach the final sentence translation milestone and the mobile app launch milestone. With staged progress, our overall system integration and final testing, verification and validation are expected to be finished in the last week. Accordingly, we have updated our schedule.

Team Status Report for 3/23/24

Main Accomplishments for This Week

  • Mobile App progress
    • Testing for bluetooth capabilities
    • Discard cloud deployment and switch to iOS local processing 
    • transform the original CV module code to Swift

  • Hardware progress
    • Purchase of 3.7V 150mAh Adafruit battery
    • Working on connecting OLED screen to Arduino
  • Machine learning model progress
    • Continuing to compile/create data
    • Training for an additional 32 sign language gestures

Risks & Risk Management

  • No additional risks right now. With the interim demo approaching, we hope to have an integration of all our parts with definitive results.

Design Changes

  • We opted for a different approach by shifting from cloud deployment to local processing on the phone
    • Instead of relying on the app to interact with a database for data exchange, we are leveraging integration between the ML and CV Python scripts directly into the app package for streamlined retrieval. This is because the current video transmission process to the database raised concerns about reliability. This change also simplifies the app architecture and reduces reliance on external resources, hopefully leading to improved performance and flexibility.

Schedule Changes

  • No formal schedule changes, but we will increase allocated dedicated time for collaboration next week to ensure adherence to the project timeline.

Team Status Report for 3/16/24

Main Accomplishments for This Week

  • The issues we encountered last week carried on to this week as well, but we have made progress in resolving them and continue to work on their solutions:
    • The dynamic machine learning models were not performing as expected – regardless of gestures made, the same word(s) are being predicted. We narrowed the vulnerability to be from integration and received feedback to focus on how we are extracting our coordinates. 
      • We were advised to identify a center of mass or main focal point such as the wrist to subtract from, rather than use raw xyz coordinates for landmarking. 
      • Hence, we updated our dynamic processing model code and now have been getting improved predictions.
    • The transmission from video to database is currently questionable. We desired real-time streaming from the phone camera to the cloud environment so that gestures can be processed and interpreted immediately as they are happening. 
      • We received lots of articles on this concept to study further. With the diversity of solutions to solving this problem, it’s a little difficult to identify which is best suitable for our situation. 
      • Hence, we are considering just having the iOS app and Xcode environment directly handle the machine learning and computer vision rather than outsource the operations to a cloud database storage. Research was done on whether in Xcode, python scripts and related code that utilize OpenCV, Mediapipe, Tensorflow, and Keras can be packaged with the app and retrieved within that package. So far, there is promise shown that this can be achieved, but for safety, we will maintain our database. 
  • Progress on Amplify setup
  • Arduino and Mobile App Bluetooth connection

Risks & Risk Management

  • With the interim demo approaching, we hope to have definitive outcomes in all our parts.
    • We are working on further accuracy and expansion of training data for our machine learning models. Our basic risk mitigation tactic for this in case of setbacks is to remain with static model implementation.
    • Regarding hardware, there is a safety concern with operating the LiPo battery, but that has been minimized by extremely careful and proper handling in addition to budget available in case of part replacement needed.
  • As mentioned in the Main Accomplishment section, there is a challenge with our plans to integrate ML and CV with the mobile app. At first, we thought of the database, but because of streaming issues, we shifted to having the mobile app have local script access to ML and CV. We will be steadily trying to achieve this, but we will have a backup of the database and even delegate operations into a web app to be converted into a mobile app if the database-to-app transmission continues to be a risk.

Design Changes

  • No design changes

Schedule Changes

  • We are currently approaching our milestone to launch the mobile app. We will be working together to integrate it, but if we do not achieve everything we want to accomplish by then in regards to the iOS app, then we will change the date.

Team Status Report for 3/9/24

Main Accomplishments for The Past Two Weeks

* Major work was accomplished in the week of 3/2 as the spring break was planned as slack week.

  • Design report

  • ML model training for dynamic signs

  • Mobile app’s Video recording & saving capabilities

  • Cloud environment setup in AWS Amplify
  • Hardware components setup and connection

Risks & Risk Management

While the expected basic functions are being implemented and carried out, we have encountered issues in low quantitative performances, including prediction accuracy and overall latency. 

  • The static ML model is tested to predict relatively accurate single letters from the real-time signing, but the two dynamic ML models we have incorporated and trained so far produced errors in live signing testing. We are re-examining the way we extracted landmarks from the CV module to debug. In case of severe difficulties in dynamic sign prediction, we would pivot to a fallback mechanism that recognizes only static signs, leveraging existing models specifically trained for static sign recognition. 
  • Another issue lies in the way we store and transmit data. At the present stage, the mobile app captures and stores the camera input in the photo album as .mov file, which is likely to be stored as the same format in S3 database after we fully set up the AWS Amplify cloud environment. However, we aim for real-time, streaming-like transmissions, which means the current solution – saving the file after pressing stop recording – does not satisfy our requirements. We will conduct further research on reliable live transmission methods to solve this issue. The backup plan of a laptop-based web app that uses the webcam would be used if no feasible mitigations are available.

Design Changes

  • No design changes

Schedule Changes

  • No schedule changes

Additional Week-Specific Questions

Part A was written by Leia, Part B was written by Sejal, and Part C was written by Ran.

Part A: Global Factors (Leia)

Our project addresses a variety of global factors with its practicality. For those not technologically savvy, our procedure to install and utilize our translator is not complex nor difficult. Simply by downloading the app and attaching the display module to the phone, the translator is immediately functional. It is not restricted to one environment either, but intended to be used everywhere for many situations, from single-person conversation to group interactions. Additionally, its purpose is for the hard-of-hearing community, but can technically be accessed by anyone, including the speaking community. Admittedly, because we are focused on American sign language rather than the international variation and do not include other nations’ versions, our product is not global in this aspect. However, its ease of use makes it versatile for anyone who uses American sign language in communicating with anyone who speaks English.

Part B: Cultural Factors (Sejal)

Cultural factors such as language and communication norms vary among different groups of people. Our solution recognizes the importance of cultural diversity and aims to bridge communication barriers by facilitating real time ASL translation. For the deaf community specifically, ASL is not just a language but also a vital part of their cultural identity. The product acknowledges the cultural significance of ASL by providing accurate translations and preserving the integrity of ASL gestures, fostering a sense of cultural pride and belonging among ASL users. By displaying written English subtitles for non-ASL users, the product promotes cultural understanding and facilitates meaningful interactions between individuals with different communication preferences, aiming to build inclusive communities. Additionally, the portable design of the product ensures that users can carry it with them wherever they go, accommodating the diverse needs and preferences of users from different cultural backgrounds.

Part C: Environmental Factors (Ran)

Our product does not pose harm to the environment and consumes a minimal amount of energy resources. Our hardware is composed of an OLED screen, a Li-Ion battery, an arduino board, and a 3-D printed phone attachment. The OLED screen lasts longer than traditional display technologies with the same amount of energy, and it uses organic compounds that are less harmful in manufacturing processes. The rechargeable Li-Ion battery also reduces overall electronic waste by almost eliminating the need for replacement. So, the battery and screen supports our product’s sustainability by their extended use period and long lifespan. In addition, we enhance the environmental advantages with the choice of a 3D printed phone attachment from polylactic acid (PLA) filaments. This material is derived from renewable resources, such as cornstarch or sugarcane, contributing to a reduction in reliance on finite fossil fuels. Moreover, the lower extrusion temperatures required for PLA during printing result in decreased energy consumption, making it a more energy-efficient option. Most importantly, PLA is biodegradable and emits fewer greenhouse gasses, including volatile organic compounds (VOCs).

Meanwhile, our product is small in size and is highly portable, whose operation does not require external energy input other than the phone battery and the Li-Ion battery, nor does it produce by-products during usage.

Team Status Report for 2/24

Main Accomplishments for This Week

  • Design Review presentation

  • Swift language and Xcode environment setup
    • Initialization of mobile app with camera capabilities

  • Ordered and picked up inventory items purchased (​​battery, oled screen and eink screen)
  • Beginning of ML model training for dynamic signs 

Risks & Risk Management

  • Currently no significant risks for the whole team, but some issues encountered by teammates are as follows:
    • One issue raised an issue that the Xcode-simulated iPhone does not have a camera implementation, so we are doing further research and testing to be able to use the iPhone camera in our app and integrate it with the rest of our code.
    • Another issue encountered was the significant amount of data that will be needed to produce an accurate ML model, as well as foreseen issues with integrating multiple data sets. Our team is mitigating risks relating to this by taking advantage of an iterative approach to training and testing the ML model with the CV processing.

Design Changes

  • No design changes

Schedule Changes

  • No schedule changes for now. However, if the intermediate integrations don’t go well, we probably will use spring break time to work on this

Team Status Report for 2/17/2024

Main Accomplishments for This Week

  • Design presentation
  • Initial ML and CV combined integration for basic ASL alphabet testing

  • Confirmation of inventory items for purchase

Risks & Risk Management

  • Currently no significant risks for the whole team. Therefore, no mitigation needed. There are concerns for each team member in their respective roles, but nothing to the extent that they jeopardize the entire project.

Design Changes

  • Natural language processing (NLP) has been included in software development. Considering sign language does not directly translate into full, syntactic sentences, we realized we needed a machine learning algorithm for grammar correction to achieve proper translation. We intend to use open-source code after understanding NLP computation, and plan for it to be implemented in later stages. Specifically, it will be developed after the ASL ML algorithm and CV programming have been accomplished. Although this grows the software aspect a little more, team members are all on board to contribute to this part together to minimize any possible costs this may incur in the overall timeline.
  • Three reach goals have been confirmed for after MVP is completed: 1. Speech-to-text, 2. A signal for the end of a sentence by the ASL user (a flash of light, or an audio notification), and 3. Facial recognition to enhance the ASL translations. All of the above is for smoother, fluid conversation between the user and the receiver.

Schedule Changes

 

Additional – Status Report 2

Part A was written by Ran, B was written by Sejal and C was written by Leia.

Part A: Our project by nature enhances public health and welfare by ensuring effective communications for sign language users. In the context of health, both obtaining and expressing accurate information about material requirements, medical procedures, and preventive measures are vital. Our project facilitates these communications, contributing to the physiological well-being of users. More importantly, we aim to elevate the psychological happiness of sign language users by providing them with a sense of inclusivity and fairness in daily social interactions. In terms of welfare, our project enables efficient access to basic needs such as education, employment, community services and healthcare via its high portability and diverse use-case scenarios. Moreover, we make every effort to secure the functionality of mechanical and electronic components: the plastic backbone of our phone attachment will be 3-D printed with round corners, and the supporting battery will operate at a human-safe low voltage.

Part B: Our project prioritizes cultural sensitivity, inclusivity, and accessibility to meet the diverse needs of sign language users in various social settings. Through image processing, the system ensures clarity and accuracy in gesture recognition, accommodating different environments. The product will promote mutual understanding and respect among users from different cultural backgrounds, to unite them on effective communication. Additionally, recognizing the importance of ethical considerations in technology development, the product will prioritize privacy and data security, such as implementing data protection measures to ensure transparent data practices throughout the user journey. By promoting trust and transparency, the product solution will foster positive social relationships and user confidence in the technology. Ultimately, the product solution aims to bridge communication barriers and promote social inclusion by facilitating seamless interaction through sign language translation, meet the needs of diverse social groups and promote inclusive communication in social settings

Part C: Our product is meant to be manufactured and distributed at very low costs. The complete package is a free mobile application and a phone attachment, which will be 3D printed and require no screws, glue, or even assembly. The attachment is simply put on or taken off the phone at the user’s discretion, even if the phone has a case. The product’s most costly component is the Arduino, which is about $30, and we expect the total hardware side will amount to less than $100. Not only are production costs minimal, but given the product’s purpose is for equity and diversity, the product will not be exclusively distributed. Purchasing it is considered like buying any effective and helpful item for daily living. If it becomes a part of the market, it should not necessarily impact the current goods and services related to the deaf or hard-of-hearing communities. However, our product and software are optimized for Apple ecosystems. Our team members all use Apple products and hence, our project has the potential for cross-platform solutions but will not be tested for it. Currently, this may come as a cost for some users who do not use Apple operating systems. Still, since Apple products are popular and common, we feel our product is still overall economically reasonable.

Team Status Report for 2/10/2024

Main Accomplishments for This Week

  • Proposal presentation

Proposal presentation first slideProposal presentation solution slide

  • ML library research
  • Inventory item analysis
  • Codespace setup
  • OpenCV & MediaPipe initialization

Hand detection with video feed

 

 

 

 

Risks & Risk Management

  • Risks: Although no significant risks have been identified at this point, we received feedback from faculty about their concern about our dataset collection. The collection process might turn out to be much more troublesome than we anticipated if our dataset source is largely dependent on our own capture. 
  • Management: Thanks to the valuable suggestions from the instructors, we decided to explore kaggle or other sources. Sejal started on researching some existing datasets, including Kaggle’s ASL Alphabet Dataset on Kaggle and Sign Language MNIST (grayscale). This change will not affect our schedule in timeline (we assigned one week starting from 2/14 for data collection), but will add Ran to assist Sejal in the process. 

Design Changes

  • We added a text-to-speech and vice versa feature as a reach task (post-MVP feature). Inspired by student questions from the presentation Q&A, we consider this improvement will significantly add to the overall user experience in real-world scenarios.

Schedule Changes

  • No changes have occurred.