Sejal’s Status Report for 3/16/24

This week, I worked on fixing the issue with predicting dynamic signs using the trained model. Previously, it would not create an accurate prediction and instead predict the same gesture regardless of the sign. I spent time debugging and iterating through the steps. I attempted to predict a gesture from a video instead of the webcam. I found that it was ~99% accurate, so I found that the issue was related to the differences in frame rate when using the webcam. Fixing this, I tested the model again and found that it was successfully predicting gestures. However, it was predicting accurately only about 70% of the time. Using the script I made for predicting gestures from videos, I found that when I inserted my own videos, the accuracy went down, meaning that the model needs to be further trained to allow recognition of diverse signing conditions and environments. After this, I created some of my own videos for each phrase, and inserted them into the dataset and further trained the model.

My progress is slightly behind schedule as the schedule said that milestone 3, word translation, should have been completed by this week, but I am still working on improving accuracy for word translation.

Next week, I hope to continue adding to the dataset and improve accuracy of detecting signs. I will do this by continuing to create my own videos and trying to integrate online datasets. The challenge with this is that the videos need to have a consistent amount of frames, so I might need to do additional preprocessing when adding data. Additionally, as we approach the interim demo, I will also be working with Ran and Leia to integrate our machine learning model into our swift application.

Ran’s Status Report for 3/16/24

  • What did you personally accomplish this week on the project? 

After our meeting with Professor Savvides and Neha on Monday, I explored the mobile app stream transmission resources they shared and experimented on some methods, including using ffmpeg, Apple’s HTTP Live Streaming (HLS), and some SDKs/open source libraries. After all, I found SwiftVideo might be a suitable package to assist mobile (local) to cloud server video transmission. Meanwhile, my teammates suggested moving CV and ML modules completely to the iPhone processor, by implementing MediaPipe features in Objective-C and using CoreML to realize ML prediction. At the current stage, I cannot tell which method will carry out a better outcome, so I will try to run two tasks in parallel and decide to adopt a certain one by the middle of next week. Moreover, I helped to test and debug the ML module for dynamic signing. 

  • Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

My progress is mostly on schedule. I plan to devote more time in writing code for 1) cloud transmission API and 2) transforming CV module from python into Objective-C.

  • What deliverables do you hope to complete in the next week?

Mobile app development

Team Status Report for 3/16/24

Main Accomplishments for This Week

  • The issues we encountered last week carried on to this week as well, but we have made progress in resolving them and continue to work on their solutions:
    • The dynamic machine learning models were not performing as expected – regardless of gestures made, the same word(s) are being predicted. We narrowed the vulnerability to be from integration and received feedback to focus on how we are extracting our coordinates. 
      • We were advised to identify a center of mass or main focal point such as the wrist to subtract from, rather than use raw xyz coordinates for landmarking. 
      • Hence, we updated our dynamic processing model code and now have been getting improved predictions.
    • The transmission from video to database is currently questionable. We desired real-time streaming from the phone camera to the cloud environment so that gestures can be processed and interpreted immediately as they are happening. 
      • We received lots of articles on this concept to study further. With the diversity of solutions to solving this problem, it’s a little difficult to identify which is best suitable for our situation. 
      • Hence, we are considering just having the iOS app and Xcode environment directly handle the machine learning and computer vision rather than outsource the operations to a cloud database storage. Research was done on whether in Xcode, python scripts and related code that utilize OpenCV, Mediapipe, Tensorflow, and Keras can be packaged with the app and retrieved within that package. So far, there is promise shown that this can be achieved, but for safety, we will maintain our database. 
  • Progress on Amplify setup
  • Arduino and Mobile App Bluetooth connection

Risks & Risk Management

  • With the interim demo approaching, we hope to have definitive outcomes in all our parts.
    • We are working on further accuracy and expansion of training data for our machine learning models. Our basic risk mitigation tactic for this in case of setbacks is to remain with static model implementation.
    • Regarding hardware, there is a safety concern with operating the LiPo battery, but that has been minimized by extremely careful and proper handling in addition to budget available in case of part replacement needed.
  • As mentioned in the Main Accomplishment section, there is a challenge with our plans to integrate ML and CV with the mobile app. At first, we thought of the database, but because of streaming issues, we shifted to having the mobile app have local script access to ML and CV. We will be steadily trying to achieve this, but we will have a backup of the database and even delegate operations into a web app to be converted into a mobile app if the database-to-app transmission continues to be a risk.

Design Changes

  • No design changes

Schedule Changes

  • We are currently approaching our milestone to launch the mobile app. We will be working together to integrate it, but if we do not achieve everything we want to accomplish by then in regards to the iOS app, then we will change the date.

Leia’s Status Report for 3/16/2024

Progress

I was able to implement the bluetooth function between a mobile app and the Arduino unit. The app currently can switch on and off the Arduino’s LED and show temperature data taken from the Arduino all through bluetooth.

I tried to connect the battery with the Adafruit battery backpack, but some smoke came out. Turns out the battery’s wire port is not compatible with Adafruit’s receiving port because the polarities are reversed – Adafruit sets their positive on the right side while the negative is on the left, the battery is the opposite. Although the Adafruit circuit appears fine, there is a possibility that the chip did blow. I considered taking out the wires form the batterie’s port to switch the contacts, but then I was informed that I can accidentally short-circuit the battery with the contacts out if they somehow touch each other.

Next Steps

I will be trying to connect the OLED screen to the Arduino. After, I will try to display the temperature on it to ensure the connection across app, Arduino, and screen is seamless. The next step would be to try and display any text on the OLED, first directly from the Arduino with its uploaded sketch, and then from the app with the Arduino as the medium. I want to work on how a user can type in a textbox on the app and immediately have that shown real-time on the screen.

I will also have to be purchasing a new Adafruit backpack and their lips battery because none of the third-party batteries I could find on Amazon are oriented in Adafruit’s polarity.

Leia’s Status Report for 3/9/2024

Progress

I tried to connect all the components together but encountered significant difficulties. When connecting the Arduino to the LiPo battery, unfortunately the pins of the Arduino unit do not fit and insert into the breadboards I purchased. I tried to perform direct coupling with jumper wires but encountered there are risks to this method. Connecting an external power supply to an Arduino without proper voltage regulation may damage the unit, even when using a battery surplus that matches within the voltage range at which the Arduino uses. I also do not possess a micro usb cord to even test the Arduino itself to my computer.

I wrote the Arduino IDE sketch to connect the Arduino unit to a mobile app over bluetooth and to communicate with the OLED screen. The mobile app is a simple, testing environment solely for exercising the BLE capabilities and to try transmitting text between each other. It is not reflective of the actual mobile app we will be employing. I have downloaded all necessary CAD models and begun creating a basic casing as well.

Next Steps

I found I did not acquire the appropriate tools and materials for this project. I will be purchasing an Adafruit Battery Backpack, a voltage regulating shield to protect the Arduino unit. I will also be getting an Arduino Nano 33 BLE Sense without headers. This model has additional sensors that will aid with testing in addition to capabilities of the Nano 33 BLE. Without headers, I will be soldering the components together. I find this process is better as its headers probably won’t fit my breadboard either. Finally, I will purchase a micro usb cable to be able to test without needing a battery source hooked to the Arduino. Everything else is still fine to use, especially the jumper wires, battery, and screens. I want to try and get at least the battery and Arduino connection as well as the bluetooth capabilities done as my highest priorities. 

Team Status Report for 3/9/24

Main Accomplishments for The Past Two Weeks

* Major work was accomplished in the week of 3/2 as the spring break was planned as slack week.

  • Design report

  • ML model training for dynamic signs

  • Mobile app’s Video recording & saving capabilities

  • Cloud environment setup in AWS Amplify
  • Hardware components setup and connection

Risks & Risk Management

While the expected basic functions are being implemented and carried out, we have encountered issues in low quantitative performances, including prediction accuracy and overall latency. 

  • The static ML model is tested to predict relatively accurate single letters from the real-time signing, but the two dynamic ML models we have incorporated and trained so far produced errors in live signing testing. We are re-examining the way we extracted landmarks from the CV module to debug. In case of severe difficulties in dynamic sign prediction, we would pivot to a fallback mechanism that recognizes only static signs, leveraging existing models specifically trained for static sign recognition. 
  • Another issue lies in the way we store and transmit data. At the present stage, the mobile app captures and stores the camera input in the photo album as .mov file, which is likely to be stored as the same format in S3 database after we fully set up the AWS Amplify cloud environment. However, we aim for real-time, streaming-like transmissions, which means the current solution – saving the file after pressing stop recording – does not satisfy our requirements. We will conduct further research on reliable live transmission methods to solve this issue. The backup plan of a laptop-based web app that uses the webcam would be used if no feasible mitigations are available.

Design Changes

  • No design changes

Schedule Changes

  • No schedule changes

Additional Week-Specific Questions

Part A was written by Leia, Part B was written by Sejal, and Part C was written by Ran.

Part A: Global Factors (Leia)

Our project addresses a variety of global factors with its practicality. For those not technologically savvy, our procedure to install and utilize our translator is not complex nor difficult. Simply by downloading the app and attaching the display module to the phone, the translator is immediately functional. It is not restricted to one environment either, but intended to be used everywhere for many situations, from single-person conversation to group interactions. Additionally, its purpose is for the hard-of-hearing community, but can technically be accessed by anyone, including the speaking community. Admittedly, because we are focused on American sign language rather than the international variation and do not include other nations’ versions, our product is not global in this aspect. However, its ease of use makes it versatile for anyone who uses American sign language in communicating with anyone who speaks English.

Part B: Cultural Factors (Sejal)

Cultural factors such as language and communication norms vary among different groups of people. Our solution recognizes the importance of cultural diversity and aims to bridge communication barriers by facilitating real time ASL translation. For the deaf community specifically, ASL is not just a language but also a vital part of their cultural identity. The product acknowledges the cultural significance of ASL by providing accurate translations and preserving the integrity of ASL gestures, fostering a sense of cultural pride and belonging among ASL users. By displaying written English subtitles for non-ASL users, the product promotes cultural understanding and facilitates meaningful interactions between individuals with different communication preferences, aiming to build inclusive communities. Additionally, the portable design of the product ensures that users can carry it with them wherever they go, accommodating the diverse needs and preferences of users from different cultural backgrounds.

Part C: Environmental Factors (Ran)

Our product does not pose harm to the environment and consumes a minimal amount of energy resources. Our hardware is composed of an OLED screen, a Li-Ion battery, an arduino board, and a 3-D printed phone attachment. The OLED screen lasts longer than traditional display technologies with the same amount of energy, and it uses organic compounds that are less harmful in manufacturing processes. The rechargeable Li-Ion battery also reduces overall electronic waste by almost eliminating the need for replacement. So, the battery and screen supports our product’s sustainability by their extended use period and long lifespan. In addition, we enhance the environmental advantages with the choice of a 3D printed phone attachment from polylactic acid (PLA) filaments. This material is derived from renewable resources, such as cornstarch or sugarcane, contributing to a reduction in reliance on finite fossil fuels. Moreover, the lower extrusion temperatures required for PLA during printing result in decreased energy consumption, making it a more energy-efficient option. Most importantly, PLA is biodegradable and emits fewer greenhouse gasses, including volatile organic compounds (VOCs).

Meanwhile, our product is small in size and is highly portable, whose operation does not require external energy input other than the phone battery and the Li-Ion battery, nor does it produce by-products during usage.

Ran’s Status Report for 3/9/24

  • What did you personally accomplish this week on the project? 

I was mainly responsible for the iOS application programming and cloud environment setup this week. I improved the UI of the mobile app and added focus and saving features to the application. After pressing the stop recording button, the video is automatically saved in .mov format (after obtaining user’s permission to access the local photo album). This feature could be easily integrated with cloud deployment, where the movie file is instead stored in the S3 database. However, I have not implemented the real-time transmission to the cloud database. This could be a potential difficulty for future work. I also finished the design requirements, test & validation, and other subsections for the design review report.

  • Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

My progress is mostly on schedule. I plan to devote more time in CV-ML integration as there seems to be accuracy issues in dynamic signing prediction. Moreover, I will spend more time working on researching cloud transmission technologies and phone application coding.

  • What deliverables do you hope to complete in the next week?

Cloud deployment with real-time video transmission

CV and ML integration

Sejal’s Status Report for 3/09/24

This week, while my group and I completed the design report, I continued to train a model with data from the DSL-10 dataset. I extracted the entire dataset, consisting of 75 videos for each of the 10 phrases. After a few rounds of training and adjusting the hyperparameters and structure from the initial training, I ended up with a model containing 3 LSTM layers and 3 dense layers, and included 100 epochs of training. This resulted in a training accuracy of around 96% and validation accuracy of around 94%. I visualized the confusion matrix, which seemed to predict a balance of phrases, and the training and validation accuracy plots, which showed a steady increase. After this, I used Ran’s computer vision processing code with MediaPipe and my trained model to display a prediction. However, the prediction was not very accurate as it heavily displayed one phrase regardless of the gesture being signed.

My progress is on schedule, as I am working on the model training for word translation, and currently have a model that shows accuracies ~95% during training.

Next week I hope to continue working on displaying an accurate prediction, debugging where the issue in displaying predictions might lie. I also hope to expand the data to incorporate more phrases to be detected, and go through a few rounds of training and optimization.

Sejal’s Status Report for 2/24/24

This week, my team and I finished creating the slides for the design review presentation, and my teammate presented it in class. Along with that, I’ve been working on the training of the ML model beyond the basic alphabet training. I gathered two datasets, how2sign and DSL-10, which both contain a multitude of videos of users signing common words and phrases. For now, I am just working with the DSL-10 dataset, containing signage of ten common daily vocabularies. From this dataset, I took 12 input samples from 3 different dynamic phrases, and trained a small model from this data. I did this by first loading and preprocessing the data, and randomly splitting into testing and training sets. Then, I extracted features from MediaPipe’s hand and pose objects. I created an array of these landmarks and performed any necessary padding if the amount of landmarks from the pose, right hand, or left hand weren’t the same. Next, I created a simple sequential model with two LSTM layers followed by a dense layer. With 10 epochs of training, this created a model that I will further expand.

My progress is on schedule, but I am a little worried about the next steps taking longer than I expect. Since this dataset only contains 10 phrases, it will be necessary to further train additional datasets to recognize more phrases. Since datasets might be of different formats, one other consideration I will have to make is how to ensure compatibility in the training process.

Next week, I hope to continue adding input samples from the DSL-10 dataset until I have significant data for all 10 phrases. I would also like to test this model iteratively to see if there are any additional necessary steps, such as additional feature extraction or preprocessing of frames. To do this, I will determine the accuracy of this trained model on my teammate’s computer vision processing by testing out signs. This will progress the process of training the model. Additionally, my team and I will be working on the Design Review Report that is due on Friday.

Team Status Report for 2/24

Main Accomplishments for This Week

  • Design Review presentation

  • Swift language and Xcode environment setup
    • Initialization of mobile app with camera capabilities

  • Ordered and picked up inventory items purchased (​​battery, oled screen and eink screen)
  • Beginning of ML model training for dynamic signs 

Risks & Risk Management

  • Currently no significant risks for the whole team, but some issues encountered by teammates are as follows:
    • One issue raised an issue that the Xcode-simulated iPhone does not have a camera implementation, so we are doing further research and testing to be able to use the iPhone camera in our app and integrate it with the rest of our code.
    • Another issue encountered was the significant amount of data that will be needed to produce an accurate ML model, as well as foreseen issues with integrating multiple data sets. Our team is mitigating risks relating to this by taking advantage of an iterative approach to training and testing the ML model with the CV processing.

Design Changes

  • No design changes

Schedule Changes

  • No schedule changes for now. However, if the intermediate integrations don’t go well, we probably will use spring break time to work on this