Team Status Report for 4/6

Main Accomplishments for This Week

  • Transporting our current ML implementation into CoreML to be iOS app compatible
  • Adapting our app CV input to work with the ML program
  • Successful integration of the LLM model to grammatically fix direct ASL translations into proper sentences
  • Hardware approaching near completion with app to screen text functionality

Risks & Risk Management

  • As we’re in the final week of our project, possible risks are issues with the total integration. The CV and ML are being worked on in one app and the Arduino and bluetooth screen in another, so eventually they will have to be merged together.
    • The risk mitigation for this is careful and honest communication across all team members. We don’t anticipate this to fail severely, but in the off chance that we cannot get them to combine, we will discuss which app to prioritize.
  • Another related concern that carries from last week is the difficulty incorporating a pose detection model as MediaPipe lacks it for iOS. This may lead to reduced accuracy, and our fallback if it continues to be unavailable is to focus entirely on hand landmarks.

Design Changes

  • No design changes

Schedule Changes

  • We added an extra week for NLP training and for the final system integration. 

Additional Week-Specific Question

  • Our validation tests for the overall project involve:
    • Distance tests to measure how close or far user needs to be from app camera
      • Our use case requirement was that the person must be between 1-3.9ft from the iPhone front camera, so we will test distance inside and out of this range to determine if it meets this requirement.
    • Accuracy of ASL translations displayed on the OLED screen
      •  Our use case requirement was that the accuracy for gesture detection and recognition should be >= 95%, so we have to ensure the accuracy meets this requirement
    • Latency of text appearing on screen after gestures signed
      • We also have to ensure our latency meets the requirement of <= 1-3 seconds, consisting of the ML processing, LLM processing, and displaying on the OLED screen.
    • Accessibility and user experience surveys
      • We will get ASL users to test out the device and collect feedback through surveys in order to reach out user satisfaction rate requirement of > 90%

Leia’s Status Report for 4/6/2024

Progress

I have got the OLED to display text from the Arduino unit as well as a variety of images and animations for testing purposes. The demo app that connects with the Arduino via bluetooth has an added text feature that the user can type on. Currently, transmitting text from the app to the Arduino to show on the OLED is a work in progress.

Next Steps

I will be trying to get the app to OLED transmission to succeed by the end of this week so that next week I can focus on designing and 3D printing the case that will hold the hardware components and latch onto the phone. After that, I will be working with my team members to integrate their CV and ML app with my demo app to develop a singular mobile application that handles all of our project’s goals.

Once that app is settled, I will be building on the frontend to make it more user friendly and simple to operate.

Verification

I’ve run example code provided by Arduino to test the fluidity and latency of the OLED screen. I’ve also already examined the speed at which the app is able to connect with the Arduino over bluetooth, which has demonstrated immediate results. Other verification tests I will be running are:

  • additional latency tests of app to OLED text communication
  • phone attachment assessments that ensure it is easy to latch and remove
  • speed and formatting of ASL translations on OLED

Essentially, I will be handling all the inspections in regards to hardware that ensures the physical product meets the use case and design requirements.

Sejal’s Status Report for 3/30/24

This week, I trained and tested the broader dataset I incorporated last week with 10 additional classes. I originally recorded 13 videos, and then performed data augmentation to create more data. For example, I increased and decreased the brightness a bit, and rotated the video by a few degrees. After performing data augmentation, I had the same amount of data for each class, since data balance is important in model training. When I tested this new model, it did not seem to accurately predict the new signs correctly, and even decreased the accuracy of predicting the original signs. After trying to diagnose the issue for a bit, I went back to my original model and only added 1 new sign, ensuring that each video was consistent in terms of the amount of features MediaPipe could extract from them. However, this one additional sign was not being accurately predicted either. After this, I fine tuned some of the model parameters, such as adding more LSTM and dense layers, to see if model complexity was the issue. 

While training this, I created some support for sentence displaying and structuring. I signaled the end of a sentence by detecting if hands are not in the frame for 5 seconds, which would reset the displayed words on the screen. Since sign language contains different word order than written English, I worked on the LLM that would detect and modify this structuring. To do this, I used the OpenAI API to send an API request after words have been predicted. This request asks the gpt3.5 engine to modify the sentence into readable English, and then display it on the webcam screen. After working with the prompt for a while, the LLM eventually modified the words into accurate sentences and displayed this to the user. In the images below, the green text is what is being directly translated and the white text is the output from the LLM.

My progress is mostly on schedule since I have added the LLM for sentence structuring.

Next week, I will continue trying to optimize the machine learning model to incorporate more phrases successfully. I will also work with my teammates to integrate the model to the IOS app using coreML.

Leia’s Status Report for 3/30/2024

Progress

I have successfully soldered all the components together: battery, Arduino, and OLED screen. The entire hardware side is currently self-sufficient and functional. The battery essentially powers the Arduino, and the OLED draws power from the Arduino to work. It can be charged with inductive wireless charging but for safety precautions, it will continue to be charged through plugging in the Arduino via the Micro USB cable since the Adafruit backpack enables the battery to be charged and maintained in this method. The demo app for the circuit can connect and control to the Arduino via bluetooth, and the Arduino sketches uploaded to the unit can affect the OLED screen properly with low latency and clear imagery.

Next Steps

I will now be integrating features into the demo app that allow text to be transmitted over bluetooth to the Arduino so the OLED can exhibit them. The app will be further refined in its appearance and overall frontend to eventually prepare for its integration with machine learning and computer vision codes in Xcode. I will also be combining Arduino sketches I have for bluetooth and OLED text display together into one program that handles all necessary functionalities.

Farther down the line, I will be planning on the 3D printing case for the hardware components to be fitted into a neat package and eventually attached to the phone.

Ran’s Status Report for 3/30/24

  • What did you personally accomplish this week on the project? 

Following the plan to migrate CV and ML modules to the local iOS app, I was mainly responsible for getting the Swift version of code to work this week. I successfully made the mobile app running, showing 21 landmarks per hand as expected. However, I am still debugging the CoreML interface that should integrate the ML model we trained and the real-time mediapipe landmarks.

  • Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

I devoted quite an amount of time working on my task this week, but I am still a bit behind schedule. Since the set milestone and project deadlines are approaching, I will make sure to seek help if really stuck.

  • What deliverables do you hope to complete in the next week?

CoreML integration

Testing with ML latency and accuracy

Team Status Report for 3/30/24

Main Accomplishments for This Week

  • Mobile App
    • MediaPipe hand landmark detection
    • CoreML initiated

  • ML learning
    • Translation algorithms optimization
    • LLM initialization 

  • Hardware 
    • All wires soldered
    • Bluetooth connected
    • OLED screen display enabled

Risks & Risk Management

  • Since MediaPipe does not provide a pose detection model for iOS, not including pose landmarks might reduce the accuracy. Accordingly, we will seek possible solutions. But if no reliable complementary pose detection model is available, we would adjust our ML model to focus on prediction based on hand landmarks.

Design Changes

  • We decided to use LLM instead of NLP to structure sentences. Since OpenAI provides handy interface, an LLM model does not need complicated training and it provides more accurate result.

Schedule Changes

  • We may need an extra week to reach the final sentence translation milestone and the mobile app launch milestone. With staged progress, our overall system integration and final testing, verification and validation are expected to be finished in the last week. Accordingly, we have updated our schedule.

Team Status Report for 3/23/24

Main Accomplishments for This Week

  • Mobile App progress
    • Testing for bluetooth capabilities
    • Discard cloud deployment and switch to iOS local processing 
    • transform the original CV module code to Swift

  • Hardware progress
    • Purchase of 3.7V 150mAh Adafruit battery
    • Working on connecting OLED screen to Arduino
  • Machine learning model progress
    • Continuing to compile/create data
    • Training for an additional 32 sign language gestures

Risks & Risk Management

  • No additional risks right now. With the interim demo approaching, we hope to have an integration of all our parts with definitive results.

Design Changes

  • We opted for a different approach by shifting from cloud deployment to local processing on the phone
    • Instead of relying on the app to interact with a database for data exchange, we are leveraging integration between the ML and CV Python scripts directly into the app package for streamlined retrieval. This is because the current video transmission process to the database raised concerns about reliability. This change also simplifies the app architecture and reduces reliance on external resources, hopefully leading to improved performance and flexibility.

Schedule Changes

  • No formal schedule changes, but we will increase allocated dedicated time for collaboration next week to ensure adherence to the project timeline.

Sejal’s Status Report for 3/23/24

This week, I worked on continuing to broaden the data set and train the model. Unfortunately, it was difficult to find other dynamic ASL datasets readily available. I tried to download the how2sign dataset, but there was an incompatibility issue with the script to download it. I tried to debug this for a bit and even reached out to the creator of the script, but haven’t gotten to a solution yet. I tried the MS-ASL dataset from Microsoft, but the data linked to YouTube videos that were all set to private. I requested permission to access the Purdue RVL-SLLL dataset, but I haven’t gotten a response yet. I also looked at ASL-LEX, but it is a network of 1 video corresponding to each sign, which is not very helpful. At this point, since it’s difficult to find datasets, I’ve just been continuing to create my own videos, following the details of the DSL-10 dataset videos I currently have trained, such as the same number of frames, and amount of videos per class. I have added 32 classes of the most common phrases used in conversation for our use case: “good”, “morning”, “afternoon”, “evening”, “bye”, “what”, “when”, “where”, “why”, “who”, “how”, “eat”, “drink”, “sleep”, “run”, “walk”, “sit”, “stand”, “book”, “pen”, “table”, “chair”, “phone”, “computer”, “happy”, “sad”, “angry”, “excited”, “confused”, “I’m hungry”, “I’m tired”, “I’m thirsty”. Because there are a lot of videos and there will be more, I am running into storage issues on my device. I am wondering if there is a method or separate server that allows quicker processing of large datasets like this.

My progress is still slightly behind schedule because I am still working on word translation. I plan to catch up this week as we prepare for the interim demo.

Next week, I will continue to train my custom dataset to allow for more variety in translated gestures. Also, I will work on continuous translation since right now I am working on word translation, but we need to eventually work on continuous sentences. I will also be working with my teammates to integrate our parts right now on the IOS app for deployment of our current product state.

Ran’s Status Report for 3/23/24

  • What did you personally accomplish this week on the project? 

This week we decided to make a change on our implementation plan. Instead of incorporating cloud deployment, we switched to local processing on the phone. Since MediaPipe offers an interface with Swift, I started to set up the appropriate environment and migrate our original code for the CV module. Although initially I encountered integration issues, I managed to get the application compiled and run on my phone. Currently it captures the video and shows landmarks in real-time on the screen. The next task would be to transform the ML module with CoreML, but all our team members happen to be busy with other courses and interviews, so we will invest in-peron, collaborative working time next week to keep us on the schedule.

  • Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

Since some of the tasks have changed, I am currently behind schedule. Next week, I will allocate at least 6 hours besides regular class time to work on the remaining work offline, partially individually and partially with my teammates. 

  • What deliverables do you hope to complete in the next week?

iOS-based CV module refinement

CoreML integration

Leia’s Status Report for 3/23/2024

Progress

I continued to practice American sign language, particularly basic greetings and the alphabet. I’ve also been tweaking the mobile app used for testing bluetooth capabilities to work on frontend development and to implement a text box typing function. Because of Adafruit issues mentioned from the last status report, another voltage regulator backpack was purchased in addition to a 3.7V 150mAh Adafruit battery, and they were obtained this week. I tried connecting the OLED screen to the Arduino, but the loose jumper wires and unbending pins that came with both items require soldering to be affixed together. The same can be said for the Adafruit backpack and battery.

Next Steps

Now that all components for the hardware side have been attained, I will be going into the 18220 lab to solder the wires. The interim demo is approaching so my goal is to get the entire hardware side joined together by then. I hope to at least make the Arduino module self-sufficient by connecting the battery and then hooking it to the OLED. The ideal is to also get the OLED working by then using the mobile app.