Ran’s Status Report for 2/24/24

  • What did you personally accomplish this week on the project? 

I was the speaker of the design review presentation, so I spent some time reviewing and practicing for it in the first couple days of last week. Then I started on setting up the backend codebase in Swift. Not familiar with Xcode or Swift language before, I watched a few online tutorials to help me get started. I encountered various issues when implementing the real app code, including basic syntax errors, program termination from the simulator’s failure to detect a camera, and permission inconsistency when I tried to export the app to my phone for testing. Those issues took me a while to fix, and thanks to the help from my teammates, I was able to create a camera app that features both photo and video taking. Although the current app is technically the basic version of the iPhone’s default camera app, it is integrable with our own app.

  • Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

My progress is mostly on schedule. However, I have some work that needs to be integrated with my teammates on Monday (mainly CV and ML phases). If the integration outcome fails to meet our expectations, we might devote some time over the spring break into it. Moreover, although I completed my assigned task of setting up the mobile app backend, I anticipated the upcoming workload for the mobile app might turn out to be heavier than we thought. Subsequently, I will spend more time in the spring break working on it.

  • What deliverables do you hope to complete in the next week?

Continue with mobile app backend programming

CV and ML integration

Recognition accuracy testing

Design review report writing

 

Leia’s Status Report for 2/24/2024

Progress

The components for our product have been ordered through the purchasing form: an Arduino Nano 33 BLE, an OLED 2.42” screen display module, an E-Ink 2.7” display, a Lithium Polymer battery 3.7V 2000mAh, and a breadboard + jumper wires kit. Currently, the two different screens and the battery have been received and hopefully the rest of the parts will arrive this coming week. I’ve been continuing to prepare how I’ll connect everything and learning sign language on Youtube. I’ve also been practicing the Swift language and Xcode environment via Apple Developer Tutorials. Specifically, there are three features I’m trying to learn to integrate into the mobile app for our MVP and also for backup in case our attempts at integration in the future across the app, Arduino, machine learning, and computer vision go awry: 1. Retrieving data from the internet such as URLs so we can port a web app into a mobile app, 2. Recognizing multi-touch screen gestures like taps, drags, and touch and hold, and 3. Recognizing hand gestures from the phone camera with machine learning. With Ran, I am also trying to figure out how to distribute our app into our phone for testing purposes. She raised an issue that the Xcode-simulated iPhone does not have a camera implementation so we are working to try and get the app into our phones.

Next Steps

The third feature mentioned in the Progress section needs further analysis and communication with team members. It’s performance is still uncertain and how it could amalgamate with our ASL-adapted computer vision and machine learning is questionable. For now, its primary use is to try and get our app to use the phone’s camera.

My plan is to get a working mobile app with functional buttons that lead to the settings page and the  ASL page where at its corner, a small window shows what the phone’s front-facing camera sees. This will be broken down into further steps. Additionally, once I obtain the rest of the purchased components, I will connect the Arduino to the app, using the BLE feature. I’ll attach an LED to the Arduino and see if the mobile app can control it. After, I’ll hook a screen to Arduino, control the screen via Arduino, then control the screen via app. I realized that it’s still too early to try and utilize CAD, so my priorities have shifted into working on the mobile app and operating the hardware.

Sejal’s Status Report for 2/17/24

This week I got started on a simple ML model and combined it with Ran’s computer vision algorithm for hand detection. I trained a dataset from Kaggle’s ASL MNIST using a CNN. Using the trained model, I took the video processing from the OpenCV and Mediapipe code, processed the prediction of what character was being displayed, and displayed this prediction on the webcam screen, as shown below.

(Code on github https://github.com/LunaFang1016/GiveMeASign/tree/feature-cv-ml)

Training this simple model allowed me to think about the complexities required beyond this, such as incorporating both static and dynamic signs, and combining letters into words to form readable sentences. After doing further research on the structure of neural network to use, I decided to go with a combination of CNN for static signs, and LSTM for dynamic signs. I also gathered datasets that display both static and dynamic signs from a variety of sources (How2Sign, MS-ASL, DSL-10, RWTH-PHOENIX Weather 2014, Sign Language MNIST, ASLLRP).

My progress on schedule is on track as I’ve been working on model testing and gathering data from existing datasets.

Next week, I hope to accomplish more training of the model using the gathered datasets and hopefully be able to display a more accurate prediction of not just letters, but words and phrases. We will also be working on the Design Review presentation and report.

Team Status Report for 2/17/2024

Main Accomplishments for This Week

  • Design presentation
  • Initial ML and CV combined integration for basic ASL alphabet testing

  • Confirmation of inventory items for purchase

Risks & Risk Management

  • Currently no significant risks for the whole team. Therefore, no mitigation needed. There are concerns for each team member in their respective roles, but nothing to the extent that they jeopardize the entire project.

Design Changes

  • Natural language processing (NLP) has been included in software development. Considering sign language does not directly translate into full, syntactic sentences, we realized we needed a machine learning algorithm for grammar correction to achieve proper translation. We intend to use open-source code after understanding NLP computation, and plan for it to be implemented in later stages. Specifically, it will be developed after the ASL ML algorithm and CV programming have been accomplished. Although this grows the software aspect a little more, team members are all on board to contribute to this part together to minimize any possible costs this may incur in the overall timeline.
  • Three reach goals have been confirmed for after MVP is completed: 1. Speech-to-text, 2. A signal for the end of a sentence by the ASL user (a flash of light, or an audio notification), and 3. Facial recognition to enhance the ASL translations. All of the above is for smoother, fluid conversation between the user and the receiver.

Schedule Changes

 

Additional – Status Report 2

Part A was written by Ran, B was written by Sejal and C was written by Leia.

Part A: Our project by nature enhances public health and welfare by ensuring effective communications for sign language users. In the context of health, both obtaining and expressing accurate information about material requirements, medical procedures, and preventive measures are vital. Our project facilitates these communications, contributing to the physiological well-being of users. More importantly, we aim to elevate the psychological happiness of sign language users by providing them with a sense of inclusivity and fairness in daily social interactions. In terms of welfare, our project enables efficient access to basic needs such as education, employment, community services and healthcare via its high portability and diverse use-case scenarios. Moreover, we make every effort to secure the functionality of mechanical and electronic components: the plastic backbone of our phone attachment will be 3-D printed with round corners, and the supporting battery will operate at a human-safe low voltage.

Part B: Our project prioritizes cultural sensitivity, inclusivity, and accessibility to meet the diverse needs of sign language users in various social settings. Through image processing, the system ensures clarity and accuracy in gesture recognition, accommodating different environments. The product will promote mutual understanding and respect among users from different cultural backgrounds, to unite them on effective communication. Additionally, recognizing the importance of ethical considerations in technology development, the product will prioritize privacy and data security, such as implementing data protection measures to ensure transparent data practices throughout the user journey. By promoting trust and transparency, the product solution will foster positive social relationships and user confidence in the technology. Ultimately, the product solution aims to bridge communication barriers and promote social inclusion by facilitating seamless interaction through sign language translation, meet the needs of diverse social groups and promote inclusive communication in social settings

Part C: Our product is meant to be manufactured and distributed at very low costs. The complete package is a free mobile application and a phone attachment, which will be 3D printed and require no screws, glue, or even assembly. The attachment is simply put on or taken off the phone at the user’s discretion, even if the phone has a case. The product’s most costly component is the Arduino, which is about $30, and we expect the total hardware side will amount to less than $100. Not only are production costs minimal, but given the product’s purpose is for equity and diversity, the product will not be exclusively distributed. Purchasing it is considered like buying any effective and helpful item for daily living. If it becomes a part of the market, it should not necessarily impact the current goods and services related to the deaf or hard-of-hearing communities. However, our product and software are optimized for Apple ecosystems. Our team members all use Apple products and hence, our project has the potential for cross-platform solutions but will not be tested for it. Currently, this may come as a cost for some users who do not use Apple operating systems. Still, since Apple products are popular and common, we feel our product is still overall economically reasonable.

Ran’s Status Report for 2/17/24

  • What did you personally accomplish this week on the project?

After obtaining MediaPipe hands recognition on loaded video, I continued to work on enabling live video input and detecting dynamic pose in addition to hands. Notably, with instructions from Professor Savvides and Neha, I was able to answer the frame processing rate question from last week. The reason was that a loaded mp4 video might have played at a high frame rate incompatible with OpenCV. However, this issue turned out to be irrelevant to our implementation plan, since our use-case only involves live feed-in from webcam or phone camera. I added a live frame rate monitoring function to quantify our recognition frame rate to be 10s-15s under real-time dynamic capture conditions. Thanks to Sejal’s primitive working ML model, at this point, I completed my assigned task of static alphabetical gesture recognition. Our team then decided to include detection of the user’s upper body’s movement, basically including arm & shoulder poses, as well as the intricate gesture of both hands. Accordingly, I experimented with the MediaPipe holistic model to successfully implement this recognition feature (as shown in the screenshot below, code in GitHub). Lastly, as the presenter for the design review presentation, I prepared for the presentation slides and rehearsed for the speech.

 

 

 

 

  • Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

My progress in on schedule.

  • What deliverables do you hope to complete in the next week?

Boost CV processing

Detect webcam feed-in of static alphabetical gestures

Design presentation slides and speech

Leia’s Status Report for 2/17/2024

Progress

I have done more research and trade-off analyses on the items we will need for the hardware side. I intend to purchase the Arduino Nano 33 BLE for its bluetooth capability and compact size. For the product to be portable and chargeable, I will attach a Lithium Ion battery to the Nano to power it. The reason for this battery in particular is that most handheld devices use these batteries. Hence, when a person uses a wired or inductive/wireless charger for their phone, they can also charge the product’s battery as well. Currently, I am considering three different displays to be used for the dual-screen aspect: LCD, OLED, and E-INK. They are all relatively priced and each have balanced pros and cons. I plan to try all screen types after I find the most suitable one of each. They all must be about 2.5” diagonally so the screen is large enough for the other person to see, but not overly big that it makes he phone difficult to handle. Everything will be wire-connected, and I’ve prepared how I will connect the components together beforehand.

I did a minimalist design of the mobile app wireframes. After receiving feedback that I should check if mobile app development for Apple operating systems requires a $99 subscription for licensing, I ensured that I do not need that annual purchase to create an app for local environment use. The subscription is for distributing an app on Apple’s App Store, but we do not have such intentions for our project.

Next Steps

After confirming with team members, I will submit purchase forms for the above items. I will be planning in-depth as to how I’ll connect them all together. I will also be practicing how to utilize CAD software so eventually I can create a phone attachment that will hold all the components for 3D printing. It must not be too bulky, and be adjustable in tilt, which will necessitate studying current phone stands in the market for comparison and development. Since I established the wireframes for the mobile app, I will begin developing the front-end on Xcode. When I have time, I’ll also be delving into the back-end aspects to identify how I can get the app to connect to a cloud database. Further in the timeline, after I get the hardware parts, I will try to connect the Arduino with the mobile app via bluetooth and test the app to manipulate the display monitor.

Sejal’s Status Report for 2/10/24

After presenting the project proposal Monday, my group and I reflected on the questions and feedback we got, and prepared to start each of our parts of the project. I started doing further research into the machine learning algorithm that will recognize ASL gestures. Since my teammate will be processing the datasets using OpenCV, I will begin by using publicly available datasets that provide preprocessed images for sign language recognition tasks. For example, ASL Alphabet Dataset on Kaggle and Sign Language MNIST on Kaggle. Since we decided to use TensorFlow and Keras, I looked into how existing projects utilized these technologies. In regards to training the neural network, I learned that convolutional neural networks (CNNs) or recurrent neural networks (RNNs) are commonly used. However, 3D CNNs are also used for image classification, especially with spatiotemporal data. Hybrid models combining CNNs and RNNs might also be a good approach. 

Our progress is on track relative to our schedule. During the next week, Ran and I will begin preparing a dataset. We will also allocate some time to learn ASL so we can use some of our own data. I also hope to do more research into the structures of neural networks and consider the best ones

Ran’s Status Report for 2/10/24

  • What did you personally accomplish this week on the project? 

After helping finish up the proposal slides, I worked on getting OpenCV and MediaPipe set up on my laptop. I first experimented with sampling a mp4 video into frames that are stored as images in a local folder, and it worked. Later on, after solving some capability issues when initializing MediaPipe, I successfully utilized the MediaPipe hand detection feature to generate landmarks dynamically on the video input. These landmarks, as we expected, exclusively cover the person’s hands and mark the fingers and knuckles. This accomplishment marked the completion of my assigned task of video processing. However, I noticed that the incorporation of MediaPipe’s hand detection model slowed down the frame processing rate, so I will continue researching for solutions to speed up the process.

Hand detection with video feed

 

 

 

 

  • Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

My process is on schedule.

  • What deliverables do you hope to complete in the next week?

Boost CV processing

Detect webcam feed-in of static alphabetical gestures

Design presentation slides and speech

Team Status Report for 2/10/2024

Main Accomplishments for This Week

  • Proposal presentation

Proposal presentation first slideProposal presentation solution slide

  • ML library research
  • Inventory item analysis
  • Codespace setup
  • OpenCV & MediaPipe initialization

Hand detection with video feed

 

 

 

 

Risks & Risk Management

  • Risks: Although no significant risks have been identified at this point, we received feedback from faculty about their concern about our dataset collection. The collection process might turn out to be much more troublesome than we anticipated if our dataset source is largely dependent on our own capture. 
  • Management: Thanks to the valuable suggestions from the instructors, we decided to explore kaggle or other sources. Sejal started on researching some existing datasets, including Kaggle’s ASL Alphabet Dataset on Kaggle and Sign Language MNIST (grayscale). This change will not affect our schedule in timeline (we assigned one week starting from 2/14 for data collection), but will add Ran to assist Sejal in the process. 

Design Changes

  • We added a text-to-speech and vice versa feature as a reach task (post-MVP feature). Inspired by student questions from the presentation Q&A, we consider this improvement will significantly add to the overall user experience in real-world scenarios.

Schedule Changes

  • No changes have occurred.

Leia’s Status Report for 2/10/24

Progress

I did a little more comparative research between Arduino and Raspberry Pi to ensure that using an Arduino module is right for our project. The articles support using the former, particularly the Arduino UNO, because of user-friendliness, energy efficiency, simplicity, and diversity. Moreover, it can achieve bluetooth as well as seamless connection to LCD displays. However, the specified unit is too bulky for our purposes so I looked into the other numerous Arduino hardware available. I have honed in on the Nano 33 BLE. It is much smaller, capable of machine learning in case we need it, and has bluetooth features. It also has a Flash memory of 1 MB, which is enough for storing simple text translations, which would take at most a couple KB. I believe it can be coupled with 2.8-3.5 inch LCD displays; I examined its data sheet and since it can connect to 16×2 displays, I’m expecting it can do the same with wider screens. Backup Arduino I’m considering is the Arduino GIGA display bundle which consists of an Arduino GIGA R1 wifi and a GIGA Display Shield. It packages the board and display together and is relatively flat, but its very complex and powerful so not necessarily compatible with our project.

I have setup the Xcode platform to prepare for Swift programming and the Arduino application in my computer. I also studied on 1. how to develop an app that can control the Arduino via bluetooth, and 2. How to connect to cloud storage through the app. Further plans for both are addressed in the “Next Steps” section.

Next Steps

A concern is how to attach a chargeable battery to the Arduino so it doesn’t need to be constantly plugged for electric power. Further investigation needs to be done to find a charger that won’t fry the board and is small/flat/sleek enough. The integration between battery and Arduino needs to be added as a task. Moreover, I need to decide on what LCD display to get so I can determine whether a Nano can be wired to it. 

For Arduino control with mobile app, the first step will be to design a plain UI. Then once Arduino is acquired, I will test bluetooth capabilities probably with LEDs or temperature sensing, which will then lead to testing text transmissions.

For cloud data retrieval from app, I identified Firebase, an application dev platform from Google that performs backend cloud computing services. I found a guide on how to install and use Firebase SDK (software dev kit) in Xcode, but cloud storage implementation needs to be discussed more with team members as this concerns retrieving ML and CV data for app usage.