October 2022 – Team B1: Aware-ables

October 30, 2022

Jong Woo’s Status Report for 10/29/2022

What did you personally accomplish this week on the project? Give files or photos that demonstrate your progress. Prove to the reader that you put sufficient effort into the project over the course of the week (12+ hours):

This past week, I focused on further applying the canny edge detection filter and using non-maximum suppression to draw out circles on top of the pre-existing braille characters. Canny edge detection, supported by OpenCV, allows various structural information to be extracted from an image while dramatically reducing the amount of the data to be processed. The results of the canny edge filter applied image will be facilitated for the non-maximum suppression, which would select a single entity (dot) out of many overlapping entities and draw out individual colored circles on the top.

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?:

Progress is on schedule, and the upcoming week will be primarily focused on the group work regarding camera integration.

What deliverables do you hope to complete in the next week?:

Camera Integration is the primary goal to be completed by next week. This is quintessential for the upcoming interim demo on the Nov 16th, and I believe that as long as camera integration could be completed in time, our team would be able to have some tangible deliverable by the interim demo date.

October 30, 2022

Kevin’s Status Report for 10/29/22

Following our return from fall break, we spent some time this week to debrief and re-calibrate our expected deliverables for the Interim Demo. One important change that was made for more convenient development was pivoting to the Jetson Nano as our prototyping platform. Outside of working on the Ethics assignment, I spent some time this week partitioning the dataset into separate datasets for cross-validation (train, validate, test), using roughly a 60/20/20 division, respectively. Because of the size of the dataset, I was confident that I could use a larger partition for validating and testing. Once done, I formatted the dataset in accordance to the SageMaker tutorial for TensorFlow, then uploaded it to an AWS S3 Bucket.

This weekend, I was granted AWS credits which I will use to begin training our ML model on SageMaker. Since SageMaker offers multiple frameworks for Image Classification (MXNet, TensorFlow), I will make sure to test both to see which is more accurate. Furthermore, I am planning to use K-Fold cross validation to test the robustness of our dataset. I am currently still training on the open-source dataset without any meaningful modifications outside of relabeling (see last weekly update), however we hope to add some more images that have been run through the pre-processing pipeline soon.

Since we are beginning to pivot toward preparing hardware for our interim demo, I also took some time this week to work independently on bringing up the Jetson Nano and eCAM-50. However, I ran into some issues flashing the SD card, due to a version mismatch between the on-board memory and the image provided by NVIDIA online. Since I do not have an Ubuntu system readily available, I will need to use Jetpack SDK manager on the lab computers to resolve this.

As mentioned above, I’ve run into some unexpected blockers both on hardware bring-up and AWS, but I’m hoping to catch up early this week, hopefully ending tomorrow with a working Jetson Nano and integrated camera, and a working SageMaker model. The rest of the next week will be spent measuring the results of tuning various parameters on SageMaker and choosing the best model to use for our application, in addition to working with Jay to integrate our phases.

October 29, 2022

Chester’s Status Report 10/29/2022

This past week we came back together after being away on fall break. We received feedback on our design report which was overall positive and gave us a continued path forward for development. The beginning of the week was also spent initially working on the ethics assignment in preparation for class next week. In my own subsystem of the product, I continued to develop the spell checking algorithm code such that it could take in words and output the nearest real word. At the moment, it is only checking a dictionary and no probability metric is placed on words to give them a hierarchy when developing a best fit for each correction. One possible way that I researched for providing a probability is to use the Counter library in python and break down a collection of large texts in conjunction with a dictionary or large set of more often used words. This would create a hierarchy of words that are used more often, giving them a higher probability of being the replacement for the misspelled word. Lastly, I authenticated the Google API text to speech to be used on my local computer and I am currently able to generate mp3 and wav files via the API.

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

With two weeks until the interim demo, our product is in the stages of finalizing the separate subsystems of development. Our main goal is to have the separate subsystems display working functionality depicting the relative pipeline of our product. As of now, I am confident that my concatenation, algorithms, and final text to speech application is viable and will be able to perform a rough full run through within the two weeks provided in front of us.

What deliverables do you hope to complete in the next week?

In the upcoming week, I would like to start connecting the spell checking algorithm code with the google API text to speech capabilities, so I can verify the quality of the speech product, as well as start to hammer out inconsistencies in the spell checking algorithm itself. I would like to begin some formal testing of this subsystem if there is time. This is most likely going to be the integration process of the next two weeks before the interim demo.

October 29, 2022October 29, 2022

Team Status Report 10/29/2022

What are the most significant risks that could jeopardize the success of the project?

At the moment, our main risks are associated with meeting timing requirements while making sure we can work with the hardware effectively. Since our eCAM50 is built for the Jetson Nano, we are temporarily pivoting to the Nano platform and working on getting the camera integrated. From this experience, we are seeing that it will be essential to have an extended ribbon cable to connect the camera to the Jetson to ensure reasonable wearability. However, as important as wearability is, we do not want this to hinder our overall product capabilities. One thing that Alex mentioned to us early in the design process was that lengthening the camera cable could significantly affect latency. Until now, we have mostly been working individually on our personal systems, since we are now testing out camera integration with the Nano and beginning to integrate our independent parts on the device, this may require us to rely on WiFi, which the Nano provides over the Xavier AGX.

How are these risks being managed?

We currently have 22 days until the expected date of the interim demo on Nov 16th. Our goal for the interim demo is to be able to showcase how raw captured data is processed at each stage of our pipeline, from the camera, to the text-to-speech. Because we are temporarily pivoting to the Nano, we are putting less of a focus on latency so that we can focus on demoing functionality. As a result, we plan to work extensively on camera and software integration starting this coming week, and speaker integration the week after. We believe that such a schedule will guarantee enough time to troubleshoot any of the potential issues and further optimize our system.

What contingency plans are ready?

In case everything goes wrong in terms of integration of the e-CAM50 or the Nano does not provide the performance that we need, we do have a final contingency plan of falling back on the non-wearable fixed format using Jetson Xavier AGX instead of Nano. However, with proper time management and collaboration, we firmly believe that everything will be completed in time.

Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)?

Due to several constraints of the Jetson Xavier AGX (wireless connectivity, weight, I/O), we are considering altering our plan to work with the Jetson Nano. The Jetson Nano would provide wifi capabilities as well as integrate well with the camera that we already have. It also serves to decrease power draw in case we want to package our final prototype as a portable battery powered wearable. The main trade-off would be the performance difference. With this being said, we believe that the Nano will provide us with enough speed to match the necessary pre/post processing as well as classification subsystems.

Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward?

This change is necessary due to our product’s use case, and need for mobility. With the Nano being smaller and containing its own Wifi, we can better integrate our design and make it more wearable and usable. The main cost for this change would be the decrease in performance capabilities, but we believe it will be able to handle our processing sufficiently. Going forward, we do not believe it will change the overall course of our schedule, and the next two weeks will still be essential for the development of our product before the interim demo.

Provide an updated schedule if changes have occurred.

Following Fall Break, we debriefed and re-assessed our progress and what we needed to do before the Interim Demo. As a result, we’ve moved camera and speaker integration up earlier in our Gantt chart. As we move closer to the integration phase, we will need to keep a closer eye on the Gantt chart to make sure everyone is on the same page and ready to integrate their deliverables.

October 23, 2022

Jong Woo’s Status Report for 10/22/2022

Note: This weekly status report covers any work performed during the week of 10/15 as well as Fall Break

The week before mid-sem break was primarily invested in the crafting and review of the design report. Our team was able to efficiently divide the work loads for the design report based on the subsystems that we were individually working on, and have a thorough and considerate revision to finalize our design report.

I have spent some time this week, looking into how the vertically and horizontally segmented images would be cropped and stored into a folder of separate file images. Since OpenCV had no specific functions for cropping, “Numpy array slicing” method will be adopted. Since every image files that has been pre-processed can be read in and stored as 2D array for the individual color channels, pixel values of the height and width of the area can be specified to crop specific regions of the images.

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?:

Progress is on schedule, and given how this past week was a mid-sem break, our team will focus on getting things back rolling and combine the parallelized works, getting ready for the interim demo that is approaching.

What deliverables do you hope to complete in the next week?:

Implementation of the canny edge filters along with the non-maximum suppression method for the ultimate edge contrasts will be done.

October 22, 2022

Kevin’s Status Report for 10/22/2022

Note: This weekly status report covers any work performed during the week of 10/15 as well as Fall Break.

This past week (10/15), the team spent the majority of their time developing the design report, for which I spent some time performing an experiment to measure the performance of the pre-trained model we are measuring. To do this, I first had to download an offline copy of the labeled dataset made available by aeye-alliance. Then, I relabeled the dataset with braille unicode characters rather than English translations. I also manually scanned through each labeled image to make sure they were labeled correctly. Of the more than 20,000 images downloaded from online containers, I only found 16 mislabeled images and 2 that I deemed too unclear to use.

An example of mislabeled data within the “⠃” dataset.

Attribution of training data will be difficult to maintain if required. We can refer to the labeled data csv files from aeye-allliance, which includes a list of the sources of all images, but we will not be able to specifically pinpoint the source of any single image.

Once I had the correct data in each folder, I wrote a python script which loaded the pre-trained model and crawls through the training dataset, making a prediction for each image. The result would be noted down in a csv containing the correct value, the prediction, and the measured inference time. Using pandas and seaborn, I was able to visualize the resulting data as a confusion matrix. I found that the resulting confusion matrix did not quite reach the requirements that we put forth for ourselves. There are also a number of imperfections with this experiment, which have been described in the design report.

Confusion matrix generated by experiment

The rest of my time was spent writing my share of the content of the design report. The following week being Fall Break, I did not do as much work as described in our Gantt chart. I looked into how to use Amazon Sagemaker to train a new ML model and setup an AWS account. I am still in alignment with my scheduled tasks, having built a large dataset and measured existing solutions in order to complete the design report. Next week, I hope to use this knowledge to quickly setup a Sagemaker workflow to train and iterate on a model customized for our pre-processing pipeline.

October 21, 2022

Chester’s Status Report 10/22/2022

In the preceding week before fall break, most of our time working on the project was dedicated to the development of the design review report. This was slightly more time consuming than anticipated, but it was worth the thorough examination and attention to detail. With more than half of the time spent on the design report, the remaining time was spent writing out the spell checking algorithm and beginning initial error checking. The basic foundation is in place but there is still a lot of formation that needs to be done in order to be successful in our final product. Initially, the infrastructure began with word concatenation, and then evolved to funnel words into the post processing algorithm.

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

With the fall break, I think it wasn’t entirely considered into the plan as a full on break, and this might affect the overall segmentation of work sequentially. This being said, although we might be taking more time off this week than usual, there is slack built in that accounts for the time off. This slack is also for challenges faced, but we are currently on track and developing our work well. I think the next week will be essential in making sure we stay on track, and meet any necessary deadlines in work.

What deliverables do you hope to complete in the next week?

Like I mentioned above, this week will be essential in straightening out and rough edges after the break. Coming back together, we will hit the ground running, and most likely work in parallel to make significant strides in our work. For me, this would involve touching up the spell checking algorithm so that it is working at a reasonable level, as well as developing the post processing infrastructure into a smooth and more full pipeline. This includes the coding of the text to speech possibly if time is available.

October 8, 2022

Chester’s Status Report 10/08/2022

This week I was significantly more busy than most. Although on schedule still, I hope to catch up a little bit more before break and put more work into getting a proper chunk done for when we come back. Over the course of this week I have started to design the software responsible for taking in characters and turning them into words to be processed by the spell check algorithm. On top of this we finished the design review as well as started planning for the design report that will be due next week.

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

Overall, our entire project is still on track, and we hope to be at a manageable place going into fall break next week. After some tough questions at the design review, I think we still have a very clear direction for what we want our product to be, but being able to clarify that and present it as a user product will be important for our final deliverable.

What deliverables do you hope to complete in the next week?

In the next week, I hope to start testing out a preliminary spell checking algorithm as well as write it into an infrastructure including the word concatenation. This will help me start to create a more foundational understanding of the timing constraints of the pre-processing section of the project. Alongside this, although I wasn’t able to test between different text-to-speech apis last week, I would like to try doing this as well in the upcoming week.

October 8, 2022

Kevin’s Status Report for 10/08/2022

This week, our team presented our design review for the final vision of Awareables. I spent the beginning of the week under the weather, which meant that we met fewer times as a whole group.

Individiually, I spent some of the week experimenting with a pre-trained model that was trained on the 30,000 image set we intend to use for our model. I started by feeding the model the pre-processed images that Jay provided me with last week. Of the four different filter outputs, non-max suppression yielded the best accuracy, with 85% of the characters recognized accurately (Blur3: 60%, Orig: 80%, Thresh3: 60%). That said, non-max suppression may be the most processing-heavy pre-processing method, so we will have to weight the cost-benefit tradeoff there. Interestingly, most misidentified characters were misidentified as the letter “Q” (N, S, and T are all only some “flips” away from Q). Furthermore, “K” is likely to be misidentified if the two dots are not aligned to the left side of the image.

It’s clear that using any pre-trained model will be insufficient for our use-case requirements. This further justifies our design choices to: (1) train our own machine learning model (2) on a dataset modified to more closely resemble the output of our pre-processing pipeline. Therefore, I have also been taking some time to look at various online learning resources for machine learning and neural networks, since as a group, we have fairly little experience with the tools. My main question was how to choose the configuration of the hidden layers of a neural network. Some heuristics I have found are (1) hidden layer nodes should be close to sqrt(input layer nodes * output layer nodes) and (2) to keep on adding layers until test error does not improve any more.

Looking at the frameworks available, it seems most likely that I will be using Keras to configure a TensorFlow neural network, which, once trained, will be deployed on OpenCV. I will also take some time to experiment with decision trees and random forest on OpenCV using hand-picked features. Based on this and last week’s experience, it takes around 1-2 hours to train a model (20 epochs reaches an accuracy of 95+% against test dataset) locally with the equipment I have on-hand. We are looking into how to avoid waiting for model training as a blocker by using AWS SageMaker.

Looking at our Gantt chart, we are heading into the development phase following our design review. It seems like most, if not all, of us are slightly ahead of schedule for the time we have budgeted (due to running individual experiments as part of our design review).

Next week, I expect to be able to have set up an AWS SageMaker workflow for iteratively training and testing models, and have created a modified dataset we can use to train and test.

October 8, 2022October 8, 2022

Team Status Report for 10/08/2022

What are the most significant risks that could jeopardize the success of the project?

This week, the team focused on wrapping up and presenting our design review. We also spent some time experimenting with the Jetson and individually researching approaches for our respective phases. This early exploratory work has set us up nicely to begin writing our in-depth design report and finalize our bill of materials to order parts.

Based on our research, we have also identified some further potential risks that could jeopardize the success of our project. While researching the classification phase, we realized that the time spent training iterations of our neural network may become a blocker for optimization and development. Originally, we had envisioned that we could use a pre-trained model or that we only needed to train a model once. However, it has become clear that iteration will be needed to optimize layer depth and size for best performance. Using the equipment we have on hand (Kevin’s RTX 3080), we were able to train a neural network for 20 epochs (13 batches per epoch) in around 1-2 hours.

2. How are these risks being managed?

To address training time as a possible blocker, we have reached out to Prof. Mukherjee to discuss options for an AWS workflow using SageMaker. Until this is working, we will have to be selective and intentional about what parameters we would like to test and iterate on.

3. What contingency plans are ready?

While we expect to be able to use AWS or other cloud computing services to train our model, our contingency plan will likely be to fall back on local hardware. While this will be slower, we will simply need to be more intentional about our decisions as a result.

Based on initial feedback from our design review presentation, one of the things we will be revising for our design report will be clarity of the datapath. As such, we are creating diagrams which should help clearly visualize a captured image’s journey from sensor to text-to-speech.

4. Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)?

One suggestion that we discussed for our design review was the difference between a comfortable reading speed and a comfortable comprehension speed. Prof. Yu pointed out that while we would like to replicate the performance of braille reading, it is unlikely that text-to-speech at this word rate would be comfortable to listen to and comprehend entirely. As a result, we have adjusted our expectations and use-case requirements to take this into account. Based on our research, a comfortable comprehension speed is around 150wpm. Knowing this metric will allow us to better tune our text-to-speech output.

5. Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward?

Placing an upper limitation on the final output speed of translated speech would not incur any monetary or performance costs.

6. Provide an updated schedule if changes have occurred.

Based on our Gantt chart, it seems that we have done a good job so far of budgeting time generously to account for lost time. As such, we are at pace with our scheduled tasks for the most part. In fact, we are partially ahead of schedule in some tasks due to experimentation we performed to drive the design review phase. However, one task we forgot to take into account in our original Gantt chart was the Design Report. We have modified the Gantt chart to take this into consideration, as below: