Kevin’s Status Report for 10/08/2022

This week, our team presented our design review for the final vision of Awareables. I spent the beginning of the week under the weather, which meant that we met fewer times as a whole group.

Individiually, I spent some of the week experimenting with a pre-trained model that was trained on the 30,000 image set we intend to use for our model. I started by feeding the model the pre-processed images that Jay provided me with last week. Of the four different filter outputs, non-max suppression yielded the best accuracy, with 85% of the characters recognized accurately (Blur3: 60%, Orig: 80%, Thresh3: 60%). That said, non-max suppression may be the most processing-heavy pre-processing method, so we will have to weight the cost-benefit tradeoff there. Interestingly, most misidentified characters were misidentified as the letter “Q” (N, S, and T are all only some “flips” away from Q). Furthermore, “K” is likely to be misidentified if the two dots are not aligned to the left side of the image.

It’s clear that using any pre-trained model will be insufficient for our use-case requirements. This further justifies our design choices to: (1) train our own machine learning model (2) on a dataset modified to more closely resemble the output of our pre-processing pipeline. Therefore, I have also been taking some time to look at various online learning resources for machine learning and neural networks, since as a group, we have fairly little experience with the tools. My main question was how to choose the configuration of the hidden layers of a neural network. Some heuristics I have found are (1) hidden layer nodes should be close to sqrt(input layer nodes * output layer nodes) and (2) to keep on adding layers until test error does not improve any more.

Looking at the frameworks available, it seems most likely that I will be using Keras to configure a TensorFlow neural network, which, once trained, will be deployed on OpenCV. I will also take some time to experiment with decision trees and random forest on OpenCV using hand-picked features. Based on this and last week’s experience, it takes around 1-2 hours to train a model (20 epochs reaches an accuracy of 95+% against test dataset) locally with the equipment I have on-hand. We are looking into how to avoid waiting for model training as a blocker by using AWS SageMaker.

Looking at our Gantt chart, we are heading into the development phase following our design review. It seems like most, if not all, of us are slightly ahead of schedule for the time we have budgeted (due to running individual experiments as part of our design review).

Next week, I expect to be able to have set up an AWS SageMaker workflow for iteratively training and testing models, and have created a modified dataset we can use to train and test.

Team Status Report for 10/08/2022

  1. What are the most significant risks that could jeopardize the success of the project?

      This week, the team focused on wrapping up and presenting our design review. We also spent some time experimenting with the Jetson and individually researching approaches for our respective phases. This early exploratory work has set us up nicely to begin writing our in-depth design report and finalize our bill of materials to order parts.

      Based on our research, we have also identified some further potential risks that could jeopardize the success of our project. While researching the classification phase, we realized that the time spent training iterations of our neural network may become a blocker for optimization and development. Originally, we had envisioned that we could use a pre-trained model or that we only needed to train a model once. However, it has become clear that iteration will be needed to optimize layer depth and size for best performance. Using the equipment we have on hand (Kevin’s RTX 3080), we were able to train a neural network for 20 epochs (13 batches per epoch) in around 1-2 hours. 

2. How are these risks being managed?

      To address training time as a possible blocker, we have reached out to Prof. Mukherjee to discuss options for an AWS workflow using SageMaker. Until this is working, we will have to be selective and intentional about what parameters we would like to test and iterate on.

3. What contingency plans are ready?

     While we expect to be able to use AWS or other cloud computing services to train our model, our contingency plan will likely be to fall back on local hardware. While this will be slower, we will simply need to be more intentional about our decisions as a result. 

     Based on initial feedback from our design review presentation, one of the things we will be revising for our design report will be clarity of the datapath. As such, we are creating diagrams which should help clearly visualize a captured image’s journey from sensor to text-to-speech. 

4. Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)?

     One suggestion that we discussed for our design review was the difference between a comfortable reading speed and a comfortable comprehension speed. Prof. Yu pointed out that while we would like to replicate the performance of braille reading, it is unlikely that text-to-speech at this word rate would be comfortable to listen to and comprehend entirely. As a result, we have adjusted our expectations and use-case requirements to take this into account. Based on our research, a comfortable comprehension speed is around 150wpm. Knowing this metric will allow us to better tune our text-to-speech output.

5. Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward? 

      Placing an upper limitation on the final output speed of translated speech would not incur any monetary or performance costs. 

6. Provide an updated schedule if changes have occurred. 

      Based on our Gantt chart, it seems that we have done a good job so far of budgeting time generously to account for lost time. As such, we are at pace with our scheduled tasks for the most part. In fact, we are partially ahead of schedule in some tasks due to experimentation we performed to drive the design review phase. However, one task we forgot to take into account in our original Gantt chart was the Design Report. We have modified the Gantt chart to take this into consideration, as below:

 

Kevin’s Status Report for 10/01/2022

This week, my focus was looking at existing solutions for braille character classification and investigating the tools I would need for an in-house solution. This would help us get a better idea of how we should allocate our time and effort later in the development phase. I took some time to set up and train the GitHub repository I found last week. However, upon completion, I found that the training data was poorly labeled and, even considering the mislabeled data, it was not able to accurately classify our braille inputs.

Despite this failed experiment, the repository was able to give us a good idea of how fast classification can be once a model, in this case a DNN, is trained. Jay was able to provide me with some sample images of what a cropped braille character would look like after his pre-processing pipeline. Unfortunately, I lost some time this weekend due to illness but I hope to start next week by retraining the model with correct data and testing it against Jay’s inputs. If the pre-written solution turns out to be a dead end, I am looking into the most likely alternative of writing our own featurization techniques using Hough transform etc. and feeding them into OpenCV’s classification pipeline.

This week, I also took some time to design some diagrams for our design review, which will hopefully make it easier to communicate our vision during the presentation. It also helped us as a team to better understand our shared vision before moving into the development and implementation phase.

According to our Gantt chart, the main goals this week were to iron out our hardware and software design details and prepare the design presentation slides. We were able to accomplish the majority of this as a group, and some of us were even able to move ahead to initial implementation. One thing that I think we may need to make sure we do is to draft a parts list of parts we do not already have from inventory to order online as soon as possible.

Looking ahead, this upcoming week, Jay will be presenting our design review. Outside of class, I hope to have either an existing modified solution working or to start working on my own ML pipeline that can successfully classify the outputs that Jay has shared with me.

Kevin’s Status Report for 09/24/2022

This week, my team and I worked on preparing and presenting the slide deck for our proposal presentation. To prepare for the presentation, I made sure to spend some time rehearsing and editing the final slide deck to fit the expected pace. Following the presentation, we received some insightful feedback on the directions our project could take as we move into the next phase.

Since I have been assigned to focus on character classification and testing, I spent the remaining time this week looking for open source datasets as well as printed artifacts we could use for testing, and researching algorithms we could use to featurize the segmented braille characters. For the former, I’ve found custom shops on Etsy which specialize in braille printing or sell braille goods, as well as dedicated online storefronts for braille goods. However, popular storefronts, such as Amazon, seem to have a limited selection. For the latter, Jay suggested that we look into Hough Transforms, a technique which may be useful for extracting the position of shapes in an image. I also found a GitHub repository with a pre-trained classifier that may be a good place to start, which I am planning to test in the next week.

Everything has been on schedule during these first few weeks. During the past week, we have completed the joint deliverables for website bring-up and the proposal presentation. Personally, I have started research into a more robust testing criteria and featurization strategies. Looking ahead, next week, I expect to work with the team to develop a final technical design to present on the following Monday, in addition to experimenting with software options on my own. By the end of the week, we should also have an initial parts list for anything we may need to order in addition to the existing hardware we’ve requested from inventory.

Team Status Report for 09/24/2022

1. What are the most significant risks that could jeopardize the success of the project?

At this point in our project, most of our significant risks involve the general success of the software we provide. Alongside this, relying on the processing capabilities of the hardware to reinforce our quantitative requirements and optimizing for proper performance. Also, if we are unable to find any significant research in braille CV detection, it will require a more bottom-up development that could require more time and research rather than optimizing.

2. How are these risks being managed? 

By staying ahead of schedule in development, we can ensure we have plenty of time to do both unit testing and integration testing to give us a baseline for what needs to be worked on and optimized. We can continuously develop software in parallel so that it is easier to sidestep or add to the process if needed. 

3. What contingency plans are ready? 

Working steps have been modularized and parallelized to facilitate team cooperation and collaboration.

4. Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)?

While we are actively workshopping our design, some of the major considerations we made in the past weeks apply to narrowing the scope of our project and ironing out the details of our MVP. After speaking with Professor Yu, it became clear that we wanted to prioritize functionality and performance to meet our use-case requirements, with form factor and comfort as a secondary goal. Therefore, we decided to follow Alex’s advice to develop our MVP on the Jetson Xavier, which would provide ample headroom for optimization. However, due to its size and weight, the Jetson would not fit comfortably on a helmet or mounted to glasses, as we had originally envisioned. Therefore, we are likely to update our MVP to a wearable camera linked to the Jetson worn on a vest.

Following our Proposal Presentation, we received a lot of insightful feedback from our instructors and peers. Namely, there was some confusion about the technical details of our MVP and what our test environment would look like. As we move into the design stage of our development cycle, we will make sure to emphasize these features for our report and presentation. This is especially important so that our team has a clear idea of our goal by the end of the semester and so that we can order relevant materials ahead of time. There were also questions about how our solution addressed the problems we introduced in our use case. As we have narrowed our scope down to a more manageable size, we have also managed some tradeoffs in terms of functionality. However, we hope that our MVP will provide a strong foundation from which the path to an ideal solution will become clear.

5. Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward? 

Specifically, obtaining the actual Jetson Xavier board made us realize that it would be realistically impossible for the users to carry around all the parts on top of the helmet due to its heavy weight and bulky size. Therefore we will be adopting a combination of camera mounted glasses and a vest for our initial build design. Since we have been in the design phase so far and haven’t built the hardware yet, there will not be any costs that require further mitigations. 

6. Provide an updated schedule if changes have occurred. 

We have not made any changes to our schedule as a result of the updates we made to our design this week. Looking ahead on our Gantt chart, next week will be dedicated to planning out the technical details of our project and preparing our Design Review presentation. This will likely involve experimenting with the hardware and software involved in developing our MVP.