Chester’s Status Report 10/29/2022

What did you personally accomplish this week on the project? Give files or photos that demonstrate your progress. Prove to the reader that you put sufficient effort into the project over the course of the week (12+ hours).

This past week we came back together after being away on fall break. We received feedback on our design report which was overall positive and gave us a continued path forward for development. The beginning of the week was also spent initially working on the ethics assignment in preparation for class next week. In my own subsystem of the product, I continued to develop the spell checking algorithm code such that it could take in words and output the nearest real word. At the moment, it is only checking a dictionary and no probability metric is placed on words to give them a hierarchy when developing a best fit for each correction. One possible way that I researched for providing a probability is to use the Counter library in python and break down a collection of large texts in conjunction with a dictionary or large set of more often used words. This would create a hierarchy of words that are used more often, giving them a higher probability of being the replacement for the misspelled word. Lastly, I authenticated the Google API text to speech to be used on my local computer and I am currently able to generate mp3 and wav files via the API. 

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

With two weeks until the interim demo, our product is in the stages of finalizing the separate subsystems of development. Our main goal is to have the separate subsystems display working functionality depicting the relative pipeline of our product. As of now, I am confident that my concatenation, algorithms, and final text to speech application is viable and will be able to perform a rough full run through within the two weeks provided in front of us. 

What deliverables do you hope to complete in the next week?

In the upcoming week, I would like to start connecting the spell checking algorithm code with the google API text to speech capabilities, so I can verify the quality of the speech product, as well as start to hammer out inconsistencies in the spell checking algorithm itself. I would like to begin some formal testing of this subsystem if there is time. This is most likely going to be the integration process of the next two weeks before the interim demo. 

Team Status Report 10/29/2022

  • What are the most significant risks that could jeopardize the success of the project?

At the moment, our main risks are associated with meeting timing requirements while making sure we can work with the hardware effectively. Since our eCAM50 is built for the Jetson Nano, we are temporarily pivoting to the Nano platform and working on getting the camera integrated. From this experience, we are seeing that it will be essential to have an extended ribbon cable to connect the camera to the Jetson to ensure reasonable wearability. However, as important as wearability is, we do not want this to hinder our overall product capabilities. One thing that Alex mentioned to us early in the design process was that lengthening the camera cable could significantly affect latency. Until now, we have mostly been working individually on our personal systems, since we are now testing out camera integration with the Nano and beginning to integrate our independent parts on the device, this may require us to rely on WiFi, which the Nano provides over the Xavier AGX. 

  •  How are these risks being managed? 

We currently have 22 days until the expected date of the interim demo on Nov 16th. Our goal for the interim demo is to be able to showcase how raw captured data is processed at each stage of our pipeline, from the camera, to the text-to-speech. Because we are temporarily pivoting to the Nano, we are putting less of a focus on latency so that we can focus on demoing functionality. As a result, we plan to work extensively on camera and software integration starting this coming week, and speaker integration the week after. We believe that such a schedule will guarantee enough time to troubleshoot any of the potential issues and further optimize our system. 

  • What contingency plans are ready? 

In case everything goes wrong in terms of integration of the e-CAM50 or the Nano does not provide the performance that we need, we do have a final contingency plan of falling back on the non-wearable fixed format using Jetson Xavier AGX instead of Nano. However, with proper time management and collaboration, we firmly believe that everything will be completed in time. 

  • Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)?

Due to several constraints of the Jetson Xavier AGX (wireless connectivity, weight, I/O), we are considering altering our plan to work with the Jetson Nano. The Jetson Nano would provide wifi capabilities as well as integrate well with the camera that we already have. It also serves to decrease power draw in case we want to package our final prototype as a portable battery powered wearable. The main trade-off would be the performance difference. With this being said, we believe that the Nano will provide us with enough speed to match the necessary pre/post processing as well as classification subsystems. 

  • Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward? 

This change is necessary due to our product’s use case, and need for mobility. With the Nano being smaller and containing its own Wifi, we can better integrate our design and make it more wearable and usable. The main cost for this change would be the decrease in performance capabilities, but we believe it will be able to handle our processing sufficiently. Going forward, we do not believe it will change the overall course of our schedule, and the next two weeks will still be essential for the development of our product before the interim demo. 

  • Provide an updated schedule if changes have occurred.

Following Fall Break, we debriefed and re-assessed our progress and what we needed to do before the Interim Demo. As a result, we’ve moved camera and speaker integration up earlier in our Gantt chart. As we move closer to the integration phase, we will need to keep a closer eye on the Gantt chart to make sure everyone is on the same page and ready to integrate their deliverables.

Jong Woo’s Status Report for 10/22/2022

Note: This weekly status report covers any work performed during the week of 10/15 as well as Fall Break

What did you personally accomplish this week on the project? Give files or photos that demonstrate your progress. Prove to the reader that you put sufficient effort into the project over the course of the week (12+ hours): 

         The week before mid-sem break was primarily invested in the crafting and review of the design report. Our team was able to efficiently divide the work loads for the design report based on the subsystems that we were individually working on, and have a thorough and considerate revision to finalize our design report.

          I have spent some time this week, looking into how the vertically and horizontally segmented images would be cropped and stored into a folder of separate file images. Since OpenCV had no specific functions for cropping, “Numpy array slicing” method will be adopted. Since every image files that has been pre-processed can be read in and stored as 2D array for the individual color channels, pixel values of the height and width of the area can be specified to crop specific regions of the images. 

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?:

             Progress is on schedule, and given how this past week was a mid-sem break, our team will focus on getting things back rolling and combine the parallelized works, getting ready for the interim demo that is approaching.

What deliverables do you hope to complete in the next week?:

                  Implementation of the canny edge filters along with the non-maximum suppression method for the ultimate edge contrasts will be done. 

Kevin’s Status Report for 10/22/2022

Note: This weekly status report covers any work performed during the week of 10/15 as well as Fall Break.

This past week (10/15), the team spent the majority of their time developing the design report, for which I spent some time performing an experiment to measure the performance of the pre-trained model we are measuring. To do this, I first had to download an offline copy of the labeled dataset made available by aeye-alliance. Then, I relabeled the dataset with braille unicode characters rather than English translations. I also manually scanned through each labeled image to make sure they were labeled correctly. Of the more than 20,000 images downloaded from online containers, I only found 16 mislabeled images and 2 that I deemed too unclear to use.

An example of mislabeled data within the “⠃” dataset.

Attribution of training data will be difficult to maintain if required. We can refer to the labeled data csv files from aeye-allliance, which includes a list of the sources of all images, but we will not be able to specifically pinpoint the source of any single image.

Once I had the correct data in each folder, I wrote a python script which loaded the pre-trained model and crawls through the training dataset, making a prediction for each image. The result would be noted down in a csv containing the correct value, the prediction, and the measured inference time. Using pandas and seaborn, I was able to visualize the resulting data as a confusion matrix. I found that the resulting confusion matrix did not quite reach the requirements that we put forth for ourselves. There are also a number of imperfections with this experiment, which have been described in the design report.

Confusion matrix generated by experiment

The rest of my time was spent writing my share of the content of the design report. The following week being Fall Break, I did not do as much work as described in our Gantt chart. I looked into how to use Amazon Sagemaker to train a new ML model and setup an AWS account. I am still in alignment with my scheduled tasks, having built a large dataset and measured existing solutions in order to complete the design report. Next week, I hope to use this knowledge to quickly setup a Sagemaker workflow to train and iterate on a model customized for our pre-processing pipeline.

Chester’s Status Report 10/22/2022

What did you personally accomplish this week on the project? Give files or photos that demonstrate your progress. Prove to the reader that you put sufficient effort into the project over the course of the week (12+ hours).

In the preceding week before fall break, most of our time working on the project was dedicated to the development of the design review report. This was slightly more time consuming than anticipated, but it was worth the thorough examination and attention to detail. With more than half of the time spent on the design report, the remaining time was spent writing out the spell checking algorithm and beginning initial error checking. The basic foundation is in place but there is still a lot of formation that needs to be done in order to be successful in our final product. Initially, the infrastructure began with word concatenation, and then evolved to funnel words into the post processing algorithm.  

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

With the fall break, I think it wasn’t entirely considered into the plan as a full on break, and this might affect the overall segmentation of work sequentially. This being said, although we might be taking more time off this week than usual, there is slack built in that accounts for the time off. This slack is also for challenges faced, but we are currently on track and developing our work well. I think the next week will be essential in making sure we stay on track, and meet any necessary deadlines in work. 

What deliverables do you hope to complete in the next week?

Like I mentioned above, this week will be essential in straightening out and rough edges after the break. Coming back together, we will hit the ground running, and most likely work in parallel to make significant strides in our work. For me, this would involve touching up the spell checking algorithm so that it is working at a reasonable level, as well as developing the post processing infrastructure into a smooth and more full pipeline. This includes the coding of the text to speech possibly if time is available. 

Chester’s Status Report 10/08/2022

What did you personally accomplish this week on the project? Give files or photos that demonstrate your progress. Prove to the reader that you put sufficient effort into the project over the course of the week (12+ hours).

This week I was significantly more busy than most. Although on schedule still, I hope to catch up a little bit more before break and put more work into getting a proper chunk done for when we come back. Over the course of this week I have started to design the software responsible for taking in characters and turning them into words to be processed by the spell check algorithm. On top of this we finished the design review as well as started planning for the design report that will be due next week. 

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

Overall, our entire project is still on track, and we hope to be at a manageable place going into fall break next week. After some tough questions at the design review, I think we still have a very clear direction for what we want our product to be, but being able to clarify that and present it as a user product will be important for our final deliverable. 

What deliverables do you hope to complete in the next week?

In the next week, I hope to start testing out a preliminary spell checking algorithm as well as write it into an infrastructure including the word concatenation. This will help me start to create a more foundational understanding of the timing constraints of the pre-processing section of the project. Alongside this, although I wasn’t able to test between different text-to-speech apis last week, I would like to try doing this as well in the upcoming week. 

Kevin’s Status Report for 10/08/2022

This week, our team presented our design review for the final vision of Awareables. I spent the beginning of the week under the weather, which meant that we met fewer times as a whole group.

Individiually, I spent some of the week experimenting with a pre-trained model that was trained on the 30,000 image set we intend to use for our model. I started by feeding the model the pre-processed images that Jay provided me with last week. Of the four different filter outputs, non-max suppression yielded the best accuracy, with 85% of the characters recognized accurately (Blur3: 60%, Orig: 80%, Thresh3: 60%). That said, non-max suppression may be the most processing-heavy pre-processing method, so we will have to weight the cost-benefit tradeoff there. Interestingly, most misidentified characters were misidentified as the letter “Q” (N, S, and T are all only some “flips” away from Q). Furthermore, “K” is likely to be misidentified if the two dots are not aligned to the left side of the image.

It’s clear that using any pre-trained model will be insufficient for our use-case requirements. This further justifies our design choices to: (1) train our own machine learning model (2) on a dataset modified to more closely resemble the output of our pre-processing pipeline. Therefore, I have also been taking some time to look at various online learning resources for machine learning and neural networks, since as a group, we have fairly little experience with the tools. My main question was how to choose the configuration of the hidden layers of a neural network. Some heuristics I have found are (1) hidden layer nodes should be close to sqrt(input layer nodes * output layer nodes) and (2) to keep on adding layers until test error does not improve any more.

Looking at the frameworks available, it seems most likely that I will be using Keras to configure a TensorFlow neural network, which, once trained, will be deployed on OpenCV. I will also take some time to experiment with decision trees and random forest on OpenCV using hand-picked features. Based on this and last week’s experience, it takes around 1-2 hours to train a model (20 epochs reaches an accuracy of 95+% against test dataset) locally with the equipment I have on-hand. We are looking into how to avoid waiting for model training as a blocker by using AWS SageMaker.

Looking at our Gantt chart, we are heading into the development phase following our design review. It seems like most, if not all, of us are slightly ahead of schedule for the time we have budgeted (due to running individual experiments as part of our design review).

Next week, I expect to be able to have set up an AWS SageMaker workflow for iteratively training and testing models, and have created a modified dataset we can use to train and test.

Team Status Report for 10/08/2022

  1. What are the most significant risks that could jeopardize the success of the project?

      This week, the team focused on wrapping up and presenting our design review. We also spent some time experimenting with the Jetson and individually researching approaches for our respective phases. This early exploratory work has set us up nicely to begin writing our in-depth design report and finalize our bill of materials to order parts.

      Based on our research, we have also identified some further potential risks that could jeopardize the success of our project. While researching the classification phase, we realized that the time spent training iterations of our neural network may become a blocker for optimization and development. Originally, we had envisioned that we could use a pre-trained model or that we only needed to train a model once. However, it has become clear that iteration will be needed to optimize layer depth and size for best performance. Using the equipment we have on hand (Kevin’s RTX 3080), we were able to train a neural network for 20 epochs (13 batches per epoch) in around 1-2 hours. 

2. How are these risks being managed?

      To address training time as a possible blocker, we have reached out to Prof. Mukherjee to discuss options for an AWS workflow using SageMaker. Until this is working, we will have to be selective and intentional about what parameters we would like to test and iterate on.

3. What contingency plans are ready?

     While we expect to be able to use AWS or other cloud computing services to train our model, our contingency plan will likely be to fall back on local hardware. While this will be slower, we will simply need to be more intentional about our decisions as a result. 

     Based on initial feedback from our design review presentation, one of the things we will be revising for our design report will be clarity of the datapath. As such, we are creating diagrams which should help clearly visualize a captured image’s journey from sensor to text-to-speech. 

4. Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)?

     One suggestion that we discussed for our design review was the difference between a comfortable reading speed and a comfortable comprehension speed. Prof. Yu pointed out that while we would like to replicate the performance of braille reading, it is unlikely that text-to-speech at this word rate would be comfortable to listen to and comprehend entirely. As a result, we have adjusted our expectations and use-case requirements to take this into account. Based on our research, a comfortable comprehension speed is around 150wpm. Knowing this metric will allow us to better tune our text-to-speech output.

5. Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward? 

      Placing an upper limitation on the final output speed of translated speech would not incur any monetary or performance costs. 

6. Provide an updated schedule if changes have occurred. 

      Based on our Gantt chart, it seems that we have done a good job so far of budgeting time generously to account for lost time. As such, we are at pace with our scheduled tasks for the most part. In fact, we are partially ahead of schedule in some tasks due to experimentation we performed to drive the design review phase. However, one task we forgot to take into account in our original Gantt chart was the Design Report. We have modified the Gantt chart to take this into consideration, as below:

 

Jong Woo’s Status Report for 10/08/2022

 

  • What did you personally accomplish this week on the project? Give files or photos that demonstrate your progress. Prove to the reader that you put sufficient effort into the project over the course of the week (12+ hours).

          First half of this week was spent on the preparation and rehearsal of the Design Presentation, as I was the presenter of team B1-Awareables.

          Latter half of this week was primarily invested in further research on segmentation and partial implementation of horizontal segmentation. After I obtain the region of image (ROI) after the initial pre-processing, the next step to proceed is the vertical and horizontal segmentation, a process of dividing the original ROI to vertical and horizontal segments of crops, in order to process into final cropped images of individual braille characters. 

            After some research, I found out that the approach for the horizontal and vertical segmentations should be a little bit different. More specifically, for horizontal segmentation, there are two different approach options: 1) Using hough transform lines, and 2) manually grouping individual rows by label, assigning labels to each row based on whether they are following each other or not (i.e. diff >1), and then find the mean row value associated with each label to color each row lines for horizontal segmentation. For vertical segmentation in specific, since the space between the dots depends on the letters following each other, the similar “2)” approach from horizontal segmentation should be somewhat adapted using Hough transformation, which will be further explored in the coming weeks. 

For the horizontal segmentation, this is the current result: 

Next steps would be to crop individual rows of ROI and save them into individual segmented rows which will be further vertically segmented using Hough Transforms:

EX)

 +

 +

(further segmented rows of ROI)….

  • Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

       Things that were due this week were i) start working on vertical & horizontal segmentation of the currently pre-processed images to attain cropped individual jpegs of braille alphabets, and 2) research on non-max suppression methods. This week I made some progress with horizontal segmentation and studied in depth how non-max suppression would be applied to our final filter. All goals were met and my progress is currently on schedule. 

  • What deliverables do you hope to complete in the next week?

         By the end of next week I plan on accomplishing the following: 1) Keep working on vertical & horizontal segmentation of the currently pre-processed images to attain cropped individual jpgs of braille alphabets, and 2) Further research and application of non-max suppression filters. 

Kevin’s Status Report for 10/01/2022

This week, my focus was looking at existing solutions for braille character classification and investigating the tools I would need for an in-house solution. This would help us get a better idea of how we should allocate our time and effort later in the development phase. I took some time to set up and train the GitHub repository I found last week. However, upon completion, I found that the training data was poorly labeled and, even considering the mislabeled data, it was not able to accurately classify our braille inputs.

Despite this failed experiment, the repository was able to give us a good idea of how fast classification can be once a model, in this case a DNN, is trained. Jay was able to provide me with some sample images of what a cropped braille character would look like after his pre-processing pipeline. Unfortunately, I lost some time this weekend due to illness but I hope to start next week by retraining the model with correct data and testing it against Jay’s inputs. If the pre-written solution turns out to be a dead end, I am looking into the most likely alternative of writing our own featurization techniques using Hough transform etc. and feeding them into OpenCV’s classification pipeline.

This week, I also took some time to design some diagrams for our design review, which will hopefully make it easier to communicate our vision during the presentation. It also helped us as a team to better understand our shared vision before moving into the development and implementation phase.

According to our Gantt chart, the main goals this week were to iron out our hardware and software design details and prepare the design presentation slides. We were able to accomplish the majority of this as a group, and some of us were even able to move ahead to initial implementation. One thing that I think we may need to make sure we do is to draft a parts list of parts we do not already have from inventory to order online as soon as possible.

Looking ahead, this upcoming week, Jay will be presenting our design review. Outside of class, I hope to have either an existing modified solution working or to start working on my own ML pipeline that can successfully classify the outputs that Jay has shared with me.