Jong Woo’s Status Report for 10/08/2022

 

  • What did you personally accomplish this week on the project? Give files or photos that demonstrate your progress. Prove to the reader that you put sufficient effort into the project over the course of the week (12+ hours).

          First half of this week was spent on the preparation and rehearsal of the Design Presentation, as I was the presenter of team B1-Awareables.

          Latter half of this week was primarily invested in further research on segmentation and partial implementation of horizontal segmentation. After I obtain the region of image (ROI) after the initial pre-processing, the next step to proceed is the vertical and horizontal segmentation, a process of dividing the original ROI to vertical and horizontal segments of crops, in order to process into final cropped images of individual braille characters. 

            After some research, I found out that the approach for the horizontal and vertical segmentations should be a little bit different. More specifically, for horizontal segmentation, there are two different approach options: 1) Using hough transform lines, and 2) manually grouping individual rows by label, assigning labels to each row based on whether they are following each other or not (i.e. diff >1), and then find the mean row value associated with each label to color each row lines for horizontal segmentation. For vertical segmentation in specific, since the space between the dots depends on the letters following each other, the similar “2)” approach from horizontal segmentation should be somewhat adapted using Hough transformation, which will be further explored in the coming weeks. 

For the horizontal segmentation, this is the current result: 

Next steps would be to crop individual rows of ROI and save them into individual segmented rows which will be further vertically segmented using Hough Transforms:

EX)

 +

 +

(further segmented rows of ROI)….

  • Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

       Things that were due this week were i) start working on vertical & horizontal segmentation of the currently pre-processed images to attain cropped individual jpegs of braille alphabets, and 2) research on non-max suppression methods. This week I made some progress with horizontal segmentation and studied in depth how non-max suppression would be applied to our final filter. All goals were met and my progress is currently on schedule. 

  • What deliverables do you hope to complete in the next week?

         By the end of next week I plan on accomplishing the following: 1) Keep working on vertical & horizontal segmentation of the currently pre-processed images to attain cropped individual jpgs of braille alphabets, and 2) Further research and application of non-max suppression filters. 

Kevin’s Status Report for 10/01/2022

This week, my focus was looking at existing solutions for braille character classification and investigating the tools I would need for an in-house solution. This would help us get a better idea of how we should allocate our time and effort later in the development phase. I took some time to set up and train the GitHub repository I found last week. However, upon completion, I found that the training data was poorly labeled and, even considering the mislabeled data, it was not able to accurately classify our braille inputs.

Despite this failed experiment, the repository was able to give us a good idea of how fast classification can be once a model, in this case a DNN, is trained. Jay was able to provide me with some sample images of what a cropped braille character would look like after his pre-processing pipeline. Unfortunately, I lost some time this weekend due to illness but I hope to start next week by retraining the model with correct data and testing it against Jay’s inputs. If the pre-written solution turns out to be a dead end, I am looking into the most likely alternative of writing our own featurization techniques using Hough transform etc. and feeding them into OpenCV’s classification pipeline.

This week, I also took some time to design some diagrams for our design review, which will hopefully make it easier to communicate our vision during the presentation. It also helped us as a team to better understand our shared vision before moving into the development and implementation phase.

According to our Gantt chart, the main goals this week were to iron out our hardware and software design details and prepare the design presentation slides. We were able to accomplish the majority of this as a group, and some of us were even able to move ahead to initial implementation. One thing that I think we may need to make sure we do is to draft a parts list of parts we do not already have from inventory to order online as soon as possible.

Looking ahead, this upcoming week, Jay will be presenting our design review. Outside of class, I hope to have either an existing modified solution working or to start working on my own ML pipeline that can successfully classify the outputs that Jay has shared with me.

Jong Woo’s Status Report for 10/01/2022

  • What did you personally accomplish this week on the project? Give files or photos that demonstrate your progress. Prove to the reader that you put sufficient effort into the project over the course of the week (12+ hours).

This week, using various openCV functions that grayscale an image, or reduce the noise and enhance the edge contrast, or binarize an image, I have tested out various outcomes of applying different filters with controlled variables to attain a clean pre-processed image of braille text that would facilitate the recognition of each braille alphabets by our recognition ML dataset. For example, running the following python code below would display various potential intermediate outcomes of applying filters from the original braille image. 

After the initial image pre-processing stage, the image then needs to be segmented both vertically and horizontally, then save individually cropped braille alphabets into separate folder containing continuous jpegs of individual crops to be handled by the recognition ML dataset to be translated to the corresponding English Alphabet. This week, I worked on the vertical segmentation, and similar works will be applied for horizontal segmentation and the cropping and saving in coming weeks. 

In order to parallelize the workflow of our team, I manually cropped out the first 5 or 7 braille alphabets of the various versions of pre-processed images to be handed to Kevin for the next step of our algorithm, recognition phase. Kevin will then test the recognition ML algorithm and give metrics on the accuracy of translation given various clarities of pre-processed images. It is then going to be my goal to continuously enhance image-processing to match the thresholds required by Kevin’s metrics requirements. 

Last but not least, further method to enhance the accuracy rate from recognition phase was investigated, which was using non-max suppression from imults libraries. Since colored circles are drawn on top of the currently existing braille dots, there is a high chance that this way of pre-processing and cropping individual braille alphabets may return relatively higher accuracy rate. And the code for non-max suppression will be written from scratch in coming weeks. 

  • Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

Things that were due this week were i) collaborative initial hardware boot ii) assess tradeoffs for various image pre-processing and segmentation methods and start testing out the effectiveness of each pre-processing algorithm, and iii) start writing some code from scratch for pre-processing. All goals were met and my progress is currently on schedule. 

  • What deliverables do you hope to complete in the next week?

By the end of next week I plan on accomplishing the following: 1) Starting works on vertical & horizontal segmentation of the currently pre-processed images to attain cropped individual jpgs of braille alphabets, and 2) Research on non-max suppression methods using imults libraries to draw colored circles on top of pre-processed images for potential boost in recognition accuracy.

Chester’s Status Report 10/01/2022

This week, we began working directly with the hardware we ordered. This was an initial attempt to test the complexity of the Jetson AGX Xavier, as well as understanding the design implications that it would have in the future/changes we might have to make. After spending several hours working with it, there were many difficulties including internet connectivity and being able to start fresh from scratch. Although these are temporary setbacks, we are still confident going forward that our timeline is well within schedule. In my own personal work, I researched several text-to-speech applications that could be integrated into the project and worked with usb speaker systems. One said application is google’s lifelike text to speech synthesis. This takes in words/sentences and converts it to WAV form files that can be read by an audio output device. Alongside the text-to-speech portion of this project, I spent time analyzing methods for spellcheck and refining a possible formula going forward. This would include generating a set of possible words within a range of 1-2 edits of the input word, then outputting the one with highest probability. 

This being said, we are finishing up the research and design phase of our project, and within the scheduled time given. Soon we will transition to software development and architecture, and begin writing preliminary code. In the coming week, the goal is to have a working hardware system going forward, a unified code base with version control, and a well refined design for our code to get started. I would also like to test out several text-to-speech platforms for latency/accuracy measurements.

Team Status Report for 10/01/2022

1. What are the most significant risks that could jeopardize the success of the project?

This week, our team was focusing on solidifying the technical details of our design. One of the main blockers for us was flashing the AGX Xavier board and getting a clean install of the newest OS. Because the necessary software was not available on the host devices that we had access to, we spent some time setting up the Xavier temporarily on the board itself. During this process, we also considered the pros and cons of using an Xavier when compared to the more portable, energy efficient Nano. 

Our work is split into three parts: pre-processing, where the initial picture is taken and processed. In our initial phase, due to the difficulties of natural scene braille detection, we are currently initiating our image-processing phase with reasonably cropped images of braille text. However, since our use-case requirements and apparatus model provides a headwear mounted camera, we might need to consider different ways the camera will be mounted to provide more reliable angles of photo capturing in case ML based natural scene braille detection does not return 90% use-case requirements accuracy. 

The second phase of our algorithm is the recognition phase. For this phase, because we want to work with ML, the greatest risks are poorly labeled data and compatibility with the pre-processing phase. For the former, we encountered a public dataset that was not labeled correctly for English Braille. Therefore, we had to look for alternatives that could be used instead. To make sure that this phase is compatible with the phase before it, Jay has been communicating with Kevin to add the pre-processing output to the classifier’s training dataset.

The final phase of our algorithm is post-processing, which includes spellcheck and text-to-speech in our MVP. One design consideration that was made was whether to use an external text-to-speech API or build our own in-house software. We decided against an in-house solution because we think the final product would be better served if using a tried and true publicly available package, specifically for our latency metrics.

2. How are these risks being managed? 

These risks are being mitigated by working through the process of getting the Xavier running with a newly flashed environment. This will allow us to work through some of the direct technical challenges like connecting to ethernet, storage capabilities, and general processing power. By staying proactive and looking ahead, we can try and scale down to the nano if necessary, or if steady progress is made on the xavier, then we will be able to demo/use it for our final product. Overall, we have divided our work in such a way that each of us is not heavily reliant on each other or on the hardware working perfectly (of course it is necessary for testing and requirements). 

3. What contingency plans are ready? 

As far as our core device is concerned, we have currently set up a Jetson Xavier AGX in REH340 and can run it via ssh. We will also be ordering in Jetson nano since we have concluded that our programs could also be run in nano under reasonable latency along with other perks such as supportability of wifi or relative compactness of the device. For the initial image pre-processing phase, in case ML based natural scene detection returns unreliable accuracy, various methods, to mount the camera in regulated manners to adjust the initial dimensions of the image, are being considered. For the second phase of our primary algorithm, recognition, we researched into the possibility of using Hough transform of which are also supported by openCV houghtTransform libraries in case ML based recognition returns unreliable accuracy. For our final phase, audio transition, various web-based text-to-speech translation APIs are being currently investigated. 

4. Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)?

Overall, there were no significant changes made to our existing design of the system except for creating a solidified testing approach. This testing approach both validates the physical design of our product, quantifies “success”, and tests in a controlled environment. Alongside our testing approach, we are still currently in the process of deciding on whether or not the xavier is the correct fit for our project, or if we will have to pivot to the Nano for its wifi capabilities and simplistic design. This would only change our system specs at the moment. 

5. Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward? 

Adding a fully defined testing plan will allow us to properly measure our quantitative use case requirements, as well as give our audience/consumer a better understanding of the product as a whole. In addition, the Nano will not cost any more for us to use as it is available, but it may cost time to get adjusted to the new system and capabilities. This system has a significantly lower power draw (positive), but a slower processing speed (negative). Overall, we think that it will still be able to meet our expectations and mold well into our product design. Because we are still ahead of schedule, this cost will be a part of our initial research and design phase. 

6. Provide an updated schedule if changes have occurred. 

Everything is currently on schedule and does not require further revision.