Team Status Report for 10/08/2022

  1. What are the most significant risks that could jeopardize the success of the project?

      This week, the team focused on wrapping up and presenting our design review. We also spent some time experimenting with the Jetson and individually researching approaches for our respective phases. This early exploratory work has set us up nicely to begin writing our in-depth design report and finalize our bill of materials to order parts.

      Based on our research, we have also identified some further potential risks that could jeopardize the success of our project. While researching the classification phase, we realized that the time spent training iterations of our neural network may become a blocker for optimization and development. Originally, we had envisioned that we could use a pre-trained model or that we only needed to train a model once. However, it has become clear that iteration will be needed to optimize layer depth and size for best performance. Using the equipment we have on hand (Kevin’s RTX 3080), we were able to train a neural network for 20 epochs (13 batches per epoch) in around 1-2 hours. 

2. How are these risks being managed?

      To address training time as a possible blocker, we have reached out to Prof. Mukherjee to discuss options for an AWS workflow using SageMaker. Until this is working, we will have to be selective and intentional about what parameters we would like to test and iterate on.

3. What contingency plans are ready?

     While we expect to be able to use AWS or other cloud computing services to train our model, our contingency plan will likely be to fall back on local hardware. While this will be slower, we will simply need to be more intentional about our decisions as a result. 

     Based on initial feedback from our design review presentation, one of the things we will be revising for our design report will be clarity of the datapath. As such, we are creating diagrams which should help clearly visualize a captured image’s journey from sensor to text-to-speech. 

4. Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)?

     One suggestion that we discussed for our design review was the difference between a comfortable reading speed and a comfortable comprehension speed. Prof. Yu pointed out that while we would like to replicate the performance of braille reading, it is unlikely that text-to-speech at this word rate would be comfortable to listen to and comprehend entirely. As a result, we have adjusted our expectations and use-case requirements to take this into account. Based on our research, a comfortable comprehension speed is around 150wpm. Knowing this metric will allow us to better tune our text-to-speech output.

5. Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward? 

      Placing an upper limitation on the final output speed of translated speech would not incur any monetary or performance costs. 

6. Provide an updated schedule if changes have occurred. 

      Based on our Gantt chart, it seems that we have done a good job so far of budgeting time generously to account for lost time. As such, we are at pace with our scheduled tasks for the most part. In fact, we are partially ahead of schedule in some tasks due to experimentation we performed to drive the design review phase. However, one task we forgot to take into account in our original Gantt chart was the Design Report. We have modified the Gantt chart to take this into consideration, as below:

 

Jong Woo’s Status Report for 10/08/2022

 

  • What did you personally accomplish this week on the project? Give files or photos that demonstrate your progress. Prove to the reader that you put sufficient effort into the project over the course of the week (12+ hours).

          First half of this week was spent on the preparation and rehearsal of the Design Presentation, as I was the presenter of team B1-Awareables.

          Latter half of this week was primarily invested in further research on segmentation and partial implementation of horizontal segmentation. After I obtain the region of image (ROI) after the initial pre-processing, the next step to proceed is the vertical and horizontal segmentation, a process of dividing the original ROI to vertical and horizontal segments of crops, in order to process into final cropped images of individual braille characters. 

            After some research, I found out that the approach for the horizontal and vertical segmentations should be a little bit different. More specifically, for horizontal segmentation, there are two different approach options: 1) Using hough transform lines, and 2) manually grouping individual rows by label, assigning labels to each row based on whether they are following each other or not (i.e. diff >1), and then find the mean row value associated with each label to color each row lines for horizontal segmentation. For vertical segmentation in specific, since the space between the dots depends on the letters following each other, the similar “2)” approach from horizontal segmentation should be somewhat adapted using Hough transformation, which will be further explored in the coming weeks. 

For the horizontal segmentation, this is the current result: 

Next steps would be to crop individual rows of ROI and save them into individual segmented rows which will be further vertically segmented using Hough Transforms:

EX)

 +

 +

(further segmented rows of ROI)….

  • Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

       Things that were due this week were i) start working on vertical & horizontal segmentation of the currently pre-processed images to attain cropped individual jpegs of braille alphabets, and 2) research on non-max suppression methods. This week I made some progress with horizontal segmentation and studied in depth how non-max suppression would be applied to our final filter. All goals were met and my progress is currently on schedule. 

  • What deliverables do you hope to complete in the next week?

         By the end of next week I plan on accomplishing the following: 1) Keep working on vertical & horizontal segmentation of the currently pre-processed images to attain cropped individual jpgs of braille alphabets, and 2) Further research and application of non-max suppression filters. 

Kevin’s Status Report for 10/01/2022

This week, my focus was looking at existing solutions for braille character classification and investigating the tools I would need for an in-house solution. This would help us get a better idea of how we should allocate our time and effort later in the development phase. I took some time to set up and train the GitHub repository I found last week. However, upon completion, I found that the training data was poorly labeled and, even considering the mislabeled data, it was not able to accurately classify our braille inputs.

Despite this failed experiment, the repository was able to give us a good idea of how fast classification can be once a model, in this case a DNN, is trained. Jay was able to provide me with some sample images of what a cropped braille character would look like after his pre-processing pipeline. Unfortunately, I lost some time this weekend due to illness but I hope to start next week by retraining the model with correct data and testing it against Jay’s inputs. If the pre-written solution turns out to be a dead end, I am looking into the most likely alternative of writing our own featurization techniques using Hough transform etc. and feeding them into OpenCV’s classification pipeline.

This week, I also took some time to design some diagrams for our design review, which will hopefully make it easier to communicate our vision during the presentation. It also helped us as a team to better understand our shared vision before moving into the development and implementation phase.

According to our Gantt chart, the main goals this week were to iron out our hardware and software design details and prepare the design presentation slides. We were able to accomplish the majority of this as a group, and some of us were even able to move ahead to initial implementation. One thing that I think we may need to make sure we do is to draft a parts list of parts we do not already have from inventory to order online as soon as possible.

Looking ahead, this upcoming week, Jay will be presenting our design review. Outside of class, I hope to have either an existing modified solution working or to start working on my own ML pipeline that can successfully classify the outputs that Jay has shared with me.

Jong Woo’s Status Report for 10/01/2022

  • What did you personally accomplish this week on the project? Give files or photos that demonstrate your progress. Prove to the reader that you put sufficient effort into the project over the course of the week (12+ hours).

This week, using various openCV functions that grayscale an image, or reduce the noise and enhance the edge contrast, or binarize an image, I have tested out various outcomes of applying different filters with controlled variables to attain a clean pre-processed image of braille text that would facilitate the recognition of each braille alphabets by our recognition ML dataset. For example, running the following python code below would display various potential intermediate outcomes of applying filters from the original braille image. 

After the initial image pre-processing stage, the image then needs to be segmented both vertically and horizontally, then save individually cropped braille alphabets into separate folder containing continuous jpegs of individual crops to be handled by the recognition ML dataset to be translated to the corresponding English Alphabet. This week, I worked on the vertical segmentation, and similar works will be applied for horizontal segmentation and the cropping and saving in coming weeks. 

In order to parallelize the workflow of our team, I manually cropped out the first 5 or 7 braille alphabets of the various versions of pre-processed images to be handed to Kevin for the next step of our algorithm, recognition phase. Kevin will then test the recognition ML algorithm and give metrics on the accuracy of translation given various clarities of pre-processed images. It is then going to be my goal to continuously enhance image-processing to match the thresholds required by Kevin’s metrics requirements. 

Last but not least, further method to enhance the accuracy rate from recognition phase was investigated, which was using non-max suppression from imults libraries. Since colored circles are drawn on top of the currently existing braille dots, there is a high chance that this way of pre-processing and cropping individual braille alphabets may return relatively higher accuracy rate. And the code for non-max suppression will be written from scratch in coming weeks. 

  • Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

Things that were due this week were i) collaborative initial hardware boot ii) assess tradeoffs for various image pre-processing and segmentation methods and start testing out the effectiveness of each pre-processing algorithm, and iii) start writing some code from scratch for pre-processing. All goals were met and my progress is currently on schedule. 

  • What deliverables do you hope to complete in the next week?

By the end of next week I plan on accomplishing the following: 1) Starting works on vertical & horizontal segmentation of the currently pre-processed images to attain cropped individual jpgs of braille alphabets, and 2) Research on non-max suppression methods using imults libraries to draw colored circles on top of pre-processed images for potential boost in recognition accuracy.

Chester’s Status Report 10/01/2022

This week, we began working directly with the hardware we ordered. This was an initial attempt to test the complexity of the Jetson AGX Xavier, as well as understanding the design implications that it would have in the future/changes we might have to make. After spending several hours working with it, there were many difficulties including internet connectivity and being able to start fresh from scratch. Although these are temporary setbacks, we are still confident going forward that our timeline is well within schedule. In my own personal work, I researched several text-to-speech applications that could be integrated into the project and worked with usb speaker systems. One said application is google’s lifelike text to speech synthesis. This takes in words/sentences and converts it to WAV form files that can be read by an audio output device. Alongside the text-to-speech portion of this project, I spent time analyzing methods for spellcheck and refining a possible formula going forward. This would include generating a set of possible words within a range of 1-2 edits of the input word, then outputting the one with highest probability. 

This being said, we are finishing up the research and design phase of our project, and within the scheduled time given. Soon we will transition to software development and architecture, and begin writing preliminary code. In the coming week, the goal is to have a working hardware system going forward, a unified code base with version control, and a well refined design for our code to get started. I would also like to test out several text-to-speech platforms for latency/accuracy measurements.

Team Status Report for 10/01/2022

1. What are the most significant risks that could jeopardize the success of the project?

This week, our team was focusing on solidifying the technical details of our design. One of the main blockers for us was flashing the AGX Xavier board and getting a clean install of the newest OS. Because the necessary software was not available on the host devices that we had access to, we spent some time setting up the Xavier temporarily on the board itself. During this process, we also considered the pros and cons of using an Xavier when compared to the more portable, energy efficient Nano. 

Our work is split into three parts: pre-processing, where the initial picture is taken and processed. In our initial phase, due to the difficulties of natural scene braille detection, we are currently initiating our image-processing phase with reasonably cropped images of braille text. However, since our use-case requirements and apparatus model provides a headwear mounted camera, we might need to consider different ways the camera will be mounted to provide more reliable angles of photo capturing in case ML based natural scene braille detection does not return 90% use-case requirements accuracy. 

The second phase of our algorithm is the recognition phase. For this phase, because we want to work with ML, the greatest risks are poorly labeled data and compatibility with the pre-processing phase. For the former, we encountered a public dataset that was not labeled correctly for English Braille. Therefore, we had to look for alternatives that could be used instead. To make sure that this phase is compatible with the phase before it, Jay has been communicating with Kevin to add the pre-processing output to the classifier’s training dataset.

The final phase of our algorithm is post-processing, which includes spellcheck and text-to-speech in our MVP. One design consideration that was made was whether to use an external text-to-speech API or build our own in-house software. We decided against an in-house solution because we think the final product would be better served if using a tried and true publicly available package, specifically for our latency metrics.

2. How are these risks being managed? 

These risks are being mitigated by working through the process of getting the Xavier running with a newly flashed environment. This will allow us to work through some of the direct technical challenges like connecting to ethernet, storage capabilities, and general processing power. By staying proactive and looking ahead, we can try and scale down to the nano if necessary, or if steady progress is made on the xavier, then we will be able to demo/use it for our final product. Overall, we have divided our work in such a way that each of us is not heavily reliant on each other or on the hardware working perfectly (of course it is necessary for testing and requirements). 

3. What contingency plans are ready? 

As far as our core device is concerned, we have currently set up a Jetson Xavier AGX in REH340 and can run it via ssh. We will also be ordering in Jetson nano since we have concluded that our programs could also be run in nano under reasonable latency along with other perks such as supportability of wifi or relative compactness of the device. For the initial image pre-processing phase, in case ML based natural scene detection returns unreliable accuracy, various methods, to mount the camera in regulated manners to adjust the initial dimensions of the image, are being considered. For the second phase of our primary algorithm, recognition, we researched into the possibility of using Hough transform of which are also supported by openCV houghtTransform libraries in case ML based recognition returns unreliable accuracy. For our final phase, audio transition, various web-based text-to-speech translation APIs are being currently investigated. 

4. Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)?

Overall, there were no significant changes made to our existing design of the system except for creating a solidified testing approach. This testing approach both validates the physical design of our product, quantifies “success”, and tests in a controlled environment. Alongside our testing approach, we are still currently in the process of deciding on whether or not the xavier is the correct fit for our project, or if we will have to pivot to the Nano for its wifi capabilities and simplistic design. This would only change our system specs at the moment. 

5. Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward? 

Adding a fully defined testing plan will allow us to properly measure our quantitative use case requirements, as well as give our audience/consumer a better understanding of the product as a whole. In addition, the Nano will not cost any more for us to use as it is available, but it may cost time to get adjusted to the new system and capabilities. This system has a significantly lower power draw (positive), but a slower processing speed (negative). Overall, we think that it will still be able to meet our expectations and mold well into our product design. Because we are still ahead of schedule, this cost will be a part of our initial research and design phase. 

6. Provide an updated schedule if changes have occurred. 

Everything is currently on schedule and does not require further revision.

Kevin’s Status Report for 09/24/2022

This week, my team and I worked on preparing and presenting the slide deck for our proposal presentation. To prepare for the presentation, I made sure to spend some time rehearsing and editing the final slide deck to fit the expected pace. Following the presentation, we received some insightful feedback on the directions our project could take as we move into the next phase.

Since I have been assigned to focus on character classification and testing, I spent the remaining time this week looking for open source datasets as well as printed artifacts we could use for testing, and researching algorithms we could use to featurize the segmented braille characters. For the former, I’ve found custom shops on Etsy which specialize in braille printing or sell braille goods, as well as dedicated online storefronts for braille goods. However, popular storefronts, such as Amazon, seem to have a limited selection. For the latter, Jay suggested that we look into Hough Transforms, a technique which may be useful for extracting the position of shapes in an image. I also found a GitHub repository with a pre-trained classifier that may be a good place to start, which I am planning to test in the next week.

Everything has been on schedule during these first few weeks. During the past week, we have completed the joint deliverables for website bring-up and the proposal presentation. Personally, I have started research into a more robust testing criteria and featurization strategies. Looking ahead, next week, I expect to work with the team to develop a final technical design to present on the following Monday, in addition to experimenting with software options on my own. By the end of the week, we should also have an initial parts list for anything we may need to order in addition to the existing hardware we’ve requested from inventory.

Chester’s Status Report for 9/24/2022

We are currently working on the final design of our product as software and for the application’s wearability. This week was focused primarily on the proposal presentation slides and analysis of feedback, and then researching separately for the development. 

In our schedule, the main bulk of our project comes after the design presentation because we want to have a finalized structure before delving deeply into the work. This includes software design trade-offs as well as structuring the project to allow for parallelism and growth. Therefore, since the development begins primarily after the design review, we are very much on time, and have good space for the beginning of our project. 

The coming week will involve hands-on hardware integration, as well as the development of the software + hardware design for finalization. Working with the hardware will allow us to find more challenges and difficulties to iron out before integrating a final product design. Alongside this, we will be concluding research in our separate fields in order to begin the software and hardware development.

Jong Woo’s Status Report for 09/24/2022

 

  • What did you personally accomplish this week on the project? Give files or photos that demonstrate your progress. Prove to the reader that you put sufficient effort into the project over the course of the week (12+ hours).

I started research on image pre-processing and segmentation using OpenCV libraries. In order to convert an image or a photo taken from a camera into a binarized image that could then be segmented and recognized, various pre-processing steps needs to be taken. For example, given an imported image of a braille document,

  

various OpenCV functions that would i) convert the image into gray scales, ii) reduce the overall noise of an image, iii) enhance the edge contrast, iv) then binarize the current image using similar approach as below, giving images like .

Further on, by i) finding the connected components and extracting the mean height and width using cv’s cv2.connectedComponentWithStats function and np.mean() function. ii) Find empty rows, defined as having less than mean_h/2 pixels. iii) Group and assign each labels to rows, and then find the mean row value associated with each label.After successive steps, the image will then be segmented like this:

Next steps from here is to adopt Hough transform to identify and recognize each of the segmented braille alphabets. And the work will follow in the following week. 

  • Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

Things that were due this week were i) research on image pre-processing and segmentation and ii) initiation on the hardware design. Everything planned is completed on track and as scheduled.

  • What deliverables do you hope to complete in the next week?

By the end of next week I plan on accomplishing the following: 1) collaborative initial hardware design, in a form of camera mounted glasses in combination with a wearable vest that would hold the jetson xavier. 2) Tradeoffs for various image pre-processing and segmentation methods and start testing out the effectivenesses of each preprocessing algorithms to decide whether the code needs to be written from scratch to match our metrics requirements for recognition and translation. 3) Initiate camera integration 

Team Status Report for 09/24/2022

1. What are the most significant risks that could jeopardize the success of the project?

At this point in our project, most of our significant risks involve the general success of the software we provide. Alongside this, relying on the processing capabilities of the hardware to reinforce our quantitative requirements and optimizing for proper performance. Also, if we are unable to find any significant research in braille CV detection, it will require a more bottom-up development that could require more time and research rather than optimizing.

2. How are these risks being managed? 

By staying ahead of schedule in development, we can ensure we have plenty of time to do both unit testing and integration testing to give us a baseline for what needs to be worked on and optimized. We can continuously develop software in parallel so that it is easier to sidestep or add to the process if needed. 

3. What contingency plans are ready? 

Working steps have been modularized and parallelized to facilitate team cooperation and collaboration.

4. Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)?

While we are actively workshopping our design, some of the major considerations we made in the past weeks apply to narrowing the scope of our project and ironing out the details of our MVP. After speaking with Professor Yu, it became clear that we wanted to prioritize functionality and performance to meet our use-case requirements, with form factor and comfort as a secondary goal. Therefore, we decided to follow Alex’s advice to develop our MVP on the Jetson Xavier, which would provide ample headroom for optimization. However, due to its size and weight, the Jetson would not fit comfortably on a helmet or mounted to glasses, as we had originally envisioned. Therefore, we are likely to update our MVP to a wearable camera linked to the Jetson worn on a vest.

Following our Proposal Presentation, we received a lot of insightful feedback from our instructors and peers. Namely, there was some confusion about the technical details of our MVP and what our test environment would look like. As we move into the design stage of our development cycle, we will make sure to emphasize these features for our report and presentation. This is especially important so that our team has a clear idea of our goal by the end of the semester and so that we can order relevant materials ahead of time. There were also questions about how our solution addressed the problems we introduced in our use case. As we have narrowed our scope down to a more manageable size, we have also managed some tradeoffs in terms of functionality. However, we hope that our MVP will provide a strong foundation from which the path to an ideal solution will become clear.

5. Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward? 

Specifically, obtaining the actual Jetson Xavier board made us realize that it would be realistically impossible for the users to carry around all the parts on top of the helmet due to its heavy weight and bulky size. Therefore we will be adopting a combination of camera mounted glasses and a vest for our initial build design. Since we have been in the design phase so far and haven’t built the hardware yet, there will not be any costs that require further mitigations. 

6. Provide an updated schedule if changes have occurred. 

We have not made any changes to our schedule as a result of the updates we made to our design this week. Looking ahead on our Gantt chart, next week will be dedicated to planning out the technical details of our project and preparing our Design Review presentation. This will likely involve experimenting with the hardware and software involved in developing our MVP.