Team Status Report for 12/10/2022

  1. What are the most significant risks that could jeopardize the success of the project?
  • Pre-processing: Pre-processing is completed and meets all design requirements and also possesses an machine-learning based alternative with a longer latency.
  • Classification: Classification is more-or-less complete and meeting the requirements set forth in our design review. We don’t foresee any risks in this subsystem
  • Hardware/integration: We are still in the process of measuring latency of the entire system, but we know that we are within around 5 seconds on the AGX Xavier, which is a big improvement over the Nano. We will continue to measure and optimize the system, but we are at risk of compromising our latency requirement somewhat.
  • Report: We are beginning to outline the contents of our report and video. It is too early to say if any risks jeopardize our timeline.

2. How are these risks being managed? 

Nearly everything has been completed as planned. 

3. What contingency plans are ready? 

Post-Processing: At this point there are no necessary contingency plans with how everything is coming together. 

4. Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)?

Since last week, we have been able to measure the performance of our system on the AGX Xavier, and have chosen to pivot back to the Xavier, as we had originally planned in our proposal and Design Review. 

5. Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward? 

This change was necessary to more capably meet our latency requirements in the classification subsystem, where we were able to perform inferences 7x faster. This also improved the overall latency of the system.

6. Provide an updated schedule if changes have occurred. 

We are on schedule to present our solution at the final demo and make final measurements for our final report without making any schedule changes.

Team Status Report 11/19/2022

  • What are the most significant risks that could jeopardize the success of the project?

This week, the team debriefed and began implementing changes based on feedback from our Interim Demo. We primarily focused on making sure that we had measurable data that could be used to justify decisions made in our system’s design. 

Pre-processing: I am primarily relying on the stats values (left, top coordinates of individual braille dots,as well as the width and height of the neighboring dots) from “cv2.connectedComponentsWithStats()” function. I have checked the exact pixel locations of the spitted out matrices and the original image and have confirmed that the values are in fact accurate. My current redundancies of dots come from the inevitable flaw of the connectedComponentsWithStats() function, and I need to get rid of the redundant dots sporadically distributed in nearby locations using the non_max_suppression. There is a little issue going on, and I do not want to write the whole function myself so I am looking for ways to fix this, but as long as this gets done, I am nearly done with the pre-processing procedures. 

Classification: Latency risks for classification have been mostly addressed this week by changing the input layer of our neural network to accept 10 images for a single inference. The number of images accepted per inference will be tuned later to optimize against our testing environment. In addition, the model was converted from MXNET to ONNX, which is interoperable with NVIDIA’s TensorRT framework. However, using TensorRT seems to have introduced some latency to inference resulting in unintuitively faster inferences on the CPU.

Post-processing: The primary concern with the post-processing section of the project at the moment is in determining the audio integration with the Jetson Nano. Due to some of the difficulties we had with camera integration, we hope that it will not be as difficult of a process since we are only looking to transfer audio outwards rather than needing to recognize sound input as well. 

 

  • How are these risks being managed? 

Pre-processing: I am looking more into the logic behind non_max_suppression in getting rid of the redundant dots to facilitate the debugging process. 

Classification: More extensive measurements will be taken next week using different inference providers (CPU, TensorRT, CUDA) to inform our choice for the final system. 

Post-processing: Now that the camera is integrated, it is important to shift towards the stereo output. I do think it will integrate more easily than the camera, but it is still important that we get everything connected as soon as possible to avoid hardware needs later on. 

 

  • What contingency plans are ready? 

Pre-processing: If the built-in non_max_suppression() function does not work after continuous debugging attempts, I will have to write it myself. 

 

  • Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)?

Classification: The output of the classification pipeline has been modified to include not only a string of translated characters, but a dictionary of character indexes with the lowest confidence, as well as the next 10 predicted letters. This metadata is provided to help improve the efficiency of the post-processing spell checker.

 

  • Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward? 

This change was not necessary but it will help improve the overall efficiency of the pipeline significantly if it is able to stand on its own. It also does not require any significant overhead in time or effort so it is easy to implement. 

 

  • Provide an updated schedule if changes have occurred. 

 

 

Team Status Report for 11/12/2022

  1. What are the most significant risks that could jeopardize the success of the project?
  • Pre-processing:
    • Currently, I am relying on openCV’s “cv.2connectedComponentsWithStats()” function that outputs various statistical values in regards to the original image inputted, including the left, top coordinates as well as the width, height, and area of the most commonly appearing object (braille dots in our case). However, depending on the lighting or the image quality of the original image taken, the accuracy of this stats function needs to be further tested in order to modify further modifications. 
  • Classification:
    • On the classification side, one new risk that was introduced when testing our neural network inference on the Jetson Nano was latency. Since each character has around a 0.1s latency, if we were to process characters sequentially, an especially long sentence could produce substantial latency.
  • Hardware:
    • The Jetson Nano hardware also presented some challenges due to its limited support as a legacy platform in the Jetson ecosystem. Missing drivers and slow package build times make bring-up particularly slow. This is, however, a one-time cost which should not have any significant impact on our final product.
  • Post-processing:
    • Another hardware related risk to our final product is the audio integration capabilities of the Nano. Since this is one of the last parts of integration, complications could be critical. 

 

2. How are these risks being managed? 

 

  • Pre-processing:
    • On primary level, pixel by pixel comparison between image and printed matrices on terminal would be undergone to understand the current accuracy level and for further tweaking of the parameters. Furthermore, cv’s non_max_suppression() function is being further investigated to mitigate some of the inaccuracies that can rise from the initial “connectedComponentsWithStats().” 
  • Classification:
    • To address possible latency issues as a result of individual character inference latency, we are hoping to convert our model from the mxnet framework to NVIDIA’s TensorRT, which the Jetson can use to run the model on a batch of images in parallel. This should reduce the sequential bottleneck that we are currently facing.
  • Hardware:
    • Since hardware risks are a one-time cost, as mentioned above, we do not feel that we will need to take steps to manage them at this time. However, we are considering using a docker image to cross-compile larger packages for the Jetson on a more powerful system.
  • Post-processing:
    • After finishing camera integration, we will work on interacting with audio through the usb port. We have a stereo adapter ready to connect to headphones.

3. What contingency plans are ready? 

  • Classification:
    • If the inference time on the Jetson Nano is not significantly improved by moving to TensorRT, one contingency plan we have in place would be to migrate back to the Jetson AGX Xavier, which has significantly more computing power. While this comes at the expense of portability and power efficiency, it is within the parameters of our original design
  • Post-Processing:
    • There is a possible sound board input and output pcb that would allow us to attach to the nano and play sound. This comes with added expense and complexity, but it seems more likely to be proven effective. 

4. Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)?

Integrating each of our individual components into our overall software pipeline did not introduce any obvious challenges. Therefore, we did not think it is necessary to make any significant changes to our software system. However, in response to interim demo feedback, we are looking to create more definitive testing metrics when deciding on certain algorithms or courses of action. This will allow us to justify our choices moving forward and give our final report clarity. In addition to the testing, we are considering a more unified interaction between classification and post-processing that helps create a more deterministic approach to which characters might be wrong more often. 

5. Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward? 

The minor changes that we are making to the individual subsystems are crucial for the efficiency and effectiveness of our product. Making sure that we stay on top of optimal decisions and advice given by our professors and TAs.