Kevin’s Status Report for 11/19/22

This week, I was able to convert our trained neural network from Apache’s MXNET framework, which AWS uses to train image classification networks, to ONNX (Open Neural Network Exchange), an open-source ecosystem for interoperability between NN frameworks. Doing so allowed me to untether our software stack from MXNET, which was unreliable on the Jetson Nano. As a result, I was able to use the onnx-runtime package to run our model using three different providers: CPU only, CUDA, and TensorRT.

Surprisingly, when testing CPU against CUDA/TensorRT, CPU peformed the best in inference latency. While I am not sure yet why this may be the case, there are some reports online of a similar issue where the first inference after a pause on TensorRT is much slower than following inferences. Furthermore, TensorRT and CUDA have more latency overhead on startup, since the framework needs to set up kernels and send instructions to all the parallel units. This is not something that will affect our final product, however, because it is a one time cost for our persistent system.

In addition to converting our model to MXNET, I also changed the model’s input layer to accept 10 images at a time rather than 1. Doing so allows more work to be done in a single inference, lowering the latency overhead of my phase. Because the number of images per inference will be a fixed value for a given model, I will make sure to tune this parameter to lower the number of “empty” inferences completed as we define the our testing data set (how many characters per scan etc.). It is also possible that as the input layer becomes larger, CPU inference becomes less efficient while GPU inference is able to parallelize, leading to better performance using TensorRT/CUDA.

Finally, I was able to modify output of my classification model to include confidence (inference probabilities) and the next N best predictions. This should help optimize post-processing to our problem space by narrowing the scope of the spell check search.

I did not have the opportunity this week to retrain / continue training the existing model using images passed through Jay’s pre-processing pipeline. However, as the details of the pipeline are still developing, this may have been a blessing in disguise. Next week, I will be focused on measuring the current models performance (accuracy, latency) using different pre-processing techniques and inference providers, as well as measuring cross-validation accuracy of our final training dataset. This information will be visualized and analyzed in our final report and help inform our final design. In addition, I will also be integrating the trigger button for our final prototype.

Chester’s Status Report 11/19/2022

What did you personally accomplish this week on the project? Give files or photos that demonstrate your progress. Prove to the reader that you put sufficient effort into the project over the course of the week (12+ hours).

This week was spent building off of the previous demo’s feedback, and a helpful meeting with the professor on Wednesday. Several things were made clear. The importance of testing and actually having quantitative metrics to back up our decisions in design. This means having tests run on pieces of software to physically demonstrate why something is better than something else. In terms of the post-processing section, this would include quantifying the text that I already have, as well as testing out large paragraphs of errors of different sorts. This also comes with the addition of the confidence matrix that will be provided by kevin now.

One of the main elements of progress from this week was creating an algorithm that could take in Kevin’s confidence matrix dictionary and properly adjust the current algorithm to run more optimally. This means checking different iterations of words through the classification pipeline and checking based on the determined accuracy of the ML algorithm. This would give us the opportunity to greatly increase the current run time of checking every letter with every character in the word. It would only check characters either below a certain accuracy threshold, or within a percentage of the total letters so that we only have to check 10 possible characters for each classification character given to me. Theoretically, based on our model of 85-90% accuracy from classification, it would provide meaningful error correction as a surplus to the classification, bringing us to a theoretical 90+%. 

 

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

We are currently still on track for the final product. We are hoping to get the full hardware integration as we want soon so that we can focus on developing the software to our liking.  

 

What deliverables do you hope to complete in the next week?

I would like to start integrating the speaker/headphones to the Nano to make sure there are no difficulties transferring audio at runtime. I think it is also important in general that we focus more effort on getting software and functionality on the Nano as that will be harder to tweak if there are errors. Overall, continuing polishing the spell checking algorithm and audio processing as we get integrated with the hardware. 

 

Team Status Report 11/19/2022

  • What are the most significant risks that could jeopardize the success of the project?

This week, the team debriefed and began implementing changes based on feedback from our Interim Demo. We primarily focused on making sure that we had measurable data that could be used to justify decisions made in our system’s design. 

Pre-processing: I am primarily relying on the stats values (left, top coordinates of individual braille dots,as well as the width and height of the neighboring dots) from “cv2.connectedComponentsWithStats()” function. I have checked the exact pixel locations of the spitted out matrices and the original image and have confirmed that the values are in fact accurate. My current redundancies of dots come from the inevitable flaw of the connectedComponentsWithStats() function, and I need to get rid of the redundant dots sporadically distributed in nearby locations using the non_max_suppression. There is a little issue going on, and I do not want to write the whole function myself so I am looking for ways to fix this, but as long as this gets done, I am nearly done with the pre-processing procedures. 

Classification: Latency risks for classification have been mostly addressed this week by changing the input layer of our neural network to accept 10 images for a single inference. The number of images accepted per inference will be tuned later to optimize against our testing environment. In addition, the model was converted from MXNET to ONNX, which is interoperable with NVIDIA’s TensorRT framework. However, using TensorRT seems to have introduced some latency to inference resulting in unintuitively faster inferences on the CPU.

Post-processing: The primary concern with the post-processing section of the project at the moment is in determining the audio integration with the Jetson Nano. Due to some of the difficulties we had with camera integration, we hope that it will not be as difficult of a process since we are only looking to transfer audio outwards rather than needing to recognize sound input as well. 

 

  • How are these risks being managed? 

Pre-processing: I am looking more into the logic behind non_max_suppression in getting rid of the redundant dots to facilitate the debugging process. 

Classification: More extensive measurements will be taken next week using different inference providers (CPU, TensorRT, CUDA) to inform our choice for the final system. 

Post-processing: Now that the camera is integrated, it is important to shift towards the stereo output. I do think it will integrate more easily than the camera, but it is still important that we get everything connected as soon as possible to avoid hardware needs later on. 

 

  • What contingency plans are ready? 

Pre-processing: If the built-in non_max_suppression() function does not work after continuous debugging attempts, I will have to write it myself. 

 

  • Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)?

Classification: The output of the classification pipeline has been modified to include not only a string of translated characters, but a dictionary of character indexes with the lowest confidence, as well as the next 10 predicted letters. This metadata is provided to help improve the efficiency of the post-processing spell checker.

 

  • Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward? 

This change was not necessary but it will help improve the overall efficiency of the pipeline significantly if it is able to stand on its own. It also does not require any significant overhead in time or effort so it is easy to implement. 

 

  • Provide an updated schedule if changes have occurred. 

 

 

Team Status Report for 11/12/2022

  1. What are the most significant risks that could jeopardize the success of the project?
  • Pre-processing:
    • Currently, I am relying on openCV’s “cv.2connectedComponentsWithStats()” function that outputs various statistical values in regards to the original image inputted, including the left, top coordinates as well as the width, height, and area of the most commonly appearing object (braille dots in our case). However, depending on the lighting or the image quality of the original image taken, the accuracy of this stats function needs to be further tested in order to modify further modifications. 
  • Classification:
    • On the classification side, one new risk that was introduced when testing our neural network inference on the Jetson Nano was latency. Since each character has around a 0.1s latency, if we were to process characters sequentially, an especially long sentence could produce substantial latency.
  • Hardware:
    • The Jetson Nano hardware also presented some challenges due to its limited support as a legacy platform in the Jetson ecosystem. Missing drivers and slow package build times make bring-up particularly slow. This is, however, a one-time cost which should not have any significant impact on our final product.
  • Post-processing:
    • Another hardware related risk to our final product is the audio integration capabilities of the Nano. Since this is one of the last parts of integration, complications could be critical. 

 

2. How are these risks being managed? 

 

  • Pre-processing:
    • On primary level, pixel by pixel comparison between image and printed matrices on terminal would be undergone to understand the current accuracy level and for further tweaking of the parameters. Furthermore, cv’s non_max_suppression() function is being further investigated to mitigate some of the inaccuracies that can rise from the initial “connectedComponentsWithStats().” 
  • Classification:
    • To address possible latency issues as a result of individual character inference latency, we are hoping to convert our model from the mxnet framework to NVIDIA’s TensorRT, which the Jetson can use to run the model on a batch of images in parallel. This should reduce the sequential bottleneck that we are currently facing.
  • Hardware:
    • Since hardware risks are a one-time cost, as mentioned above, we do not feel that we will need to take steps to manage them at this time. However, we are considering using a docker image to cross-compile larger packages for the Jetson on a more powerful system.
  • Post-processing:
    • After finishing camera integration, we will work on interacting with audio through the usb port. We have a stereo adapter ready to connect to headphones.

3. What contingency plans are ready? 

  • Classification:
    • If the inference time on the Jetson Nano is not significantly improved by moving to TensorRT, one contingency plan we have in place would be to migrate back to the Jetson AGX Xavier, which has significantly more computing power. While this comes at the expense of portability and power efficiency, it is within the parameters of our original design
  • Post-Processing:
    • There is a possible sound board input and output pcb that would allow us to attach to the nano and play sound. This comes with added expense and complexity, but it seems more likely to be proven effective. 

4. Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)?

Integrating each of our individual components into our overall software pipeline did not introduce any obvious challenges. Therefore, we did not think it is necessary to make any significant changes to our software system. However, in response to interim demo feedback, we are looking to create more definitive testing metrics when deciding on certain algorithms or courses of action. This will allow us to justify our choices moving forward and give our final report clarity. In addition to the testing, we are considering a more unified interaction between classification and post-processing that helps create a more deterministic approach to which characters might be wrong more often. 

5. Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward? 

The minor changes that we are making to the individual subsystems are crucial for the efficiency and effectiveness of our product. Making sure that we stay on top of optimal decisions and advice given by our professors and TAs. 

 

 

Jong Woo’s Status Report for 11/12/2022

What did you personally accomplish this week on the project? Give files or photos that demonstrate your progress. Prove to the reader that you put sufficient effort into the project over the course of the week (12+ hours): 

         The former half of this week was dedicated to the preparation and execution of the interim demo as well as further debugging and parameter tweaking of issues being acknowledged. More specifically, I am relying on openCV’s “cv.2connectedComponentsWithStats()” function that outputs various statistical values in regards to the original image inputted, including the left, top coordinates as well as the width, height, and area of the most commonly appearing object (braille dots in our case). However, depending on the lighting or the image quality of the original image taken, the accuracy of this stats function needs to be further tested in order to modify further modifications. Therefore, I cam currently undergoing pixel by pixel comparison between the original image and the printed matrices containing information regarding (x,y) coordinates of the common objects (braille dots) as well as their corresponding width and heights, in an attempt to understand the current accuracy level and for further modifications that would be required for accurate acquisition of core numbers.  Furthermore, I am currently working on the cv’s non_max_suppression() function to mitigate some of the inaccuracies that can rise from the initial “connectedComponentsWithStats().” 

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?:

             There has been a little bit of an unexpected delay due to my health condition after my covid 4th booster shot, but on a general level progress is on schedule, and the upcoming week will be primarily focused on specific tweaking of the cv’s “connectedComponentsWithStats()” function and “non_max_suppression()” function’s parameters. 

What deliverables do you hope to complete in the next week?:

         Through the comparison between the pixel by pixel image and the printed matrices from “connectedComponentsWithStats()” function, I wish to gain a accuracy table of the current parameters in order to better tweak to get the accurate (x,y) coordinates of the centers of each braille dots, as well as the corresponding width and height. The accuracy of these value are critical for the last, and last, step of pre-processing; cropping via numpy slicing.

Chester’s Status Report 11/12/2022

What did you personally accomplish this week on the project? Give files or photos that demonstrate your progress. Prove to the reader that you put sufficient effort into the project over the course of the week (12+ hours).

The beginning of the week was spent adding finishing touches to the demo and example structure so that everything would go smoothly. I was able to demonstrate the spell-checking algorithm I had created and tested it on a small paragraph of words with errors. The algorithm showed strong promise in correcting incorrect words, but with the dictionary that I was using, there were a lot of words not included that caused correct words to be changed. 

The demo also provided significant advice for moving forward in the future. Not only did I bring to light several of my own issues in the spell checking design, but the TAs and professors gave great feedback on improving going forward. One significant issue that I found in design was the probability of a character being incorrectly replaced with a space. If this happened it would be processed as two different words. Alongside this, we decided that it would be effective to add a pipeline that would pass on the most significantly incorrect characters based on the confidence level determined in Kevin’s classification. This will allow us to minimize the latency of error checking on the post-processing side by limiting our response to the specified characters provided. Given the initial 90% accuracy goal provided by Kevin, if 50 characters are analyzed, it would be reasonable to specify the top 5 lowest confidence characters and pass them to the spell-checking algorithm. 

In addition to demo feedback, I tested an initial replacement of the dictionary method, by adding large english text to the file and parsing it for repetitions. This gives a very basic word probability that allows the algorithm to sort different choices rather than returning the first one that is found. 

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

We are currently still on track for the final product. We are hoping to get the full hardware integration as we want soon so that we can focus on developing the software to our liking.  

What deliverables do you hope to complete in the next week?

Looking forward, there are several factors in the spell-checking algorithm that I would like to analyze and focus on ironing out. This includes the actual breakdown of the sentence so that it comes back together correctly, as well as embedding the classification data with mine so that it checks certain characters.

Kevin’s Status Report for 11/12/2022

This week was Interim Demo week. I spent some time this week bootstrapping an integrated demo of all our individual parts, which was fairly simple because of the detached and parallel nature of our pipeline. As part of this task, I built a wrapper class for making predictions on a directory of files using the classifier I trained on AWS. Since last week, the mxnet docs have luckily been restored, making this task substantially less confusing.

While the resulting software worked well on my local Ubuntu system, it was quite difficult getting all the dependencies working on the Jetson Nano, given that it is a legacy device with limited support from NVIDIA. Specifically, the Jetson’s hardware platform and older OS meant that package managers like pip rarely offered pre-built wheels for a quick and easy install. As a result, libraries such as mxnet had to be built locally, which took around a day given the Jetson Nano’s computing power. The alternative option would have been to cross-compile the package on a more powerful computer. However, I had trouble getting the dockerfiles provided to accomplish this working. There are still quite a few problems with the hardware that I will have to troubleshoot in the coming weeks.

This week I also used Jay’s pre-processing pipeline to create a second dataset for training my model. Next week, I hope to continue iterating on the existing model on AWS to make it more accurate and reliable for our use case. Furthermore, while per-character inference on the Jetson is fairly fast at around ~0.1s, when processing words by character, this can add up to significant latency. As a result, I will be working on converting the mxnet model to tensorrt, which uses the Nano’s tensor cores to parallelize batch inference. This should also remove some of the difficulty of working with mxnet.

Jong Woo’s Status Report for 11/5/22

What did you personally accomplish this week on the project? Give files or photos that demonstrate your progress. Prove to the reader that you put sufficient effort into the project over the course of the week (12+ hours): 

This week, I gave a finishing touch to the current pre-processing filters that will be used to train our ML model. For thresholding, otsu thresholding, median thresholding, and gaussian thresholding with various threshold boundaries have been investigated. Because the thresholded image will then be eroded (to reduce small noises, by shrinking the currently existing dots) and dilated (extended to fill up the spaces to create a more wholesome circle), parameters have been tweaked multiple times and then fed into the erosion and dilation process, and individual results have been visually compared to opt for the better pre-processing results. For now, gaussian adaptive threshold with upper and lower boundaries as 21,4 exhibit preliminary thresholding. Below is the image of various different thresholding parameters and their corresponding result images. 

(please zoom in using ctrl + / –  to view the images here and below)

Similarly, canny edge filters, erosion, and dilations were all tested with various parameters to reach reasonable pre-processing results. Below is the code and corresponding comparison image that also includes the original braille image (img) as well as the final processed images(final1,2). 

         Furthermore, camera is integrated this week, and due to the resolution or lighting variations, the masking filters will need to be tweaked correspondingly to continuously produce reasonable pre-processing results. Below are the initially captured images with various colored contrast. 

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?:

             Progress is on schedule, and the upcoming week will be focused primarily on finishing vertical and horizontal segmentations that would lead into final cropping. 

What deliverables do you hope to complete in the next week?:

I hope to refine and finish the current horizontal segmentation, finish the remaining vertical segmentation, and lead into cropping.

Chester’s Status Report 11/5/2022

What did you personally accomplish this week on the project? Give files or photos that demonstrate your progress. Prove to the reader that you put sufficient effort into the project over the course of the week (12+ hours).

This week began with an initial ethics discussion. This discussion gave us some valuable feedback and information regarding the possible complications we might see if our product is taken to market and the worst possible outcomes. Alongside the ethical discussion, as a group we continued to work on camera integration and setting up the Nano for the interim demo the coming week. This was a little frustrating as the camera we initially wanted to work with was causing the Nano to malfunction and not turn on at all after the driver was downloading. 

Individually, I took this week to iron out the software infrastructure for my subsystem, designing classes for the spell-checking as well as the text-to-speech interface. This will allow our final software product to be easily put together and run through a single main file. In addition, it will help to minimize overall latency by not rerunning initializations. The actual spell checking software takes in a sentence, and returns the sentence with all errors corrected in words that were not in the dictionary. At the moment this is a very basic algorithm that simply returns the first single difference word found (naive). I hope to assign a probability that returns the best fit word at the final demo. 

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

Currently I am on track to have a simple demo of the subsystem for the interim demo, then scale it up and connect it with the other subsystems for the final product. 

What deliverables do you hope to complete in the next week?

In the upcoming week I would like to get the text to speech configured such that it can be called and immediately speak the inputted text without needing to convert to a file beforehand. This should reduce latency and clutter. In the upcoming week and beyond I would like to try and get a basic probability working for the spell check algorithm so it doesn’t just return the first possible solution. 

Kevin’s Status Report for 11/05/2022

This week, I spent an unexpected bulk of my time setting up the Jetson Nano with our camera. Unfortunately, the latest driver for the e-CAM50/CUNX-NANO camera we had chosen to use was corrupting the Nano’s on-board firmware memory. As a result, even re-flashing the MicroSD card did not fix the issue and the Nano was stuck on the NVIDIA splash screen when booting up. To fix this, I had to install Ubuntu on a personal computer and use NVIDIA’s SDK manager to reflash the Nano board entirely. We will be pivoting to a USB webcam temporarily while we search for an alternative camera solution (if the USB webcam is not sufficient). Looking at the documentation, the Jetson natively supports USB webcams and Sony’s IMX219 sensor (which is also available in our inventory, but seems to provide worse clarity). I am also in contact with e-con systems (the manufacturers of e-CAM50), and am awaiting a response for troubleshooting the driver software. For future reference, the driver release I used was R07, on a Jetson Nano 2GB developer kit with a 64GB MicroSD card running Jetpack 6.4 (L4T32.6.1).

On the image classifier side, I was able to set up a Jupyter notebook on SageMaker for training a MXNet DNN model to classify braille. However, using default suggested settings and the given dataset led to unsatisfactory results when training for more than 50 epochs from scratch (~4% validation accuracy). We will have to tune some parameters before trying again, but we will have to be careful not to over-test given our $100 AWS credit limit. Transfer learning from Sagemaker’s pre-trained model (trained on ImageNet), conversely, allowed the model to converge to ~94+% validation accuracy within 10 epochs. However, testing with a separate test dataset has not been completed on this model yet. Once I receive the pre-processing pipeline from Jay, I would also like to run the dataset through our pre-processing and use that to train/test the models – perhaps even using it for transfer learning on the existing braille model.

One minor annoyance with using an MXNet DNN model is that it seems that Amazon is the only company actively supporting the framework. As a result, documentation is lacking for how to deploy and run inferences without going through SageMaker/AWS. For example, the online documentation for MXnet is currently a broken link. This is important because we will need to run many inferences to measure the accuracy and reliability of our final model / iterative models, and batch transforms are relatively expensive on AWS.

Next week is Interim Demo week, for which we hope to have each stage of our pipeline functioning. This weekend, we expect to complete integration and migration to a single Jetson board, then do some preliminary testing on the entire system. Meanwhile, I will be continuing to tune the SageMaker workflow to automate (a) testing model accuracy / confusion matrix generation (b) intake for new datasets. Once the workflow is low maintenance enough, I would like to help out with coding other parts of our system. In response to feedback we received from the ethics discussions, I am considering prototyping a feature that tracks the user’s finger as they move it over the braille as a “cursor” to control reading speed and location. This should help reduce overreliance and undereducation due to our device.