Team Status Report 11/19/2022

  • What are the most significant risks that could jeopardize the success of the project?

This week, the team debriefed and began implementing changes based on feedback from our Interim Demo. We primarily focused on making sure that we had measurable data that could be used to justify decisions made in our system’s design. 

Pre-processing: I am primarily relying on the stats values (left, top coordinates of individual braille dots,as well as the width and height of the neighboring dots) from “cv2.connectedComponentsWithStats()” function. I have checked the exact pixel locations of the spitted out matrices and the original image and have confirmed that the values are in fact accurate. My current redundancies of dots come from the inevitable flaw of the connectedComponentsWithStats() function, and I need to get rid of the redundant dots sporadically distributed in nearby locations using the non_max_suppression. There is a little issue going on, and I do not want to write the whole function myself so I am looking for ways to fix this, but as long as this gets done, I am nearly done with the pre-processing procedures. 

Classification: Latency risks for classification have been mostly addressed this week by changing the input layer of our neural network to accept 10 images for a single inference. The number of images accepted per inference will be tuned later to optimize against our testing environment. In addition, the model was converted from MXNET to ONNX, which is interoperable with NVIDIA’s TensorRT framework. However, using TensorRT seems to have introduced some latency to inference resulting in unintuitively faster inferences on the CPU.

Post-processing: The primary concern with the post-processing section of the project at the moment is in determining the audio integration with the Jetson Nano. Due to some of the difficulties we had with camera integration, we hope that it will not be as difficult of a process since we are only looking to transfer audio outwards rather than needing to recognize sound input as well. 


  • How are these risks being managed? 

Pre-processing: I am looking more into the logic behind non_max_suppression in getting rid of the redundant dots to facilitate the debugging process. 

Classification: More extensive measurements will be taken next week using different inference providers (CPU, TensorRT, CUDA) to inform our choice for the final system. 

Post-processing: Now that the camera is integrated, it is important to shift towards the stereo output. I do think it will integrate more easily than the camera, but it is still important that we get everything connected as soon as possible to avoid hardware needs later on. 


  • What contingency plans are ready? 

Pre-processing: If the built-in non_max_suppression() function does not work after continuous debugging attempts, I will have to write it myself. 


  • Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)?

Classification: The output of the classification pipeline has been modified to include not only a string of translated characters, but a dictionary of character indexes with the lowest confidence, as well as the next 10 predicted letters. This metadata is provided to help improve the efficiency of the post-processing spell checker.


  • Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward? 

This change was not necessary but it will help improve the overall efficiency of the pipeline significantly if it is able to stand on its own. It also does not require any significant overhead in time or effort so it is easy to implement. 


  • Provide an updated schedule if changes have occurred. 



Jong Woo’s Status Report for 11/12/2022

What did you personally accomplish this week on the project? Give files or photos that demonstrate your progress. Prove to the reader that you put sufficient effort into the project over the course of the week (12+ hours): 

         The former half of this week was dedicated to the preparation and execution of the interim demo as well as further debugging and parameter tweaking of issues being acknowledged. More specifically, I am relying on openCV’s “cv.2connectedComponentsWithStats()” function that outputs various statistical values in regards to the original image inputted, including the left, top coordinates as well as the width, height, and area of the most commonly appearing object (braille dots in our case). However, depending on the lighting or the image quality of the original image taken, the accuracy of this stats function needs to be further tested in order to modify further modifications. Therefore, I cam currently undergoing pixel by pixel comparison between the original image and the printed matrices containing information regarding (x,y) coordinates of the common objects (braille dots) as well as their corresponding width and heights, in an attempt to understand the current accuracy level and for further modifications that would be required for accurate acquisition of core numbers.  Furthermore, I am currently working on the cv’s non_max_suppression() function to mitigate some of the inaccuracies that can rise from the initial “connectedComponentsWithStats().” 

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?:

             There has been a little bit of an unexpected delay due to my health condition after my covid 4th booster shot, but on a general level progress is on schedule, and the upcoming week will be primarily focused on specific tweaking of the cv’s “connectedComponentsWithStats()” function and “non_max_suppression()” function’s parameters. 

What deliverables do you hope to complete in the next week?:

         Through the comparison between the pixel by pixel image and the printed matrices from “connectedComponentsWithStats()” function, I wish to gain a accuracy table of the current parameters in order to better tweak to get the accurate (x,y) coordinates of the centers of each braille dots, as well as the corresponding width and height. The accuracy of these value are critical for the last, and last, step of pre-processing; cropping via numpy slicing.

Chester’s Status Report 11/12/2022

What did you personally accomplish this week on the project? Give files or photos that demonstrate your progress. Prove to the reader that you put sufficient effort into the project over the course of the week (12+ hours).

The beginning of the week was spent adding finishing touches to the demo and example structure so that everything would go smoothly. I was able to demonstrate the spell-checking algorithm I had created and tested it on a small paragraph of words with errors. The algorithm showed strong promise in correcting incorrect words, but with the dictionary that I was using, there were a lot of words not included that caused correct words to be changed. 

The demo also provided significant advice for moving forward in the future. Not only did I bring to light several of my own issues in the spell checking design, but the TAs and professors gave great feedback on improving going forward. One significant issue that I found in design was the probability of a character being incorrectly replaced with a space. If this happened it would be processed as two different words. Alongside this, we decided that it would be effective to add a pipeline that would pass on the most significantly incorrect characters based on the confidence level determined in Kevin’s classification. This will allow us to minimize the latency of error checking on the post-processing side by limiting our response to the specified characters provided. Given the initial 90% accuracy goal provided by Kevin, if 50 characters are analyzed, it would be reasonable to specify the top 5 lowest confidence characters and pass them to the spell-checking algorithm. 

In addition to demo feedback, I tested an initial replacement of the dictionary method, by adding large english text to the file and parsing it for repetitions. This gives a very basic word probability that allows the algorithm to sort different choices rather than returning the first one that is found. 

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

We are currently still on track for the final product. We are hoping to get the full hardware integration as we want soon so that we can focus on developing the software to our liking.  

What deliverables do you hope to complete in the next week?

Looking forward, there are several factors in the spell-checking algorithm that I would like to analyze and focus on ironing out. This includes the actual breakdown of the sentence so that it comes back together correctly, as well as embedding the classification data with mine so that it checks certain characters.

Kevin’s Status Report for 11/12/2022

This week was Interim Demo week. I spent some time this week bootstrapping an integrated demo of all our individual parts, which was fairly simple because of the detached and parallel nature of our pipeline. As part of this task, I built a wrapper class for making predictions on a directory of files using the classifier I trained on AWS. Since last week, the mxnet docs have luckily been restored, making this task substantially less confusing.

While the resulting software worked well on my local Ubuntu system, it was quite difficult getting all the dependencies working on the Jetson Nano, given that it is a legacy device with limited support from NVIDIA. Specifically, the Jetson’s hardware platform and older OS meant that package managers like pip rarely offered pre-built wheels for a quick and easy install. As a result, libraries such as mxnet had to be built locally, which took around a day given the Jetson Nano’s computing power. The alternative option would have been to cross-compile the package on a more powerful computer. However, I had trouble getting the dockerfiles provided to accomplish this working. There are still quite a few problems with the hardware that I will have to troubleshoot in the coming weeks.

This week I also used Jay’s pre-processing pipeline to create a second dataset for training my model. Next week, I hope to continue iterating on the existing model on AWS to make it more accurate and reliable for our use case. Furthermore, while per-character inference on the Jetson is fairly fast at around ~0.1s, when processing words by character, this can add up to significant latency. As a result, I will be working on converting the mxnet model to tensorrt, which uses the Nano’s tensor cores to parallelize batch inference. This should also remove some of the difficulty of working with mxnet.

Jong Woo’s Status Report for 11/5/22

What did you personally accomplish this week on the project? Give files or photos that demonstrate your progress. Prove to the reader that you put sufficient effort into the project over the course of the week (12+ hours): 

This week, I gave a finishing touch to the current pre-processing filters that will be used to train our ML model. For thresholding, otsu thresholding, median thresholding, and gaussian thresholding with various threshold boundaries have been investigated. Because the thresholded image will then be eroded (to reduce small noises, by shrinking the currently existing dots) and dilated (extended to fill up the spaces to create a more wholesome circle), parameters have been tweaked multiple times and then fed into the erosion and dilation process, and individual results have been visually compared to opt for the better pre-processing results. For now, gaussian adaptive threshold with upper and lower boundaries as 21,4 exhibit preliminary thresholding. Below is the image of various different thresholding parameters and their corresponding result images. 

(please zoom in using ctrl + / –  to view the images here and below)

Similarly, canny edge filters, erosion, and dilations were all tested with various parameters to reach reasonable pre-processing results. Below is the code and corresponding comparison image that also includes the original braille image (img) as well as the final processed images(final1,2). 

         Furthermore, camera is integrated this week, and due to the resolution or lighting variations, the masking filters will need to be tweaked correspondingly to continuously produce reasonable pre-processing results. Below are the initially captured images with various colored contrast. 

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?:

             Progress is on schedule, and the upcoming week will be focused primarily on finishing vertical and horizontal segmentations that would lead into final cropping. 

What deliverables do you hope to complete in the next week?:

I hope to refine and finish the current horizontal segmentation, finish the remaining vertical segmentation, and lead into cropping.

Chester’s Status Report 11/5/2022

What did you personally accomplish this week on the project? Give files or photos that demonstrate your progress. Prove to the reader that you put sufficient effort into the project over the course of the week (12+ hours).

This week began with an initial ethics discussion. This discussion gave us some valuable feedback and information regarding the possible complications we might see if our product is taken to market and the worst possible outcomes. Alongside the ethical discussion, as a group we continued to work on camera integration and setting up the Nano for the interim demo the coming week. This was a little frustrating as the camera we initially wanted to work with was causing the Nano to malfunction and not turn on at all after the driver was downloading. 

Individually, I took this week to iron out the software infrastructure for my subsystem, designing classes for the spell-checking as well as the text-to-speech interface. This will allow our final software product to be easily put together and run through a single main file. In addition, it will help to minimize overall latency by not rerunning initializations. The actual spell checking software takes in a sentence, and returns the sentence with all errors corrected in words that were not in the dictionary. At the moment this is a very basic algorithm that simply returns the first single difference word found (naive). I hope to assign a probability that returns the best fit word at the final demo. 

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

Currently I am on track to have a simple demo of the subsystem for the interim demo, then scale it up and connect it with the other subsystems for the final product. 

What deliverables do you hope to complete in the next week?

In the upcoming week I would like to get the text to speech configured such that it can be called and immediately speak the inputted text without needing to convert to a file beforehand. This should reduce latency and clutter. In the upcoming week and beyond I would like to try and get a basic probability working for the spell check algorithm so it doesn’t just return the first possible solution. 

Kevin’s Status Report for 11/05/2022

This week, I spent an unexpected bulk of my time setting up the Jetson Nano with our camera. Unfortunately, the latest driver for the e-CAM50/CUNX-NANO camera we had chosen to use was corrupting the Nano’s on-board firmware memory. As a result, even re-flashing the MicroSD card did not fix the issue and the Nano was stuck on the NVIDIA splash screen when booting up. To fix this, I had to install Ubuntu on a personal computer and use NVIDIA’s SDK manager to reflash the Nano board entirely. We will be pivoting to a USB webcam temporarily while we search for an alternative camera solution (if the USB webcam is not sufficient). Looking at the documentation, the Jetson natively supports USB webcams and Sony’s IMX219 sensor (which is also available in our inventory, but seems to provide worse clarity). I am also in contact with e-con systems (the manufacturers of e-CAM50), and am awaiting a response for troubleshooting the driver software. For future reference, the driver release I used was R07, on a Jetson Nano 2GB developer kit with a 64GB MicroSD card running Jetpack 6.4 (L4T32.6.1).

On the image classifier side, I was able to set up a Jupyter notebook on SageMaker for training a MXNet DNN model to classify braille. However, using default suggested settings and the given dataset led to unsatisfactory results when training for more than 50 epochs from scratch (~4% validation accuracy). We will have to tune some parameters before trying again, but we will have to be careful not to over-test given our $100 AWS credit limit. Transfer learning from Sagemaker’s pre-trained model (trained on ImageNet), conversely, allowed the model to converge to ~94+% validation accuracy within 10 epochs. However, testing with a separate test dataset has not been completed on this model yet. Once I receive the pre-processing pipeline from Jay, I would also like to run the dataset through our pre-processing and use that to train/test the models – perhaps even using it for transfer learning on the existing braille model.

One minor annoyance with using an MXNet DNN model is that it seems that Amazon is the only company actively supporting the framework. As a result, documentation is lacking for how to deploy and run inferences without going through SageMaker/AWS. For example, the online documentation for MXnet is currently a broken link. This is important because we will need to run many inferences to measure the accuracy and reliability of our final model / iterative models, and batch transforms are relatively expensive on AWS.

Next week is Interim Demo week, for which we hope to have each stage of our pipeline functioning. This weekend, we expect to complete integration and migration to a single Jetson board, then do some preliminary testing on the entire system. Meanwhile, I will be continuing to tune the SageMaker workflow to automate (a) testing model accuracy / confusion matrix generation (b) intake for new datasets. Once the workflow is low maintenance enough, I would like to help out with coding other parts of our system. In response to feedback we received from the ethics discussions, I am considering prototyping a feature that tracks the user’s finger as they move it over the braille as a “cursor” to control reading speed and location. This should help reduce overreliance and undereducation due to our device.

Team Status Report for 11/05/2022

What are the most significant risks that could jeopardize the success of the project?

One issue we ran into this week was with connecting the eCAM50 MIPI CSI camera that we had expected to use initially. Due to unforeseen issues in the camera driver, the Jetson Nano is left in a boot loop after running the provided install script. We have reached out to the manufacturers for troubleshooting but have yet to hear back.

Looking at the feed from our two alternative cameras, the quality of video feed and the resulting captured image may not exhibit optimal resolution. Furthermore, the IMX219 camera with its ribbon design and wide angle FOV is highly vulnerable to shakes and distortions that can disrupt the fixed dimensional requirements for the original captured image, so further means to minimize dislocation should be investigated.


How are these risks being managed? 


There are alternative cameras that we can use and have already tested connecting to the Nano. One is a USB camera (Logitech C920) and the other is an IMX219, which is natively supported by the Nano platform and does not require additional driver installations. Overall, our product isn’t at risk, but there are trade offs that we must consider when deciding on a final camera to use. The C920 seems to provide a clearer raw image since there is some processing being done on-board, but it will likely have higher latency as a result.

We will be locating the camera in a fixed place(rod,…) along with creating dimensional guidelines to place the braille document to be interpreted. Since the primary users of our product could have visual impairments, we will place physical tangible components that will provide guidelines for placing the braille document.

What contingency plans are ready?

We have several contingency plans in place at the moment. Right now we are working with a temporary USB camera alternative to guarantee the camera feed and connection to the Nano. In addition, we also have another compatible camera that is smaller and has lower latency with a small quality trade off. Finally, our larger contingency plan is to work with the Jetson AGX Xavier connected to the wifi extension provided by the Nano, and mount the camera for the most optimal processing speeds.

Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)?

Since our last report, no significant changes have been made to the system design of our product. We are still in the process of testing out the camera integrated with the Jetson Nano, and depending on how this goes we will make a final decision as we start to finalize the software. In terms of individual subsystems, we are considering different variations of filtering that work best with segmentation and the opencv processing, as well as having the classification subsystem be responsible for concatenation to simplify translation of the braille language.

Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward?

Since we did not change any major sections of the system design, we do not have any costs associated currently. Right now, we are prepping for the interim demo and if everything goes well we will be in solid form to finish the product according to schedule. Individually, the work will not vary too much and if needed can be shifted around when people are more accessible to help. At this stage in the product, most of the software for our individual subsystems has been created and we can start to work more in conjunction with each other for the final demo.

Provide an updated schedule if changes have occurred.



Since we had been working based off of the deadlines on the Canvas assignments, our Gantt chart was structured under the impression that Interim Demos would take place during the week of Nov. 14. We have since changed our timeline to match the syllabus.

Jong Woo’s Status Report for 10/29/2022

What did you personally accomplish this week on the project? Give files or photos that demonstrate your progress. Prove to the reader that you put sufficient effort into the project over the course of the week (12+ hours): 

         This past week, I focused on further applying the canny edge detection filter and using non-maximum suppression to draw out circles on top of the pre-existing braille characters. Canny edge detection, supported by OpenCV, allows various structural information to be extracted from an image while dramatically reducing the amount of the data to be processed. The results of the canny edge filter applied image will be facilitated for the non-maximum suppression, which would select a single entity (dot) out of many overlapping entities and draw out individual colored circles on the top. 

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?:

             Progress is on schedule, and the upcoming week will be primarily focused on the group work regarding camera integration. 

What deliverables do you hope to complete in the next week?:

        Camera Integration is the primary goal to be completed by next week. This is quintessential for the upcoming interim demo on the Nov 16th, and I believe that as long as camera integration could be completed in time, our team would be able to have some tangible deliverable by the interim demo date. 

Kevin’s Status Report for 10/29/22

Following our return from fall break, we spent some time this week to debrief and re-calibrate our expected deliverables for the Interim Demo. One important change that was made for more convenient development was pivoting to the Jetson Nano as our prototyping platform. Outside of working on the Ethics assignment, I spent some time this week partitioning the dataset into separate datasets for cross-validation (train, validate, test), using roughly a 60/20/20 division, respectively. Because of the size of the dataset, I was confident that I could use a larger partition for validating and testing. Once done, I formatted the dataset in accordance to the SageMaker tutorial for TensorFlow, then uploaded it to an AWS S3 Bucket.

This weekend, I was granted AWS credits which I will use to begin training our ML model on SageMaker. Since SageMaker offers multiple frameworks for Image Classification (MXNet, TensorFlow), I will make sure to test both to see which is more accurate. Furthermore, I am planning to use K-Fold cross validation to test the robustness of our dataset. I am currently still training on the open-source dataset without any meaningful modifications outside of relabeling (see last weekly update), however we hope to add some more images that have been run through the pre-processing pipeline soon.

Since we are beginning to pivot toward preparing hardware for our interim demo, I also took some time this week to work independently on bringing up the Jetson Nano and eCAM-50. However, I ran into some issues flashing the SD card, due to a version mismatch between the on-board memory and the image provided by NVIDIA online. Since I do not have an Ubuntu system readily available, I will need to use Jetpack SDK manager on the lab computers to resolve this.

As mentioned above, I’ve run into some unexpected blockers both on hardware bring-up and AWS, but I’m hoping to catch up early this week, hopefully ending tomorrow with a working Jetson Nano and integrated camera, and a working SageMaker model. The rest of the next week will be spent measuring the results of tuning various parameters on SageMaker and choosing the best model to use for our application, in addition to working with Jay to integrate our phases.