ckglenn – Team B1: Aware-ables

December 11, 2022

Chester’s Status Report 12/10/2022

What did you personally accomplish this week on the project? Give files or photos that demonstrate your progress. Prove to the reader that you put sufficient effort into the project over the course of the week (12+ hours).

This week I spent the beginning portion preparing for my final presentation of our product and results. In addition, we continued to run validation testing on the integrated pipeline while smoothing out the bits and pieces. Overall, the post-processing segments are as solid as they need to be and we have really just been focusing on the entire integration.

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

We are finishing up for the final demo and then the final report. Everything is looking clean and well integrated.

What deliverables do you hope to complete in the next week?

Last little touches are being put onto the project and then we will finish up the report!

December 4, 2022

Chester’s Status Report 12/03/2022

Overall a lot of progress was made this week in preparation for the final demo and presentation. We decided that rescoping our project away from natural scene braille detection and wearability was in our best interest. This is due to the complexity required for initial recognition.

In terms of the post processing section of the project, I was able to complete a significant amount of work in developing the spell checking algorithm even further. By taking in information about the projected confidence levels from kevin, I can check any number of characters throughout the string for corrections. After initial testing, this brings up the correctness to around 98-99% on single error words. In addition to confidence inclusions, I built a testing infrastructure that would run on large paragraphs of text and compare each word to a correct version after being run through the spell checker. For my algorithm, this gives me a clean interpretation of what is working best and what will work best after classification. I ran the test on both a static dictionary file with over 200,000 words, a text file containing sherlock holmes and little women with around 300,000 words, and a combination of both. The dictionary file provided the least accuracy due to the inability to produce a probability per word and limited size. The combination provided maximum confidence with very little efficiency loss. In addition, adding in the confidence matrix led to almost 2x speed ups for similar data sets. Of course these are simulated data sets so it will be interesting to run through the whole pipeline and see results.

In conclusion, we were also able to verify the sound capabilities of the jetson Nano and it was simple to connect to. This means that the full pipeline can perform on the jetson and we can run testing from pre-processing to post-processing.

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

We are currently still on track after rescoping to the stationary educational model. I think it would have been nice to have more data for the presentation but overall I am happy with where we are.

What deliverables do you hope to complete in the next week?

As we get into the final stretch, I think it is crucial to be able to add final testing metrics and data for the presentation and final demo. This will show that we thoroughly checked our options and effectively came to our final product with quantitative reasoning.

November 20, 2022

Chester’s Status Report 11/19/2022

This week was spent building off of the previous demo’s feedback, and a helpful meeting with the professor on Wednesday. Several things were made clear. The importance of testing and actually having quantitative metrics to back up our decisions in design. This means having tests run on pieces of software to physically demonstrate why something is better than something else. In terms of the post-processing section, this would include quantifying the text that I already have, as well as testing out large paragraphs of errors of different sorts. This also comes with the addition of the confidence matrix that will be provided by kevin now.

One of the main elements of progress from this week was creating an algorithm that could take in Kevin’s confidence matrix dictionary and properly adjust the current algorithm to run more optimally. This means checking different iterations of words through the classification pipeline and checking based on the determined accuracy of the ML algorithm. This would give us the opportunity to greatly increase the current run time of checking every letter with every character in the word. It would only check characters either below a certain accuracy threshold, or within a percentage of the total letters so that we only have to check 10 possible characters for each classification character given to me. Theoretically, based on our model of 85-90% accuracy from classification, it would provide meaningful error correction as a surplus to the classification, bringing us to a theoretical 90+%.

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

We are currently still on track for the final product. We are hoping to get the full hardware integration as we want soon so that we can focus on developing the software to our liking.

What deliverables do you hope to complete in the next week?

I would like to start integrating the speaker/headphones to the Nano to make sure there are no difficulties transferring audio at runtime. I think it is also important in general that we focus more effort on getting software and functionality on the Nano as that will be harder to tweak if there are errors. Overall, continuing polishing the spell checking algorithm and audio processing as we get integrated with the hardware.

November 20, 2022November 20, 2022

Team Status Report 11/19/2022

What are the most significant risks that could jeopardize the success of the project?

This week, the team debriefed and began implementing changes based on feedback from our Interim Demo. We primarily focused on making sure that we had measurable data that could be used to justify decisions made in our system’s design.

Pre-processing: I am primarily relying on the stats values (left, top coordinates of individual braille dots,as well as the width and height of the neighboring dots) from “cv2.connectedComponentsWithStats()” function. I have checked the exact pixel locations of the spitted out matrices and the original image and have confirmed that the values are in fact accurate. My current redundancies of dots come from the inevitable flaw of the connectedComponentsWithStats() function, and I need to get rid of the redundant dots sporadically distributed in nearby locations using the non_max_suppression. There is a little issue going on, and I do not want to write the whole function myself so I am looking for ways to fix this, but as long as this gets done, I am nearly done with the pre-processing procedures.

Classification: Latency risks for classification have been mostly addressed this week by changing the input layer of our neural network to accept 10 images for a single inference. The number of images accepted per inference will be tuned later to optimize against our testing environment. In addition, the model was converted from MXNET to ONNX, which is interoperable with NVIDIA’s TensorRT framework. However, using TensorRT seems to have introduced some latency to inference resulting in unintuitively faster inferences on the CPU.

Post-processing: The primary concern with the post-processing section of the project at the moment is in determining the audio integration with the Jetson Nano. Due to some of the difficulties we had with camera integration, we hope that it will not be as difficult of a process since we are only looking to transfer audio outwards rather than needing to recognize sound input as well.

How are these risks being managed?

Pre-processing: I am looking more into the logic behind non_max_suppression in getting rid of the redundant dots to facilitate the debugging process.

Classification: More extensive measurements will be taken next week using different inference providers (CPU, TensorRT, CUDA) to inform our choice for the final system.

Post-processing: Now that the camera is integrated, it is important to shift towards the stereo output. I do think it will integrate more easily than the camera, but it is still important that we get everything connected as soon as possible to avoid hardware needs later on.

What contingency plans are ready?

Pre-processing: If the built-in non_max_suppression() function does not work after continuous debugging attempts, I will have to write it myself.

Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)?

Classification: The output of the classification pipeline has been modified to include not only a string of translated characters, but a dictionary of character indexes with the lowest confidence, as well as the next 10 predicted letters. This metadata is provided to help improve the efficiency of the post-processing spell checker.

Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward?

This change was not necessary but it will help improve the overall efficiency of the pipeline significantly if it is able to stand on its own. It also does not require any significant overhead in time or effort so it is easy to implement.

Provide an updated schedule if changes have occurred.

November 12, 2022

Chester’s Status Report 11/12/2022

The beginning of the week was spent adding finishing touches to the demo and example structure so that everything would go smoothly. I was able to demonstrate the spell-checking algorithm I had created and tested it on a small paragraph of words with errors. The algorithm showed strong promise in correcting incorrect words, but with the dictionary that I was using, there were a lot of words not included that caused correct words to be changed.

The demo also provided significant advice for moving forward in the future. Not only did I bring to light several of my own issues in the spell checking design, but the TAs and professors gave great feedback on improving going forward. One significant issue that I found in design was the probability of a character being incorrectly replaced with a space. If this happened it would be processed as two different words. Alongside this, we decided that it would be effective to add a pipeline that would pass on the most significantly incorrect characters based on the confidence level determined in Kevin’s classification. This will allow us to minimize the latency of error checking on the post-processing side by limiting our response to the specified characters provided. Given the initial 90% accuracy goal provided by Kevin, if 50 characters are analyzed, it would be reasonable to specify the top 5 lowest confidence characters and pass them to the spell-checking algorithm.

In addition to demo feedback, I tested an initial replacement of the dictionary method, by adding large english text to the file and parsing it for repetitions. This gives a very basic word probability that allows the algorithm to sort different choices rather than returning the first one that is found.

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

We are currently still on track for the final product. We are hoping to get the full hardware integration as we want soon so that we can focus on developing the software to our liking.

What deliverables do you hope to complete in the next week?

Looking forward, there are several factors in the spell-checking algorithm that I would like to analyze and focus on ironing out. This includes the actual breakdown of the sentence so that it comes back together correctly, as well as embedding the classification data with mine so that it checks certain characters.

November 5, 2022November 5, 2022

Chester’s Status Report 11/5/2022

This week began with an initial ethics discussion. This discussion gave us some valuable feedback and information regarding the possible complications we might see if our product is taken to market and the worst possible outcomes. Alongside the ethical discussion, as a group we continued to work on camera integration and setting up the Nano for the interim demo the coming week. This was a little frustrating as the camera we initially wanted to work with was causing the Nano to malfunction and not turn on at all after the driver was downloading.

Individually, I took this week to iron out the software infrastructure for my subsystem, designing classes for the spell-checking as well as the text-to-speech interface. This will allow our final software product to be easily put together and run through a single main file. In addition, it will help to minimize overall latency by not rerunning initializations. The actual spell checking software takes in a sentence, and returns the sentence with all errors corrected in words that were not in the dictionary. At the moment this is a very basic algorithm that simply returns the first single difference word found (naive). I hope to assign a probability that returns the best fit word at the final demo.

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

Currently I am on track to have a simple demo of the subsystem for the interim demo, then scale it up and connect it with the other subsystems for the final product.

What deliverables do you hope to complete in the next week?

In the upcoming week I would like to get the text to speech configured such that it can be called and immediately speak the inputted text without needing to convert to a file beforehand. This should reduce latency and clutter. In the upcoming week and beyond I would like to try and get a basic probability working for the spell check algorithm so it doesn’t just return the first possible solution.

October 29, 2022

Chester’s Status Report 10/29/2022

This past week we came back together after being away on fall break. We received feedback on our design report which was overall positive and gave us a continued path forward for development. The beginning of the week was also spent initially working on the ethics assignment in preparation for class next week. In my own subsystem of the product, I continued to develop the spell checking algorithm code such that it could take in words and output the nearest real word. At the moment, it is only checking a dictionary and no probability metric is placed on words to give them a hierarchy when developing a best fit for each correction. One possible way that I researched for providing a probability is to use the Counter library in python and break down a collection of large texts in conjunction with a dictionary or large set of more often used words. This would create a hierarchy of words that are used more often, giving them a higher probability of being the replacement for the misspelled word. Lastly, I authenticated the Google API text to speech to be used on my local computer and I am currently able to generate mp3 and wav files via the API.

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

With two weeks until the interim demo, our product is in the stages of finalizing the separate subsystems of development. Our main goal is to have the separate subsystems display working functionality depicting the relative pipeline of our product. As of now, I am confident that my concatenation, algorithms, and final text to speech application is viable and will be able to perform a rough full run through within the two weeks provided in front of us.

What deliverables do you hope to complete in the next week?

In the upcoming week, I would like to start connecting the spell checking algorithm code with the google API text to speech capabilities, so I can verify the quality of the speech product, as well as start to hammer out inconsistencies in the spell checking algorithm itself. I would like to begin some formal testing of this subsystem if there is time. This is most likely going to be the integration process of the next two weeks before the interim demo.

October 29, 2022October 29, 2022

Team Status Report 10/29/2022

What are the most significant risks that could jeopardize the success of the project?

At the moment, our main risks are associated with meeting timing requirements while making sure we can work with the hardware effectively. Since our eCAM50 is built for the Jetson Nano, we are temporarily pivoting to the Nano platform and working on getting the camera integrated. From this experience, we are seeing that it will be essential to have an extended ribbon cable to connect the camera to the Jetson to ensure reasonable wearability. However, as important as wearability is, we do not want this to hinder our overall product capabilities. One thing that Alex mentioned to us early in the design process was that lengthening the camera cable could significantly affect latency. Until now, we have mostly been working individually on our personal systems, since we are now testing out camera integration with the Nano and beginning to integrate our independent parts on the device, this may require us to rely on WiFi, which the Nano provides over the Xavier AGX.

How are these risks being managed?

We currently have 22 days until the expected date of the interim demo on Nov 16th. Our goal for the interim demo is to be able to showcase how raw captured data is processed at each stage of our pipeline, from the camera, to the text-to-speech. Because we are temporarily pivoting to the Nano, we are putting less of a focus on latency so that we can focus on demoing functionality. As a result, we plan to work extensively on camera and software integration starting this coming week, and speaker integration the week after. We believe that such a schedule will guarantee enough time to troubleshoot any of the potential issues and further optimize our system.

What contingency plans are ready?

In case everything goes wrong in terms of integration of the e-CAM50 or the Nano does not provide the performance that we need, we do have a final contingency plan of falling back on the non-wearable fixed format using Jetson Xavier AGX instead of Nano. However, with proper time management and collaboration, we firmly believe that everything will be completed in time.

Were any changes made to the existing design of the system (requirements, block diagram, system spec, etc)?

Due to several constraints of the Jetson Xavier AGX (wireless connectivity, weight, I/O), we are considering altering our plan to work with the Jetson Nano. The Jetson Nano would provide wifi capabilities as well as integrate well with the camera that we already have. It also serves to decrease power draw in case we want to package our final prototype as a portable battery powered wearable. The main trade-off would be the performance difference. With this being said, we believe that the Nano will provide us with enough speed to match the necessary pre/post processing as well as classification subsystems.

Why was this change necessary, what costs does the change incur, and how will these costs be mitigated going forward?

This change is necessary due to our product’s use case, and need for mobility. With the Nano being smaller and containing its own Wifi, we can better integrate our design and make it more wearable and usable. The main cost for this change would be the decrease in performance capabilities, but we believe it will be able to handle our processing sufficiently. Going forward, we do not believe it will change the overall course of our schedule, and the next two weeks will still be essential for the development of our product before the interim demo.

Provide an updated schedule if changes have occurred.

Following Fall Break, we debriefed and re-assessed our progress and what we needed to do before the Interim Demo. As a result, we’ve moved camera and speaker integration up earlier in our Gantt chart. As we move closer to the integration phase, we will need to keep a closer eye on the Gantt chart to make sure everyone is on the same page and ready to integrate their deliverables.

October 21, 2022

Chester’s Status Report 10/22/2022

In the preceding week before fall break, most of our time working on the project was dedicated to the development of the design review report. This was slightly more time consuming than anticipated, but it was worth the thorough examination and attention to detail. With more than half of the time spent on the design report, the remaining time was spent writing out the spell checking algorithm and beginning initial error checking. The basic foundation is in place but there is still a lot of formation that needs to be done in order to be successful in our final product. Initially, the infrastructure began with word concatenation, and then evolved to funnel words into the post processing algorithm.

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

With the fall break, I think it wasn’t entirely considered into the plan as a full on break, and this might affect the overall segmentation of work sequentially. This being said, although we might be taking more time off this week than usual, there is slack built in that accounts for the time off. This slack is also for challenges faced, but we are currently on track and developing our work well. I think the next week will be essential in making sure we stay on track, and meet any necessary deadlines in work.

What deliverables do you hope to complete in the next week?

Like I mentioned above, this week will be essential in straightening out and rough edges after the break. Coming back together, we will hit the ground running, and most likely work in parallel to make significant strides in our work. For me, this would involve touching up the spell checking algorithm so that it is working at a reasonable level, as well as developing the post processing infrastructure into a smooth and more full pipeline. This includes the coding of the text to speech possibly if time is available.

October 8, 2022

Chester’s Status Report 10/08/2022

This week I was significantly more busy than most. Although on schedule still, I hope to catch up a little bit more before break and put more work into getting a proper chunk done for when we come back. Over the course of this week I have started to design the software responsible for taking in characters and turning them into words to be processed by the spell check algorithm. On top of this we finished the design review as well as started planning for the design report that will be due next week.

Is your progress on schedule or behind? If you are behind, what actions will be taken to catch up to the project schedule?

Overall, our entire project is still on track, and we hope to be at a manageable place going into fall break next week. After some tough questions at the design review, I think we still have a very clear direction for what we want our product to be, but being able to clarify that and present it as a user product will be important for our final deliverable.

What deliverables do you hope to complete in the next week?

In the next week, I hope to start testing out a preliminary spell checking algorithm as well as write it into an infrastructure including the word concatenation. This will help me start to create a more foundational understanding of the timing constraints of the pre-processing section of the project. Alongside this, although I wasn’t able to test between different text-to-speech apis last week, I would like to try doing this as well in the upcoming week.