jasprees – Team B3: Scenescribe

December 9, 2023December 10, 2023

Team Status Report for 12.09.23

The most significant risks right now are regarding integration and testing. For integration, as we finalize our ML models we don’t have as much time as we might want when it comes to debugging. This week we have been running into many issues with integrating our systems, whether it has to do with incompatibilities, file path issues, or more. We will be working hard throughout the weekend and until the demo in order to finalize our integration. Along with this risk comes the risk of not having enough testing. If we don’t finish our integration in time, we will not be able to conduct thorough testing of our complete system, and also won’t be able to conduct thorough user testing. As mentioned last week, if we cannot test tomorrow we will likely have to test during the demo on Monday so that we can include our results in the final report.

We have not made any changes to the design of our system since last week, and have not updated our schedule.

For our latency tests, we measured the time between the button press for starting/stopping audio, and the result. We took 10 trials and averaged the results. For the start button latency, we measured an average of 5 seconds, and for the stop button latency, we measured an average of 60 ms. It is worth mentioning that these latencies are dependent upon the network delay and traffic at the time.

For our weight and size measurements, we simply used a scale and a ruler. The weight of our attachment was 86 grams, and the size was 76 mm by 42 mm by 37 mm.

For battery life, we measured a minimum battery life of 5 hours when the device was being used constantly. This was after testing the battery on 5 separate occasions.

For our graph detection ML model, we gathered around 2000 real images of lecture slides which we split into a training and validation set. The validation set had a size of about 100 images and our unit tests involved testing whether the graph detection model was able to detect good bounding boxes around graphs in the slides. To measure the accuracy of this, we used mean intersection over Union as a metric – calculating the overlap / total area of the predicted and true bounding boxes from what we labeled. We found that all graphs were accurately detected (100% detection rate) and the mean IOU was about 95%, so bounding boxes were pretty good about capturing the whole graph.

For our slide matching ML model, unit testing involved taking a validation set of 110 images from the images of slides we captured with our device in TechSpark. We then tested both components of our slide matching system. First, we tested the detection of bounding boxes around the slide number on the slide. These numbers were detected with 100% accuracy. We then took the cropped images of the slide number boxes and then ran a second model on them in which we did preprocessing and then classification of each digit. These tests revealed an accuracy of 73%, so our total accuracy for slide matching from unit testing is 73%.

For our graph description ML model, our unit tests involved measuring the mean accuracy in terms of token-to-token matching with the reference descriptions. We did this on a set of about 75 graphs out of the graphs we extracted from real slides, and this revealed an accuracy of 96%.

December 9, 2023December 10, 2023

Jaspreet’s Status Report for 12.09.23

This week, I continued to help with gathering images to test and train our ML models. I went to TechSpark and gathered about 400 total images of presentation slides being displayed on a large monitor. These slides had differently formatted slide numbers at the bottom right, and testing with these images helped us determine which format would be best for our slide matching model. I also added clips onto the side of our component case so that it can now attach to the side of glasses. However, in the middle of the week I tested positive for COVID, and I was unable to work for multiple days due to my sickness.

Similarly to last week, since our team’s progress is behind schedule, so is mine. Since we have not finished integration, I still have to place our code on our Jetson, and the plan is currently to do so on Sunday once integration is finalized. As a team, we must complete integration before the demo on Monday. Furthermore, we must complete user testing of our system either before or during the demo on Monday. After this, what remains is completing the final deliverables for our project.

December 1, 2023

Jaspreet’s Status Report for 12.2.23

Since the last status report, I have made a lot of progress on the hardware subsystem, and have helped with integration of our subsystems as well as gathered data for our ML models.

Regarding the hardware subsystem, the component case was finally printed and assembled, and is completely finished. Printing the case ended up being a lot more difficult than I thought, as I ran into issues where the print would fail halfway through, someone would stop my print, or the print would be successful, but have minor errors in the design that would require another reprint. Despite these problems, I have now assembled the glasses attachment, which is pictured below.

The camera is positioned on the right face, the buttons are positioned on the top face, and the charging port, the power switch, and other ports are positioned on the left face. As you can see, there is some minor discoloration, but fixing this is not a priority at the moment. If I have extra time next week, I should be able to fix this relatively easily.

Furthermore, I have helped with the integration of the subsystems. Specifically, I added code to the Raspberry Pi so that once the start button is pressed, it will not only send an image to the Jetson, but also receive the extracted description corresponding to that image. It then sends this description to our iOS app, where it is read aloud. Currently though, our code is running locally on our laptops instead of the Jetson, since we are prioritizing making our system functional first.

Finally, I spent many hours working on gathering image data for our ML models, as well as manually annotating our data. For our slide matching model, I used a script I had previously written on the Pi to gather images of slides that have slide numbers in boxes on the bottom right. One such picture is shown below. We were able to gather a couple hundred of these images. For our graph description model, I helped write graph descriptions for a few hundred graphs, including information about their trends and general shape.

My progress is currently behind schedule, since our team’s progress is behind schedule. We are currently supposed to be testing our system with users, but since our system is not complete we cannot do so. I am also supposed to have placed our code on the Jetson, but we cannot do so without finalized code. We will have to spend time as a team finalizing the integration of our subsystem in order to get back on track.

In the next week, I hope to be able to put all necessary code on the Jetson so that the hardware is completely ready. I also will help finalize integration for our project. After that, I will help with user testing, as well as working on final deliverables for our project.

November 18, 2023

Team Status Report for 11.18.23

I think we have grown as a team in terms of many skills like communication, planning, and time management. Throughout the semester so far, one strategy we have started employing recently is setting regular times to meet outside of class to work on our project, even though what we have to work on is separate and does not necessarily require other teammates to complete. This makes us more accountable and productive and was particularly useful before our interim demo. For joint tasks like data collection, we also set specific goals for how many images we wanted to gather per day, and we were able to stick to that schedule. One final strategy is to ask for help early on – I think that earlier in the semester, when we got stuck on something, we would try for a long time to figure it out on our own, but with the end-of-semester deadlines approaching, we found that it is best to ask for help immediately to resolve any issues.

The most significant risk is still the ML models not working. The ML model we must have working for our final project is the slide detection model, which is necessary to get any sort of output from our app. It must be able to at least identify the slide and output the text on the slide, even if the graph data cannot be extracted. We have been managing this risk by taking a lot of images that we can use to train the model on, and have contingency plans involving taking more images later on if the model is not accurate enough. Similarly, we have a lot of data for the graph data extraction model, but they are all similarly formatted because we decided to auto-generate them through a python script. If need be, we can find another dataset online that contains pre-tagged graph data in order to make the training set more diverse.

Here are some of the sets of pictures that we took in different lecture rooms across campus. We projected sample slides in HH-1107 and WEH-7500 as you can see below, but we also took images in other rooms. Below, we’ve shown what an image of a slide looks like with our camera, and also that we’ve managed to capture many slides from these angles.

We also fixed a bug (regarding the checkered box pattern instead of text) in the random slide generation, allowing for better quality slides to be produced. Here are some examples – we were able to generate 10,000 slides like this.

November 18, 2023

Jaspreet’s Status Report for 11.18.23

This week, I made progress on the CAD for our hardware component case. It was slightly difficult to import the CAD models for some of the components we bought, but all that’s left is finalizing the case design that surrounds it. I also spent time this week programming our Raspberry Pi so we could use it to gather image data for our slide matching model. We were able to take close to 6000 images that we can train on. Finally, I ordered new cameras with different FOVs and dimensions so that we can compare their performances and effects on our ML models.

My progress is behind schedule. I expected to have a printed component case by the end of the week, but have not been able to do so yet. In order to catch up, I will finish the CAD by Sunday so that we can print out the case as soon as possible. In the next week, I hope to fully complete my subsystem so we can begin testing.

November 11, 2023November 12, 2023

Team Status Report for 11.11.23

The most significant risks that could jeopardize the success of the project are primarily related to the ML Model: specifically, not having enough data for the ML model and/or not getting accurate results. We saw poor results this last week when training the graph detection model on a very small number of images, so contingency plans involve having the whole group involved in the data collection process as well as augmenting our data with a large number of auto-generated images, which we have already implemented.

Another risk is not being able to find enough testers/not finding visually-impaired people willing to work with us to test the product. Contingency plans involve testing the product heavily on sighted users, and we are managing the other risk by beginning to reach out to try to find visually-impaired testers.

We did not make any changes to the system design, and did not update our schedule.

We were able to connect up our Raspberry Pi, camera, and buttons as shown below.

We were able to get the graph generation and slide generation working this week. See Nithya’s post for images of this. We also got the stop button and the app to communicate by setting up a server on the app side.

November 11, 2023

Jaspreet’s Status Report for 11.11.23

This week, I finished implementing the hardware pipeline for sending images from our Raspberry Pi to our Jetson. There were several steps that I went through in order to finish this up. First, I made it so that the Jetson runs a Flask server on startup, so that it can receive the images from the Raspberry Pi through a POST request. I then made it so that the Raspberry Pi camera capture is also run on startup, so that the start button can be pressed to send an image. I also worked with Aditi to set up the stop button, so that any audio description playing from the iOS app is stopped as soon as the button is pressed. I tried to play with the camera settings of the RPi, but will need to further adjust them in order to capture satisfactory images.

My progress is now on schedule, now that we have adjusted our schedule to account for our current level of progress. Despite being on track, there is still plenty of work to be done. The first and most important part for completing our project is to design and print our component case that will attach to the user’s glasses. This must be completed within the next week according to our schedule. If I complete this faster than expected, I will work on decreasing the latency from when the start button is pressed and when the Jetson receives a new image. When designing, we expected that this latency would be much smaller than it currently is, so I will find ways to decrease it.

According to our schedule, in two weeks I will be running the following tests on the hardware system:

I will measure the latency between when the start button is pressed and when the Jetson receives a new image. This is one of the components of the total latency from when the start button is pressed and when the user receives audio description. The hardware component of the latency was estimated to be about 600 ms, but I did not properly account for the amount of time it would take to actually take an image. However, I do not see this being a large issue as we allowed for multiple seconds of leeway in our use case latency requirement.
I will measure the total size weight of the device. In our requirements, we stated that it had to be at most 25mm x 35mm x 100mm in dimension, and at most 60g in weight.
I will measure the battery life and power of the device. We stated that the device should be usable for at least 6 hours at a time before needing to be recharged.

November 4, 2023November 5, 2023

Team Status Report for 11.04.23

One major risk is that we will not be able to find an appropriate group of testers for our device. If we do not have enough testers, we won’t be able to gather enough quantitative data to indicate whether or not the various aspects of our design worked as intended. In order to tell whether or not we were successful in creating the device we proposed, we need to be able to compare our requirements to enough quantitative results. In order to manage this, we would need to reach out to our contacts and confirm that we can test our product with visually impaired volunteers. If we are unable to do this, we would instead have to settle for testing with volunteers who aren’t visually impaired. Although they would still be able to provide useful feedback, it would not be ideal. Therefore, we should prioritize managing this risk in the coming week.

Another risk is gathering enough data for the graph description model. We found out after looking into our previous Kaggle dataset in more detail that many of the graph and axis titles are in a Slavic language and so will not be helpful for our English graph description model. To manage these risks, we plan to devote the next couple of days to searching for and gathering new graph data; our contingency plan, as mentioned in a previous status report, is to generate our own data, which we will then create reference descriptions for.

We have adjusted our schedule based on the weeks that we have left in the semester. We plan to finish our device within the next three weeks to leave enough time for testing and preparing our final deliverables.

The following is a test image of a presentation slide displayed on a laptop that was taken from the Raspberry Pi after pressing the “start” button. We can see that the camera brightness may need adjusting, but that it is functional.

For pictures related to progress made on the app this week, see Aditi’s status report.

November 4, 2023November 5, 2023

Jaspreet’s Status Report for 11.04.23

This week, I continued to make progress on the pipeline for sending images to the Jetson from our Raspberry Pi. Now, the system is capable of saving an image after the start button is pressed, and it can take input from the stop button as well. However, the button setup is currently on a breadboard just so that it is ready for the demo this week. In the final setup, it should fit compactly within the component case. Once I have set up our HTTP server on the Jetson, we will be able to transfer the captured image via POST request.

My progress is behind schedule. I did not realize that both the Raspberry Pi and NVIDIA Jetson Orin Nano would require so many external components in order to operate. Specifically, I had to obtain microSD cards as well as various cables for display or input purposes. I also had more trouble setting up the WiFi connections than I had anticipated, and in hindsight, I should have reached out for help as soon as I started encountering issues. In order to catch up, I will need to speed up the process for designing and printing all of our 3D printed parts.

In the next week, I will first finish preparations for the interim demo, which include setting up the HTTP server on the Jetson and connecting both the Raspberry Pi and Jetson to campus WiFi. After the demo, I will finally begin designing our hardware component case as well as the textured button caps. This will put me back on track for completing the hardware subsystem on time.

October 28, 2023

Jaspreet’s Status Report for 10.28.23

This week I continued working on implementing the image to server pipeline using our Raspberry Pi Zero and Unistorm camera. I realized that the OS I had configured on the SD card was not properly compatible, so I went back and redownloaded Raspberry Pi OS. I then reconfigured the Pi so that I could use ssh to access it and use VNC viewer. I still have to finish setting up the GPIO button input and sending an image from the camera to an external server.

I ended up having to spend time completing work for other classes, and was not able to complete the goals I set for this week. I plan to spend most of Sunday completing my tasks for this week so that I can stay on schedule. Then, next week, I will begin creating a CAD of our 3d printed component case.

October 21, 2023October 22, 2023

Jaspreet’s Status Report for 10.21.23

This week, we received the hardware components that we ordered, and I will be able to begin work on testing and assembling them once we are back from Fall Break. While waiting for the components, I was able to test out capturing and sending images from the Raspberry Pi 4 and Arducam camera module that we borrowed. This will make it much easier to set up the same pipeline with the Raspberry Pi Zero and Unistorm camera module. The majority of the rest of the week was spent working on our design report, which took much longer than we expected.

My planned tasks for the near future are to set up an image to server data pipeline and create a 3d printed case for our hardware components. In order to accomplish my planned tasks, I will have to look into how to capture an image with the Raspberry Pi based on a button press input, and how to then send an image to a remote web server. For the case, I will have to look into how to create a functional 3d model in CAD software. I will also have to look into different methods of attaching our device to the side of glasses.

My progress is not on schedule, as we had planned to receive our components earlier in the week. However, when I submitted our order forms, I forgot to check it off with our group’s TA, so our order was delayed. Therefore, according to our schedule I will need to set up the image to server pipeline and test data transfer from our camera by the end of the week to catch up. In the next week, I hope to do this as well as set up a server on the Jetson so that we can test sending our images to it. Since I am behind schedule, it will be necessary to spend extra time to finish these tasks by the end of the week.

October 21, 2023October 22, 2023

Team Status Report for 10.21.23

One risk that we will have to consider is that our device’s attachment mechanism will not be sufficient or easy enough to use. When looking into how we could create a universal attachment for all types of glasses, we narrowed down our options to using either a hooking mechanism or a magnetic mechanism. With a hooking mechanism, we risk that our users may not be able to easily clasp our device on, and with a magnetic mechanism we risk that our device may not be secure enough. To manage the risk with a hooking mechanism, we can iterate over multiple designs and receive user feedback for which is easiest to use. For the magnetic mechanism, we can try to increase the strength of the magnet so that the attachment is more secure. However, it is worth noting that in the worst case scenario, if neither of these solutions work, the image capturing and audio description functionality of our device will still be testable.

Another risk to consider is the latency of the graph description model; after doing some more research into how exactly the CNN-LSTM model works (see Nithya’s status report), we discovered that the generation of the graph description may take longer than we originally anticipated. Specifically, the sequence processor portion of the model needs to generate the output sequence one word at a time, and the way that a particular word is generated is by performing a softmax over the entire vocabulary and then choosing the highest-probability output. This is discussed more in the “changes” section below, but we can manage this risk by (1) further limiting/decreasing the length of the output description, and (2) modifying our use case and design requirements to accommodate for this change.

The biggest change we made was adding the new functionality of the Canvas scraping software. We figured that it might be unnecessary, annoying, and difficult for the visually-impaired user to have to download the PDF of the lecture from Canvas, email it to themselves to get it on their iPhone, then upload it to the iOS app in order for our ML model to parse it before the lecture. We felt like it also might take too much time and discourage people from using our project, especially if they have lots of back to back classes with short passing periods. So, we decided to add the functionality where the user could simply click a button on the app, and have the Flask server automatically scrape the most recent lecture PDF depending on which button the user clicks. This incurs the following costs:

We need to add an extra week to Aditi’s portion of the schedule to allow her to make the change.
Professors must be willing to provide their visually-impaired students with an API key that they will put into the app so that the application will have access to the lectures in the Canvas course.
The visually-impaired user will have to ask their professor for this API key.

To address cost (1), Aditi is already ahead of schedule, and this added functionality will not put her behind. However, if it does, one of the other team members can help take on some of the load. To mitigate (2), we will provide a disclaimer from the app to explain to the professors that the app will only scrape materials that are already available for the user to see, so there will not be a privacy concern. The app will only scrape the most recent lecture under the “Lectures” module, so unpublished files will not be extracted. We felt like it was not necessary to mitigate (3), because asking for an API key will still likely be faster and less time consuming than having to download and upload the new lecture PDF before every class.

Another change is the graph description model latency which was mentioned above. We will need to change the design requirements to be slightly more relaxed for this specific portion of the before-class latency, which we set at 10 seconds. We still don’t have a specific estimate for how long the CNN-LSTM model will actually take given some input graph, but we may need to increase this time bound; however, this should not be a problem as we have a lot of wiggle room. In our use-case requirements, we stated that the student should upload the slides 10 minutes before class, so the total latency before class need only be under 10 minutes, and we are confident that we will be able to process the slides in less than this amount of time.

We have adjusted our schedule based on the changes listed above, and have highlighted the schedule changes in red.

October 7, 2023

Jaspreet’s Status Report for 10.07.23

This week, I ordered the hardware components that we plan on using for our project. This includes the Raspberry Pi Zero, camera, battery, and Nvidia Jetson. One issue I ran into while ordering parts was that the original camera that I selected was out of stock in many stores, and would take multiple weeks to arrive in others. Therefore, I ordered the backup camera instead, which is the Unistorm Raspberry Pi Zero W Camera. This camera has the same resolution and FOV, and is a very similar size, so I feel comfortable ordering it as a replacement. I also ordered an Nvidia Jetson Orin Nano Dev Kit from ECE inventory, which we plan on using to host our server with our ML models. Finally, I spent some time working with a Raspberry Pi 4 and Arducam module to test how to send an image wirelessly from the Pi. I plan on making more progress on this throughout the coming week.

I am slightly behind schedule, as even though I have ordered all of the hardware components, I have not looked into how to give the buttons texture so that they can be easily differentiated. However, I don’t expect this to take too much time, and I should be able to figure out the solution this weekend. For the next week, while I wait for components I hope to continue testing out how to send images wirelessly with a Raspberry Pi 4 and compatible camera. However, I don’t expect this to take up a lot of time, so I plan on helping my group members with their work. Specifically, I plan on helping gather image data for training our ML models, and I will get more information on what data to gather from Nithya.

September 30, 2023October 1, 2023

Jaspreet’s Status Report for 09.30.23

This week, I selected the necessary hardware components for our design, including the camera, battery, and computing device.

Computing Device: Raspberry Pi Zero WH. Our computing device needed to be able to send image data wirelessly to our server at the press of a button. It also needed to be small and lightweight so that it could fit comfortably to glasses. The Raspberry Pi Zero WH fulfills all these roles, as the W indicates that it is compatible with WiFi, and the H indicates that it has GPIO headers which can be connected to our buttons. It has dimensions of 65 mm x 30 mm x 10 mm and weighs 11 g, which is small enough for our purposes. Another plus is that it has a built in CSI camera connector, which we can take advantage of.

Camera: Arducam 5MP OV5647 Miniature Camera Model for Pi Zero. Since we are using a Raspberry Pi Zero, it makes sense to use a camera made exactly for that board. Therefore, I chose the Arducam Miniature Camera Model. The camera itself is about 6mm x 6mm, and is attached to a 60 mm flex cable that in total weighs about 2 g, which is small compared to other camera modules.

Battery: PiSugar 2 Power Module. After searching for rechargeable lithium batteries for the Raspberry Pi Zero, I came across the PiSugar 2. This is a custom board and battery made specifically for the Pi Zero, which makes it easier to power the Pi. This weighs about 25 g, which is a lot, but most batteries that provide enough power for our use case requirements weigh about this much.

Buttons: Any medium sized push buttons will work for our use case. I need to look more into how I can texture these buttons to make it easier for a blind user to differentiate between the start and stop buttons.

The useful courses that helped me throughout this week include 18-441 Computer Networks and 18-349 Intro to Embedded Systems. In these courses I learned about sending data over wireless connections as well as using GPIO pins to read inputs from buttons.

My progress is now on schedule. In the next week I hope to order all necessary components, and begin working on the pipeline for sending an image from our camera through a Pi. I have acquired a Raspberry Pi 4 and a compatible camera that I can test on and use to gain insight into the work I will need to do once we receive our components.

September 23, 2023October 1, 2023

Jaspreet’s Status Report for 09.23.23

This week, I focused primarily on preparing for the proposal presentation. My secondary goal was to do research on which hardware components we should be using for our design. These components are the camera, microcontroller, buttons, and battery.

I am behind schedule, as I expected to make more progress on selecting hardware components for our expected design. Therefore, I plan on spending extra time this weekend to catch up.

In the next week, I hope to have a completed first list of selected hardware components, with detailed explanations for why those selections were made. This includes listing out all components that were considered and the various tradeoffs between these components. SWaP-C must be considered for each component, especially since most of our use case requirements depend on the size, weight, and power consumption of our device.