Final Video Submission

Please find the link to our video demo detailing our final demonstration:

We are excited to host a live demo tomorrow, Friday, May 3rd at Wiegand Gymnasium, and we hope to see you all there!

April 27 Status Report — Surya Chandramouleeswaran

As the semester comes to a close, this status report aims to provide a final technical update along with a quick reflection on some of the lessons we have learned thus far as a group.

First, in the interest of a technical update on our progress: we are still debugging the mySQLConnector in the process of integrating the separate algorithms that comprise our project. Namely, we need to implement the logic to encode the binary data sent from the generated image and decode this binary data on the server side before it is written into the database.

I was able to get the SSOCR (Seven-Segment Optical Character Recognition) working for still images; this would be used to communicate the scale readings to our database. Here is an example of how it works on a homemade test case:

 

Resulting output:

On the Pi-capturing end, the logic is to continuously capture and parse frames until a significant positive value is registered (something is placed on the scale). From there, a counter is implemented to let the weight standardize, and the image is sampled. This image is then forwarded to the database which parses the image (SSOCR) and writes the value into the database. Fine-tuning the SSOCR so it works across broad testing inputs remains a personal goal of mine in the coming week.

Another bucket list item we would like to work on is improving the front end of our website. Grace plans to take the lead on that as demo day approaches, with supplementary assistance from me and Steven based on our progress with mySQLConnect.

Some of the lessons we learned from the semester include handling the tradeoff between building your components for stages of the project versus buying premade parts. The tradeoff arises between speed and customizability, and in general, we tended to favor the latter given that we saw the project as an opportunity to apply what we have learned to build new things as opposed to learning how to use a premade system that may be rather restricted in functionality towards our use cases. Another lesson was the importance of documentation on a routine level: we found that status reports, weekly meetings, and report writing and assignments throughout the semester help maintain consistent progress in our project throughout the semester so that no point of the design process devolves into an unbearable time crunch. This is a very extendable skill for our careers beyond graduation.

We would once again like to thank the 18-500 course staff, and all the educators at CMU whose advice and guidance over the past few years of undergraduate study were applied in the development of our project. While there remain some outstanding intricacies to resolve ahead of the final demonstration, we are excited for the opportunity to show off our hard work throughout the semester in the final demo showcase this coming Friday.

Grace Liu’s Status Report for April 27th, 2024

As Steven prepared for the final presentation, Surya and I helped provide any necessary feedback to ensure the presentation effectively portrayed our current work in progress along with incorporating previous presentation suggestions. While most of the hardware components remained static, the design tradeoffs involved weighing the pros and cons of different image classification libraries, showing greater overall benefits in Tesseract OCR and ResNet-18. With that being the main focus of the presentation, we tried allocating less time towards the video demos but rather verbally explain more.

Since we are approaching all of the final components to our project, I noticed there were still some frontend components on our web application that could be improved involving user experience, performance, and functionality. First, focusing on the UI, the previous designs had too much clutter on one side of the screen so it is important to ensure appropriate whitespace is utilized to not overwhelm users with too much information. I had some trouble with maintaining a consistent branding and typography on the pages since they do range from creating new posts for the global page to displaying parsed caloric information from uploaded images. I will receive further feedback from user testing to see which pages are easier to navigate and proceed from there.

Additionally, since we have made substantial progress in integrating the subsystems together, there were improvements to be made on the interactive elements of the web application upon its setup on Nginx. Since all the necessary buttons and forms have been implemented onto the UI, I worked on making certain ones stood out from others to ensure users visually notice them first. Another issue that emerged was providing feedback and error states/messages to users after their actions such as scanning the product with the camera. Providing helpful error messages can immediately help users identify their incorrect behavior and receive necessary feedback as soon as possible to improve.

Since I previously helped Steven in gathering data for the ML algorithms, we also worked together in its testing and validation. Specifically regarding testing, because we did not yet reach our initial goals in accuracy determination, we conducted various tests on our label reading and image classification algorithms to improve their validity. The crucial part of these tests is to ensure they perform well under controlled conditions before letting users use the product in real time. We previously spent a lot of time with clear images of these nutritional labels and fresh produce, so the necessary action now is to conduct live testing with the RPi cameras, allowing us to assess how the ML algorithms adapt and perform in real-time. This transition in testing is intended to close the gap between theoretical accuracy and real life applicability of the various trained algorithms.

With the remaining time, our group will shift efforts towards maximizing efficiency of our algorithms and make necessary fixes before the final demo on Friday. As I have previously worked on research posters prior to this class, I will be mainly working towards finishing that before the deadline. My group and I look forward to presenting our semesters’ worth of work to the faculty along with others who may be interested in learning more about our product and how it could be expanded to a wider application in the future.

Team Status Report 04/27/24

Below are some of the tests that our team conducted:

  1. Image Classification Accuracy Tests: We had a dataset of 500 compiled images of all forms (white background, normal background, and blurred background) and tested the classification accuracy of it. The results fell into one of four categories: canned food, bananas, oranges, and apples. However, our system will be able to classify into other groups if needed. The purpose of the four categories was to simplify the confusion matrix into a smaller form. The accuracy was around 89%; however, we hope to aim for around 95%. The design changes that we will make to improve the accuracy involve background reduction to make the image consist of only the primary object.
  2. Image Classification Latency Tests: We had a dataset of 500 compiled images of all forms (white background, normal background, and blurred background) and tested the classification speed of it. Our goal is for it to take around 2-3 seconds. The tests took 4-5 seconds on average which is not too far from the final goal. Our primary focus is on accuracy, but once we get accuracy to the ideal threshold, we will work on optimizing the classification algorithm (potentially removing layers).
  3. Text Extraction Accuracy Tests: We used our sample of 125 canned food images as well as an additional 75 pictures that we took to create a testing set of 200 samples. We had two focuses: calorie extraction and product label extraction. The result indicated a 92% accuracy in obtaining the right calories which we ideally want to be above 96%. The primary issue is that sometimes the text extraction algorithm formats it into a weird way in which the calorie amount does not follow the word “calorie.” As a result, we plan to tweak our text extraction algorithm to take care of edge cases like that. In addition to that, we were able to extract the product label 98% of the time. The issue is the outputted text includes too many extraneous characters. We will work on a parsing algorithm to make the label clear and concise for inventory documentation purposes.
  4. Text Extraction LatencyTests: Using the 200 samples, we reproduced a score of 3-4 seconds for the whole calorie and product label extraction process. The results were skewed a little by the fact that some tests took exceptionally long trying to find the calorie amount. We expect to see this value to decrease to 1-2 seconds after tweaking the calorie extraction algorithm. Our goal is 2-3 seconds, so this gives us some buffer in adding a product label sanitization algorithm to make the product label clear and concise rather than a blob of all the text.
  5. RPi and MySQL Connector Latency Tests: We used a set of 100 image captures from the RPi (200 in total: 100 for product images and 100 for scale readings) and timed the time it takes from sending to being accessed via the web application. Our ideal time range includes 10-15 seconds. However, the results demonstrated on average 22.4 seconds out of all our samples. We hope to make some design choices that involve synchronizing when the first and second camera takes image captures to improve the latency. Likewise, we can reduce the polling time it takes to stabilize the image to speed up this process because the quickest speed from one training sample is 16.5 seconds. We found that it does not take too long for the product and weight to stabilize, so a slight alteration in the Python script will improve latency tremendously.

Regarding progress, Steven was able to debug and fix our ML classification model with ResNet-18. There were many bugs with integrating it into the front-end, so Steven was able to tweak it to run properly and output the classified results. Before it was not outputting any result, so Steven was able to modify it to be able to process images correctly. Likewise, he was able to improve our background reduction algorithm to make the RPi image clearer and easier to classify due to less external features in the database entry. This ultimately improved accuracy to around 89% which is roughly similar to our desired goals. He will work to fine-tune it more to get the accuracy up to 95%.

Grace and Steven also conducted more testing and validation regarding the ML components which included a lot of unit test with our compiled data that we had from previous weeks. These tests will be talked about more in the team status report, but we did speed and accuracy tests on the ML algorithms for image classification and text extraction. Most of these tests were done locally first with stock images to ensure quick and efficient testing. The next step is live testing with live pictures from the RPi on the website.

Surya was able to get a high accuracy on the Seven Segment Optical Character Recognition that effectively establishes communication between the scale and the MySQL database. The extracted weight would be forwarded to the database to be written and also used for backend calculations. This was after failed testing with the Tesseract OCR since it has trouble extracting digital text fonts. He plans to work on fine-tuning this algorithm with the remaining time to work across broad testing inputs.

Grace spent most of her time improving frontend components involving user experience, performance, and functionality. Since most efforts were shifted towards creating the database retrieval system in MySQL, improvements on the design and interactive sides of the website as it could initially be overwhelming to navigate all the different pages. I am thinking instructions could be included somewhere to help users with the functionalities. Another issue that emerged after integrating the subsystems together was providing enough feedback and error states/messages to users after their actions such as scanning the product with the camera. It is crucial to provide helpful error messages to immediately help users identify their incorrect behavior and receive necessary feedback as soon as possible to improve upon.

Likewise, we all worked closely to deploy the web application on Nginx to create a MySQL database to be connected with the RPi using a MySQLConnector.  This involved work with coordinating between two USB cameras and testing with one vs two RPis. Likewise, we coded python scripts to take images of both the object and the digital scale output.

Steven Zeng Weekly Status Report 04/27/24

This week started off with my presentation, so I was able to present and take in the feedback and questions to better our project and my presentation skills. Thank you to all our peers and faculty for the thought and attention! I also worked with my group on the group poster early in the week.

I was able to debug and fix our ML classification model with ResNet-18. There were many bugs with integrating it into the front-end, so I was able to tweak it to run properly and output the classified results. Before it was not outputting any result, so I was able to modify it to be able to process images correctly. Likewise, I was able to improve our background reduction algorithm to make the RPi image clearer and easier to classify due to less external features in the database entry. This ultimately improved accuracy to around 89% which is roughly similar to our desired goals. I will work to fine-tune it more to get the accuracy up to 95% hopefully.

Likewise, I worked closely with Grace and Surya to deploy the web application on Nginx to create a MySQL database to be connected with the RPi using a MySQLConnector. I assisted Surya in the process of forwarding data from the RPi to the SQL database. We worked closely using MySQL Connector, and there slight configuration issues that we ended up solving. This involved work with coordinating between two USB cameras and testing with one vs two RPis. Likewise, I helped code python scripts to take images of both the object and the digital scale output.

Lastly, I conducted more testing and validation with Grace regarding the ML components which included a lot of unit test with our compiled data that we had from previous weeks. These tests will be talked about more in the team status report, but we did speed and accuracy tests on the ML algorithms for image classification and text extraction. Most of these tests were done locally first with stock images to ensure quick and efficient testing. The next step is live testing with live pictures from the RPi on the website.

Overall, this week involved a lot of collaborative work as we seek to integrate all our individual components into a working product. I hope to conduct more testing and fine-tuning to the ML models as we near the demo date. However, we are slowly seeing our product in full form.

Grace Liu’s Status Report for April 20th, 2024

With the final presentation and many other deadlines coming up, I am anticipating the final steps we will make and progress towards having a completed MVP for the final demo. A portion left on the user interface was integrating the MySQL database into the environment and setting up user roles for what they are allowed to and not allowed to manipulate. I was able to learn most of this information from online tutorials as well as previous knowledge from my Web Application Development and Security classes to prevent injection attacks.

Different data input/output operations were defined including the upload of new entries after successful scans from the hardware, deletion by user after consumption, and updating daily intake after this deletion. I also did individual database testing to ensure each of them operates correctly. In the remaining time, I will work with Steven and Surya to further check the integration between the hardware and web application display.

Since another presentation is coming up next week, a lot of the group’s efforts were concentrated on revising and completing the final presentation. I helped Steven with a video demo to be included in this. Most of the changes from the design presentation include highlighting changes made between the original design and our current design now, solution approach evolutions such and social and global considerations, and most of the completed testing to balance between design trade-offs. Most of the designated software/hardware design changes have been highlighted in our previous presentation, but we’ve decided on additional pages to be included on our web application for more versatile usage. For instance, in consideration of social concerns like higher interaction between different users and prevention of misusage, we’ve included a more social aspect of the application that allows users using the same device to optionally see each other’s food activity. Additionally, there is a page that takes in user inputted calorie threshold for them to view whether or not their calorie intake has exceeded their personal goal or not. 

In regards to testing, since most of the web application functionalities have been completed for the interim demo, I used the extra time to help Steven with his ML algorithms and Surya with scale calibrations. Since our project diverged towards ResNet-18 for CNN, I helped Steven with image preprocessing to ensure the system would receive the most readable images and achieve higher accuracy reading. We conducted batch testing to increase efficiency and make full use of our computational resources.

In regards to the scale, range and sensitivity testing was done to see if small weight changes could be detected. That is most of the testing that needed to be done, and if there is additional time, we can also conduct durability testing to verify the scale’s performance under different temperature conditions. While improving the performance of each subsystem, we are shifting our efforts towards system integration testing to monitor the system’s abilities to handling errors and how they could possibly be reported to the user.

April 20 Status Report — Surya Chandramouleeswaran

The final list of items to complete on my end with the demo coming up soon include configuring the database, sending captured images to it, and slightly adapting this process for the scale measurement case.

On Monday I worked with Steven on writing some logic to auto-capture the image when the resolution is good enough. Thanks to the excellent camera quality (pictured below working on a watch for example), we’ve noticed better object recognition results/classification accuracies:

 

Again, the goal for this stage is just to generate a clear image to send to the database; the database will take care of proper recognition and update. It wouldn’t make sense to host such intensive operations on the Pi for performance reasons. With that said, it is nice to see that the algorithm classified my watch as an “analog clock,” with “stopwatch” being a somewhat valid classification all things considered.

The image is then saved to a directory and the mysql.connector() class of functions uses “blob” structures to send a relatively large set of data (an image) in the format of binaries that can be received and interpreted by the MySQL database.

Our web application implements AJAX (Asynchronous Javascript and XML) to render the user food entries on the website with corresponding nutritional information. It will render whatever is in the database, so the image will pop up on the user end when it has been entered into the DB by the application.

The other item to complete is the same process but for text recognition on the scale reading. This is easier because we just need to communicate scale numbers (much smaller data types than pictures) over to mySQL. It involves building off existing frameworks but modifying them around a digit recognition algorithm as opposed to image classification.

As such, our primary focus is trying to optimize and improve the accuracy of all the subsystems we have implemented thus far. Efforts have been more group-oriented now than ever before so we can patch up the areas that need improvement to allow for as smooth a demo as possible.

Team Status Report 04/20/24

This week we had a wide variety of tasks that we needed to complete. The first involved the final presentation. We all worked to combine aspects of our previous presentations in addition to our new testing results and design tradeoffs. This involved collection of data and recording videos for the presentation inclusion to capture our team’s current progress thus far.

We also focused our attention on forwarding data from the rpi cameras to the database. We will need to record both the image of the product as well as the scale reading. These recordings will be stored in a MySQL database which we access through the cloud using Nginx cloud deployment. These tasks involved a lot of python code on the rpi and work configuring the rpi to properly write to the database. Since we have only recently focused our efforts on the scale integration, we came up with a comprehensive strategy in regards to its testing. Some calibration was done to ensure the scale provides accurate readings. Another question emerging was whether or not it could easily detect small changes in weight or not. Sensitivity is a big component to consider since we want to minimize latency between weighing and transferring the data to our user interface.

Another area where we concentrated significant effort was in the ML portion of our project. We tested ML classification in both the website end as well as the rpi end. Furthermore, we integrated ResNet-18 instead of GoogLeNet to improve performance. This involved a lot of configuration details that we had a lot of errors in. Likewise, we needed to fine-tune this classification algorithm to make it classify into the four primary groups. In addition to that, we coded up some functions to alter images to reduce the effect of the background. This involved some background subtraction code and experimentation using Mask R-CNN to extract instances of images in the picture. In our case, there is only one image in the photo, so Mask R-CNN will solely extract the pixels for that while ignoring the background. There are still some bugs in the implementation which we are currently debugging, but we hope to finish a working algorithm which would greatly improve the accuracy.

In addition to classifying canned foods and fresh produce, we also implemented 7-segment display OCR to streamline the process of digital capture of the digital scale reading. The camera inside the box had to be displayed in a manner where it could get a clear, unobstructed view of the reading. In a similar process to the previous ML classification, image preprocessing was done including conversion to black and white and cropping the image to focus on the weight reading only. This ensures the images to all be similar besides the actual reading. Currently, we do not have the best accuracy results with using Tesseract but we will conduct further testing with various light conditions to verify that this subsystem correctly interfaces with the web application. A possible solution is to experiment with the different page segmentation modes to treat the images as a single line of text (i.e. ‘–psm 11’). Of course, good lighting will also play an important role in ensuring the text to be as legible as possible without any shadowing and reflections.

Last but not least, we worked a little on the front-end to better display the data. We implemented the infrastructure of the code to access values in the MySQL database and query based of username. We were able to finish the code to create tables, lists, and graphs as needed in the web application. We hope to have all this connected and working by end of next week with the rpi and hardware. We shoot to have a working, fully-integrated product that resembles our MVP by end of next week.