April 6th Status Report — Surya Chandramouleeswaran

This was a busy yet exciting week for our group in preparation for the interim demonstration. I continued work on the hardware and classification components of our project with the goal of shifting to integration in the coming weeks.

Operating the RPI in a “headless manner” continued to offer its own challenges. Interfacing directly with the board required an SSH connection and a remote viewer (typically RealVNC), which would be quite slow at times. As such, observing the camera performance through the SSH connection resulted in significant frame lag and limited resolution. My goal for the coming weeks is to improve our camera performance through a headed monitor setup and try a USB camera as opposed to a 3rd party vendor (Arducam).

Dependencies associated with classification hardware:

Resulting camera’s first image!:

 

To elaborate, the plan would be to automatically “capture” an image once the algorithm is 80 percent (or higher) confident that it has classified the object correctly. The formal classification operates on the backend, but an 80 percent benchmark from this rudimentary algorithm I’ve implemented on the RPI typically indicates the image is of sufficient quality, so it is a good heuristic for image capturing. The resulting image then needs to be sent to the database. Once our web application is hosted, I’ll add the IP addresses of both RPIs to the database to allow it to accept images from our RPIs. The user will then have the option to reject or accept the image if it is not of sufficient quality. I will implement these steps as soon as the new cameras come in (probably next week).

Verification in the context of hardware consists of an evaluation of classification accuracy and latency considerations. There are inherent design tradeoffs between algorithm complexity, hardware limitations, and the overall usage time scale we want the product to be operated within.

This also entails evaluating system performance under various environmental conditions. Namely, we plan to conduct tests under different lighting conditions, angles, and distances to understand whether the algorithm maintains consistent accuracy and latency across different scenarios.

We are looking for an 80 percent confidence metric on the hardware side given that it just needs to take a clear enough picture and forward it to the database. So verification will entail checking that the classification accuracy meets or exceeds this threshold while maintaining respectable latency. Finally, it is important to balance quantitative verification with qualitative verification, so we will rehash these ideas once more with Prof. Savvides and Neha (our assigned TA) to build a thorough verification of system performance on the hardware end.

Our progress is coming in well on all fronts and I am excited to continue improving the performance of our design.

Steven Zeng Status Report 04/06/24

This week my focus was concentrated on the demo and working towards proper functionality of my designated components. Thank you to the peers and professors for the feedback and making sure the interim demos went well.

I did a lot of work in integrating the various ML features into the web application. This required working closely with Grace in setting up the Django templates and functions. I was able to integrate Tesseract OCR into the views,py folder to output the caloric information on uploaded images. This also involved using the Pillow library to operate on images. Likewise, to boost accuracy, I created functions that process the uploaded image and turn it into gray-scaled and enlarged versions of the original image. This greatly improved the accuracy when we conducted tests. The next step is to incorporate the Raspberry Pi camera captures into the web application’s OCR capabilities. Furthermore, I made progress with Grace in developing a model to represent inventory entries to store in the SQL database. We were able to display all the items uploaded as well as their caloric values. The next step is to create individual inventories for each user that is logged in. I also plan to look into Paddle OCR as an alternative to Tesseract to potentially improve the text extraction accuracy.

Furthermore, I worked alongside Surya to make sure the text classification algorithm worked properly on the website. We used a GoogLeNet training model alongside Keras and OpenCV. However, I plan to experiment with RasNet-18 to better the classification results. This was suggest by Professor Marios which I researched thoroughly this week after the demo. The GoogLeNet model had so many categories which led to a lot of misclassification, but the RasNet-18 model allows more flexibility which would greatly benefit our product’s efficiency and accuracy.

The next thing I worked on after the demo involved the ChatGPT4 Vision API which allows image inputs. This would greatly improve the accuracy of our classification and OCR problems, and it would simplify the code base tremendously. I did a lot of troubleshooting to make sure the syntax worked, and I plan to see how good the outputs are when everything runs according to plan.

Lastly, I looked into cloud deployment using Nginx which works well with the Rpi. This is the next step in our project. Next week, I plan to work more in integrating and developing all the remaining ML functionality and database operations to display on the website. Likewise, I will work with everyone in getting the Rpi, camera, and scale to work as we integrate these components into the physical box and the web application.

Regarding the question that we have to answer, I will need to verify various components to the ML design and integration. The first being the classification accuracy. I will need to have the accuracy specified in our design report of above 90%. This algorithm will need to correctly classify between fruits and canned foods as well as among three various fruits (strawberries, bananas, and oranges). The comprehensive test will include testing of a large compiled dataset of around 250 stock and taken images. These tests will take place locally and be automated in a google CoLab file. Likewise, the image text recognition needs to be close to 100 percent to correctly extract the calorie amount from the nutrition label. This will test the OCR algorithm performance. These tests will consist of 200 nutrition labels of all types and angles with various colors. These will also be tested locally to ensure automation and easier analysis. These will show my contribution to the project as I took the lead on the ML design portion from the start. Likewise, there will be latency tests that will need to reach a satisfiable threshold of under 5 seconds based off research on user attention span. These also will meet the design requirements that were specified in our design report.