Steven Zeng Status Report 04/06/24

This week my focus was concentrated on the demo and working towards proper functionality of my designated components. Thank you to the peers and professors for the feedback and making sure the interim demos went well.

I did a lot of work in integrating the various ML features into the web application. This required working closely with Grace in setting up the Django templates and functions. I was able to integrate Tesseract OCR into the views,py folder to output the caloric information on uploaded images. This also involved using the Pillow library to operate on images. Likewise, to boost accuracy, I created functions that process the uploaded image and turn it into gray-scaled and enlarged versions of the original image. This greatly improved the accuracy when we conducted tests. The next step is to incorporate the Raspberry Pi camera captures into the web application’s OCR capabilities. Furthermore, I made progress with Grace in developing a model to represent inventory entries to store in the SQL database. We were able to display all the items uploaded as well as their caloric values. The next step is to create individual inventories for each user that is logged in. I also plan to look into Paddle OCR as an alternative to Tesseract to potentially improve the text extraction accuracy.

Furthermore, I worked alongside Surya to make sure the text classification algorithm worked properly on the website. We used a GoogLeNet training model alongside Keras and OpenCV. However, I plan to experiment with RasNet-18 to better the classification results. This was suggest by Professor Marios which I researched thoroughly this week after the demo. The GoogLeNet model had so many categories which led to a lot of misclassification, but the RasNet-18 model allows more flexibility which would greatly benefit our product’s efficiency and accuracy.

The next thing I worked on after the demo involved the ChatGPT4 Vision API which allows image inputs. This would greatly improve the accuracy of our classification and OCR problems, and it would simplify the code base tremendously. I did a lot of troubleshooting to make sure the syntax worked, and I plan to see how good the outputs are when everything runs according to plan.

Lastly, I looked into cloud deployment using Nginx which works well with the Rpi. This is the next step in our project. Next week, I plan to work more in integrating and developing all the remaining ML functionality and database operations to display on the website. Likewise, I will work with everyone in getting the Rpi, camera, and scale to work as we integrate these components into the physical box and the web application.

Regarding the question that we have to answer, I will need to verify various components to the ML design and integration. The first being the classification accuracy. I will need to have the accuracy specified in our design report of above 90%. This algorithm will need to correctly classify between fruits and canned foods as well as among three various fruits (strawberries, bananas, and oranges). The comprehensive test will include testing of a large compiled dataset of around 250 stock and taken images. These tests will take place locally and be automated in a google CoLab file. Likewise, the image text recognition needs to be close to 100 percent to correctly extract the calorie amount from the nutrition label. This will test the OCR algorithm performance. These tests will consist of 200 nutrition labels of all types and angles with various colors. These will also be tested locally to ensure automation and easier analysis. These will show my contribution to the project as I took the lead on the ML design portion from the start. Likewise, there will be latency tests that will need to reach a satisfiable threshold of under 5 seconds based off research on user attention span. These also will meet the design requirements that were specified in our design report.

Leave a Reply

Your email address will not be published. Required fields are marked *