Steven Zeng Status Report 04/20/24

I spent some time this week with the presentation and preparing for what I am going to say. This included creating the demo video that involved both recordings of the website interface and its ML capabilities as well as the rpi image retrieval system. Likewise, I had to compile all the testing results and graph them to portray in the slideshow. This consisted of gathering data that I had previously and learning how to plot using Matlab.

In addition to the design presentation, I focused a lot of my work on getting ResNet-18 to work and also simplifying its classification algorithm to output one of four results: orange, banana, strawberry, and canned food. This implementation was done solely on the web application rather than tested locally due to time constraints. I did some preliminary testing, and it worked enough to put in the results section. However, I need to focus more on making the image sent from the rpi to be clear enough to classify. This includes implement background reduction as well as an emphasis on the physical design to incorporate sufficient lighting to illuminate the image properly.

Lastly, I worked with Grace in setting up a SQL database that ensures safe access and editing of data. This is necessary in ensuring that the rpi data can be sent and stored to later be accessed by the website. To do this, we also did a lot of work in cloud deployment using Nginx which had its various complications. In the end, we were able to successfully deploy it, and now we just need to do testing on retrieval and processing through the cloud.

Next week, I plan to work closely with Surya in the communication between the rpi and the website. Likewise, I will need to help out with the placement of lights in the box to properly illuminate the image to prevent reflections and graininess. Then, I will need to tweak the ML classification and text recognition algorithms based off the original accuracy results. Lastly, I will need to work with Grace to ensure proper MySQL database access to ensure that data is properly being forwarded and processed according to our specifications.

Grace Liu’s Status Report for April 6th, 2024

This week was decently busy as my group members and I prepared for the interim demo. While not all of the components are properly connected together at this moment, we made good process on each of the subcomponents. I made some additional progress based on feedback and takeaways from the ethics discussion class since the last time we presented during our team meeting.

During this stage where we were trying to merge components of the web application including Surya’s scale readings, Steven’s ML classification, and my frontend additions for the user, there were conflicting changes that prevented pulls from our respective branches. Additionally, there were authentication issues that came up with certain additions, so I had to trace back to the most recent git commit that did not have this problem. This caused a lot of lost time to work on the functionalities, so the additional time to figure out the authentication problems we were kindly granted allowed me to finish most components we planned on having on our website.

To elaborate on the additions, I will include screenshots and explanations of pages included in our demo below:

Our WebSocket library and server have already been set up, but latency issues still persist so during the validation phase, I will look more into content delivery networks or choosing a different message format to minimize additional network traffic. For now, the page is able to display chat messages that are updated in the UI but more testing during the deployment stage will have to be done to ensure users are receiving their respective messages.

The calorie consumption progress page below is something we considered when brainstorming how to improve the user experience and provide more fitness encouragement. By displaying such a line graph of all of the user’s daily tracking, they are able to see their overall improvement after using our application. I ran into issues with rendering the inputted data and customizing the graph to update with each additional update. Enough testing was implemented to ensure efficient data synchronization between the client as well as the server.

This page is more to show proof of concept since there is still more work to be done with connecting the classification and scale readings to the cloud database. But essentially, inputting values for a certain day will be retrieved and displayed in the rendered chart below. I was having trouble saving values from previous days, but that is something that would be included in the end product to show progress. Perhaps a diagram that can directly compare improvements and vice versa.

Lastly, also for proof of concept and connected to the data retrieved from the previous calorie progress tracking page, users can set a calorie limit to see whether or not they have exceeded this limit on a daily basis. This is achieved through a simple if-else statement on the HTML page where red means they have exceeded their set calorie limit, and vice versa for green entries.

To answer this week’s additional prompt, more testing has to be done regarding individual components I am working on as well as API integrations. For instance, unit testing will be done to ensure the backend is able to reject invalid data like negative calorie entries or if a certain limit is exceeded. CRUD operations will need to be tested to ensure they work correctly and this changed data would be stored correctly. This would include deleting chats between users, deleting posts or comments under posts, or deleting caloric entries under the different pages that include these functionalities. In addition to proof of concept, we will also include deleting entries that will be added to total daily consumption.

Some basic testing for user security would be compatibility between different browsers to ensure consistent behavior across them, usability testing to gather navigation feedback and general user interface feedback, and more security tests. Previously, I conducted tests on SQL injection and cross-site scripting attacks. Since OAuth is an inclusion, proper authentication as in only CMU users are able to use this application upon signing in. Another thing that will be considered is file upload testing. Users are uploading profile pictures or (for now) pictures for posts that will eventually be pictures from the camera scanning, vulnerabilities will have to be tested to ensure malicious files would not compromise the server.

Considering most of our subsystems have not been connected together yet, I plan to finish all my testing to prevent future complications with my specific subsystem. Specific requirements that must be met include proper updating and including as many vulnerability prevention measures as possible for most common security vulnerabilities. I look forward to further discussion with my group members and identifying important validation points for the merging of subcomponents.

Team Status Report 04/06/24

To start the week off, we were able to make good progress in showcasing our project’s individual components (Rpi integration, Arducam, scale, physical box design, website ML functionality, SQL database access, and retrieval) for the interim demo. These were all separately shown during the demo.

In the context of validation, this is an evaluation that is more system-centric and requires us to analyze how well the final product matches our problem statement and proposed approach. Some of the system-wide approaches we plan to consider include the following:

    1. Image forwarding: How well does the RPI forward information to and from the mySQL database? With a 30-second overall latency requirement, image capturing and forwarding should be done in a timely manner (which also means the first-level classification performed on the RPI should be rather light and efficient given that an exact match is not required for that stage of the design)
    2. Concurrency: Principles of ACID and two-phase locking implemented in the views of the web application are important to ensure safe database access and retrieval. From our coursework, we have learned that the database is the most vulnerable component regarding system failure. Other than data sanitation and validation (which are separate concerns), we have to regulate access to elements in the database that are constantly being deleted and re-inserted. Particular edge cases include handling duplicates (eating the same thing multiple times in the day) and ensuring the DB generates new entries for each object.
    3. Tesseract OCR: There are still accuracy issues persisting with text recognition in relation to the image quality, so we have to perform more testing under different conditions for real-time processing scenarios. More diverse set of images can be used and calibrated to see how the text is best extracted in the most accurate way possible such as grayscale conversion, image cropping, font sizes, and background. By systematically testing Tesseract performance on these various images, we are aiming to find optimal settings to achieve the most accurate text extraction for reading food labels. In doing so, we aim to enhance the precision of our text recognition for this real-time application.
    4. Web App User Interaction: How can we ensure all of the user features will work as expected? Performance issues will be tested to access the web application’s responsiveness and scalability, where bottlenecks like slower queries and inadequate indexing are even more important to identify with the inclusion of data forwarding from the camera and scale readings. Upon these inclusions, usability regarding navigation and the transition between scanning and using the application is important to note for user’s ease of use and navigation between different functionality tabs.
    5. ML and Camera Integration: How accurate and quick will ML classification using the Rpi with RasNet-18 model be? We will test this using a testing dataset as well as the time function in the Rpi. Likewise, we will compare this with the results of classification from the website backend to determine which method is better and more efficient. Furthermore, we need to test the image quality using either a USB camera or an Arducam to determine the best fps to achieve maximum accuracy in terms of classification.
    6. Proper Database Retrieval and Storage: How will data (usernames, caloric intake, inventory items) be stored properly in the database once scanned and how can we properly display it in the front-end? We have pages that will graph caloric intake as well as display caloric intake at a daily granularity in the form of a list. We need to make sure that the items and information is properly stored based off the logged-in user to prevent security issues. Likewise, we need to make sure that the users can only modify the database by scanning and/or removing objects directly from the inventory list. We will do extensive testing to prevent SQL injections, javascript attacks, and X-site scripting attacks. Likewise, we will use the browser console to detect irregularities and potential security concerns using error and DB retrieval statements.

Summary of individual work
Surya continued his work on RPI and camera integration. Despite challenges with operating the RPI in a headless setup, including slow SSH connections and lag in camera performance, he successfully implemented a rudimentary algorithm to capture images with 80 percent confidence for classification. The next steps for him involve improving camera performance with a headed monitor setup and integrating image capturing into the database, along with exploring options for classification performance. He thanks the 18500 teaching staff for their feedback on the hardware behind this project during the interim demo with specific regards to camera performance and options.

Grace included additional functional pages on the web application in accordance to feedback and improving the general user experience. These pages include but are not limited to a chat function between users to discuss fitness related goals, a calorie consumption progress graph demonstrating user’s daily improvement, adding caloric entries by date and displaying by day (eventually connected to the hardware components of classification), and a page allowing users to set a daily calorie limit to measure necessary improvements.  She plans to continue working on the frontend and ensuring they match how the user can use the application in an efficient and beneficial way. Also testing for each page that includes user modifications or uploads that could be vulnerable to security attacks.

Steven continued working on his ML features and working with Grace to integrate them into the web application. Tesseract OCR components were added into views.py to extract food label texts from the user’s uploaded images. Additional work will need to be done to improve accuracy, but images were turned into gray-scale and enhanced through enlargement of the original images themselves. During the individual testing phase, these improvements were able to enhance text extracting results. The next step would involve incorporating Surya’s camera subcomponent into capturing these OCR readings and fully integrating it onto a web application page. Steven also included a new model to store entries in a database where users are inputting food items and their respective caloric values. More improvements will be done to include individual food inventories for the user that is currently logged into the application.

April 6th Status Report — Surya Chandramouleeswaran

This was a busy yet exciting week for our group in preparation for the interim demonstration. I continued work on the hardware and classification components of our project with the goal of shifting to integration in the coming weeks.

Operating the RPI in a “headless manner” continued to offer its own challenges. Interfacing directly with the board required an SSH connection and a remote viewer (typically RealVNC), which would be quite slow at times. As such, observing the camera performance through the SSH connection resulted in significant frame lag and limited resolution. My goal for the coming weeks is to improve our camera performance through a headed monitor setup and try a USB camera as opposed to a 3rd party vendor (Arducam).

Dependencies associated with classification hardware:

Resulting camera’s first image!:

 

To elaborate, the plan would be to automatically “capture” an image once the algorithm is 80 percent (or higher) confident that it has classified the object correctly. The formal classification operates on the backend, but an 80 percent benchmark from this rudimentary algorithm I’ve implemented on the RPI typically indicates the image is of sufficient quality, so it is a good heuristic for image capturing. The resulting image then needs to be sent to the database. Once our web application is hosted, I’ll add the IP addresses of both RPIs to the database to allow it to accept images from our RPIs. The user will then have the option to reject or accept the image if it is not of sufficient quality. I will implement these steps as soon as the new cameras come in (probably next week).

Verification in the context of hardware consists of an evaluation of classification accuracy and latency considerations. There are inherent design tradeoffs between algorithm complexity, hardware limitations, and the overall usage time scale we want the product to be operated within.

This also entails evaluating system performance under various environmental conditions. Namely, we plan to conduct tests under different lighting conditions, angles, and distances to understand whether the algorithm maintains consistent accuracy and latency across different scenarios.

We are looking for an 80 percent confidence metric on the hardware side given that it just needs to take a clear enough picture and forward it to the database. So verification will entail checking that the classification accuracy meets or exceeds this threshold while maintaining respectable latency. Finally, it is important to balance quantitative verification with qualitative verification, so we will rehash these ideas once more with Prof. Savvides and Neha (our assigned TA) to build a thorough verification of system performance on the hardware end.

Our progress is coming in well on all fronts and I am excited to continue improving the performance of our design.

Steven Zeng Status Report 04/06/24

This week my focus was concentrated on the demo and working towards proper functionality of my designated components. Thank you to the peers and professors for the feedback and making sure the interim demos went well.

I did a lot of work in integrating the various ML features into the web application. This required working closely with Grace in setting up the Django templates and functions. I was able to integrate Tesseract OCR into the views,py folder to output the caloric information on uploaded images. This also involved using the Pillow library to operate on images. Likewise, to boost accuracy, I created functions that process the uploaded image and turn it into gray-scaled and enlarged versions of the original image. This greatly improved the accuracy when we conducted tests. The next step is to incorporate the Raspberry Pi camera captures into the web application’s OCR capabilities. Furthermore, I made progress with Grace in developing a model to represent inventory entries to store in the SQL database. We were able to display all the items uploaded as well as their caloric values. The next step is to create individual inventories for each user that is logged in. I also plan to look into Paddle OCR as an alternative to Tesseract to potentially improve the text extraction accuracy.

Furthermore, I worked alongside Surya to make sure the text classification algorithm worked properly on the website. We used a GoogLeNet training model alongside Keras and OpenCV. However, I plan to experiment with RasNet-18 to better the classification results. This was suggest by Professor Marios which I researched thoroughly this week after the demo. The GoogLeNet model had so many categories which led to a lot of misclassification, but the RasNet-18 model allows more flexibility which would greatly benefit our product’s efficiency and accuracy.

The next thing I worked on after the demo involved the ChatGPT4 Vision API which allows image inputs. This would greatly improve the accuracy of our classification and OCR problems, and it would simplify the code base tremendously. I did a lot of troubleshooting to make sure the syntax worked, and I plan to see how good the outputs are when everything runs according to plan.

Lastly, I looked into cloud deployment using Nginx which works well with the Rpi. This is the next step in our project. Next week, I plan to work more in integrating and developing all the remaining ML functionality and database operations to display on the website. Likewise, I will work with everyone in getting the Rpi, camera, and scale to work as we integrate these components into the physical box and the web application.

Regarding the question that we have to answer, I will need to verify various components to the ML design and integration. The first being the classification accuracy. I will need to have the accuracy specified in our design report of above 90%. This algorithm will need to correctly classify between fruits and canned foods as well as among three various fruits (strawberries, bananas, and oranges). The comprehensive test will include testing of a large compiled dataset of around 250 stock and taken images. These tests will take place locally and be automated in a google CoLab file. Likewise, the image text recognition needs to be close to 100 percent to correctly extract the calorie amount from the nutrition label. This will test the OCR algorithm performance. These tests will consist of 200 nutrition labels of all types and angles with various colors. These will also be tested locally to ensure automation and easier analysis. These will show my contribution to the project as I took the lead on the ML design portion from the start. Likewise, there will be latency tests that will need to reach a satisfiable threshold of under 5 seconds based off research on user attention span. These also will meet the design requirements that were specified in our design report.

Grace Liu’s Status Report for March 30th, 2024

This week, our efforts were focused towards completely integrating one main subsystem for the interim demo that will be happening next week. Since our main goal is to integrate correct food classifications using ML into the website, I focused on our shift towards TensorFlow for OCR that seems to have a differing approach from traditional OCR libraries in which we were previously using the ChatGPT API. This approach had a lot more configuration issues that Steven and Surya encountered but would ultimately be more beneficial for classification and detection for our MVP in the end. The connection between their work and my role is to experiment with how the TensorFlow model could be integrated into our web application for proof of concept.

After the model selection, I set up a Python environment along with some necessary libraries for our web server for something that best suits our needs. I created the user interface where files are uploaded to the server from the camera, but some more work needs to be done involving image compatibility and resizing and resolution considerations to maximize user experience on the globals page. The good thing about the TensorFlow Python model is that it is able to extract text from the user scanned image that can be converted into a readable format. This text is saved into the format we want that would be displayed under posts once that functionality can work. This should also be saved into the database for the food inventory. Further testing with the OCR process will need to be done to improve performance and scalability and ensure it works as expected under various different conditions such as errors and edge cases.

In addition to TensorFlow integration into our web application, I was able to tackle some challenges in relation to GitHub version control. We encountered some merge conflicts involving branch pulling and pushing that temporarily hindered us from making quick progress. After going through Git’s version control features and going through each of our branch changes, these issues were resolved efficiently to address merge conflicts.

Looking ahead despite encountering various challenges, I will focus on refining our TensorFlow integration into the web application and tinkering the other features we were planning to focus more towards in consideration to the previous ethics discussion that happened in class. The integration of TensorFlow will help enhance our functionality and perform more advanced and high performance image analysis.

Team Status Report — March 30

We shifted our whole focus into our demo and getting our individual components to work. A big thing that we discovered in all our components were a lot of underlying issues that came when integrating into our project. All the testing was local and independent, so we ran into a lot of complications.

A few complications arose from our ML design. The first being the difficulty of using the ChatGPT API to take in images and process them. It was slow and hard to input an image efficiently. Furthermore, our local tests for some of the ML classification schemes were difficult to integrate into the Django web app. As a result, we had to shift base and adjust on the fly. This included using some backup plans we had such as using pre-trained data sets and classification APIs. The big issue with these were the configuration issues that we spent hours dealing with and making sure we had everything installed to the right versions. Lastly, we decided to change our fruit testing data to oranges, bananas, and strawberries instead of apples. We hope this change allows us to move on from classification at the moment and shift our focus towards label reading and OCR design as well as hardware integration.

Surya made major progress in integrating the Raspberry Pi and configuring it with our internet. The major issue is configuring it with CMU wifi which is the primary issue right now. However, he was able to set it up with the ssh server and download required packages for the video stream. We shifted focus to doing a lot of computation on the Raspberry Pi itself to add into our designed trade studies. We hope to show all of these in the demo to showcase our experimentation process. Surya lastly did a lot of work configuring MacBook settings to run our website with all the required packages. There were many hardware issues that he had to resolve and fix to even get the website to run. Ultimately, the website was able to run successfully on his computer and classify images to an acceptable accuracy.

Lastly, Grace was able to create a new Gantt chart schedule that reflected changes in our schedule that had to be made since several technical challenges were encountered during our testing process. While there were some unexpected delays with the OCR libraries and hardware configurations, we remained on track in terms of our original project schedule with the appropriate allocated slack time and abilities to adjust to such changes. Since we added additional features and ideas to our project throughout the capstone process including the in-class ethics discussion, some slack was allocated towards those features and some extra time was spent handling unanticipated technical issues. Ultimately while schedule changes to our project were necessary, they ended up contributed a lot towards our ability to work together as teammates and also learn to adapt to necessary changes to adapt to our project framework.

We hope to have a productive demo next week and take in the feedback of everyone to get closer to completing our final project. Likewise, we will start drafting the final report.

Steven Zeng Status Report 03/30/24

With the upcoming deadline for demo coming, I shifted gears towards integrating all the ML components into the website. A lot of my focus was in integrating the ChatGPT API into the website for simple functionality. There were several issues with synchronization and syntax bugs that I spent a while debugging. The goal was to allow for label reading of an uploaded image, so there was less of a stress on database access; it was primarily just ML functionality. We came into an error involving the inability to upload images to the API without understanding complex transformations. As a result, we ultimately decided to scratch this idea after trying it out.

This led to me experimenting more with the Keras library and tensorflow. The code involved a lot of particular configurations which took a lot of time. This included python compatibility and virtual environment issues and various hardware protections that we needed to overcome. We ultimately were able to overcome this issue and have a working classification algorithm on the website that classifies between the various fruits. The algorithm worked super well in classifying bananas and oranges, and it also provides other classification results. To determine if it is canned food, we will classify as canned food if it is not a fruit. We still need to look more into error edge cases to notify the user.

Lastly, the focus shifted towards OCR and label reading now that ChatGPT API is not an option. The discussion involved research between doing it on the rapsberry pi or the website. I along with Surya experimented with both this week. We were able to use Tesseract to read nutrition labels after uploading image. However, we are currently in the process of testing it out on the hardware instead.

Overall, I made good progress towards practical implementation and the demo. I need to record everything I tested that ultimately did not work when it came to integrating with our website. However, I was satisfied with our ability as a team to adjust on the fly and have many backup options that allow us to make significant progress despite some complications.

March 30th Status Report — Surya Chandramouleeswaran

With the interim demo coming up in a few days, we spent significant time this week working together as a group and ensuring we have a functional subsystem to demonstrate for the coming days.

The primary feature of our demo is showing our algorithms at work hosted on a local website. An interesting complication that we have run into, however, is the incompatibility of certain backend packages we are using, both within the Django framework and our physical computers. Because the M1 MacBook chip has a different layout for instruction sets and architecture than your standard Intel chip, some of the Python packages cause segmentation faults and hold microinstructions that cannot be executed on an M1 CPU. The fix to these, unfortunately, involves painstakingly resolving each dependency by version matching and reading documentation on compatibility. Another important factor is to learn how the DLL (Dynamic Linked Library) translates back and forth between Python and a low-level language that the computer can interpret to execute an instruction. There are compatibility considerations with this stage as well; this is something we are all working to fix together in advance of our presentation. An example crash report can be found below:

Regarding RPI integration: We tried a “headless” setup which is a workaround involving writing the WiFi parameters under a customized OS setting and performing the rest of the setup through SSH. After some unfamiliar debugging, I was able to get this to work on my local wifi settings. I will need to ensure this works on CMU wifi by registering the device under CMU-DEVICE Wifi.

Our goal is to have a video stream present on the webpage, with the separate stages of the project implemented as separate pages on the web application.  Our Gannt chart and workflow have changed around when the hardware comes in and subsequent integration; I would say we are working more together on each of the subsystem components to ensure that we meet MVP. We plan to focus on hardware integration when the software is fully complete (by our interim demo, more or less).