Grace Liu’s Status Report for March 30th, 2024

This week, our efforts were focused towards completely integrating one main subsystem for the interim demo that will be happening next week. Since our main goal is to integrate correct food classifications using ML into the website, I focused on our shift towards TensorFlow for OCR that seems to have a differing approach from traditional OCR libraries in which we were previously using the ChatGPT API. This approach had a lot more configuration issues that Steven and Surya encountered but would ultimately be more beneficial for classification and detection for our MVP in the end. The connection between their work and my role is to experiment with how the TensorFlow model could be integrated into our web application for proof of concept.

After the model selection, I set up a Python environment along with some necessary libraries for our web server for something that best suits our needs. I created the user interface where files are uploaded to the server from the camera, but some more work needs to be done involving image compatibility and resizing and resolution considerations to maximize user experience on the globals page. The good thing about the TensorFlow Python model is that it is able to extract text from the user scanned image that can be converted into a readable format. This text is saved into the format we want that would be displayed under posts once that functionality can work. This should also be saved into the database for the food inventory. Further testing with the OCR process will need to be done to improve performance and scalability and ensure it works as expected under various different conditions such as errors and edge cases.

In addition to TensorFlow integration into our web application, I was able to tackle some challenges in relation to GitHub version control. We encountered some merge conflicts involving branch pulling and pushing that temporarily hindered us from making quick progress. After going through Git’s version control features and going through each of our branch changes, these issues were resolved efficiently to address merge conflicts.

Looking ahead despite encountering various challenges, I will focus on refining our TensorFlow integration into the web application and tinkering the other features we were planning to focus more towards in consideration to the previous ethics discussion that happened in class. The integration of TensorFlow will help enhance our functionality and perform more advanced and high performance image analysis.

Team Status Report — March 30

We shifted our whole focus into our demo and getting our individual components to work. A big thing that we discovered in all our components were a lot of underlying issues that came when integrating into our project. All the testing was local and independent, so we ran into a lot of complications.

A few complications arose from our ML design. The first being the difficulty of using the ChatGPT API to take in images and process them. It was slow and hard to input an image efficiently. Furthermore, our local tests for some of the ML classification schemes were difficult to integrate into the Django web app. As a result, we had to shift base and adjust on the fly. This included using some backup plans we had such as using pre-trained data sets and classification APIs. The big issue with these were the configuration issues that we spent hours dealing with and making sure we had everything installed to the right versions. Lastly, we decided to change our fruit testing data to oranges, bananas, and strawberries instead of apples. We hope this change allows us to move on from classification at the moment and shift our focus towards label reading and OCR design as well as hardware integration.

Surya made major progress in integrating the Raspberry Pi and configuring it with our internet. The major issue is configuring it with CMU wifi which is the primary issue right now. However, he was able to set it up with the ssh server and download required packages for the video stream. We shifted focus to doing a lot of computation on the Raspberry Pi itself to add into our designed trade studies. We hope to show all of these in the demo to showcase our experimentation process. Surya lastly did a lot of work configuring MacBook settings to run our website with all the required packages. There were many hardware issues that he had to resolve and fix to even get the website to run. Ultimately, the website was able to run successfully on his computer and classify images to an acceptable accuracy.

Lastly, Grace was able to create a new Gantt chart schedule that reflected changes in our schedule that had to be made since several technical challenges were encountered during our testing process. While there were some unexpected delays with the OCR libraries and hardware configurations, we remained on track in terms of our original project schedule with the appropriate allocated slack time and abilities to adjust to such changes. Since we added additional features and ideas to our project throughout the capstone process including the in-class ethics discussion, some slack was allocated towards those features and some extra time was spent handling unanticipated technical issues. Ultimately while schedule changes to our project were necessary, they ended up contributed a lot towards our ability to work together as teammates and also learn to adapt to necessary changes to adapt to our project framework.

We hope to have a productive demo next week and take in the feedback of everyone to get closer to completing our final project. Likewise, we will start drafting the final report.

Steven Zeng Status Report 03/30/24

With the upcoming deadline for demo coming, I shifted gears towards integrating all the ML components into the website. A lot of my focus was in integrating the ChatGPT API into the website for simple functionality. There were several issues with synchronization and syntax bugs that I spent a while debugging. The goal was to allow for label reading of an uploaded image, so there was less of a stress on database access; it was primarily just ML functionality. We came into an error involving the inability to upload images to the API without understanding complex transformations. As a result, we ultimately decided to scratch this idea after trying it out.

This led to me experimenting more with the Keras library and tensorflow. The code involved a lot of particular configurations which took a lot of time. This included python compatibility and virtual environment issues and various hardware protections that we needed to overcome. We ultimately were able to overcome this issue and have a working classification algorithm on the website that classifies between the various fruits. The algorithm worked super well in classifying bananas and oranges, and it also provides other classification results. To determine if it is canned food, we will classify as canned food if it is not a fruit. We still need to look more into error edge cases to notify the user.

Lastly, the focus shifted towards OCR and label reading now that ChatGPT API is not an option. The discussion involved research between doing it on the rapsberry pi or the website. I along with Surya experimented with both this week. We were able to use Tesseract to read nutrition labels after uploading image. However, we are currently in the process of testing it out on the hardware instead.

Overall, I made good progress towards practical implementation and the demo. I need to record everything I tested that ultimately did not work when it came to integrating with our website. However, I was satisfied with our ability as a team to adjust on the fly and have many backup options that allow us to make significant progress despite some complications.

March 30th Status Report — Surya Chandramouleeswaran

With the interim demo coming up in a few days, we spent significant time this week working together as a group and ensuring we have a functional subsystem to demonstrate for the coming days.

The primary feature of our demo is showing our algorithms at work hosted on a local website. An interesting complication that we have run into, however, is the incompatibility of certain backend packages we are using, both within the Django framework and our physical computers. Because the M1 MacBook chip has a different layout for instruction sets and architecture than your standard Intel chip, some of the Python packages cause segmentation faults and hold microinstructions that cannot be executed on an M1 CPU. The fix to these, unfortunately, involves painstakingly resolving each dependency by version matching and reading documentation on compatibility. Another important factor is to learn how the DLL (Dynamic Linked Library) translates back and forth between Python and a low-level language that the computer can interpret to execute an instruction. There are compatibility considerations with this stage as well; this is something we are all working to fix together in advance of our presentation. An example crash report can be found below:

Regarding RPI integration: We tried a “headless” setup which is a workaround involving writing the WiFi parameters under a customized OS setting and performing the rest of the setup through SSH. After some unfamiliar debugging, I was able to get this to work on my local wifi settings. I will need to ensure this works on CMU wifi by registering the device under CMU-DEVICE Wifi.

Our goal is to have a video stream present on the webpage, with the separate stages of the project implemented as separate pages on the web application.  Our Gannt chart and workflow have changed around when the hardware comes in and subsequent integration; I would say we are working more together on each of the subsystem components to ensure that we meet MVP. We plan to focus on hardware integration when the software is fully complete (by our interim demo, more or less).

Grace Liu’s Status Report for March 23rd, 2024

This week, I made significant progress in frontend design and usability. I used bootstrapping to make formatted input and output design which is particularly ideal for responsive designs where CSS is more ideal for layouts that require more flexibility (also used in our web application). I also worked towards building secure database access. In order to do so, I sanitized user inputs and outputs to prevent SQL injections and executable JavaScript attacks. It is important to validate and catch any potential malicious user input as well as encoding the output to stop malicious inputted data from users that would trigger any questionable behavior from the web browser. This was motivated primarily by the ethics discussions in which privacy was of utmost importance for our team and we now know how big of a responsibility it is in regards to out capstone project.

Another thing I looked into was web sockets and improving its speed, efficiency, and syncing functionalities. There was a big issue with a loop that caused extremely slow access time to the database. Going back to the previous discussion, security concerns that arise with this API include cross-site scripting, cross-site request forgery, and injection attacks. Input validation is also a huge part of preventing this types of attacks. I defined data types for expected input structure so there are constraints on user input messages for instance on our globals page where users can make posts. I added a lot of data to the database to simulate a real-world use case which includes images of all types that I gathered and gave Steven for his ML training dataset from the previous week. This included around a couple thousand data points and values for the modeling.

Likewise, I worked with Steven in figuring out how to integrate ChatGPT API into the web design framework. We made a dummy prototype page to test if we could easily interface it with basic input and outputs using the API and also considering the user experience and interface design. Now there is a somewhat smooth communication on our framework between users and ChatGPT since we are considering more engaging interactions. The next step we look forward to working on more is to allow ChatGPT to recognize and analyze images which we will experiment with next week. With our sights set on this experimentation, we believe this propels our project towards even more functionality and are excited how this will pan out during our interim demo.

Team Status Report 03/23/24

As a team, we enjoyed the focus on ethics this week. We enjoyed the pod discussions and began to consider things that we would have never thought of. A big issue we identified with our product involved both body image concerns as well as privacy issues. We want our product to promote a healthy lifestyle, but we do not want our users to develop eating disorders and other personal issues. Likewise, we do not want private data on the consumption habits of our users to be leaked to other users as well as companies who participate in targeted advertising. As a team, we were able to discuss fixes to these issues including database security design and user-friendly/encouraging dialogue on the front end. Despite being done with discussions on ethics, we plan to keep all these principles in mind as we further our design.

We found it interesting in the pod discussion that two other groups had a product of similar functionality to ours. While one group anticipated projecting ingredients and recipes onto a tabletop using a projector at a calculated angle, the other is using AI to generate recipes on a phone app from ingredients added to their food inventory. We were able to clearly address that the MVP for our system would only be able to classify canned foods and certain types of fruits as opposed to cooked foods or ingredients in a bowl. The biggest difference between our product and these is that ours is used as a calorie tracking device and more focused on physical wellness, so more ethical concerns arise with this focal point. This will definitely be a greater consideration of ours while working on user interface integration and user experience with our web application.

The parts for scale integration have steadily come in through the ECE delivery center, so Surya began work on understanding the hardware layout of the scale and assessing the 2 main approaches to writing scale measurements to the database. He is waiting on the RPIs to assess the camera approach to read scale values. Another option could be to start on this with the Arduino chip that came in the mail already, but typically, Arduinos are not great choices for image processing because of limited on-board RAM and limited functionality with other cameras (RPIs are fantastic for such applications and several resources exist online for subsequent support). Additionally, he plans on working with Steven and Grace in sculpting a presentation strategy for the rapidly approaching interim demo.

 

In the meantime, he has also learned more about how the load cell amplifier works and the wiring topology. An important thing to do when working with a functional scale is to ensure that the correct wires are snipped and soldered; a wiring schematic can be found below for the reader’s convenience:

Load cell wiring, wheatstone bridge formation

 

Steven did a lot of work patching up the ML infrastructure of the project. He optimized the accuracy of the various components. The first optimization was done to the classification algorithm for canned foods and fruit using the AdaBoost algorithm to combine multiple decision boundaries. The second one involved classification within the groups of fruit using k-nearest neighbors. This was combined with GoogLeNet and OpenCV to produce better results more specific to our project. Lastly, the ChatGPT API does not need optimization, but Steven worked with integration into the front-end. He plans to work alongside Grace in the upcoming weeks to test the basic functionality and syntax of the API to perform label reading and classification, if needed.

Steven Zeng Status Report 03/23/24

This week I was on track with the schedule and achieved results to analyze. I first want to discuss the work in implementing a k-nearest neighbor approach algorithm. The highest accuracy run turned out to be k = 5 using 500 training samples. The image below represents an example of the classification accuracy and results from our first tests using k = 3 and 100 training samples. 

However, I was able to boost accuracy by introducing more samples and hyper-tuning k to be 5. The resulting graph is below:

The accuracy was sufficient enough in combination with the GoogLeNet algorithm to produce results that satisfy our ideal confusion matrix discussed in our design report. The next issue I have to patch involves latency because this approach took a relatively long time when I ran it locally on my computer. I hope to remove redundancies in code with hopes of speeding the process up. It is a positive sign that the accuracy results were sufficient. Now, I need to focus on computational efficiency. I will look into methods to optimize the computations that incorporate matrix properties to speed up the algorithm.

 

The next area I worked on was the AdaBoost algorithm. The features I considered were size (total area), color (scaling of RGB values), and character count (number of character/text on product). This created a 3-D model which is relatively simple. However, I still need to work on parsing such values through images. For the sake of the algorithm, I hard-coded values for various images to test. The algorithm was able to improve accuracy better than one soft-margin SVM decision boundary. This is a good sign, and I hope to see it work for images taken from my MacBook camera which is the next step. Extracting the features from the images will be the next challenge. I am reading articles on extracting features from images (i.e. size, color, and character count); I expect to use some sort of python library to compute such values.

 

The last portion of my work this week involved the ChatGPT API. I researched the pricing model to determine the most efficient plan that minimizes the cost.  Likewise, I am still trying to understand ways to incorporate the API into the website design. I watched several videos, but this video (https://www.youtube.com/watch?v=_gQITRGs4y0) really provided good guidance for me to move forward with the product. I wrote some code changes to the repository; however, I have yet to test them out. There were several syntax issues that I need to patch up, but the overall design and structure is pretty laid out already. I hope to test these prompts locally and compute their corresponding accuracy and latency values next week.

Status Report March 23, 2024 – Surya Chandramouleeswaran

With parts needed for the scale integration coming in this week, this provided a good opportunity to get my hands dirty on this component of our design. These include the Arduino Mega, 2 ArduCam cameras, a physical scale, the ESP wifi chip, and HX711 amplifiers. I am still waiting on the RPIs to integrate with the ArduCam cameras; to recap from last week, I plan on trying 2 different implementations and seeing which method works better.

For now, I will emphasize our original implementation plan: to remove the casing on the scale, disconnect the load cell wiring from the original Wheatstone arrangement, run the 4 load cell wires into the amplifier, which would then feed into the Arduino and subsequently the ESP. From there, the ESP would just need to generate HTTP requests as I have scripted in the code I wrote last week. The factors that I am wary of include not damaging any sensitive equipment in the process of rewiring the scale. I plan on snipping the wires from the PCB in the scale as opposed to desoldering them because the last thing I want is there to be frayed copper at the ends of these connections. I hope Techspark has some heat-shrinking plastic casings that I can keep around the connections I need to resolder so that the bare ends are preserved somewhat. If any of these connections are challenged or compromised, the scale cannot be used, so that is a consideration I have in mind when working on this. Here is an example of the material (in black):

Figuring out the wiring between the amplifier and load cell was also a bit nonintuitive for me. Here is a sample diagram that I plan on using to ensure the reconnections are going to the right places. To keep things simple, red will match with red, blue with blue (these are the wires that “talk” to each other to get the total aggregate weight from all 4 load cells), and I’ll send the white wires to the amplifier. I adapted this from the documentation on the load cells:
Load Sensors Wired in Wheatstone Bridge Configuration

I have slowed down work on our web application; I think it is up to a point where all the basic functionality is present, and as a group, we will have to build the rest of the application around the hardware we will implement. One theme readers may observe across the group this week is a shift towards implementation hardware as the backend frameworks are in place.

In parallel, I will have conversations with the group to build an idea of what we want to show off for the upcoming interim demo. Overall, we are on track with schedule but we recognize that the next few weeks are essential to the completion of this project.

Grace Liu’s Status Report for March 16th, 2024

This week, our group really focused on understanding each component of the ethics assignment and how our project is applicable in real world scenarios. Out of the two case studies, I thought “Do Artifacts Have Politics?” by Langdon Winner was especially interesting since this is a perspective that most people wouldn’t consider merely looking at a piece of technology. It is very eyeopening to see how things such as a bridge design in Long Island had such symbolic meaning behind it, truly reflecting the creator’s political viewpoints and opinions. While this is applicable to inventions of things to resolve societal affairs (keeping the lower class from using recreational resources), another category of these inventions involves those that are part of political relationships. I liked taking a current technology that is hot right now and applying these types of concepts to it since this really helped me realize the design and ethics components behind each step of the design process. In terms of considerations for public health, public safety, and public welfare, the added perspectives from the case studies will definitely inspire us to pay more attention to these details to ensure a safe and friendly product for our users.

In terms of the web application, changes were added to include public posts for users to interact with each other. This was inspired by the realization of public welfare and the negative effects they may cause for users in terms of mental health and body image issues. The globals page is in attempts to promote an environment that is inclusive and encourages positive self-image where users can voluntarily choose to post their food consumptions and add a caption to it as well. While for MVP, this feature would only be useable for those who share the product in the same household where all the food inventory is gathered, we envision there is a lot more potential on a global scale for it to become something more impactful and makes our product something beyond a food tracking tool.

From the previous week, while OAuth was set up to work properly, some debugging on the registration page had to be done to ensure all the information was rendered properly on the profile page. An issue that emerged was displaying the uploaded profile picture is that we have to take into consideration the file size and file format compatibility of the image users choose to upload. A large image could potentially consume too much bandwidth, so either limiting file sizes would have to be implemented or we could use a content delivery network to improve our website’s performance and speed. This approach can be particularly more beneficial since files can quickly be uploaded from any part of the world, and this is something important to consider since we want our application to be used on a global scale for more user interaction and positivity promotion. I would still like to do a bit more research on this approach since it can be more costly as opposed to other methods such as using the web server’s file system or cloud storage that would also come at an additional cost.

I look forward to seeing more of our project flesh out as Surya gathers the physical components together and more progress is being made on the ML algorithm with the data I collected with Steven last week. I envision a user-friendly product that will really be an application beyond merely calorie tracking and a food inventory.

Team Status Report — March 16th

With the interim demo coming up on April 1st, coming out of Spring break with a clear plan of attack was important for our group. We spent this week planning out some of the main features we would like to highlight for the interim demo; furthermore, with a realization that the interim demo is a presentation of an intentionally incomplete product, we spent some time deciding how the features we decide to demonstrate will interact with one another.

Surya continued work on scale integration and helped Grace tidy up the web application. In addition, he developed potential approaches to scale integration through a block diagram as seen below:

Scale integration can be done either by hacking into a scale and forwarding values through a microcontroller, or through an OCR on the digital scale reading panel. In evaluating the first option, a strong understanding of scale hardware is required before it can be opened up. The main aspects he focused on were the difference between strain gauges and load cells, how and why they are configured in a Wheatstone arrangement, and the rewiring necessary to forward the load cell measurement through an amplifier and into a microcontroller such as an Arduino.

Given the delicacy of this process and the surprising nuance of hardware behind camera scales, he discerned that an OCR model reading the segmented display panel of the scale may be cleaner to implement for an MVP. This, however, presents its own challenges. Images require more memory to deal with than a stream of bits representing scale readings, so latency becomes a pronounced issue.  Furthermore, the accuracy of digit classification is an additional factor that is simply not a problem when one is simply forwarding bits to and from components. The counterargument to this approach is the reduced potential of damages, and for the sake of the MVP and wanting an operational scale as first priority, this is a significant consideration to keep in mind.

In either case, because both options are largely cost-effective, Surya began ordering parts common between both approaches and plans to get his hands dirty with both approaches to see which method works better for the application over the next 2 weeks. He encourages readers to view his weekly status report for more details on the above design plan for scale integration.

Steven made significant progress in testing various ML methodologies. He was able to determine that soft-margin SVMs was not effective enough to include in our implementation. However, the SVMs have provided nice decision boundaries that we plan to use for our backup option: AdaBoost Algorithm. This algorithm establishes weights for varying decision boundaries, so it takes into account multiple boundaries. He was able to research the math and coded up the preliminary functions to compute the weights and train the model.

Steven also shifted a lot of focus into working with GoogLeNet and working with a k-nearest neighbors approach to boost the accuracy of the algorithm in the classification of different fruits. He plans to work on testing next week as well as validation. We hope to have all this tested, and the results analyzed in no more than two weeks. However, another goal next week is to integrate the GoogLeNet classification algorithm directly into the website without any modifications to test the compatibility as well as the preliminary accuracies and latencies.

Regarding progress in integration, Steven did research on ChatGPT4. We are currently hesitant on purchasing the API to save some money. However, Steven wrote the pseudocode for integrating it into the front-end. Likewise, he looked into creating formatted responses given images and formatted inputs. Steven will also begin shifting focus away from local testing on his MacBook Camera and work closely with Surya and Grace to take Arduino or Rpi images and/or images from the website.

Grace was able to take this week’s ethics assignment and apply it towards public safety and public welfare considerations. We realized food tracking applications could induce negative side effects for users with high self-discipline, so an additional globals page could help promote a body positive environment that shows users what others are up to as well. A caption can also be added so our web application is used more as a social media, but of course this is optional for users and they can always choose to opt-out. With this potentially being more on a global scale, I want to consider larger options for file uploads to be able to accommodate this. One option is instead of using the web application’s file system, a content delivery network could be used since their servers are scattered around the world. This would definitely help improve speed and performance of our web application in the long-run.