30 Apr 22 – Samuel’s Status Report

Since I had completed quantity detection last week, I had nothing much left to do for the CV side of the project, besides testing and experimenting with model training/data collection, which I did. I also helped out with the installation of the system onto an actual fridge:

During testing, I found some edge and potential failure cases, and added more robustness checks. In particular, by using my white background detection, I fixed an issue where the FSM will move into the wrong state if the user tries to remove fruit one-by-one off the fridge, or tries to add more fruit once a prediction is done. The final FSM is shown in the Figure below:

I also trained the model again with some new fake fruits but the validation accuracy was still poor (although training accuracy was good). From the graph shown below, it seemed that the CNN was not learning the underlying model of the fruits, but was instead overfitting to the model. Most likely, this was the result of not having enough data to learn from for the new classes, thus creating confusion between fruits.

Next week, the focus will be on final testing on the integrated system (although we have actually tested quite a bit already, and the system seems to be fairly robust), and preparation for the final video and demo. CV wise, we are definitely ahead of schedule (basically done), since increasing the number of known fruits/vegetable classes was somewhat of a reach goal anyway.

23 Apr 22 – Samuel’s Status Report

This week, we focused on integration and testing, and I also made some minor improvements to the CV algorithm, and attempted to collect more data for training.

Integration with Jetson

The Jetson was surprisingly annoying and difficult to setup, and I spent at least 10 hours just trying to get my CV code to run properly on the Jetson.  In particular, trying to install various dependencies like PyTorch and OpenCV took a long time; we needed to compile a lot of dependencies from source (which came with a lot of its own errors) because the Jetson is an ARM aarch64 system which is not compatible with the x86_64 architectures that most things are precompiled for. The various issues were compounded by the fact that the Jetson was slightly old (using older version of Ubuntu, low RAM and memory capacity).

Even after I/we* managed to get the code up and running on the Jetson, we had significant problems with the speed of the Jetson system. I/we at first tried various methods including turning off visual displays and killing processes. Eventually, we realized that the bottleneck was … RAM!!!

What we discovered was that the Jetson took 1-2 minutes to make its first prediction, but then ran relatively quickly (~135ms) after that. This is in comparison to my computer which runs a single prediction in ~30ms. When Alex was debugging with a display of system resources, we eventually pinpointed the issue to being the lack of sufficient RAM when loading and using the model for the first time: the model was just too big to fit into RAM properly, and a major bottleneck came from having to move some of that memory into SWAP. Once that was complete, the algorithm ran relatively quickly. However, because it is nonetheless using memory accesses (swap) instead of the faster RAM, the predictions on the Jetson still ran slower than that of my computer. Nonetheless, it still runs fast enough (~135ms) after this initial “booting” stage, which has now been integrated as part of the “loading” in my CV code.

*while I was in charge/did most of the debugging, my teammates were also instrumental in helping me get it up and running (it is after all, Alex’s Jetson), so credit should be given where it is due 🙂

CV Training

While trying to fix/install dependencies on the Jetson, I had also in parallel attempted to collect some more data with the new fake fruits that came in, including the addition of a new “Lemon” class. However,  our model could not converge properly. I believe that it was due to the fact that some of the fake fruits/vegetables were not very high-quality, and looked fairly different from the ones in the original dataset (and in real life) like the peach and pear, so when validating against our original test images, it failed to perform very well. Next week, I aim to try training only the fake fruits/vegetables that look realistic enough (like the apple, lemons and eggplant). That being said, the algorithm already performs very well against some of the semi-realistic fake fruit, like the starfruit and banana shown in Figure 1 below.

During testing, I was actually pleasantly amazed by the neural network’s ability to detect multiple fruits despite being a classifier, and outputting probabilities associated with those fruits. As can be seen in Figure 1 below, the fruits being captured are Apple, Banana and Starfruit, which appear as the top 3 probabilities on the screen, as detected by the network.

Figure 1: Multiple Fruits Detection

Minor Improvements – White Background Detection

After spray painting the platform with Alex, we now had a good white background to work with. Using this piece of information, I was able to have a simple (and efficient) code that detects whether the background is mostly white using the HSL image representation, and checking for how many pixels are above a certain threshold.

Since my algorithm currently uses changes in motion (i.e. pixel changes between the frames) to switch from different states (background, prediction, wait for user to take off their fruit) in my internal FSM, this white background detection adds an important level of robustness against unforeseen changes, like an accidental hand swipe, lighting changes or extreme jerks to the camera. Otherwise, the CV system might accidentally go into a state that it is not supposed to, such as awaiting fruit removal when there is no fruit there, and confuse/frustrate the user.

Future Work

We are currently far ahead of schedule in terms of what we originally wanted to do (robust CV algorithm, fruit + vegetable detection), but there are a few things left to do/try:

  1. Quantity detection: This can be done by using white background segmentation (since I already have some basic algorithm for that) + floodfill to have a rough quantity detection of number of fruits on the platform. Right now, our algorithm is robust to multiple fruits, and there is already an API interface for quantity.
  2. Adding more classes/training: As mentioned above, I could try retraining the model on new classes using the fake fruits/vegetables + perhaps some actual ones from the supermarket. Sadly, my real bell peppers and apple are already inedible at this point 🙁

Samuel’s Status Report – 16 Apr 22

This week I mostly helped Alex to build the platform for the camera setup, and then attempted some basic tests on the setup. I also am currently training a model based on the data collected from previous weeks (and some from this week after getting the platform up)

 

Figure 1: Example of Fully-Constructed Platform

Figure 2: Example of self-collected data (here, of yellow squash) taken on new platform

 

More notably, I have been trying to improve the robustness of the change detector since I predict issues with the CV algorithm falsely detecting a change due to illumination changes from the opening of the door or a person walking around etc.

To this end, I tried using division normalization, where we divide the image by a very blurred version of the same image as per https://stackoverflow.com/a/64357685/3141253

While this allowed the algorithm to be very robust against illumination it also reduced the sensitivity significantly. One particularly problematic aspect occured because my skin color was similar to the birch wood color and so my hand was “merged” into the background. When displaying with a red apple, however, there was a better response, but not always enough to trigger changes. With this in mind, we have plans to color the platform white and hope that this might cause a larger differential between natural colors like that of skin and fruit, vs those of the artificially white background.

Another alternative is to play around with the saturation value of the HSV dimension. Or, since the main issue is going to be falsely detecting a change that moves the FSM into a background state falsely, we can potentially check for  a true extra object using a Harris corner detector.

Next week, in addition to trying the aforementioned items, I will also be helping to integrate the CV algorithm on the Jetson. We are currently on schedule but might need to speed up the integration component so we can spend more time on robustness testing.

Samuel’s Status Report – 26 Mar 22

During last week’s status report, I mentioned how we needed to find a dataset which exposed the network to a wider variety of fruit.

Samuel’s Status Report – 19 Mar 22

This week, was a relatively productive one: I managed to train the network on the “silvertray” dataset (https://www.kaggle.com/datasets/chrisfilo/fruit-recognition) which produced relatively robust results on test data that the network had never seen before (a green label indicates accurate detection; here we had 100% accuracy on test data).

Of course, the test data also involved the same silver trays that the algorithm trained on, so a high accuracy is expected.

I then moved on to making it perform detection using real-world data, on our C++ application with our webcam, and these are results!

As visible in the images above, the NN is able to detect the fruits accurately on in a real-world situation (including a noisy non-white background WITHOUT segmentation applied). That being said, there are some inaccuracies/misdetections such as with the orange above despite the frames being very similar. I describe needed improvements later below.

With this, we are currently on track towards a working prototype, although we could probably train the network to handle more classes with self-collected or web-procured data.

Next week, we will begin integration of the various components together, and I will work on several improvements to the current CV algorithm/setup:

  1. Include a “change detection” algorithm that will detect when significant changes appear; this will allow us to tell if a fruit is needing to be scanned.
  2. Normalization of the image before processing; this will help reduce issues with random lighting changes, but might require that the network be retrained
  3. Build the actual rig with a white background and test the algorithm on that
  4. If necessary, change to using a silver tray or silver-colored background similar to the network’s training set, and/or collect our own datasets.

Samuel’s Status Report – 19 Mar 22

This week, I focused heavily on getting the neural network to work properly. In the beginning of the week, I successfully trained the neural network on the new ResNet18 architecture (as opposed on the old one that did not work). After I realized that it didn’t work as well as expected on real data, I swapped to a more advanced ResNet50 architecture, but that did not seem to help it either.

It was then that I began to suspect something else was wrong besides the the network itself, because the networks kept reporting a 90+% validation accuracy, but whenever I tested the code, even on training images. This hinted at a problem with my testing code/script. Eventually, I realized that during the network training process, we were passing in normalized images, and the network was training on that; once I changed my test/evaluation script to feed normalized images into the network, and everything worked very well!

However, as I began testing the network on various images, we realized that the network was not very robust on external data:

After scrutinizing the dataset, we realized that the dataset was not good enough, and was subject to some major flaws that made it susceptible to overfitting. Firstly, it was a 360 degree shot of a single fruit per category, so even though there were many images of fruit, the network was fed only one example of something from that fruit category, thus making it hard for the network to generalize based on colour, shape etc.

To resolve this problem, I would need to search for more datasets, parse them, and train our network on them. This will be my focus for next week. Currently, I have found several datasets; however, they each have their own issues. The most promising one I have found so far is very similar to our use-case, with images of fruits taken from a top-down view, but has a reflective silver tray background which is very hard to segment away. Some pictures also have groups of fruit:

I will first try training the network on center-cropped and resized images and if that does not work, I will try algorithms like Otsu thresholding on saturation value, or GrabCut, to segment away the background.

Samuel’s Status Report – 5 Mar 2022

This week, we spent most of our time writing up our design review report. Although a lot of the main content had been covered in our design review presentation, the devil was in the details for this report. In particular, we needed to make our block diagrams a lot more detailed since the ones we used in the slides were merely summaries.

We were very thankful that the deadline for the report got extended as that reduced the amount of stress we had, and allowed us to write a more polished report.

On the implementation side, I am slightly “behind schedule” in the sense that I was not able to get as much work done on the new neural network implementation as I had hoped to, because I was focusing on the report instead. However, we are still ahead of schedule since we already have an implementation going.

Next week, I will focus on implementing and training the ResNet18 network, and then testing the accuracy on a self-collected dataset of various fruits.

Samuel’s Status Report – 26 Feb 2022

The highlight of this week for me was delivering the Design Review presentation. I think I did quite well for the presentation, with my teammates and Professor commenting that the presentation was polished, with good graphics and content. We are currently working on the design report, and will use the content of the design presentation to write it.

On the technical side, I was able to make significant progress on the PyTorch port for the CV application, which was coded in C++ so that it can run optimized code on the Jetson, for maximum speed. The C++ application is now able to run the trained PyTorch model and spit out a set of predictions for the top 5 highest probability classes.

There were several challenges with the porting process, including the lack of good PyTorch documentation, which resulted in it being very difficult to figure out how to properly convert between C++ formats (cv2 to torch Tensor for example) and also important considerations in ensuring that the model can be serialized properly for running in C++ (in particular, no nested functions etc.). This is a lesson learnt on the importance of good documentation, and the pain/need of having to pour through various forums and articles as a result.

However, after training and testing the network, I began to realize big problems with the trained model. Most notably, the model failed to produce correct predictions. After consulting with Prof Mario’s PhD student, we realized that we were using a highly customized model that was not designed properly, and was not even a proper “ResNet” (lacking fully residual layers). To this end, he advised us to use other preexisting models like ResNet18 or AlexNet. This is a lesson learnt as to not blindly copy code over intern

Next week, I will focus on trying to train our data on either ResNet18 or AlexNet, as well as test it in the new C++ classifier. (There is also a Python one for quick testing, in case the C++ one still has bugs). Hopefully I will be able to train up a network that will achieve our desired accuracy of 85% (the network itself should reach about 95% validation accuracy).

Fortunately, despite this setback, we are currently still on schedule, because we were previously ahead of schedule with the successful porting of the C++ application.

Samuel’s Status Report – 19 Feb 22

This week, we worked on the design review slides, and as part of the process, we finalized our designs for the attachment system, CV algorithms, UI interface and backend. Notably, I made a contribution to a new scanner system design, where I suggested making the camera scan overhead onto the platform as opposed to a front-facing camera as originally designed. This will allow for a more intuitive and less intrusive scanning process.

In particular, as the one in charge of the CV algorithm, I wrapped up research on the various algorithms to use for classification. In particular, I decided to go with the ResNet based CNN instead of traditional SURF/SIFT methods because of the better accuracy and performance. I modified the code of this tutorial to train a classifier and was able to successfully train a model that achieved 98% accuracy after 10 epochs.

 

 

 

 

 

 

However, it remains to be seen if the classifier will work well with validation data (ie check for overfitting), and especially whether it will work with real-world data (our actual setup) . Next week, I will be working on the C++ PyTorch code to run said trained network, meant for optimized runningon the Jetson. I will also begin working on a basic webcam setup (the webcam just arrived this Wednesday!) and collect real-data images that I can use for testing.

Samuel’s Status Report – 12 Feb 22

This week, we finalized the idea for our project, and successfully ironed out some issues.

Notably, I am in charge of the CV system; we were able to find a dataset of many fruits and vegetables (Fruits360 Dataset) which we could possibly use to train our CNN classifier. It is a fairly extensive dataset, with 90483 images and 131 classes of fruits and vegetables. Following a TA’s suggestion, we were also be able to find a ResNet based CNN classifier which we could potentially use for our project, and which I am currently trying to implement.

However, Prof Mario’s observed that the dataset images consisted of a white background, which implied that if trained using these images, our classifier might not be able to detect fruits in an arbitrary background. With this in mind, I came up with the idea of using a platform and a white screen that fruit can be put on, thus allowing us to not only easily detect and segment out the fruit from the (white) background, but also allow us to use the extensive dataset.

I am also fairly proud of my contribution to the “Use Case Requirements” part of the proposal presentation, where we considered our product’s speed accuracy and cost metrics from the perspective of tangible monetary cost.