This week, I focused heavily on getting the neural network to work properly. In the beginning of the week, I successfully trained the neural network on the new ResNet18 architecture (as opposed on the old one that did not work). After I realized that it didn’t work as well as expected on real data, I swapped to a more advanced ResNet50 architecture, but that did not seem to help it either.
It was then that I began to suspect something else was wrong besides the the network itself, because the networks kept reporting a 90+% validation accuracy, but whenever I tested the code, even on training images. This hinted at a problem with my testing code/script. Eventually, I realized that during the network training process, we were passing in normalized images, and the network was training on that; once I changed my test/evaluation script to feed normalized images into the network, and everything worked very well!
However, as I began testing the network on various images, we realized that the network was not very robust on external data:
After scrutinizing the dataset, we realized that the dataset was not good enough, and was subject to some major flaws that made it susceptible to overfitting. Firstly, it was a 360 degree shot of a single fruit per category, so even though there were many images of fruit, the network was fed only one example of something from that fruit category, thus making it hard for the network to generalize based on colour, shape etc.
To resolve this problem, I would need to search for more datasets, parse them, and train our network on them. This will be my focus for next week. Currently, I have found several datasets; however, they each have their own issues. The most promising one I have found so far is very similar to our use-case, with images of fruits taken from a top-down view, but has a reflective silver tray background which is very hard to segment away. Some pictures also have groups of fruit:
I will first try training the network on center-cropped and resized images and if that does not work, I will try algorithms like Otsu thresholding on saturation value, or GrabCut, to segment away the background.