During break, I managed to make some progress, but not as much as I ideally should have. I made progress in two area’s, the first being pixel diff localization, and the second being camera setup/image comparison.
Using just subtraction for the pixel diff didn’t really work, due to very minor changes in lighting conditions. To resolve this, I used a slightly more involved method that I got from stack overflow. I calculated the Structural Similarity Index, located contours, and got the bounds from them worked pretty well:
I initially figured that the largest contour region will likely always be the object we are looking for, but this proved to be false (the shadow was bigger both in the example with my hand and the example with the applesauce farther down), so we may need some overhead lighting to prevent that from happening. Overall, I’m pretty happy with the localization code itself, with the correct lighting, it’ll work fine.
The actual setup stuff was a mixed bag. I made a exceptionally high tech setup for the camera apparatus, shown below:
As you can see, it’s the camera module, sticky putty’d to an empty hand sanitizer bottle, with some velcro strips attached. There are equivalently spaced velcro strips on the ceiling of my cabinet, so I can attach/rotate the angle of the camera as needed. This worked fairly well for the camera itself, but I had to manually hold the RPI in place, which led to a bit of shifting/jiggling, which screwed the pixel diff. The RPI belongs to Harry, I didn’t want to attach the velcro strips (which are really hard to remove and leave a lot of sticky gunk behind) without his permission. I also wasn’t certain that the velcro strips would hold the weight of the RPI, and I didn’t want to break something that wasn’t mine. Despite this, I had one decent photo of the applesauce where the pixel diff worked pretty well (omitting some changes in the RPI/background outside of the cabinet, and the shadow)
I manually cropped the image, and did a quick pairwise comparison using only the applesauce image, and the results were REALLY bad. We got at 5 matches (on shredded cheese), out of several hundreds of possible features. So, it seems that we’ll likely need to enforce either facing the products so that the label is directly towards the camera, or we’ll have to move the camera to be on the door, and prevent the user from placing large objects in front of smaller ones.
To summarize, while I made a fair bit of progress, I think that I’m still behind where I probably should be at this point in time. Thankfully, I think that most of the blockers are just some design stuff about how to handle label visibility, the code that was used to do the initial algorithm comparison was fairly easy to extend now that I know what I’m doing.