This week I heavily focused on testing and refining the computer vision contact detection subsystem. Specifically, I spent my time in three key areas:
- Writing a testing API for recording and analyzing the correctness of the subsystem.
- Training a custom detection model on a narrowed down set of objects.
- Refining glove tracking.
Task (1) focused on setting up our team for thorough testing of the CV component. As outlined in our design report, this suite of tests will involve counting the number of frames in which the correct object is predicted as a “potential contact” when the user actually touches an object. To optimize this process, I wrote a testing API that can be plugged into our larger program that will allow us to record each test session and generate a data file (.csv) reporting the predicted contacts at every frame. I plan to begin using this API for testing over the weekend.
Task (2) focused on exploring a refinement strategy that we had outline in our design review: to train a new Single Shot Detector neural network with a specialized subset of objects via transfer learning. The goal of this was to see if we can get more accurate and faster object detection using fewer, handpicked classes. I chose to use the following object classes for this new model: book, bottle, mug. The data I used for training this model came from the Open Images data set, as planned in our design review. I used 6000 total training images (chosen based on the Jetson’s space constraints), and trained the model over 30 epochs. Training spanned about 90 minutes.
Unfortunately, this effort was unsuccessful in improving accuracy, compared to the original model we have been using. The model was unable to detect the book object class, and it struggled to reliably detect the bottle and mug classes at the required distance from the camera. I suspect the point of failure could have been not enough training time and/or too little and low quality training images. For example, the bottle images usually were of bottles of a different shape and color from the one we brought in for testing (e.g. beer bottles vs. Pepsi bottle).
I will try training this model again with more data and epochs over the weekend. However, since this effectively makes the Jetson unusable during the duration of training, I may have to run this over night during the week. Should this also fail, I will also try collecting a much smaller custom dataset of the specific objects we plan to use in our demos, and see how that fares.
Task (3) focused on refining the tracking of the glove. As a reminder, we track the glove by detecting fiducial markers that are attached to the glove. We then generate a circular region of interest (or a “hitbox”) around the glove that defines a contact if it overlaps with a corresponding region of a detected object. I attempted this week to move the hitbox to a position closer to the fingers, as opposed to being centered on the marker itself, which would be more accurate to tracking the actual position of the hand. I attempted this by essentially displacing the hitbox by a vector in 3-space (e.g. “the fiducial marker’s world”) and then translating these coordinates to their corresponding values in 2-space (e.g. “the camera’s world”). The process is summarized on this wikipedia page.
While I was able to get the fundamentals of this refinement working, I ran into issues with its reliability. The problem stems from the fact that there exist ambiguities in determining the correct pose of the fiducial marker, especially when it becomes smaller in the image (further from the camera). The problem is well documented in this GitHub thread. In our case, the ambiguity results in the detected fiducial marker’s coordinate system to flip between between being right-handed and left-handed between subsequent frames (the Z-axis flips direction). As such, the translation I experimented with was also flipped between subsequent frames, making this refinement unusable in its current state. I spent some hours thinking of a solution to this problem, but to no success. For now, I will revert back to tracking hitbox directly on the marker, for the sake of time.