I spent this week getting prepared for beginning the training of our gesture recognition model next week. For these preparations I needed to apply some data transformations to the HANDS dataset that we are using, which contains a couple hundred images of various hand gestures.
The dataset supplies images of 5 subject, 4 male and 1 female, in various positions and lighting conditions, as well as annotations of these images containing bounding boxes for each gesture being performed in the image. These annotations were stored in massive text files in csv format, with a default value of [0 0 0 0] for a bounding box if the gesture did not exist in the image. For example:
image_name,left_fist,right_fist,left_one,right_one,…
./001_color.png,[0 0 0 0],[0 0 0 0],[143 76 50 50],[259 76, 50, 50],
The above lines would represent a subject holding out the number one on both hands within the specified areas of the image.
This format for the data cannot be used to train our model, as we are now using the hand landmark coordinates as the features with which we will train/inference. I spent some time writing a script that would take in the annotations, find the hand within the image, and apply the hand pose estimation implemented by Andrew.
For example, we would start with the image below.
Then the bounding box would be used to crop both the gestures to handle separately, and then the pose estimation would be applied to find the coordinates of the hand landmarks.
The palm landmark would then be considered the origin (coordinates 0, 0) and all other points in the image would have their location expressed relative to the origin and saved in a new csv file with their corresponding label.