This past week, as outlined on the schedule, I primarily focused on processing reference video inputs with OpenCV. I spent time exploring both MediaPipe and Open Pose as different ways to process and label the reference input video. After spending a substantial amount of time experimenting with both, we as a team decided that MediaPipe was a better fit for our needs. I then proceeded to test the MediaPipe pipeline with video inputs, initially with just a simple recording of myself. This initial test yielded unsatisfactory results, prompting me to continue to fine tune the MediaPipe library and OpenCV capturing.
The MediaPipe library comes with several base models. It also has a variety of options that includes:
- min_pose_detection_confidence (0.0-1.0):
- Controls how confident the model needs to be to report a pose detection
- Higher values reduce false positives but might miss some poses
- Lower values catch more poses but may include false detections
- min_pose_presence_confidence (0.0-1.0):
- Threshold for considering a pose to be present
- Affects how readily the model reports pose presence
- min_tracking_confidence (0.0-1.0):
- For video mode, controls how confident the tracker needs to be to maintain tracking
- Lower values make tracking more stable but might track incorrect poses
- Higher values are more precise but might lose tracking more easily
- num_poses:
- Maximum number of poses to detect in each frame
- Increasing this will detect more poses but use more processing power
- Default is 1
- Output_segmentation_masks:
- Boolean to enable/disable segmentation mask output
- Disabling can improve performance if you don’t need masks
After experimentation, I found that the parameters that affected our detection the most was the min_pose_detection_confidence as well as the min_pose_presence_confidence parameters. After fine tuning these parameters, I was able to achieve much better tracking on not just my own simple testing video, but also a relatively complex YouTube dancing short. As we continue to work on this algorithm and integrating the systems together, I will also continue to experiment with the options to try to optimize the performance while keeping tracking confidence as high as possible.
Testing with recorded footage from webcam:
Testing with YouTube shorts dancing video (
):