This week I mainly focused on finalising some of the object detection algorithms and figuring out what is possible given our limited compute power. I brushed up on CNNs including some of the state of the art object detection algorithms and investigated the possibility of using RGB-D information as part of the object detection. This would mean using depth as well as the image to make detection decisions, rather than simply using the depth information for planning. This led me to read about Faster R-CNN algorithm that uses RGB-D information for state of the art object detection. I read more about it and how it can be used in conjunction with VGG16. The authors of the paper had a frequency of around 5 fps processing rate on COCO datasets using a standard GPU. This is definitely one algorithm I will look more into, however, it depends on the quality of data we can get from the depth camera. Since they used data generated from high quality depth camera such as Intel Real Sense, the point clouds they can generate using the RGB-D information will be much higher quality than what we can generate with a make shift PS4 camera. More experimentation will be needed to see if such algorithms can work even if the RGB-D information isn’t as rich. Looking into VGG16, it seems like a very lightweight and accurate algorithm for our purposes. To get around the problem of the network being trained on real objects we plan to print out pictures and paste them on our obstacles so there isn’t a need to generate a new dataset and retrain the network. We can simply freeze most of the weights and tune the network to give the desired precision and recall.
Moving forward, I want to start generating RGB-D data from the PS4 camera that we bought this week and begin testing object detection algorithms to see what works best and whether or not we need to rethink our approach.