For this week, I focused on researching the relevant CV libraries and frameworks that we can used to build our model. I explored multiple libraries(OpenCV, PyTorch) for handling image preprocessing and deep learning, and compared various object detection frameworks to determine which one best fits our requirements.
We decided to employ OpenCV in our computer vision pipeline, due to it being the industry-standard library with a large active user community and extensive documentation. OpenCV offers a wide variety of image transformations and integrates extremely well with deep learning frameworks such as PyTorch, allowing it to be easily implemented within our pipeline. The use of OpenCV for data processing will be essential in integrating our CV model with our overall system. We have also decided to use a YOLOv5 model for object detection, due to its extensive documentation and active community, and its suitability for real-time applications such as ours. Based on proven real-world performance, we are confident that the YOLOv5 model can achieve our accuracy goal of 90%.
The following is an initial outline of how the CV pipeline will operate:
- Image Capture and Basic Preprocessing (resizing, normalization with OpenCV)
- Inference using a YOLOv5 model
- Post-Processing to handle classifications/ quantity counts
In terms of my next steps, I will gather datasets of representative fridge ingredients, including variety in lighting, item arrangement and packaging. A diverse dataset of relevant images is necessary in order to achieve our accuracy goals. I will also set up a basic prototype using YOLOv5 on a small subset of test images, to evaluate initial detection accuracy, speed and resource usage. I aim to run initial benchmarks on local vs cloud inference, in order to confirm latency and throughput, as well as identify optimization techniques if needed to meet time constraints.
Overall, I am on track with researching the relevant CV libraries, laying out model architecture, and investigating how to integrate the detection pipeline within our system. For the next week, I will focus on data gathering and prototyping of the CV model.