Nico’s Status Reports

Nicholas’ Status Report for 8th March 2025

I have started training of the ML Model, but the environment was having issues, so I worked on resolving that before Spring Break. I was able to iron out the issues by setting up a Conda environment and resolving the dependency issues to use Yolo. I also read more into TensorRT, and decided to start getting the needed resources together to download and use TensorRT on the Jetson Nano. We will report on training next week

Nicholas’ Status Report for 22nd February 2025

I have finalized the ML model, so now we are moving onto setting up the training environment and getting the data downloaded onto the ECE clusters so we can train the model over Spring Break. We are using Yolov11Large, so what I did is set up a work environment on the ECE Clusters with Tensorflow. I also downloaded the data we will fine tune on. I am less familiar with Tensorflow than PyTorch, so resolving library conflicts in the virtual environment took longer than desired. Despite this set up, I was able to get everything downloaded, and I was able to load Yolov11Large onto one of the GPU’s and did a sample detection on a static image. The next steps will be to use all the components and actually fine tune the model, which I will report on next week.

Nicholas’ Status Report for 15th February 2025

This week has been devoted to cementing the choices for the ML model we should use for our project as well as addressing the potential of a purely Computer Vision approach for our card detection module. Our advisors proposed the following pipeline:

  1.  Use a traditional feature descriptor to identify cards.
  2. Take a snapshot of the table’s state.
  3. Wait 300 ms and perform steps 1-2 again.
  4. Given these 2 images, use a convolutional filter
  5. Pass a gradient filter over both of these images.
  6. Subtract the gradients of the images.

The idea is that this pipeline should allow us to cheaply and efficiently detect the cards being placed on the table, and then once we are sure we a new card has been placed into frame, we can simply use the output of the feature detector to keep track of the cards. However, this algorithm makes too many simplifying assumptions to work in the context of Blackjack, and will most likely not be a good fit for various reasons.

Firstly, we note that the best feature descriptor for the task is ORB (Oriented FAST and Rotated BRIEF), which provides a scale and rotational invariant features for object detection that is fast. However, ORB is known to fail if major occlusions for object occur, and is also not robust to lighting changes, both of which are sensible issues for us. For occlusions, we note that since cards are dealt one over each other, the first card dealt in a players hand will be largely occluded from view of the camera. Furthermore, ORB is not as robust as other, paid feature descriptors are, to lighting changes. One of our user requirements is that there is no arduous or rigid setup the users have to follow to use our product, so limiting camera position or lighting is not a trade off we are willing to make. Furthermore, given the nature of Blackjack, ORB will face issues as a feature descriptor, making step 1 of this pipeline already problematic.

There would also be noise issues with this pipeline that would go against our user requirements. We already mentioned that this pipeline will experience shaky accuracy with ORB, which is a red flag, since we need at minimum 90% accuracy to guarantee a better deviation from the count than a typical professional card counter. However, we would also require a fixed camera angle if we used this pipeline. This is because if we only used gradients and edge detection, we cannot discern a playing card from another rectangular object that has numbers or letters on it. This is because ORB is hand crafted feature descriptor that does not undergo any further processing, so it is more susceptible to adversarial attacks. We could easily place a book in frame, and the edges and the letter “K”, for a King, would be detected as a King! The only way around this would be to fix the camera angle, so we know what size the cards should be, but again, this goes against a user requirement.

Finally, we would also spend a lot of time on this pipeline for little gains. Even if we use libraries that are optimized by using BLAS under the hood, such as NumPy, we are CPU bound on critically (parallel!) operations, such as the convolution! We would have to write our own CUDA kernel for performing convolutions, making it a large time expenditure to develop and debug the kernel for minimal savings either on time or memory while still retaining the above issues. All in all, this Computer Vision pipeline seemed like a sensible idea, but the numbers don’t justify using a pipeline like this at the end of the day.

For the machine learning side of things, we finally found the exact Yolo Model we want to use. We will be using Yolov11 Large for the foreseeable feature. This is because it has a good accuracy-speed trade off for us. We need accuracy far more than speed, and we can get 300ms-500ms inference times with this model, with far better guarantees on the ability to detect cards. This lines up well with our user requirements, since we will not be streaming data or performing real time inference.

This keeps us on schedule for meeting the schedule and requirements. By next week, we hope to have compiled a slightly larger dataset and begin setting up a training environment on the ECE clusters for fine tuning.

Nicholas’ Status Report for 8th February 2025

I have been researching which ML Model we should use for our project, as well as confirming my belief that there are no alternate Computer Vision techniques we can use. In my research, I reviewed edge detection techniques and color analysis systems for detecting cards. However, as I initially believed, these systems are not robust to different conditions, which is a user requirement for us. Furthermore, these systems do not do further feature processing and refinement like the ML Model’s we will be using can do. This lack of further processing means that a pure Computer Vision approach would not meet the user requirements. Once I had fully confirmed my suspicions, I began to further research Resnet-50 and the YOLO family of models. We have currently decided to use a YOLO model, since after further research, we realized Resnet is a common backbone for most object detection pipelines, but would require further scaffolding on our part. YOLO would streamline this process, so for next week, we are deciding which YOLO model to use, and we will begin preparing the model and dataset for training.