What did you personally accomplish this week on the project?
- Was finally able to run grounding Dino on Rasp Pi. Though it was far too slow – 30 minutes when yolo takes around 3 seconds. Thus another solution had to be figured out – “the cloud”.
- AWS was not used due to needless overhead as we were not serving hundreds of customers right now. We just needed a proof of concept. Thus Gian and I set up a crude web server on a GPU(RTX4080) that ran Grounding Dino in around 1-2 seconds. Thus grounding Dino could finally be integrated into the codebase.
- Finally created end-to-end voice to speech functionality with vector cosine comparisons built in. So now one can ask a question prefaced with hey Kira(our capstone project’s name), and then have it search for that word you asked for in the database after using Langchain/prompt engineering to find the specific word you were looking for in phrase. If that word is NOT present in the database then it finds the closest word via the vector cosine comparison via an embedding model and if the closest word is close enough, it looks for it in the database. Finally when it finds the word and the corresponding picture, it sends the picture to GPT4 via the OpenAI api and OpenAI returns the description of the image which is then played on the rasp pis/Macs speaker via google speak. Every step of this process has been tested in addition to full integration tests including with Grounding Dino integrated.
- I have created a script that automatically generates different configs for YOLO training and then runs them and extracts validation data from them. This has allowed me to run tests on many models per day while not thinking about it.
- I am in the process of creating full traces that try to measure what percent of the right objects/images are returned to the user given various queries at various times. This should test all parts of the system ranging from the pruning functionality to how the Vector embedding/Langchain models process input.
- I am working on the presentation/graphics and making sure there is meaningful testing data to present.
Is your progress on schedule or behind? If you are behind, what actions will be
taken to catch up to the project schedule?
Thankfully Grounding Dino issues were figured out along which allowed me to finally implement the audio/speaker/vector cosine functionality. Now its just up to me to complete the traces implementation and possibly see how YOLO and Grounding DINO could be integrated more effectively as measured by the results of the traces.
What deliverables do you hope to complete in the next week?
Traces will be completed.
More models will be trained and various combinations of Grounding DINO/YOLO will be tested
More sophisticated pruning algorithms will be considered and tested.
0 Comments