anirudhp status report 26th April
This week my main goal was to exhaustively test for correctness and functionality(which we discussed with Prof. Gloria to be the main objective of our testing). In order to do this, I took two paths:
- Assuming that the system itself works given that there are no underlying assumptions. So the only location of bugs could be in multi-client.
- User testing with other ECE majors, both inside and outside capstone. Specific testing to find the potential edge cases that we could have missed.
The objective of testing (1) was to specifically move the flag reading and writing behaviour on the second client to specific blocks of time and see if correctness was maintianed.
I simply created a list of all flags that were being set and then tested every intermediary time step by ramping up the refresh speed and using if statements to target those specific time windows.
While this did not yield any errors, it did verify that there are absolutely no scenarios that the system would yield a wrong or incorrect output for.
Over next week, we will be wrapping up the final documentation for the project and releasing the client side code as a unified image. We are well ahead of schedule.
Team Status Report 26th April
This week we had already wrapped up our unit testing and final presentations, and mainly focussed on wrapping up the deliverables for the project.
We completed the following:
- Talked to Prof. Gloria about the ideal structure of user testing and what we aim to find from the testing:
- The only goal of user testing is to try and break the system.
- As a result, other ECE students that could devise edge cases were used for UI testing.
- Multi client involved the sequential setting of flags, and a tertiary system would keep reading them asynchronously .
- We needed to test timing errors so we made a timing diagram and blocked the testing into different segments.
- Details covered in anirudh status report.
- Wrapped up the project poster and most of the final report.
Next week we’ll just wrap up the final report and submit the remaining deliverables after presentation. Given that we are almost completely done with this and are ready for the demo, we are well ahead of schedule.
anirudhp_status_report_April19th
This week, I mainly focussed on testing as well as wrapping up the project, in the process I patched the following issues:
- Multi-client was reading the output too early becasue the flag was set well in advance.
- Output quality was ramped up using a better bitnet model.
- An interesting bug was identified during user testing: just because you don’t actually print the output in the autocomplete while waiting doesn’t mean that you aren’t reading it. Hence you are capable of reading the other person’s output.
- This was fixed by making it read only during the non output ready time steps.
Currently, my goals for next week are:
- Complete the final report
- I will be presenting the final presentation, so I will need to rehearse for that.
Some things that I specifically learnt over the course of the project are:
- Access controls and the like on an FPGA hard-core. Have not learnt this before, and user testing allowed me to identify that this would be an issue.
- The software defined interrupt interface on the mac, this was the key technology that I had used for the keyword prompting.
- Presentation and specifically not to focus on text that is already stated on the slides, this was done while rehearsing the presentation yesterday.
Team Status report_Apr19th
This week we placed the final touches to our overall project and completed the presentation for the coming week.
We accomplished the following:
- User testing with experienced tech savvy users, this is to find potentially “broken” things, so far we found a few minor fixes covered in the individual status reports.
- Testing multi-client and ensuring functionality as well as security.
- Completing our presentation and report.
Over the next week, as we complete the presentation, we will also aim to do the following:
- Identify some way of bringing the FPGA to the demo(beyond requiring the ethernet dock that we had requested earlier)
- Complete the report and poster for teh final demo date.
Currently we are well ahead of schedule with only minimal work required next week.
anirudhp_status_report
This week, I wanted to verify that the 700M parameter model would run on the FPGA(given that our primary focus is to get the output quality up and running).
In order to verify this, I had a full analysis of the memory resources that were available on the FPGA, and added a memory profiling system to the CPU based runtime on my laptop.
For now, I believe that the FPGA should not have trouble running the model once we delete PYNQ from the board. However, the always on python script will need to be adjusted.
In terms of the milestones, we are still on track. But I ran into a few other things that need to be changed in order to delete PYNQ and that additional work might put it marginally behind schedule.
Over next week, I want to work out exactly how the interfacing and inference system would work once I delete PYNQ from the board.
Team Status Report(12th April)
This week, we had the following goals:
- Reevaluate the 700M model and see if it would fit on the FPGA after deleting PYNQ, we didn’t want any surprises.
- Identify further speedups that we could generate and pull the profiling data from the FPGA.
- Extend the UI script to a generic system that anyone connected to the CMU WiFi setup can use.
We did manage to achieve goal 1(see anirudhp status report) but had some trouble with points (2) and (3). Currently we are almost done and as a result only slightly behind schedule, but we would like to get them early next week.
So over the next week we have the following goals:
- Delete Pynq and sub in the new model
- Run the advanced UI script on the rest of our laptops and get the system working
- Pull power and profiling graphs on demand.
This would essentially get our entire system done and ready for the final presentation and demos.
anirudhp_status_report_March29th
This week, my goal was to implement the model inference system on the FPGA.
I ended up running into a large number of issues, and was forced to switch models in order to fix them. For now, the model switch is temporary. I simply switched because the dynamic memory was over the board’s resources, theoretically this should not be happening and it should definitely be only because there is a bug in the inference systems.
We shifted to afrideva/llama-160m-GGUF llama-160m.q2_k.gguf along with the q2_k quantization system.
For now, we are well ahead of schedule given that this model produces decent output quality(37% hallucination rate). For now, changing from this model back to the original one is a much lower priority.
My goals for next week are to increase the performance(it is currently operating at a reading speed level(8-10 tokens/sec) all the way to reading level. It currently is a bit slower than the reading speed given that I notice a slight lag.
Team_status_report_March29th
This week, we set things up for the interim demo. Couple of changes that we made:
- We moved from a 700 Million parameter model to a 165M parameter model.
- We swapped over our quantization techniques due to the custom kernel not being able to accept the smaller model for this quantization form.
- We used a client based access control system, and as Prof. Theo said, we observe a starvation based model in our resource requests.
The details will be analyzed further in individual status reports.
We are well on schedule and have actually hit our basic MVP. The only thing left to do now would be to iterate and try to improve the performance of our system. We are also exploring how to extract power and telemetry signals from the FPGA for simple visualizations.
Team_March22nd_Status_Report
This week, now that the FPGA has both been setup and connected to the campus wifi network, we could easily complete all parts individually without having to pass the FPGA around.
We architected an alternate approach to multi-client based response that operates with the scheduler on the client side rather than the server side of things:
- When the FPGA receives a query, it sets an “in-use” flag on and starts operating on that query.
- Before the client sends a query to the FPGA, it checks the in-use flag.
- Waits till the in-use flag turns off before actually sending the query.
This system leaves it vulnerable to race conditions, but we have decided to accept that minor flaw.
Andrew worked on running the model on the FPGA.
Anirudh setup a basic answer system and the in-use flag requirement.
Amelia refined the UI script so that it reads the flag and performs the wait before sending the query across.
For the time being, all individual components have been completed and at this stage we are moving on to the integration step. While we have tested everything, this can only be verified during integration. But it looks like we are well ahead of schedule given how easy integration should be.