anirudhp_status_report_April19th

This week, I mainly focussed on testing as well as wrapping up the project, in the process I patched the following issues:

  1. Multi-client was reading the output too early becasue the flag was set well in advance.
  2. Output quality was ramped up using a better bitnet model.
  3. An interesting bug was identified during user testing: just because you don’t actually print the output in the autocomplete while waiting doesn’t mean that you aren’t reading it. Hence you are capable of reading the other person’s output.
  4. This was fixed by making it read only during the non output ready time steps.

Currently, my goals for next week are:

  1. Complete the final report
  2. I will be presenting the final presentation, so I will need to rehearse for that.

Some things that I specifically learnt over the course of the project are:

  1. Access controls and the like on an FPGA hard-core. Have not learnt this before, and user testing allowed me to identify that this would be an issue.
  2. The software defined interrupt interface on the mac, this was the key technology that I had used for the keyword prompting.
  3. Presentation and specifically not to focus on text that is already stated on the slides, this was done while rehearsing the presentation yesterday.

Team Status report_Apr19th

This week we placed the final touches to our overall project and completed the presentation for the coming week.

We accomplished the following:

  1. User testing with experienced tech savvy users, this is to find potentially “broken” things, so far we found a few minor fixes covered in the individual status reports.
  2. Testing multi-client and ensuring functionality as well as security.
  3. Completing our presentation and report.

Over the next week, as we complete the presentation, we will also aim to do the following:

  1. Identify some way of bringing the FPGA to the demo(beyond requiring the ethernet dock that we had requested earlier)
  2. Complete the report and poster for teh final demo date.

Currently we are well ahead of schedule with only minimal work required next week.

anirudhp_status_report

This week, I wanted to verify that the 700M parameter model would run on the FPGA(given that our primary focus is to get the output quality up and running).

In order to verify this, I had a full analysis of the memory resources that were available on the FPGA, and added a memory profiling system to the CPU based runtime on my laptop.

For now, I believe that the FPGA should not have trouble running the model once we delete PYNQ from the board. However, the always on python script will need to be adjusted.

In terms of the milestones, we are still on track. But I ran into a few other things that need to be changed in order to delete PYNQ and that additional work might put it marginally behind schedule.

Over next week, I want to work out exactly how the interfacing and inference system would work once I delete PYNQ from the board.

Team Status Report(12th April)

This week, we had the following goals:

  1. Reevaluate the 700M model and see if it would fit on the FPGA after deleting PYNQ, we didn’t want any surprises.
  2. Identify further speedups that we could generate and pull the profiling data from the FPGA.
  3. Extend the UI script to a generic system that anyone connected to the CMU WiFi setup can use.

We did manage to achieve goal 1(see anirudhp status report) but had some trouble with points (2) and (3). Currently we are almost done and as a result only slightly behind schedule, but we would like to get them early next week.

So over the next week we have the following goals:

  1. Delete Pynq and sub in the new model
  2. Run the advanced UI script on the rest of our laptops and get the system working
  3. Pull power and profiling graphs on demand.

This would essentially get our entire system done and ready for the final presentation and demos.

anirudhp_status_report_March29th

This week, my goal was to implement the model inference system on the FPGA.

I ended up running into a large number of issues, and was forced to switch models in order to fix them. For now, the model switch is temporary. I simply switched because the dynamic memory was over the board’s resources, theoretically this should not be happening and it should definitely be only because there is a bug in the inference systems.

We shifted to afrideva/llama-160m-GGUF llama-160m.q2_k.gguf along with the q2_k quantization system.

For now, we are well ahead of schedule given that this model produces decent output quality(37% hallucination rate). For now, changing from this model back to the original one is a much lower priority.

My goals for next week are to increase the performance(it is currently operating at a reading speed level(8-10 tokens/sec) all the way to reading level. It currently is a bit slower than the reading speed given that I notice a slight lag.

Team_status_report_March29th

This week, we set things up for the interim demo. Couple of changes that we made:

  1. We moved from a 700 Million parameter model to a 165M parameter model.
  2. We swapped over our quantization techniques due to the custom kernel not being able to accept the smaller model for this quantization form.
  3. We used a client based access control system, and as Prof. Theo said, we observe a starvation based model in our resource requests.

The details will be analyzed further in individual status reports.

We are well on schedule and have actually hit our basic MVP. The only thing left to do now would be to iterate and try to improve the performance of our system. We are also exploring how to extract power and telemetry signals from the FPGA for simple visualizations.

Team_March22nd_Status_Report

This week, now that the FPGA has both been setup and connected to the campus wifi network, we could easily complete all parts individually without having to pass the FPGA around.

We architected an alternate approach to multi-client based response that operates with the scheduler on the client side rather than the server side of things:

  1. When the FPGA receives a query, it sets an “in-use” flag on and starts operating on that query.
  2. Before the client sends a query to the FPGA, it checks the in-use flag.
  3. Waits till the in-use flag turns off before actually sending the query.

This system leaves it vulnerable to race conditions, but we have decided to accept that minor flaw.

Andrew worked on running the model on the FPGA.

Anirudh setup a basic answer system and the in-use flag requirement.

Amelia refined the UI script so that it reads the flag and performs the wait before sending the query across.

For the time being, all individual components have been completed and at this stage we are moving on to the integration step. While we have tested everything, this can only be verified during integration. But it looks like we are well ahead of schedule given how easy integration should be.

Anirudhp_March22nd_status_report

This week, we built towards a basic prototype that interfaces with the FPGA for accelerated inference.

My personal goals were to get used to dealing with a Pynq based interface and identify how I could use it to do the following:

  1. Raise an “in use” flag, this would be used by the client to decide whether to send query.
  2. Directly wire the input to the output(basically send the query back to the user).

The objective of this system is to prime towards a multi-client based FPGA response system as described in our team report and design reports.

For the time being, the above goals seem to be accomplished but we can’t really verify it until the complete system integration is done over the following week. But given that the blocks are relatively simple and have been tested well internally we seem to be ahead of schedule on our project.

team_status_report_March_15th

This week, we have received our FPGA and the plug-in wiring needed to supply the power.

In order to move forward according to our plans, we began with the following tasks in parallel:

  1. Booting Linux on the FPGA in order to start attempting the model on the embedded core.
  2. Extending the UI script to a multi-client FPGA based approach.
  3. Working out the UI testing system(developing the form and our exact method of analyzing the responses.

The details of how each task was accomplished are in the individual status reports.

At this time, we have accomplished the following:

  1. Sending a query onto the FPGA.
  2. Booting linux on the board.
  3. We have finalized the UI script study questions.

Currently we are ahead of schedule on the technical fronts, but are probably a bit behind schedule on the user study. For now, this is not that concerning given that it is very easy to adjust the UI script to user feedback, and we are anyway improving on it at this time for the FPGA extension.

anirudhp_status_report_March_15th

This week, with the FPGA in place I aimed to extend the UI and network script to the FPGA interface.

First, I extended our earlier hotkey interface to send the query to the board for inference completion. This was done via a scp protocol to ensure security in data movement as the data moves from the laptop to the FPGA through the local network.

At the moment, we have not yet managed to perform inference on the FPGA(accelerated or otherwise). So I have not yet been able to test the return of the file. However, I did take a look into a multi-client approach that is entirely based on the laptop.

What I did was try to pull a flag from the FPGA that dictates whether it is servicing a query or not. And wrapped this in a loop to ensure that the laptop does not send a query until the FPGA is free. And as for the authentication system, we simply start the text of the query with a passphrase embedded into the script and the FPGA will use it to verify the user authentication.

So far, even though we haven’t performed an actual inference on the board, we still seem to be ahead of schedule given that our multi-client approach has made significant progress.