Anirudhp_status_Feb8th2025

So this week while Andrew and Amelia were finalizing the model and setting up the FPGA, I dealt with the user interface and hotkey setup.

I utilized a lua interface that sits above the MacOS Kernel to trigger software interrupts and packaged the entire system into a single python script that will allow for the hotkey “CMD + G” to trigger our bitnet llm of choice.

Currently, our bitnet performs reasonably fast — taking around 4-5 seconds on a manual stopwatch to generate the output. This however does not stream the output token by token, and rather sends the entire output to the surface at once. Something that will have to be fixed over the next week.

While I have not taken any power measurements yet, I did notice that it turned my laptop’s fan on after I ran it 10-15 times in quick succession.

My goals for the next week are:

  1. Benchmark the model on a purely MacOS based infrastructure.
  2. Allow the system to stream tokens rather than displaying all at once.
  3. Figure out some way to take power measurements and benchmarks for the Mac based runtime.
  4. Benchmark the model for safety and look into quantizing a deepseek like system in order to improve hallucinations and accuracy(reasoning based models are inherently better in this regard.)

Anirudhp_29thJan2025

I am currently working on recreating a Flux 1.58 bit model as announced by Bytedance Research.

However, at this time, the model that they have trained shows a 7.7x times size improvement over the existing 23.5GB Flux model that was released by Black Forest Labs. This model will be in excess of 3Gb, and cannot be accomodated on the FPGAs that we have access to(max size 2Gb).

As a result, I have currently replicated the quantization process for the Flux model, however even though the model was open sourced by Black Forest Labs, the training code and training data are not referenced. As a result, I am currently trying to adapt the quantization system for a fully open-source text to image system such as:

Dall-E Mini or the first Flux.1 Dev model that was released.

However, the FLux model when quantized to 1.58 bits does produce excellent outputs that are almost on par with the original model.

Eg: “A man using a soldering iron to repair a broken electronic device” Produces:

My goal for the end of the next week is to either identify a way of using an FPGA that can accommodate the larger models(Using either a DIMM slot or in an extreme case, networking two FPGAs).

And if this is not possible, either distilling the FLUX model or recreating the quantization code for DALL-E Mini