So this week while Andrew and Amelia were finalizing the model and setting up the FPGA, I dealt with the user interface and hotkey setup.
I utilized a lua interface that sits above the MacOS Kernel to trigger software interrupts and packaged the entire system into a single python script that will allow for the hotkey “CMD + G” to trigger our bitnet llm of choice.
Currently, our bitnet performs reasonably fast — taking around 4-5 seconds on a manual stopwatch to generate the output. This however does not stream the output token by token, and rather sends the entire output to the surface at once. Something that will have to be fixed over the next week.
While I have not taken any power measurements yet, I did notice that it turned my laptop’s fan on after I ran it 10-15 times in quick succession.
My goals for the next week are:
- Benchmark the model on a purely MacOS based infrastructure.
- Allow the system to stream tokens rather than displaying all at once.
- Figure out some way to take power measurements and benchmarks for the Mac based runtime.
- Benchmark the model for safety and look into quantizing a deepseek like system in order to improve hallucinations and accuracy(reasoning based models are inherently better in this regard.)