We aim to address two current challenges in ML applications:
- Models are too heavyweight to run on local machine.
- Consume excessive energy, making them environmentally unsustainable.
To address this problem, we plan to develop an FPGA-based accelerator as a precursor to an ASIC capable of running smaller, lightweight “bitnets” locally.
Bitnets are highly quantized versions of their base bulky models, and recent research by Microsoft, Tsinghua University and the Chinese Academy of Sciences has shown that such models can be trained with a minimal loss in model output quality.
Our proof of concept will demonstrate architectural improvements, achieving faster text generation compared to FPGA- based CPU/GPU systems of similar size and power class. We will validate our approach using a heavier text completion model.
Currently, we are working on identifying the ideal bitnet model that we aim to accelerate, using the following considerations to evaluate the models:
- The models should be small enough to run on the FPGA’s limited hardware resources.
- The models should be producing good enough outputs that they could be used for applications like text or code completion. With a future goal of predictive text completion.
Currently:
Amelia is investigating potential text to text models that we could use. (Based on the work of Microsoft’s bitnet framework)
Andrew is looking into the potential of retraining a Flux text to image model for smaller size. (Based on the work of Black Forest Labs)
Anirudh is trying to create a quantization and training system for the Flux text to image models so that they can be compressed to bitnets (Based on the work of Tiktok Research)