Team’s status report – Team A3: N-body

Team Report:

September 23rd 2023

ABET #2 says … consideration of public health, safety, and welfare, as well as global, cultural, social, environmental, and economic factors…. Which of these factors did your team consider as you developed your project proposal ? Identify at least one factor of importance to your project and write a sentence explaining why it is important to consider it.

In our project proposal for accelerating the N-body algorithm using FPGA, we considered two main social and environmental goals: help physicists to speed up computation tasks and improve energy efficiency of running the simulation. This is important because FPGAs are known for their potential to significantly reduce energy consumption compared to traditional CPUs or GPUs when performing parallel processing tasks. By utilizing FPGAs, we aim to minimize the carbon footprint of our computation process, contributing to a more environmentally sustainable approach to scientific research. The pursuit for power efficiency aligns with the trend in computing technology because hardware development is running into the power wall as a result of the breakdown of Dennard scaling.

As a team we spent the first half of the week on our project proposal which had to be presented on Wednesday (9/20). We spent the rest of the week setting up our FPGA (Yuhe), exploring parallelism with HLS (Rene, Abhinav) and looking at how we would set up our graphical display (Abhinav).

September 30th 2023

This week as a team we finalized the size of the simulation we would be working with: 10000 particles as we are limited by the LUTs available on our Ultra96v2 FPGA. We also refactored our CPU benchmark to reflect this. Moreover, we confirmed that we would be transferring data in/out of the FGPA via Wifi using scp. Lastly, as a group we spent some time working on our design presentation that will be due the following week.

October 7th 2023

This week as a team we all worked on slightly different things and are hoping to tie them all together by the end of next week. We finished our design presentation together. Abhinav and Rene worked on developing the C++ code for the naive implementation,

Please enumerate one or more principles of engineering, science and mathematics that your team used to develop the the design solution for your project.

We are using parallel computing principles in order to develop our design solution. In order to achieve a 10x speedup. We looked into different parallel computing principles in order to arrive at our desired solution. We explored different concepts like Block RAM, unrolling, pipelining which are a culmination of both hardware and software level parallelism. By using hardware level parallelism, this project also plays into the hardware engineering realm as well. Lastly our computations themselves rely on Newton’s law of gravity and this would be a principle of science too.

October 23th 2023

As a team, we set up our first naive n-body simulations on Vitis to generate a hardware emulation report to verify that we have chosen an appropriate simulation size. We spent a large chunk of our week before fall break finishing our almost 10 page design report. We are currently working on setting up our Vitis HLS environment to accept and run an N-body simulation if given an input set of particles. We hope to have this ready along with a non-zero speedup from our CPU benchmark for our interim demo, as mentioned in our schedule.

28th October

As a team we completed our Vitis HLS environment set up. We can now run simulations that read data from memory. We have also begun experimenting with different optimizations and their impact on hardware utilization on smaller scale simulations as an initial test (to confirm we don’t run out of hardware when we try them on actual simulation sizes). We have also begun looking at writing programs for our graphical display. Each of us are on schedule with our gantt chart and we hope to present this in our interim demo in a couple of weeks.

4th November

As a team we finished setting up our Vitis project in addition to the Vitis HLS project from last week. This enabled us to synthesize our code onto the FPGA. We now have a completed pipeline between our User and FPGA where the user can scp some input data and get back our simulation results. We plan to show this at our demo next week. We also have started working on your graphical visualization. While we have not set this up on the arm core, we have a script that can run in isolation that accepts a set of particles at different timesteps and produces a display of how the input file indicates they move. We also continued our optimisations with fixed point types and got started with pipelining and HLS tasks.

11th November 2023

As a team we mainly focused on the demos on Monday and Wednesday. Other than that Abhinav has been working on the graphical simulation, test verification code and Rene has been working on further parallelization optimization in order to get us closer to our 10x speedup, Yuhe worked on setting up the Vitis project and is looking at better ways to be able to connect the board to her laptop. As a team we are on track and for the next week we plan on spending time as a team and sharing our discoveries while doing further integration.

18th November 2023

Abhinav worked on the graphical simulation and verification software, Rene finalized pipelining, unrolling and task optimizations. Yuhe worked on batching and BRAM optimizations to facilitate our memory read/write performance and to allow for Rene’s optimisations mentioned above. As a team we want to try synthesizing our final optimized code base before the end of the coming break so we can focus on fine tuning our accuracy and preparing for our final milestones.

Answer to this week’s questions

Over the semester, our team has honed its collaborative skills, employing strategies like SMART goal-setting and regular progress check-ins through Zoom meetings to ensure task alignment and accountability. Given that we are all studying different aspects of low level software engineering and computer architecture, it has been refreshing interacting with each other and also appreciating each other’s areas of expertise. We’ve also leveraged management tools and techniques like our gantt chart and Github Tickets for task delegation and milestone tracking, enhancing our workflow efficiency. Reflective practices, including retrospectives (like our status reports), have been instrumental in our growth, allowing us to continuously refine our processes and adapt to new challenges effectively. We have also tried to roughly follow our Gantt chart, though things have changed throughout the semester we see it as a rough guide of what we are supposed to achieve.

2nd December 2023

As a team, this week was a bit alarming at first given that we had to scrap our fixed point implementations due to its lack of precision and then had to work back from square one to achieve our speedup goal. Together we tried several builds and iterations and finally settled on an implementation that moved our entire data set on BRAM with some batching to unable loop unrolling given us 25x Speedup! We then spent the rest of the week successfully connecting the board to Rene’s laptop and also to interface with a monitor, keyboard and mouse while making some progress on running our graphical display. Our plan for next week is to finish our display while working on our final presentation and demo.

9th December

As a team, we managed to increase our speedup to 40x since last week! Additionally Yuhe prepared and presented our final presentation. Abhinav and Rene worked finishing our graphical display implementation. Abhinav created a web server that accepts data from the FPGA via HTTP and this also has options to interact with the received data and download it from here instead of having to reconnect to the board which suits our use better than the original design goal of having the FPGA present this on a display!

Team Question: List all unit tests and overall system test carried out for experimentation of the system. List any findings and design changes made from your analysis of test results and other data obtained from the experimentation.

Speedup Tests:

For CPU Benchmark: Run CPU implementation for a set of inputs several times (we chose 10 per test case) and took the average time for the same.
For FPGA Time: We ran our code for the same inputs as the CPU + a few more randomly generated inputs (to ensure our speedup was not specific to any particular workload/ data pattern) over the same number of iterations as above and then checked its speedup.
From these tests we were able to identify which implementations and optimisations worked. This led us to focusing on our BRAM based implementation and pivot completely away from using any DRAM at all and also to identify why our fixed point implementations gave us so much speedup.

Accuracy Tests:

We passed in the same input files to both CPU and FPGA implementations and compared the outputs (both and every iteration and the final results) to evaluate the accuracy.
To further test our precision, we amplified our CPU implementation to use double precision floating point types to ensure we maintained as much precision as possible (this became particularly important when verifying our fixed point implementations which led us to focusing on floating points).

Graphic Display Test:

Making sure the flask app worked locally
Sending and receiving to a local app
Hosting webpage on AWS EC2 instance
Hosting flask app on EC2 instance
Making sure gunicron worked for app deployment
Making sure nginx worked a reverse proxy
Putting all the pieces together and then hosting on AWS EC2 instance using gunicron and using nginx to configure reverse proxy
Connecting board to a Wi-Fi
Sending curl requests from the board
Bash script which executes the run command and produces output files iteratively
Bash script which does the above and then also sends the new files to the webserver using a curl command.