Alice’s Status Report – 04/30/2022

Turns out the numbers that I had gotten for Chamfer distance were off and they were even closer than I expected to the reference :). I got support for multiple spheres working and I was able to come up with a new visualization to confirm the fluid simulation traces matched as well.

My goal for this upcoming week is to re-generate all of the reference fluid simulations, record those and get log files, and then get log files from the FPGA fluid simulations and get Chamfer distance numbers. I also want to use the new visualization method to compare the fluid simulations as point clouds for easier visual verification. This will also provide as a nice graphic for all of our presentation materials.

We’ve more or less locked down our final code version,  just need to verify everything and document all of our hardware improvements with timing numbers. Feeling great that our project finally has something visual for results 🙂

Jeremy’s Status Report – 04/30/2022

This week I primarily worked on integration and the poster for our final presentation. I also worked a little bit on debugging the AXI ports with Ziyi. We integrated the different optimizations together and also worked on getting more data for our optimizations.

I am happy with our progress in this project.

Team’s Status Report – 04/20/2022

With most of the baseline operations implemented and verified, most of the time for this last week was spent on finalizing integration and adding some extra touches to algorithm and the different scenes we want to demo.

On the hardware side, we finalized how we would handle the interfacing between the Host CPU on the FPGA and the FPGA fabric itself. After some testing, we realized that the our port-widening scheme resulted in some faulty values being translated. We believe that this was due to how our datatype is only 24 bits, while the ports would have to be multiples of 32 bits wide. We think that this offset might be messing up our pointer-casting data-packing scheme. However, this is not really that big of an issue, as we can just have a longer burst transaction length. Furthermore, testing the different kernels on the FPGA yielded similar timings as well. It seems that as long as the transaction was bursted, the specifics did not really matter. (Amdahl’s Law suggests that we should turn our attentions elsewhere :^) ). Other than that, we decided to unroll a couple more things, and we mostly locked in our final code.

On the software side, we focused on adding support for displaying new scenes. There was some exploration into supporting other types of primitives such as quads and even meshes, but with a week or two left it was decided that we would just add support for more spheres (the alternative would be looking into compiling OpenGL on the Ultra96 and/or a major refactor for supporting a general Shapes class and changing a lot of std::move operations). We also figured out a new way of visualization so that we could compare the fluid simulation traces directly as point clouds. We still need to do some timing for data transfer between the Ultra96 and the host computer, but that should be trivial since it’s just an scp operation and then using a built-in timing tool in terminal.

Overall we’re pretty excited that our project actually works and we have cool things to show off now. We just need to get a lot of specific timing numbers down to address our requirements now, but we’re confident that we can get that done in the next couple days in time for the presentations.

Ziyi’s Status Report – 4/30/2022

During this final week, we pretty much just finished up working on the integration of the entire project. We merged Jeremy’s changes in the loop logic with my logic for the burst transfers and verified that the results made sense. After this, we investigated unrolling and pipelining a few more loops and managed to squeeze out a bit more performance. As  shared in the presentation, here is a summary of some of the effects of different optimizations.


As a note, these results are only estimates of the kernel operation itself, and do not entirely reflect the costs of both the kernel and its associated data transfer.

 

Other than integration, the rest of this week was pretty much spent on preparing presentation materials, including the final presentation, the poster, and the final video.