Jeremy’s Status Report for 02/12/2022

This week was a familiarization week for me. I personally do not have experience with HLS or xilinx FPGAs, so I was trying to get familiar with the platform we are using. To do so I went through the tutorial documents at https://xilinx.github.io/Vitis-Tutorials/2021-2/build/html/docs/Getting_Started/Vitis/Getting_Started_Vitis.html in order to understand the general workflows and structures of applications. This allowed me to gain a better understanding of how we need to restructure the Scotty3D library in order to accelerate it on the hardware.

I also began looking at using the DisplayPort interface on the board, as we want to display our rendered outputs so utilizing the DisplayPort will be necessary. There were several helpful documents I found such as https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842318/ZynqMP+Standalone+DisplayPort+Driver. This page links to several examples of how to use DisplayPort drivers for applications on the Xilinx board. I anticipate getting this to work correctly will be somewhat finicky, which is why I have started to look into how it works early on.

Although initially starting up any project will be somewhat slow due to figuring out workflows, I believe that we are making steady progress and I think that we are on schedule. Next week I would like to make progress on working on getting the data from the CPU on the board to the FPGA.

Alice’s Status Report – 02/05/2022

This week I did some benchmarking on the different chunks of the fluid simulation algorithm (implemented in the existing Scotty3D codebase). They are as followed:

Some notable observations I made this week:

  • gradWspiky and WPoly6 (see the paper) are distinct compute kernels which are used in both “Get Newton’s Method Scaling Factor” and “Particle Position Correction”.
  • The particle data structure is currently an unordered map of indices to a vector of Particle objects that gets reconstructed after every position update. This means that neighbor searching is essentially an O(1) operation, since the index is predetermined and the search can be confined to a 9×9 cube around the spatial voxel in question.
    • I think a new data structure (hash grid) like the paper would be more efficient, so I will potentially be working on this in the next week.

A lot of work this week was done for the neural rendering project that we had previously, so this status update does not reflect 12 hours of work for this new fluid simulation project. On the other hand, though this means we are *technically* somewhat behind since we recently pivoted from neural rendering to fluid simulation acceleration, I personally am feeling confident about our project since I am quite familiar with the current codebase and the paper, and the code has already been tested and run on an ARM chip (Jeremy’s Macbook with an M1 processor).

In the next week I hope to make a diagram on what data structures are important and how the algorithm updates each of them (i.e. scene object BVH, vector of total particles, vector of scaling factors, etc.). I also hope to start work on redesigning the UI to be more user-friendly and work with Ziyi & Jeremy on figuring out how to get OpenGL working on an FPGA, and also looking at the code to see what changes can be made in the software to make it easier to port over to the FPGA.

Ziyi’s Status Report – 2/05/22

This first week of the project was anaylzing the feasibility of accelerating the fluid simulation workload. My first step was to analyze the fluid simulation algorithm to assess where we could stand to benefit from the increased parallelization.

From an initial viewing, we can obviously observe that the “for all particles i do”  loops introduce an obvious avenue of parallelization. For each request of the fluid simulator, we expect to process 512 particles at a time; we could attempt to fully unroll into 512 separate threads, but this could take up a large amount of hardware. Instead, we’d probably want to do a batched pipeline, where we dispatch some N-sized batch of particles into the pipeline at a time. In terms of the exact parameters of the pipeline (parameters such as batch width, pipeline depth), these will be handled on the low-level by the HLS tool itself and on the high-level by the relative importance of different sections of the code. For instance, we might expect that the loop from lines 20-23 will occupy much of the runtime than lines 1-4. As such, Amdahl’s law tells us that we should first focus on deriving speedup for lines 20-23. On the first order, we may say that performance is directly correlated with the amount of hardware resources we assign to a task (as we are just instantiating more threads); so concretely, we may desire a 16-wide pipeline for lines 1-4 and perhaps a 64-wide pipeline for lines 20-23. Of course, we will arrive at some more exact figures once we fully crack open the code and perform some mappings to the hardware resources and determine how much we have to work with.

In terms of progress, I would say that we are certainly a bit behind, due to the pivot from the UNISURF project to this. However, I will say that a lot of the investigations we performed for UNISURF in regards to the hardware resources of the Ultra96 and the organization of data between the CPU and the FPGA fabric map nicely to this new project.

In terms of deliverables we would like to have completed by next week, I would personally like to have the entire Vivado/Vitis project set up. This would mean that we first have to ensure that the base program works nicely on the Ultra96’s ARM core, and then we’d have to designate the different parts of the Vivado project such as the specific compute kernel. Since the Fluid Simulation library is only a portion of the Scotty3D program, I’ll have to investigate to see if there is anything special I’ll need to set up in order to ensure that the different compute tasks are correctly running on the FPGA. In terms of whether Vitis can port in the code, since everything is written locally on the Scotty3D library (no reliance on external libraries), I don’t think that we’ll run into any troubles on that front. Nevertheless, in terms of getting this up an running, I doubt it’ll be as simple as tossing the code into Vitis and hitting build. I will be reviewing 643 documentation to see if I can set up a more streamlined compilation and testing platform.

Ziyi’s Status Report for 2/5/2022

This first week of the project was anaylzing the feasibility of accelerating the fluid simulation workload. My first step was to analyze the fluid simulation algorithm to assess where we could stand to benefit from the increased parallelization.

From an initial viewing, we can obviously observe that the “for all particles i do”  loops introduce an obvious avenue of parallelization. For each request of the fluid simulator, we expect to process 512 particles at a time; we could attempt to fully unroll into 512 separate threads, but this could take up a large amount of hardware. Instead, we’d probably want to do a batched pipeline, where we dispatch some N-sized batch of particles into the pipeline at a time. In terms of the exact parameters of the pipeline (parameters such as batch width, pipeline depth), these will be handled on the low-level by the HLS tool itself and on the high-level by the relative importance of different sections of the code. For instance, we might expect that the loop from lines 20-23 will occupy much of the runtime than lines 1-4. As such, Amdahl’s law tells us that we should first focus on deriving speedup for lines 20-23. On the first order, we may say that performance is directly correlated with the amount of hardware resources we assign to a task (as we are just instantiating more threads); so concretely, we may desire a 16-wide pipeline for lines 1-4 and perhaps a 64-wide pipeline for lines 20-23. Of course, we will arrive at some more exact figures once we fully crack open the code and perform some mappings to the hardware resources and determine how much we have to work with.

In terms of progress, I would say that we are certainly a bit behind, due to the pivot from the UNISURF project to this. However, I will say that a lot of the investigations we performed for UNISURF in regards to the hardware resources of the Ultra96 and the organization of data between the CPU and the FPGA fabric map nicely to this new project.

In terms of deliverables we would like to have completed by next week, I would personally like to have the entire Vivado/Vitis project set up. This would mean that we first have to ensure that the base program works nicely on the Ultra96’s ARM core, and then we’d have to designate the different parts of the Vivado project such as the specific compute kernel. Since the Fluid Simulation library is only a portion of the Scotty3D program, I’ll have to investigate to see if there is anything special I’ll need to set up in order to ensure that the different compute tasks are correctly running on the FPGA. In terms of whether Vitis can port in the code, since everything is written locally on the Scotty3D library (no reliance on external libraries), I don’t think that we’ll run into any troubles on that front. Nevertheless, in terms of getting this up an running, I doubt it’ll be as simple as tossing the code into Vitis and hitting build. I will be reviewing 643 documentation to see if I can set up a more streamlined compilation and testing platform.