This first week of the project was anaylzing the feasibility of accelerating the fluid simulation workload. My first step was to analyze the fluid simulation algorithm to assess where we could stand to benefit from the increased parallelization.
From an initial viewing, we can obviously observe that the “for all particles i do” loops introduce an obvious avenue of parallelization. For each request of the fluid simulator, we expect to process 512 particles at a time; we could attempt to fully unroll into 512 separate threads, but this could take up a large amount of hardware. Instead, we’d probably want to do a batched pipeline, where we dispatch some N-sized batch of particles into the pipeline at a time. In terms of the exact parameters of the pipeline (parameters such as batch width, pipeline depth), these will be handled on the low-level by the HLS tool itself and on the high-level by the relative importance of different sections of the code. For instance, we might expect that the loop from lines 20-23 will occupy much of the runtime than lines 1-4. As such, Amdahl’s law tells us that we should first focus on deriving speedup for lines 20-23. On the first order, we may say that performance is directly correlated with the amount of hardware resources we assign to a task (as we are just instantiating more threads); so concretely, we may desire a 16-wide pipeline for lines 1-4 and perhaps a 64-wide pipeline for lines 20-23. Of course, we will arrive at some more exact figures once we fully crack open the code and perform some mappings to the hardware resources and determine how much we have to work with.
In terms of progress, I would say that we are certainly a bit behind, due to the pivot from the UNISURF project to this. However, I will say that a lot of the investigations we performed for UNISURF in regards to the hardware resources of the Ultra96 and the organization of data between the CPU and the FPGA fabric map nicely to this new project.
In terms of deliverables we would like to have completed by next week, I would personally like to have the entire Vivado/Vitis project set up. This would mean that we first have to ensure that the base program works nicely on the Ultra96’s ARM core, and then we’d have to designate the different parts of the Vivado project such as the specific compute kernel. Since the Fluid Simulation library is only a portion of the Scotty3D program, I’ll have to investigate to see if there is anything special I’ll need to set up in order to ensure that the different compute tasks are correctly running on the FPGA. In terms of whether Vitis can port in the code, since everything is written locally on the Scotty3D library (no reliance on external libraries), I don’t think that we’ll run into any troubles on that front. Nevertheless, in terms of getting this up an running, I doubt it’ll be as simple as tossing the code into Vitis and hitting build. I will be reviewing 643 documentation to see if I can set up a more streamlined compilation and testing platform.