Team’s Status Report 02/19/2022

If you’ve already looked at our individual reports, you’d know that much of our week was spent on realizing the Y2k22 issue with Vitis, so we won’t go on too much about it here. Instead, we’ll focus on the positives. 😊

To start, we now have a local instance of Vitis running on Alice’s laptop! The point of this is that we will no longer be restricted to the build tools present on the ECE machines. As the current Scotty3D code runs on a newer version of CMake, the ECE machine was unable to compile the program. Things should be more straightforward on a local machine, where we can easily install whatever packages we need. In any case, it seems like our build platform woes are hopefully coming to an end…

 

In terms of algorithmic improvements, we met a couple times throughout the week to analyze the code in the fluid.cpp file itself, rather than just identifying the dependencies in the overall algorithm. Of primary importance for these meetings was determining how to represent each of the datatypes and determining what data we would be able to store on chip in the BRAMs, as well as how many copies of the data we would actually be able to store. One particular structure of importance was trying to figure out how we would actually implement the neighbor map on the FPGA. In the regular implementation, the neighbor map is simply an unordered hash map that uses a nearest quantized point to index into a list of neighboring particles. While this implementation is fine in software, if we actually want a performant hardware implementation, we’ll likely need to manually implement the hashmap as a BRAM array of pointers to another BRAM array of particles.  Aside from that, we also took a look at the looping structures in the code and assessed its ability to benefit from pipelining. In fact, we found that a lot of the steps in the code could actually be pipelined. Steps 2 and 3 seem to have no interdependencies, so we can probably unroll and pipeline those two steps. Other than that, we also took note of the instances of multithreading code that we would need to strip from the fluids.cpp file due to depreciation.

For the next week, our primary goal is to get a build of the fluids.cpp kernel working in Vitis HLS, as this will give us the baseline results we need.

Ziyi’s Status Report for 02/19/2022

Basically, much of this week was spent realizing that Vitis has been broken on the ECE machines since the start of the new year. Basically, the very same Y2K22 bug that plagued Microsoft Exchange was affecting the build process for compiling to hardware on Vitis. Unfortunately, this took a bit of dredging to find on the Vitis forums, and so we lost a couple days of progress, as we thought that it was an issue with our personal configuration rather than an issue with the system itself. Thankfully, after pointing out the issue to Professor James Hoe, he was able to quickly implement the patch for fixing the build tools. Finally, we were able to compile a project and generate a PetaLinux image to flash onto the FPGA. But then we needed to actually interface with the FPGA. Unfortunately, due to some weird configurations, the FPGA’s internal WiFi was not automatically set up, so we needed to interface with it through mouse and keyboard (we were also missing the mini Displayport cable, so we had to overnight that).

After finally gaining access to the FPGA interface, we were able to connect the Ultra96 to our local Router. Now, we are able to remote into the FPGA whenever we are connected to the local Router. We’ll still need to do some poking around in order to gain access to the FPGA when on campus, as our apartment network does not play nicely with port forwarding, but I’m sure we can figure something out. Either way, this is a good start for having a more streamlined development platform. We might decide to set up the board  in 1307, so we can just VPN in, but it’s flexible.

Other than this, we did some speicifc code analysis on the fluids.cpp file, but we’ll talk more about this in the team report.

Next week, my main goal is to compile a baseline kernel of the step2 function (which consists of the main body of the fuild simulation compuational kernel) using Vitis HLS. This will involve significant tinkering of the code and perhaps refactoring into more friendly datatypes. The best case scenario is that everything just compiles, but that’s likely quite the pipe dream.

 

 

Jeremy Dropkin Status 02/19/2022

This week unfortunately we had to spend a while to determine that for a while there was a bug with the version of Vitis installed on the cmu ece number machines. This means that for a long time we were determining why we were facing build errors on the ece machines and were unable to compile any project in Vitis, even basic example ones. After a while we determined that we were facing a bug due to the way Vitis stored integers in a format that broke on Jan 1 2022.

In addition to dealing with compiler errors, we also figured out what portions of the fluid rendering algorithm we are able to pipeline and apply loop unrolling to. I personally was able to gain a much deeper understanding of how the fluid rendering algorithm internally works and structures data. This helped me understand how we will structure the hardware, and be able to massively accelerate the algorithm. I also began thinking about how to schedule requests, and worked on the slides for the design presentation.

I think that we are slightly behind on schedule due to the issue with Vitis, but due to the way we created our schedule, I think that we are in a very recoverable position.

Alice’s Status Report for 2/19/2022

This week I was able to get Vitis set up on my laptop. We are now using my laptop as the testing platform since getting Scotty3D to compile was near impossible on the Andrew Linux machines, so we decided to pivot to using a machine that already had Scotty3D working. This process unfortunately took about 12 hours this week due to trying out various machines and hard drives and having to clear up space for Vitis, Vitis HLS, and Vivado (the install required 200 GB).

I am currently working on getting Scotty3D to build in Vitis on my laptop. In particular, I am working on getting Vitis to respect the CMake build system of Scotty3D. Based on some readings it seems promising that Vitis will be able to, there is a section on the Xilinx documentation website for Vitis and Makefiles. In addition to this, I’m also working with Ziyi & Jeremy in re-writing the Scotty3D code to be more hardware friendly, for instance, getting rid of recursion in the collision detection function and getting rid of unnecessary member variables in the Particle object (among other things).

We realized that all the rendering should be done on the fabric (the board’s CPU), so we don’t need OpenGL to work on the FPGA. OpenGL should behave like any other C++ library.

Next week I definitely want to get Scotty3D built through Vitis. Among software rewrites, I will also start looking at making a lightweight version of Scotty3D so we can separate out only what we are aiming to accelerate from the rest of Scotty3D. I also want to follow through on my command-line interface for an easier workflow. I am *slightly* behind on my schedule, but not worried at all as I estimate it to be only 1 or 2 days of work behind.

Team Report – 2/12/2022

Currently the significant risk that we are facing is the C++ code that isn’t compiling. A lot of the C++17 features that are being used are features we are not familiar with. We are making good progress on learning about this features, however, and are in communication with the main developer of Scotty3D on our various compilation issues.

The most important task for the project at the moment is getting OpenGL to work on the FPGA. Alice and Ziyi worked on it briefly this week, but more work is necessary.

No changes were made to the system design or schedule.

Alice’s Status Report for 02/12/2022

This week I worked on getting rid of dead code and re-working some of the imports so that fluid.cpp would be reliant on as few other libraries as possible, and started to make the flowchart for how the different data structures work together. I generated some dependency graphs in order to visualize which parts of Scotty3D are critical to the fluid simulation and UI. 

 

I also worked heavily with Ziyi this week. I helped Ziyi debug some compilation issues that arose when trying to compile Scotty3D in Vitis. There are various issues with the CMake and C++ versions, since a large portion of the codebase is implemented with C++17 features. We also took on getting OpenGL to work on the Ultra96 together.

The initial steps for this project are slow and a bit confusing, but we are still making progress at a rate that we expect. Next week I hope to complete the flowchart diagram and work with Ziyi to get Scotty3D compiled in Vitis, and make significant progress on getting OpenGL + simple graphics demo to run on the FPGA.

Jeremy’s Status Report for 02/12/2022

This week was a familiarization week for me. I personally do not have experience with HLS or xilinx FPGAs, so I was trying to get familiar with the platform we are using. To do so I went through the tutorial documents at https://xilinx.github.io/Vitis-Tutorials/2021-2/build/html/docs/Getting_Started/Vitis/Getting_Started_Vitis.html in order to understand the general workflows and structures of applications. This allowed me to gain a better understanding of how we need to restructure the Scotty3D library in order to accelerate it on the hardware.

I also began looking at using the DisplayPort interface on the board, as we want to display our rendered outputs so utilizing the DisplayPort will be necessary. There were several helpful documents I found such as https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842318/ZynqMP+Standalone+DisplayPort+Driver. This page links to several examples of how to use DisplayPort drivers for applications on the Xilinx board. I anticipate getting this to work correctly will be somewhat finicky, which is why I have started to look into how it works early on.

Although initially starting up any project will be somewhat slow due to figuring out workflows, I believe that we are making steady progress and I think that we are on schedule. Next week I would like to make progress on working on getting the data from the CPU on the board to the FPGA.

Alice’s Status Report – 02/05/2022

This week I did some benchmarking on the different chunks of the fluid simulation algorithm (implemented in the existing Scotty3D codebase). They are as followed:

Some notable observations I made this week:

  • gradWspiky and WPoly6 (see the paper) are distinct compute kernels which are used in both “Get Newton’s Method Scaling Factor” and “Particle Position Correction”.
  • The particle data structure is currently an unordered map of indices to a vector of Particle objects that gets reconstructed after every position update. This means that neighbor searching is essentially an O(1) operation, since the index is predetermined and the search can be confined to a 9×9 cube around the spatial voxel in question.
    • I think a new data structure (hash grid) like the paper would be more efficient, so I will potentially be working on this in the next week.

A lot of work this week was done for the neural rendering project that we had previously, so this status update does not reflect 12 hours of work for this new fluid simulation project. On the other hand, though this means we are *technically* somewhat behind since we recently pivoted from neural rendering to fluid simulation acceleration, I personally am feeling confident about our project since I am quite familiar with the current codebase and the paper, and the code has already been tested and run on an ARM chip (Jeremy’s Macbook with an M1 processor).

In the next week I hope to make a diagram on what data structures are important and how the algorithm updates each of them (i.e. scene object BVH, vector of total particles, vector of scaling factors, etc.). I also hope to start work on redesigning the UI to be more user-friendly and work with Ziyi & Jeremy on figuring out how to get OpenGL working on an FPGA, and also looking at the code to see what changes can be made in the software to make it easier to port over to the FPGA.

Ziyi’s Status Report – 2/05/22

This first week of the project was anaylzing the feasibility of accelerating the fluid simulation workload. My first step was to analyze the fluid simulation algorithm to assess where we could stand to benefit from the increased parallelization.

From an initial viewing, we can obviously observe that the “for all particles i do”  loops introduce an obvious avenue of parallelization. For each request of the fluid simulator, we expect to process 512 particles at a time; we could attempt to fully unroll into 512 separate threads, but this could take up a large amount of hardware. Instead, we’d probably want to do a batched pipeline, where we dispatch some N-sized batch of particles into the pipeline at a time. In terms of the exact parameters of the pipeline (parameters such as batch width, pipeline depth), these will be handled on the low-level by the HLS tool itself and on the high-level by the relative importance of different sections of the code. For instance, we might expect that the loop from lines 20-23 will occupy much of the runtime than lines 1-4. As such, Amdahl’s law tells us that we should first focus on deriving speedup for lines 20-23. On the first order, we may say that performance is directly correlated with the amount of hardware resources we assign to a task (as we are just instantiating more threads); so concretely, we may desire a 16-wide pipeline for lines 1-4 and perhaps a 64-wide pipeline for lines 20-23. Of course, we will arrive at some more exact figures once we fully crack open the code and perform some mappings to the hardware resources and determine how much we have to work with.

In terms of progress, I would say that we are certainly a bit behind, due to the pivot from the UNISURF project to this. However, I will say that a lot of the investigations we performed for UNISURF in regards to the hardware resources of the Ultra96 and the organization of data between the CPU and the FPGA fabric map nicely to this new project.

In terms of deliverables we would like to have completed by next week, I would personally like to have the entire Vivado/Vitis project set up. This would mean that we first have to ensure that the base program works nicely on the Ultra96’s ARM core, and then we’d have to designate the different parts of the Vivado project such as the specific compute kernel. Since the Fluid Simulation library is only a portion of the Scotty3D program, I’ll have to investigate to see if there is anything special I’ll need to set up in order to ensure that the different compute tasks are correctly running on the FPGA. In terms of whether Vitis can port in the code, since everything is written locally on the Scotty3D library (no reliance on external libraries), I don’t think that we’ll run into any troubles on that front. Nevertheless, in terms of getting this up an running, I doubt it’ll be as simple as tossing the code into Vitis and hitting build. I will be reviewing 643 documentation to see if I can set up a more streamlined compilation and testing platform.

Ziyi’s Status Report for 2/5/2022

This first week of the project was anaylzing the feasibility of accelerating the fluid simulation workload. My first step was to analyze the fluid simulation algorithm to assess where we could stand to benefit from the increased parallelization.

From an initial viewing, we can obviously observe that the “for all particles i do”  loops introduce an obvious avenue of parallelization. For each request of the fluid simulator, we expect to process 512 particles at a time; we could attempt to fully unroll into 512 separate threads, but this could take up a large amount of hardware. Instead, we’d probably want to do a batched pipeline, where we dispatch some N-sized batch of particles into the pipeline at a time. In terms of the exact parameters of the pipeline (parameters such as batch width, pipeline depth), these will be handled on the low-level by the HLS tool itself and on the high-level by the relative importance of different sections of the code. For instance, we might expect that the loop from lines 20-23 will occupy much of the runtime than lines 1-4. As such, Amdahl’s law tells us that we should first focus on deriving speedup for lines 20-23. On the first order, we may say that performance is directly correlated with the amount of hardware resources we assign to a task (as we are just instantiating more threads); so concretely, we may desire a 16-wide pipeline for lines 1-4 and perhaps a 64-wide pipeline for lines 20-23. Of course, we will arrive at some more exact figures once we fully crack open the code and perform some mappings to the hardware resources and determine how much we have to work with.

In terms of progress, I would say that we are certainly a bit behind, due to the pivot from the UNISURF project to this. However, I will say that a lot of the investigations we performed for UNISURF in regards to the hardware resources of the Ultra96 and the organization of data between the CPU and the FPGA fabric map nicely to this new project.

In terms of deliverables we would like to have completed by next week, I would personally like to have the entire Vivado/Vitis project set up. This would mean that we first have to ensure that the base program works nicely on the Ultra96’s ARM core, and then we’d have to designate the different parts of the Vivado project such as the specific compute kernel. Since the Fluid Simulation library is only a portion of the Scotty3D program, I’ll have to investigate to see if there is anything special I’ll need to set up in order to ensure that the different compute tasks are correctly running on the FPGA. In terms of whether Vitis can port in the code, since everything is written locally on the Scotty3D library (no reliance on external libraries), I don’t think that we’ll run into any troubles on that front. Nevertheless, in terms of getting this up an running, I doubt it’ll be as simple as tossing the code into Vitis and hitting build. I will be reviewing 643 documentation to see if I can set up a more streamlined compilation and testing platform.