mngaw – Team C5: FPGA-AMP

April 28, 2024April 28, 2024

Team’s Status Report for 04/27/2024

This past week was spent again on integration, as well as on the final presentation slides. We hope to be done with integration in the next two days, leaving us some time do testing and benchmarking before the poster deadline.

Testing & Validation

The tests that are highest in priority have to do with correctness (finding a valid and optimal path) and performance (achieving some speedup). Path comparison is simpler, and is done via comparison with our correct C implementation of RRT + A*. Performance will be benchmarked via timing of the RRT kernel (and compared against the software version). We will also measure end-to-end performance to see how our improvements affect the system overall.

Perception system: Tested by comparing the dimensions of objects in a real-world scene and the mapped scene. This is >90% correct overall. However, the perception system wasn’t able to assign correct global coordinates to the objects (it was only able to preserve the relative distances and dimensions of the objects), so we made design changes to apply a fixed transformation to all points in the point cloud before processing the points using octomap. This works and does not limit the scope of our project because the camera is not expected to move in real-world settings, so it is reasonable to measure and calibrate the angles of the 3d transformation when deploying the system.

11:52

April 28, 2024April 28, 2024

Matt’s Status Report for 04/27/2024

This past week I worked on 1) debugging the HLS version of RRT, 2) system integration, mainly the communication between the Ultra96 and the laptop, and 3) the final presentation slides.

I was able to compile using HLS last week, but the tree that RRT generated at first glance seemed right but instead had some errors. Namely, RRT seemed to converge towards a configuration of the state space that wasn’t even a valid RRT tree that could be used for a motion plan. After K iterations of RRT in an empty state space (no obstacles), I saw two subtrees being grown from the initial start and end points, but the two subtrees did not connect. For some reason, further RRT iterations past K did not result in changes to the tree. I suspect the reason for this error is due to one of two possible causes—either I made an error when refactoring the code from being modular with many functions into one main function, or I made an error when altering the code to work in hardware (e.g. replacing uses of the C rand() function with a LFSR for random numbers. I discussed this problem with Chris, and we plan on meeting tomorrow to debug further.

While the UART library I found last week worked for transferring small number of bytes, I was not able to transfer the RRT data without losing info. I was not able to find a solution, and thus decided to move away from using UART and swapped to a more reliable but slower method of transferring the data: by sending the file over the network via scp.

April 21, 2024April 28, 2024

Team’s Status Report for 04/20/2024

This past week was spent continuing the integration effort, this time with the Ultra96 instead of the Kria. After discussing with Professor Kim, we decided to pivot to the Ultra96 after it became clear through discussions with AMD that setup for the Kria would require many more steps. Thus, we decided to stick with what we know better given the short timeline that we have left.

Pivoting to the Ultra96 means we will no longer be doing perception nor kinematics on the same device as motion planning/RRT. Perception will be done on a laptop, and perception data will be sent to the Ultra96 over UART. Only RRT will be done on the Ultra96, and the tree data will be sent back to the laptop, will develop a motion plan using A* (our prior troubles with A* will not be an issue here since it will be done in software). The motion plan will then be passed to our kinematics module.

April 21, 2024

Matt’s Status Report for 04/20/2024

The week prior to last week, I been working with an AMD representative, Andrew, and an ECE PhD student at CMU, Shashank, to get the Kria board working. He had sent me three tutorials to work through, all of which I did, and I ran into errors when running two of them. After discussing with Shashank and sharing the logs with him, we determined a few things wrong with how I was using Vitis. First, the scripts that I was using to run Vitis 2022.2 had some minor bugs that Shashank fixed. He also pointed out that I cannot work on Vitis projects on AFS, and so I moved my work to the local scratch directory of the ECE machine that I was working on. After this I was able to run Vitis and all three of the tutorials without failures.

At this stage, Andrew sent me a tutorial on how to specify and build a platform for the Kria. However, after discussion with Professor Kim about the pacing of our project, we decided to fall back to the Ultra96, which had more knowns and a smoother development path than the Kria, which still had a few unknowns, the main one being exactly what modules provided by the Kria we wanted to use. The tutorial that Andrew had sent was required to create a platform file that would specify to Vitis what resources were available to the accelerator that we were building. Doing this would require Vivado, and while I was able to follow the tutorial, I was not confident in my ability to adapt the tutorial and develop my own hardware platform that would suit our project. I did not originally expect having to do this step when planning to use the Kria—I had taken a lot of setup steps for granted, all of which were done by Shashank and the 18-643 staff for the course labs. Thus, that week we decided to move away from the Kria, which sadly tosses out a lot of work that my partners did to set up the Kria as an end-to-end robotics system.

This past week I easily got a working hardware version of RRT built. Due to the complications with A* search that we experienced right before interim demo, we have separated RRT and A* so that RRT alone will be done on the FPGA. Adapting the C version of RRT into a synthesizable hardware version that could be understood by the HLS compiler was difficult—I was running into this cryptic and completely undocumented error “naming conflict of rtl elements” from Vitis. Even after thoroughly searching through Google I could not find anything, so I resorted to reshaping my HLS code in different ways. Namely, I refactored our RRT code so that it essentially lives entirely in one function (this is better for HLS anyways), and I forced many automatic optimizations off (namely loop pipelining). Eventually I got a working version that would compile and give the correct results. What’s left for me to work on is now figuring out which optimizations I can turn back on so that our accelerator can be as performant as possible.

Lastly, on top of working on the RRT kernel itself, I also worked on defining how the FPGA board/SoC would communicate with the laptop (which we are using in place of the Kria board for perception+kinematics). After trying some libraries out, I settled on this very simple UART library (literally) which seemed to suit our needs. With it we are able to send bytes over UART and read/write them into/from C buffers. More importantly, it is very easy to use, consisting of only a .c and .h file pair. This is important because it means I can simply drop it into my Vitis project and compile the host code with the UART library.

Learning Reflection

During this project, I experienced for the first time what it was like to learn how to use a new tool and device (Vitis and the Kria board) by walking through online tutorials as well as through guidance from an expert (Andrew). I had prior experience with Vitis and the Ultra96 through a well-written step-by-step document given by the 643 course staff, but the online tutorials are not written with the same clarity and thoroughness. Thus, I found it useful to ask Andrew many questions, which he was more than happy to answer.

April 1, 2024April 7, 2024

Matt’s Status Report for 04/06/2024

This week I wrapped up implementing RRT for the FPGA. Most of this was simply porting the C code we wrote and adapting it slightly for HLS, but we had big changes in our code wherever dynamic memory allocation was done (since memory can’t be dynamically allocated in hardware). The main change had to do with A*, since A* uses a queue to keep track of a frontier of nodes to search next. For the FPGA we swapped out A* for SMA* (Simple Memory Bounded A*), which requires a fixed-size queue. To do this, I implemented a circular, bounded generic queue.

However I was not able to get this implementation done before interim demo, as the documentation for the algorithm is poor. In fact, I am not sure if it is a good idea to pursue this algorithm due to how small its presence is on the internet (the algorithm comes from one paper; the Wikipedia page has the paper as its only reference).

Regarding the Kria board, I also met with Andrew this past week to debug why I am not able to build with Vitis for the Kria. He gave me some tutorials to try, and I connected him with ECE IT to debug the issue further.

Verification & Validation

During our interim demo, running our software RRT* on a real-sized test case (roughly the size of a cube with sides 4 feet long) took 20 minutes. Our current RRT* is heavily unoptimized, and we expect it to run much faster after some changes.

We don’t want to over-optimize because a) the focus of our project is the FPGA version and b) a slower SW version means more speedup from the FPGA version. We will have to verify the correctness of our paths—i.e. that the path generated on the FPGA is close enough to the one generated by our SW version. This will be done by calculating deltas between both paths and making sure they are marginal. For validation, we will be measuring time elapsed for the full robotic arm pipeline, as well as the time elapsed for running just RRT on each system. We can then do comparisons to calculate speedup.

March 30, 2024

Matt’s Status Report for 03/30/2024

This past week I continued porting RRT to HLS. There are some constructs in our dense RRT that are not portable—namely the use of dynamic memory allocation in our implementation of A* search. Since all buffers need to have known size at compile time for hardware, we cannot have unbounded queues/stacks that one would normally use in conventional search algorithms. Thus, we have been trying to implement SMA*, a memory-bounded version of A*. We are doing this in software, and once that is done I will port it to HLS.

Again this week I have not been able to meet with the AMD rep, and so our plan for the interim demo is to either integrate everything with the Ultra96, or simply demonstrate the stages of our system separately if we are not able to integrate.

If we cannot get integration done, then at least for the accelerator we can get it to take in a file containing perception data, process it, and then have it write a file with a motion plan; this would be a very rough demo, showing off only the individual components. We would then have to manually transfer the motion plan and do inverse kinematics on it separately (on our laptops for example) and then manually send the commands to the robot. For interim demo at least, this should suffice.

March 23, 2024

Matt’s Status Report for 03/23/2024

This past week I had planned on meeting with the AMD representative to get help with setting up the AMD Kria board. However, we were not able to schedule a meeting this week, but we were able to get one scheduled for the upcoming Monday. However, this means that no progress was made on setting up the Kria board.

Thus we decided that it we be best to move forward with HLS development by starting to code using the Ultra96v2. The code we write for the Ultra96 should be mostly the same as the code for the Kria (just some parameters changed for the different board size/increased power of the Kria). I have gotten a vector addition example to run on the Ultra96, and I plan on modifying it so that it can run our application (RRT) instead.

March 16, 2024March 17, 2024

Team’s Status Report for 03/16/2024

Over spring break we were able to finish most of our baseline motion planning module. Our current system is capable of accepting perception data and generating motion plans for the arm to follow. Our focus is now shifted onto optimizing this implementation and porting it onto the FPGA.

During our lecture times, we focused on integration, getting our environments set up and able to communicate with each other. With perception and motion planning substantially underway, all that’s left is inverse kinematics and system integration. These will be our main focus as we approach the interm demo. Our goal is to have the full system functioning in some capacity by this point.

March 16, 2024March 16, 2024

Matt’s Status Report for 03/16/2024

This past week I spent most of my time trying to get an Vitis HLS environment set up for our new AMD Kria KR260 board. While I have a working environment set up for the Ultra96v2 (our backup FPGA board), I was not able to get it working on the Kria. We want to use the Kria because it is more powerful, and it was gifted to us by AMD to use for robotics-related experiments. The Ultra96v2 was already set up by 18-643 staff for use in the labs, but since the Kria is a new board, I have to configure the environment myself, a process that I am not familiar with. To get help with the setup of the Kria, I was able to get in touch with someone from AMD who will guide me on the setup. We plan on meeting sometime early next week.

Our RRT implementation is done for the most part, and so once the HLS environment is set up, we should able to start writing HLS code to build the accelerated version of RRT. Next week I plan on doing HLS development in preparation for the interim demo. This will probably my largest task for our project this semester. I am aiming for at least >1 speedup with no/few optimizations (i.e. hopefully not a slowdown).

March 3, 2024

Matt’s Status Report for 03/02/2024

On Sunday, Monday, and Tuesday, I spent some time trying to get our FPGA working with the HLS development environment. At first I was working with the Ultra96v1, and I soon realized that much of the configuration for developing on our FPGAs (Ultra96v2) in 18-643 were already set up by the TAs. Thus, for all other FPGAs, I would have to figure out how to set up the board for development. To temporarily remedy this, I asked permission from Professor Hoe (who taught 18-643) if I could borrow an Ultra96v2 kit from his class. He agreed, and so we will be using this board as our backup. We are still waiting for the Kria KR260 from AMD. I hope we will be given guidance on how to set up the board (and other boards in general), because I’ve been struggling to follow the online guides.

Aside from environment setup, I also worked on our RRT implementation. While we are almost done with a sparse version built on an octree, I thought of a problem with using this sparse version, and thus decided that we should have a dense version, represented simply using a 3D array/matrix. We will be implementing this in the coming week.