ekusuma – Team AB: Real-time security camera video processing

Team Status Report

Changes to schedule:

No changes, demo is on Monday.

Major project changes:

We’ve run into issues with JPEG streaming into the ARM core, which seems to be partially a result of network overhead as well as too much chunking of data, so we’re planning on looking into h.264 streams one last time since our current end-to-end FPS is around 1. On the hardware side, we have the final implementation for all the modules, but there is some stall around the gauss -> sobel or sobel -> NMS handshake. Ilan is working on debugging this using ILAs, and Edric is working on debugging this with the testbenches. We’ll be working through the night and for all of tomorrow to see if we can improve the software-limited framerate and get the full pipeline working.

Brandon

For this week on the project, we spent a lot of time on the final presentation and the poster. The time that I did spend on the project, I tried to increase FPS going into the demo, but that was mainly accomplished through Ilan’s private network and also added in a second pi for concurrent functionality. Since I was pretty much done with my part, not much else left for me to do other than some end to end testing and integration, and demo prep. Looking forward to finishing up with the demo and the final report and being done with the course!

Edric

Full pipeline implementation is done, and testing shows that everything works. Now it’s just a matter of hooking it up to the system, and also making further tweaks to the HLS pragmas to squeeze in some extra performance.

As stated in the presentation on Wednesday, some further optimizations are making better use of DSP slices and BRAM, as the reports show that this usage is extremely low. I’m still unsure about upping DSP usage, but I should be able to play with BRAM a bit.

Ilan

Personal accomplishments this week:

Tested end-to-end with 1 Pi with Brandon. The framerate was low and we didn’t have to time to diagnose what was the issue, but we were sitting in the back of the 240 lab so our Wi-Fi signal likely wasn’t as good as it could have been.
Testing private Wi-Fi network at home and found that with some configuration, updates, and tweaks I could get FPS of Pi directly to monitor laptop up to ~18 FPS, and I’m trying with the ARM core and the FPGA logic in the middle tonight (Saturday night) to see if we get better framerate.
Tested FPGA with 333 MHz clock, it fails a few timing paths but the fabric still works without any errors. 300 MHz meets timing, so I’ll see if we need the slightly higher clock once I put in all of Edric’s IP blocks.
Creating full FPGA pipeline on Sunday now that Edric has finished the full HLS implementation.

Tweaked Pythons script that will interface with PL

Progress on schedule:

No updates, schedule is ending.

Deliverables next week:

Final demo.

April 7, 2019April 7, 2019

Status Report (3/31 – 4/06)

Team Report

Changes to schedule:

No major changes at this time.

Major project changes:

As Edric and Ilan realized with the later stesp of Canny edge detection, there are numerous parameters and slight implementation details that affect the overall result. As such, comparing against a reference implementation is likely infeasible since even a small deviation will result in a different result. We will likely plan on eyeballing the result to determine how good it is compared to a reference implementation. We’ve also ordered Wi-Fi adapters and will test with these adapters on Monday.

Brandon

For the seventh week of work on the project, I spent a lot of time working through the video sending across the Pis through the ARM core on the FPGA. As I mentioned in my previous status report, we originally intended to send the video as raw grayscale arrays, but the bandwidth we were achieving didn’t allow for that. Thus, I spent a decent amount of time figuring out how to send the feed using an H264 compressed stream. Fortunately, I was able to get it somewhat functional by the demo on Monday, and we were able to stream video from one Pi to another Pi with some delay. We were also able to send the video through the ARM core, but in doing so, we experienced significant packet loss. The struggle then is to both fix the lag/delay and convert the H264 stream into parseable arrays, such that I can store pixel values into memory on the FPGA, convert those arrays back to an H264 stream, and send this to the monitor room Pi, but this step is extremely unclear and I can’t really find any material to help me solve this problem. Thus, after talking to the other security camera group about their implementation, I’ve decided to try yet another implementation that utilizes OpenCV to extract the arrays, send them to the FPGA, store the data in memory, receive the results, and send them to the monitor room Pi to be displayed. The biggest issue that I think we’ll run into with this method is again the delay/lag from actual video recording to viewing, but hopefully the wifi antennas we ordered will help with the bandwidth issues.

Edric

This past week we made a good deal of headway into HLS. We know that our implementation of the Gaussian blur and Sobel filter are 1:1 with OpenCV’s. Unfortunately we do not meet our performance specification, so work remains on that front. After analyzing HLS’s synthesis report, the main bottlenecks are memory reads and to some extent floating point operations. The latter is hard to get around, but there is room for improvement in the former. Ilan looked into HLS’s Window object, which apparently plays more nicely with memory accesses than our current random-ish access pattern. We’ll play around with windows and see if we get a performance boost.

This week we’ll be moving forward with the rest of the algorithm’s steps. One challenge we foresee is testing. Before we would do a pixel-by-pixel comparison with OpenCV’s function, however because there is room for modifications in the rest of Canny, it’s going to be difficult to have a clear cut reference image, so we’ll likely have to go by eye from here. Apart from this, we’ll also play with the aforementioned HLS windowing to squeeze out some performance.

Ilan

Personal accomplishments this week:

Had the demo on Monday. Got the Sobel filter step working just before demo, which was very good to show more progress. Edric and I worked a little bit on performance, but at this point we’re going to push forward with the final steps of the implementation before trying to optimize and achieve the numbers we need to. I looked into HLS Windows, which map extremely well to image processing, and this should help us. HLS LineBuffers will also likely help improve performance.
Continued to work with Edric on the compute pipeline and figured out how to implement the rest of the steps of the algorithm. Determined that using HLS Windows will make everything much more understandable as well, so we started using that for the non-max suppression step and will likely go back and convert the previous steps to use Windows once we finish the pipeline.
Ethics discussion and Eberly Center reflection took away some of our scheduled lab time this week.

Progress on schedule:

Since I’ve been working with Edric, I’m still behind where I would like to be on the memory interface. I’m planning on going back to the memory interface on Monday, but I’ll likely still support Edric as necessary. I will be out on Wednesday to have a follow-up with a doctor, so I anticipate having the memory interface done on the 17th.

Deliverables next week:

Memory interface prototype using unit test to verify functionality (if possible), implementation of NMS and thresholding steps (mostly Edric, but I will support as necessary).

March 24, 2019March 24, 2019

Status Report (3/10 – 3/23)

Team Status Report

Changes to schedule:

We anticipate shifting back our software-side timeline a bit since we were not able to get all the setup taken care of this week after receiving the Pis that we ordered. Since we ordered the incorrect USB cables, we will have to push back the Pi setup by about a week. Hopefully we can move quickly through the camera setup to make up for this, but regardless we have some slack in our schedule for problems like this.

Major project changes:

We don’t have any major project changes at this point.

Brandon

3/17-3/23

For the fifth week of work on the project, I tried to get the Raspberry Pis booted and configured. After doing a lot of research about setting up WiFi on the Pis, I determined that the best way to set them up and boot them up would be to SSH through USB to obtain the MAC address in order to register the Pis with the CMU DEVICE network. Once I figured this out, though, I realized that we actually had ordered the wrong USB cables (male micro to female usb instead of male to male). Thus, I had to place another order for the correct USB cables, which will hopefully come this next week. For the second half of the week, I was traveling to Seattle for family reasons, so I wasn’t able to work much on the project.

This ordering mistake has set me back slightly in terms of schedule, but hopefully I’ll be able to move quickly through the rest of the Pi setup once I’m able to SSH in. I hope to be able to achieve basic camera functionality on the Pi next week.

Edric

Over break, no work was done. This week, we’ve begun looking into the tools for implementing the edge detection pipeline. At the moment, Vivado’s High Level Synthesis (HLS) tool is very enticing, as a lot of the complex mathematical functions are available should we decide to go down this route. Unfortunately, setting up and configuring HLS is proving to be quite difficult. I’m not entirely sure if it will pan out, so next week I’d like to start developing Plan B, which is to just crank out the Verilog for the pipeline. If HLS works, fantastic. The algorithm can be done with only a few lines of code. If it doesn’t, hopefully Plan B will be an adequate substitute.

Ilan

Personal accomplishments this week:

Switched to Xilinx PYNQ boot image and got programming of the FPGA working successfully and using a simple setup with starter scripts as a base.
- This will allow Edric and me to very easily program both the ARM core and the FPGA fabric.
- Mostly tested and interactively did programming of FPGA, so I will need to create a script that will automate this for us to prevent any issues in the future.
Experimented with HLS, and decided to use HLS for memory interface verification
- HLS interacts very easily with AXI, which is the memory interfacing method we’ll be using to connect PS and PL. HLS will also reduce total verification time since I’m very familiar with C and do not have to worry about implementing RTL for AXI.
Started working on memory interfacing between PS and PL. I did some research and started putting together the block design for the memory interface between PS and PL, and plan on finishing this up over the course of the next week. I’ll also be implementing an interface-level test that will instantiate a mock image with random data in PS, DMA the memory into PL, have PL increment each element by 1 using an HLS module that I will write (and unit test), and the DMA the result back into PS. PS will then be able to compare and verify the returned result. This will give us a good amount of confidence in the implementation considering that it accurately represents the interface-level communication that will occur in our final implementation. I’ll also be targeting a 375 MHz clock to start – I don’t think the memory interface will be the limiting factor, but this is already around the frequency that we want to target for our whole design. I’d rather push the frequency a bit higher than the overall design to start so that we are aware of its limitations in case we need to clock the design higher to meet latency requirements or to reduce overall DSP usage.

Progress on schedule:

I wasn’t able to do any work over spring break other than reading about HLS since I had surgery and I came back to Pittsburgh late to allow more time for my recovery. I am slightly behind where I’d like to be, but I will be trying to catch up during the second half of next week.

Deliverables next week:

Memory interface prototype using unit test to verify functionality.
Continue improving toolchain and infrastructure as necessary (mainly scripting FPGA programming).

March 9, 2019March 10, 2019

Status Report (3/3 – 3/9)

Team Report

Changes to schedule:

We don’t have any current changes to our schedule.

Major project changes:

We don’t have any major project changes at this point.

Brandon

For the fourth week of work on the project, I focused on the video streaming/sending functionality. Unfortunately, I had to redo my order form, and add in a bunch of other stuff (power cables, sd cards, etc…), so we didn’t get our materials this week. This pushed back a lot of what I planned to do, since I didn’t have access to the Pis. Regardless, I was able to work on the sending of 2D arrays, along with converting the frame from a 3D RGB array of pixels to a 2D grayscale array using numpy tools. Here is the process from our design document:

We plan on sending 1280×720 grayscale frames across UDP. The Raspberry Pi will capture the frame as an RGB array, which we will convert into a grayscale array. Each frame then contains 921,600 pixels, which are each represented by one byte, as a grayscale image can represent pixels at 8 bpp (bits per pixel). This results in a total of 921,600 total bytes. These bytes will be stored in a 2-dimensional array with the row and column of the pixel as the indices. Since we can’t send the entire array in one packet over UDP, we will tentatively plan to send each row separately, resulting in 720 bytes per packet, and reconstruct the array on the receiving end.

Once I pick up the Pis from Quinn, I’ll be able to truly start working on the video capture part of the project, constructing the camera packs and using the Pis to record video, which will be converted to grayscale and hopefully displayed on a monitor. Once I get that working, I can then begin sending these arrays over to the FPGA ARM core for processing.

I’m still slightly behind schedule, but again, I plan on working a bit over Spring Break this upcoming week (even though I wasn’t originally planning to) to catch up. Once we get our materials from Quinn, everything should accelerate nicely. The deliverables I hope to achieve throughout the next two weeks include actually demonstrating the RGB -> grayscale transition on a visual output, along with acquiring and orienting with the materials.

Edric

This week, we got our design report document finished. It was a good opportunity to see where our planning is lacking. As a result, the higher-level decisions for our system architecture is now finished.

I worked with Ilan to get some preliminary numbers down. We now have a reasonable estimate on how many resources (in terms of DSP slices) a compute block will take based on the number of multiplications and adds each phase the Canny edge detection algorithm takes. Using an approximate pipeline design and a clock frequency from the Ultra96’s datasheet, we now have an estimate on how long a frame will take to process, which came down to about 15ms. The next step is to start on the Gaussian blur implementation.

As for Ultra96 things, now that we have a power supply we can start playing with it. We’ve been using their guide on getting a Vivado project for the U96 running, and Ilan is going to try to get some flashing lights on the board over break.

One concern I have at the moment is flashing designs on to the board. To flash over USB we need an adapter, but apparently there is a possibility of doing so via the ARM core. More investigation is warranted.

I think we’re decently on schedule. Once we get back from break we should be able to begin actual implementation.

Ilan

Personal accomplishments this week:

Finalized target clock frequency and DSP slice allocation.
- Was tricky since we didn’t fully understand all of the phases of the algorithm at the beginning, but just required more research to better understand the computation pattern and how many DSP slices are necessary.
- Future work will be if we see that we need more DSP slices, we’ll need to pull from the reserve 52/180 per stream.
Finished design review documentation
- Big focus by Edric and myself on getting quantifiable numbers around everything, including target clock frequency and DSP slice allocation above.
- Better diagramming different parts of the system and making sure our design is fully understood by both of us.
Continued working on bringing up FPGA and ARM core. Still working on finalizing infrastructure and toolchain so it works for all 3 of us.
- Part of this will be seeing how HLS fits in and how easy it is for us to use.
  - I’ll be looking into this over spring break.

Progress on schedule:

Schedule is on target, and I will be trying to do a little bit of work over spring break to get us some more slack during the second half of the semester

Deliverables next week:

Finish enabling ARM core and FPGA functionality and pushing toolchain up to GitHub and documenting setup.
Get infrastructure working.