April 2019 – Team AB: Real-time security camera video processing

April 28, 2019April 28, 2019

Status Report (4/21 – 4/28)

Team Status Report

Changes to schedule:

No major changes at this time.

Major project changes:

To simplify our demo setup a bit, we’ll be using a laptop as the end device for displaying the results of our system.

Brandon

For this week on the project, I was able to make pretty decent progress overall. I found a resource on threading video capture and jpg frame sending, and was able to implement it to the point where it works reasonably well. Unfortunately, we were still having bandwidth issues, and were getting very low FPS numbers for the in lab demo on Wednesday. Thankfully, we figured out the issue, and after disabling the GUI, we were able to achieve ~10 FPS for the jpeg transmission. While this isn’t the 30 we set out to achieve, with the FPGA demonstrating a cap of 7 FPS, I think it should be fine at this point in our project. Additionally, Ilan was able to implement the memory storage function on the ARM core, so I’m just calling his function to store the pixel data into memory. Thus, I’ve almost completely finished my portion of the project, I just have to make sure the matplotlib method I’m using to display the video works, and refine it a bit. We spent the later part of the week working on the final presentation, so I plan on finishing display this upcoming week leading into the demo.

Ilan

Personal accomplishments this week:

Got full memory interfacing working with a sample Vivado IP block.
Worked with Edric to get Gauss and Sobel fully synthesized and ready to integrate. Took the Gauss IP block and put it in our design on Wednesday, but the result of the compute pipeline was an array of 0s instead of expected data, so we determined there were a few possible causes:
- Our compute block is not piping through control signals to its downstream consumers
- Data is not streaming in/out of our compute block properly

Both of these required the use of an ILA and JTAG, so I tried inserting the JTAG adapter to the Ultra96 but my SSH session hung. David Gronlund then mentioned to me that the Ultra96 PYNQ image is not properly configured for JTAG, so we’ve been working since Thursday afternoon to create an image that supports JTAG debugging. This is still a work in progress and is where almost all of my focus is devoted.

Finished up the Python script that will interface with PL and actually run images through the compute pipeline. I talked with Brandon and we have everything set up to combine those two scripts.
Worked with Brandon to get software-end-to-end FPS up to 10, which is significantly higher than the ½ FPS we were getting before!

Progress on schedule:

No updates, schedule is ending.

Deliverables next week:

Final demo.

April 21, 2019April 22, 2019

Status Report (4/14 – 4/20)

Team Status Report

Changes to schedule:

No major changes at this time.

Major project changes:

No major project changes at this time.

Brandon

For this week on the project, I was super busy with outside commitments that I wasn’t able to work as much as I hoped on the project. I’m still in the process of refining and visualizing my array transmission, and I plan to essentially limit our project to one camera Pi that will send the array to the ARM core, insert the data into memory, extract the analyzed data from memory, and send it to the monitoring room Pi, which will display using matplotlib’s imshow command. Hopefully I can get everything fully working except for the inserting/extracting data from memory by the demo on Wednesday. Ilan said he figured out a good way to interact with memory in the FPGA, so later this week/next week, we should be able to finish integration.

Ilan

Personal accomplishments this week:

Worked on getting memory interfacing working, but ran into segfaults when trying to access VDMA or other IP blocks. Found an example project that I was able to access the DMA (not VDMA) of and run fully, which is good. I’m going to compile this from scratch, ensure that it still works without any modifications, and then most likely modify to use a VDMA, ensure that it still works, etc. until I have the memory interface that we need.
Figured out how to easily access and run IP core-related functionality in Python and create contiguous arrays in Python that are suitable for DMA. Started creating the Python script that will do all of the memory interfacing for the accelerated system.

Progress on schedule:

No major updates, things are getting tight and it’s crunch time so there’s really no room for schedule changes at this point.

Deliverables next week:

Full memory interface ready for plugging in of compute pipeline.

April 14, 2019April 15, 2019

Status Report (4/7 – 4/13)

Team Status Report

Changes to schedule:

No major changes at this time.

Major project changes:

No major project changes at this time.

Brandon

For the eighth week of work on the project, I didn’t work on the project that much due to Carnival. I ran into a wall regarding the bandwidth issues from last week. We received the wifi antennas that we were hoping would fix the issues, but in initial tests, we were still strangely getting the same bandwidth as before. I tried to bring the Pis home to test it on a different network, and I ended up with the same results. Thus, without really knowing what to do, I decided to turn my attention to the array transmission portion of the project. I pivoted away from the H264 streams that we used in the interim demo, and I updated my code for sending arrays of pixel values across a socket. Based on the packet loss we were experiencing in the demo, I’ve thought about using TCP as the transmission protocol, but for now, I’ve implemented both TCP and UDP, and we’ll see how it goes. Essentially, where we are right now is that with time running out, we might just have to settle for the bandwidth issues and focus on integration so that we have a completed product by the final deadlines. I plan to continue troubleshooting the bandwidth issues this week along with fully testing my array transmission.

Ilan

Personal accomplishments this week:

Continued working on compute pipeline and implemented most of non-max suppression using HLS Windows. Had a bug that resulted in more suppressed pixels than what is expected.
Looked into HLS streams and VDMA for higher performance since using regular DMA adds more work.
Made some progress on memory interfacing, but still need to implement unit test and software side of interface.
Carnival – less work than expected during 2nd half of the week.

Progress on schedule:

Since I’ve been working with Edric, I’m still behind where I would like to be on the memory interface. I’m planning on going back to the memory interface on Monday, but I’ll likely still support Edric as necessary. I will be out on Wednesday to have a follow-up with a doctor, so I anticipate having the memory interface done on the 17th.

Deliverables next week:

Memory interface prototype using unit test to verify functionality (if possible), bug-free implementation of NMS.

April 7, 2019April 7, 2019

Status Report (3/31 – 4/06)

Team Report

Changes to schedule:

No major changes at this time.

Major project changes:

As Edric and Ilan realized with the later stesp of Canny edge detection, there are numerous parameters and slight implementation details that affect the overall result. As such, comparing against a reference implementation is likely infeasible since even a small deviation will result in a different result. We will likely plan on eyeballing the result to determine how good it is compared to a reference implementation. We’ve also ordered Wi-Fi adapters and will test with these adapters on Monday.

Brandon

For the seventh week of work on the project, I spent a lot of time working through the video sending across the Pis through the ARM core on the FPGA. As I mentioned in my previous status report, we originally intended to send the video as raw grayscale arrays, but the bandwidth we were achieving didn’t allow for that. Thus, I spent a decent amount of time figuring out how to send the feed using an H264 compressed stream. Fortunately, I was able to get it somewhat functional by the demo on Monday, and we were able to stream video from one Pi to another Pi with some delay. We were also able to send the video through the ARM core, but in doing so, we experienced significant packet loss. The struggle then is to both fix the lag/delay and convert the H264 stream into parseable arrays, such that I can store pixel values into memory on the FPGA, convert those arrays back to an H264 stream, and send this to the monitor room Pi, but this step is extremely unclear and I can’t really find any material to help me solve this problem. Thus, after talking to the other security camera group about their implementation, I’ve decided to try yet another implementation that utilizes OpenCV to extract the arrays, send them to the FPGA, store the data in memory, receive the results, and send them to the monitor room Pi to be displayed. The biggest issue that I think we’ll run into with this method is again the delay/lag from actual video recording to viewing, but hopefully the wifi antennas we ordered will help with the bandwidth issues.

Edric

This past week we made a good deal of headway into HLS. We know that our implementation of the Gaussian blur and Sobel filter are 1:1 with OpenCV’s. Unfortunately we do not meet our performance specification, so work remains on that front. After analyzing HLS’s synthesis report, the main bottlenecks are memory reads and to some extent floating point operations. The latter is hard to get around, but there is room for improvement in the former. Ilan looked into HLS’s Window object, which apparently plays more nicely with memory accesses than our current random-ish access pattern. We’ll play around with windows and see if we get a performance boost.

This week we’ll be moving forward with the rest of the algorithm’s steps. One challenge we foresee is testing. Before we would do a pixel-by-pixel comparison with OpenCV’s function, however because there is room for modifications in the rest of Canny, it’s going to be difficult to have a clear cut reference image, so we’ll likely have to go by eye from here. Apart from this, we’ll also play with the aforementioned HLS windowing to squeeze out some performance.

Ilan

Personal accomplishments this week:

Had the demo on Monday. Got the Sobel filter step working just before demo, which was very good to show more progress. Edric and I worked a little bit on performance, but at this point we’re going to push forward with the final steps of the implementation before trying to optimize and achieve the numbers we need to. I looked into HLS Windows, which map extremely well to image processing, and this should help us. HLS LineBuffers will also likely help improve performance.
Continued to work with Edric on the compute pipeline and figured out how to implement the rest of the steps of the algorithm. Determined that using HLS Windows will make everything much more understandable as well, so we started using that for the non-max suppression step and will likely go back and convert the previous steps to use Windows once we finish the pipeline.
Ethics discussion and Eberly Center reflection took away some of our scheduled lab time this week.

Progress on schedule:

Since I’ve been working with Edric, I’m still behind where I would like to be on the memory interface. I’m planning on going back to the memory interface on Monday, but I’ll likely still support Edric as necessary. I will be out on Wednesday to have a follow-up with a doctor, so I anticipate having the memory interface done on the 17th.

Deliverables next week:

Memory interface prototype using unit test to verify functionality (if possible), implementation of NMS and thresholding steps (mostly Edric, but I will support as necessary).