ibiala – Team AB: Real-time security camera video processing

April 28, 2019April 28, 2019

Status Report (4/21 – 4/28)

Team Status Report

Changes to schedule:

No major changes at this time.

Major project changes:

To simplify our demo setup a bit, we’ll be using a laptop as the end device for displaying the results of our system.

Brandon

For this week on the project, I was able to make pretty decent progress overall. I found a resource on threading video capture and jpg frame sending, and was able to implement it to the point where it works reasonably well. Unfortunately, we were still having bandwidth issues, and were getting very low FPS numbers for the in lab demo on Wednesday. Thankfully, we figured out the issue, and after disabling the GUI, we were able to achieve ~10 FPS for the jpeg transmission. While this isn’t the 30 we set out to achieve, with the FPGA demonstrating a cap of 7 FPS, I think it should be fine at this point in our project. Additionally, Ilan was able to implement the memory storage function on the ARM core, so I’m just calling his function to store the pixel data into memory. Thus, I’ve almost completely finished my portion of the project, I just have to make sure the matplotlib method I’m using to display the video works, and refine it a bit. We spent the later part of the week working on the final presentation, so I plan on finishing display this upcoming week leading into the demo.

Ilan

Personal accomplishments this week:

Got full memory interfacing working with a sample Vivado IP block.
Worked with Edric to get Gauss and Sobel fully synthesized and ready to integrate. Took the Gauss IP block and put it in our design on Wednesday, but the result of the compute pipeline was an array of 0s instead of expected data, so we determined there were a few possible causes:
- Our compute block is not piping through control signals to its downstream consumers
- Data is not streaming in/out of our compute block properly

Both of these required the use of an ILA and JTAG, so I tried inserting the JTAG adapter to the Ultra96 but my SSH session hung. David Gronlund then mentioned to me that the Ultra96 PYNQ image is not properly configured for JTAG, so we’ve been working since Thursday afternoon to create an image that supports JTAG debugging. This is still a work in progress and is where almost all of my focus is devoted.

Finished up the Python script that will interface with PL and actually run images through the compute pipeline. I talked with Brandon and we have everything set up to combine those two scripts.
Worked with Brandon to get software-end-to-end FPS up to 10, which is significantly higher than the ½ FPS we were getting before!

Progress on schedule:

No updates, schedule is ending.

Deliverables next week:

Final demo.

April 21, 2019April 22, 2019

Status Report (4/14 – 4/20)

Team Status Report

Changes to schedule:

No major changes at this time.

Major project changes:

No major project changes at this time.

Brandon

For this week on the project, I was super busy with outside commitments that I wasn’t able to work as much as I hoped on the project. I’m still in the process of refining and visualizing my array transmission, and I plan to essentially limit our project to one camera Pi that will send the array to the ARM core, insert the data into memory, extract the analyzed data from memory, and send it to the monitoring room Pi, which will display using matplotlib’s imshow command. Hopefully I can get everything fully working except for the inserting/extracting data from memory by the demo on Wednesday. Ilan said he figured out a good way to interact with memory in the FPGA, so later this week/next week, we should be able to finish integration.

Ilan

Personal accomplishments this week:

Worked on getting memory interfacing working, but ran into segfaults when trying to access VDMA or other IP blocks. Found an example project that I was able to access the DMA (not VDMA) of and run fully, which is good. I’m going to compile this from scratch, ensure that it still works without any modifications, and then most likely modify to use a VDMA, ensure that it still works, etc. until I have the memory interface that we need.
Figured out how to easily access and run IP core-related functionality in Python and create contiguous arrays in Python that are suitable for DMA. Started creating the Python script that will do all of the memory interfacing for the accelerated system.

Progress on schedule:

No major updates, things are getting tight and it’s crunch time so there’s really no room for schedule changes at this point.

Deliverables next week:

Full memory interface ready for plugging in of compute pipeline.

April 14, 2019April 15, 2019

Status Report (4/7 – 4/13)

Team Status Report

Changes to schedule:

No major changes at this time.

Major project changes:

No major project changes at this time.

Brandon

For the eighth week of work on the project, I didn’t work on the project that much due to Carnival. I ran into a wall regarding the bandwidth issues from last week. We received the wifi antennas that we were hoping would fix the issues, but in initial tests, we were still strangely getting the same bandwidth as before. I tried to bring the Pis home to test it on a different network, and I ended up with the same results. Thus, without really knowing what to do, I decided to turn my attention to the array transmission portion of the project. I pivoted away from the H264 streams that we used in the interim demo, and I updated my code for sending arrays of pixel values across a socket. Based on the packet loss we were experiencing in the demo, I’ve thought about using TCP as the transmission protocol, but for now, I’ve implemented both TCP and UDP, and we’ll see how it goes. Essentially, where we are right now is that with time running out, we might just have to settle for the bandwidth issues and focus on integration so that we have a completed product by the final deadlines. I plan to continue troubleshooting the bandwidth issues this week along with fully testing my array transmission.

Ilan

Personal accomplishments this week:

Continued working on compute pipeline and implemented most of non-max suppression using HLS Windows. Had a bug that resulted in more suppressed pixels than what is expected.
Looked into HLS streams and VDMA for higher performance since using regular DMA adds more work.
Made some progress on memory interfacing, but still need to implement unit test and software side of interface.
Carnival – less work than expected during 2nd half of the week.

Progress on schedule:

Since I’ve been working with Edric, I’m still behind where I would like to be on the memory interface. I’m planning on going back to the memory interface on Monday, but I’ll likely still support Edric as necessary. I will be out on Wednesday to have a follow-up with a doctor, so I anticipate having the memory interface done on the 17th.

Deliverables next week:

Memory interface prototype using unit test to verify functionality (if possible), bug-free implementation of NMS.

February 20, 2019February 24, 2019

Status Report (2/10 – 2/16)

Team report:

The most significant risks right now that could jeopardize the success of the project are a lack of clarity regarding design and requirements. Since our project has changed significantly over the past couple weeks, we are currently trying to re-establish clear requirements and design specs with the help of our TA, Zilei. We have other project ideas as contingency plans, but at this point, a pivot is pretty much out of the question, so we really have to make this idea work. Many changes were made to the existing design of the system, which were necessary to establish a use case for the project, but these changes do not significantly increase our bill of materials for the project, other than adding another Pi to our parts order. We are working on defining more requirements specific to our finalized use case of a scalable security camera system.

The other main risk is FPGA bring-up, which needs to be taken care of as soon as possible so we are confident that our platform is set up and our toolchain works. We will be starting to work on this most likely at the end of this week, and our goal is to avoid getting stuck by using some of the Altera/Xilinx demos and application notes to get over any bumps in the process. If we get stuck, some TAs have experience with both platforms and could offer some guidance as to how to get everything set up efficiently.

To more concretely define what our system will do, we’ve been spending the majority of this week researching algorithms that we feel comfortable implementing in an additive manner so we can visually confirm our progress, and we’ve found a good candidate to be Canny edge detection. The algorithm is nontrivial, but it is also a well-defined step-by-step algorithm that we can implement each piece on top of the previous after confirming functionality of the current pipeline. We are doing final research this weekend to finalize our algorithm, so that by Monday we can update our current design documentation. This will likely impact our schedule a bit, so we have adjusted that as necessary to accommodate the timeline for the algorithm. We will also be finalizing our decision to go with Wi-Fi or Ethernet based on how complex and troublesome getting Wi-Fi could be. Once these two decisions are finalized, we’ll be moving on to refining our block diagrams from our proposal presentation and having a much more fully defined project so we can begin the initial work. With the video algorithm decision made, we can start properly design our FPGA implementation, estimate development time for each part of the algorithm, determine how we will communicate data and control between programmable logic and the core on the board, etc. These are a lot of the unknowns that both we were aware of and that the TAs and professors brought up when discussing our project. We have updated our schedule accordingly after this week’s progress. Most tasks have been pushed out a bit due to the focus this week being on solidifying our project’s use case and processing functionality. Starting this week, we hope to be focusing on the actual tasks we have laid out for this project.

Brandon:

For the first week of work on the project, our team was very lost as to what direction to take with our project. After presenting our project proposal, we were met with pretty intense pushback, and several questions arose that we were unable to answer. I believe the biggest issue was that we had misunderstood the intentions of this project – while we thought it would be adequate to conduct an exploration of FPGA computational power, in reality, we’re required to have a clear use case that we are able to demonstrate. Thus, instead of working on our project this week, we spent all of our time discussing and brainstorming all the issues with our current idea, and considered pivoting toward a different project. After talking to various TAs and professors, we finally nailed down our project idea, but since it’s changed, we haven’t completed any tangible work yet on the actual project. This means that to be completely honest, I don’t have any progress on my part of the project. We’re still in the refinement stage.

Obviously, this means that I’m significantly behind schedule, and our team is also significantly behind schedule. I’m aware of this, and am prepared to invest a significant amount of time this upcoming week to catch up. I have to implement basic UDP functionality along with full video frame send functionality. These are the deliverables I hope to complete in the next week.

Ilan

This week my team and I were going over feedback from the proposal presentation, and ultimately we decided to stick with our current system architecture and refine the use case to target security camera systems. To make the video processing portion more concrete, I spent quite a bit of time researching different algorithms, analyzing their feasibility, and seeing what numbers are reasonable for different algorithms and platforms. Based on this research, a lot of machine learning algorithms are most likely too complex to fit in with the rest of our project and our skillset, but a more tangible computer vision algorithm is likely feasible. The main candidate I’ve found so far is Canny edge detection, which is used as a step to more complex CV/ML algorithms like object detection. As a result, I thought this would be a good candidate since it’s not extremely simple, but also does not bring with it the complexity of a neural network or another complex algorithm. Additionally, the algorithm has 5 main steps, which we could implement one on top of another as we progress with the rest of the project. This is good since none of us have strong backgrounds in machine learning, so having an algorithm that we would be able to visually inspect for correctness is beneficial.

In addition to researching the specific computer vision algorithm we will implement, I also looked at the FPGA boards we have available and did some research to determine what board would be the best choice for us to work with. To make it easier to implement the software baseline implementation, it might make more sense for us to use an SoC board rather than the original Virtex 7 + MicroBlaze architecture we originally intended in an effort to reduce bring-up time and unblock Brandon on the server-side implementation. I worked on comparing between the Zynq board and the DE10-Standard, but still haven’t found a clear reason to choose one over the other.

Due to our proposal presentation feedback causing us to reconsider and brainstorm a bit, we are slightly behind schedule, but I plan on acquiring hardware as soon as possible this week and finalizing the algorithm decision. Once that is done, I’ll be immediately moving to bring-up by running through some of the demos and documentation to get things working. That will be my focus for this week, so that we are sure that our hardware works and we have all of the toolchain and infrastructure set up when we need it in a few weeks. Hopefully bring-up will go relatively smoothly, and Edric and I can take a few days to properly design the programmable logic portion and the interfacing between the logic and the rest of the system.

Over the course of the next week, I plan on finalizing the video processing algorithm we will implement and updating our documentation to reflect this change accordingly. Additionally, I’ll be submitting a request for hardware and trying to set up everything (less of a deliverable, but more of a prerequisite for future deliverables).

Edric