Status Report 2/17-2/23

Team report:

Changes to schedule:

We’re slightly behind on the hardware side of things because we only acquired our board this late this week, but our design has been simplified due to the use of an ARM core over a Microblaze and the Wi-Fi module being supported by the board rather than by us. This should cut some time out of interface bring-up and allow Ilan to help Edric a bit more with the pipeline design and implementation details.

As for the software side, Brandon’s slowly catching up after being very behind last week. He’s added the benchmarking to the Gantt chart for this week along with starting video frame/feed functionality.

Major project changes:

Possibility of full edge detection not being implemented in hardware due to limited number of DSP slices on the board. This would most likely mean we’d implement the first few stages in hardware, which are also the more computationally intense ones, and then move the data back to software. Once we implement the first stage and actually see how many DSP slices are actually used (as opposed to our theoretical calculations), we’ll know whether this change will happen or not.

 

Brandon:

For the second week of work on the project, we were able to clarify our project significantly from the prior week. We’ve settled on a security system implementation with canny edge detection, and this means that most of our communication protocol design will stay the same. Thus, I was able to spend my time this week actually working on relevant code for the UDP protocol. I drafted up a test server/client to ensure that I could get basic UDP functionality working, which I was able to as shown in the pictures below. Some technical challenges I was able to overcome include some bugs with the server code that involved some MSG flags that I wasn’t setting properly, along with some sizing issues with the char array that I was trying to send. The steps I took to accomplish this was a lot of online research and talking with some peers. Once I got this working, I was supposed to work on reconfiguring my code to accommodate video streams, but instead, since we have our design presentation this next week, I’m currently trying to benchmark some latency numbers for sending a 1280×720 video frame, so I’m designing a chunking/packing algo and trying to time it.

With this new task, I am now slightly behind schedule, but not as much as I was last week. I’ve caught up on the UDP functionality, but haven’t started video streaming functionality yet, which I was supposed to do this week. In order to catch up, I’m going to try to finish both benchmarking and start video streaming functionality this week. These are the deliverables I hope to complete in the next week.

 

Ilan

Personal accomplishments this week:

  • Decided on and acquired FPGA hardware. We’ll be using the Ultra96 with an ARM core and FPGA.
    • No significant challenges here
    • Steps involved mainly narrowing down between Zynq and Ultra96 due to ARM core, but Zynq was unavailable
    • Future work involves board bring-up and interface development
  • Researched steps of Canny edge detection in-depth with Edric, determined that it may not be computationally feasible to fully compute 2 simultaneous 720p streams on the FPGA due to limited number of DSP slices (360 slices total, so if 2 separate compute blocks would be used that would mean 180 slices/stream), or it may be quite tight to the 100 ms latency. Back of the envelope math shows it would take ~27 ms to do 3 convolutions (just in the first 2 out of 5 stages of the edge detection algorithm alone) in a pixel-by-pixel fashion, with each convolution using 9 DSP slices. If we want a pipelined design, each DSP slice will be dedicated to a stage, and so that alone allocates 27/180 slices for a single stream. This is something we’ll nail down once we implement the Gaussian filter, since that will require a convolution and will heavily inform how we implement the intensity gradient calculation (another 2 convolutions). At that point, we’ll have a definite answer as to under what timing condition we can fit the whole algorithm in the FPGA fabric.
    • Technical challenges met were lack of familiarity with the algorithm, and some gaps in understanding specific stages
    • Steps involved starting mainly by focusing on first 2 stages since these seem to be a significant portion of the algorithm in terms of computational time. Broke down computation performed by convolution based on DSP slices on FPGA and determined conservatively how we would use the DSP slice to determine frequency, computation time, etc.
    • Future work will be when Edric implements first stage and sees how convolution ultimately consumes DSP slices
  • Finalized interfacing design between ARM core and FPGA with Edric. We discussed all of the interfaces we’ll need and how those will work to allow the computation to be offloaded from the ARM core to the FPGA and read back once the computation has finished. We’ll section off a portion of DRAM for each stream, and use GPIO pins between the ARM core and PL to communicate status and control signals for the edge detection start/end. Since computing a matrix convolution efficiently means not overwriting the current data, we came up with 2 strategies for moving data between stages of the edge detection pipeline, 1 of which is our main strategy. Our main strategy is to allocate separate chunks in DRAM for each stage, so we can pipeline the design. This incurs more memory overhead, but based on our calculations it is feasible.
    • No significant technical challenges here
    • Steps were determine what interfaces we could use and what suited the application the best
    • Future work will be myself implementing these interfaces

Progress on schedule:

  • Edric and I made good progress on the edge detection pipeline design and interfacing, which is approximately on schedule.
  • We didn’t have the power cable for the Ultra96 and couldn’t find a matching one in Hamerschlag, so I couldn’t do any quick testing of the board, which is slightly behind where I wanted to be. However, our previous schedule and architecture was based on using a Microblaze core, and after adjusting our schedule based on our finalized board decision the schedule hasn’t been affected.

Deliverables next week:

  • Enabling basic ARM core and FPGA functionality
    • Unblocks Brandon’s development and testing on the server-side
    • Unblocks Edric to start flashing bitstreams onto the FPGA (not necessary for a while though, most designs will be simulated and only synthesized for timing and resource usage.
    • No expected risk/challenge here, these tasks are mainly focusing on getting very basic functionality working and making sure everything is usable and set up for the future when things become more fast-paced

Edric

This week, because we managed to decide on and get ahold of our FPGA we could begin some estimates. On the hardware end, no code has been written yet, but we’ve managed to flesh out a few aspects of our design:

  • Data coming from video streams will be placed in DRAM by the ARM core at a specified address
    • DRAM address space is split into segments, where each stream is allocated a chunk of the space
  • Once in memory, the ARM core will communicate to the fabric (compute blocks) that there is a frame ready for processing
    • Simple consumer-producer protocol: core will ping fabric that a frame is ready, as well as what address the start of the frame is located
  • When the frame is processed (and put back into DRAM), then the fabric will ping the ARM core that it is ready

Regarding the implementation of the Canny algorithm, there we’ve come across a few issues with respect to the actual implementation. It seems like we’ll need to do more work in trying to understand what operations are necessary, although focusing on the Gaussian filter for the time being seems reasonable.

We have, however designed the basic flow of a frame being processed (and how each step translates to its Canny algorithm step). This can be illustrated with the following diagram:

Each block represents a chunk of memory a copy of the frame (at each step) is located in DRAM. Unfortunately we can’t really edit the frame in-place, so this is how we’ll do it (for now).

Some foreseen challenges:

  • Still need to figure out how the Canny algorithm works
  • When there is both a frame done and a frame pending for processing, we’ll need to figure out a way to prevent deadlock between the producer (FPGA fabric) and consumer (ARM core)
    • Perhaps a FIFO is enough. Will need to give it more thought.

For the most part, we’re a bit behind schedule, but definitely in a better place than last week. The next steps are to get flashing lights on the Ultra96, look into the Canny algorithm more, and perhaps solidify more our testing suite for looking at the output from the FPGA.

Status Report (2/10 – 2/16)

Team report:

The most significant risks right now that could jeopardize the success of the project are a lack of clarity regarding design and requirements. Since our project has changed significantly over the past couple weeks, we are currently trying to re-establish clear requirements and design specs with the help of our TA, Zilei. We have other project ideas as contingency plans, but at this point, a pivot is pretty much out of the question, so we really have to make this idea work. Many changes were made to the existing design of the system, which were necessary to establish a use case for the project, but these changes do not significantly increase our bill of materials for the project, other than adding another Pi to our parts order. We are working on defining more requirements specific to our finalized use case of a scalable security camera system.

The other main risk is FPGA bring-up, which needs to be taken care of as soon as possible so we are confident that our platform is set up and our toolchain works. We will be starting to work on this most likely at the end of this week, and our goal is to avoid getting stuck by using some of the Altera/Xilinx demos and application notes to get over any bumps in the process. If we get stuck, some TAs have experience with both platforms and could offer some guidance as to how to get everything set up efficiently.

To more concretely define what our system will do, we’ve been spending the majority of this week researching algorithms that we feel comfortable implementing in an additive manner so we can visually confirm our progress, and we’ve found a good candidate to be Canny edge detection. The algorithm is nontrivial, but it is also a well-defined step-by-step algorithm that we can implement each piece on top of the previous after confirming functionality of the current pipeline. We are doing final research this weekend to finalize our algorithm, so that by Monday we can update our current design documentation. This will likely impact our schedule a bit, so we have adjusted that as necessary to accommodate the timeline for the algorithm. We will also be finalizing our decision to go with Wi-Fi or Ethernet based on how complex and troublesome getting Wi-Fi could be. Once these two decisions are finalized, we’ll be moving on to refining our block diagrams from our proposal presentation and having a much more fully defined project so we can begin the initial work. With the video algorithm decision made, we can start properly design our FPGA implementation, estimate development time for each part of the algorithm, determine how we will communicate data and control between programmable logic and the core on the board, etc. These are a lot of the unknowns that both we were aware of and that the TAs and professors brought up when discussing our project. We have updated our schedule accordingly after this week’s progress. Most tasks have been pushed out a bit due to the focus this week being on solidifying our project’s use case and processing functionality. Starting this week, we hope to be focusing on the actual tasks we have laid out for this project.

 

Brandon:

For the first week of work on the project, our team was very lost as to what direction to take with our project. After presenting our project proposal, we were met with pretty intense pushback, and several questions arose that we were unable to answer. I believe the biggest issue was that we had misunderstood the intentions of this project – while we thought it would be adequate to conduct an exploration of FPGA computational power, in reality, we’re required to have a clear use case that we are able to demonstrate. Thus, instead of working on our project this week, we spent all of our time discussing and brainstorming all the issues with our current idea, and considered pivoting toward a different project. After talking to various TAs and professors, we finally nailed down our project idea, but since it’s changed, we haven’t completed any tangible work yet on the actual project. This means that to be completely honest, I don’t have any progress on my part of the project. We’re still in the refinement stage.

Obviously, this means that I’m significantly behind schedule, and our team is also significantly behind schedule. I’m aware of this, and am prepared to invest a significant amount of time this upcoming week to catch up. I have to implement basic UDP functionality along with full video frame send functionality. These are the deliverables I hope to complete in the next week.

 

Ilan

This week my team and I were going over feedback from the proposal presentation, and ultimately we decided to stick with our current system architecture and refine the use case to target security camera systems. To make the video processing portion more concrete, I spent quite a bit of time researching different algorithms, analyzing their feasibility, and seeing what numbers are reasonable for different algorithms and platforms. Based on this research, a lot of machine learning algorithms are most likely too complex to fit in with the rest of our project and our skillset, but a more tangible computer vision algorithm is likely feasible. The main candidate I’ve found so far is Canny edge detection, which is used as a step to more complex CV/ML algorithms like object detection. As a result, I thought this would be a good candidate since it’s not extremely simple, but also does not bring with it the complexity of a neural network or another complex algorithm. Additionally, the algorithm has 5 main steps, which we could implement one on top of another as we progress with the rest of the project. This is good since none of us have strong backgrounds in machine learning, so having an algorithm that we would be able to visually inspect for correctness is beneficial.

In addition to researching the specific computer vision algorithm we will implement, I also looked at the FPGA boards we have available and did some research to determine what board would be the best choice for us to work with. To make it easier to implement the software baseline implementation, it might make more sense for us to use an SoC board rather than the original Virtex 7 + MicroBlaze architecture we originally intended in an effort to reduce bring-up time and unblock Brandon on the server-side implementation. I worked on comparing between the Zynq board and the DE10-Standard, but still haven’t found a clear reason to choose one over the other.

Due to our proposal presentation feedback causing us to reconsider and brainstorm a bit, we are slightly behind schedule, but I plan on acquiring hardware as soon as possible this week and finalizing the algorithm decision. Once that is done, I’ll be immediately moving to bring-up by running through some of the demos and documentation to get things working. That will be my focus for this week, so that we are sure that our hardware works and we have all of the toolchain and infrastructure set up when we need it in a few weeks. Hopefully bring-up will go relatively smoothly, and Edric and I can take a few days to properly design the programmable logic portion and the interfacing between the logic and the rest of the system.

Over the course of the next week, I plan on finalizing the video processing algorithm we will implement and updating our documentation to reflect this change accordingly. Additionally, I’ll be submitting a request for hardware and trying to set up everything (less of a deliverable, but more of a prerequisite for future deliverables).

 

Edric

<insert here>