Status Report (3/24 – 3/30)

Team

Changes to schedule:

Ilan is extending his memory interface work by at least 1 week.

Brandon is extending his sending video stream over sockets work by at least 1 week.

Edric is pushing back the edge detection implementation by at least 1 week.

Major project changes:

Our Wi-Fi quality issues have posed a problem that we intend to temporarily circumvent by lowering the video stream bitrate. Once we have more of the project’s functionality working, we’ll try to look back at the Wi-Fi quality issues so we can increase the bitrate.

On the compute side, we are basically decided on moving forward with Vivado’s HLS tool.

Brandon

For the sixth week of work on the project, I was able to successfully boot up and configure the Pis! It took me a decent amount of extra work outside of lab, but after receiving the correct USB cables, I was able to boot into the Pis and connect to the CMU DEVICE network. To simplify usage, we’re just using mini HDMI cables to navigate through the Pis with a monitor rather than SSHing in. After I finished initial setup, I moved on to camera functionality and networking over UDP. I was able to display a video feed from the camera, and convert a video frame to an RGB array, and then to a grayscale array, but when I began working on the networking portion of the project, I ran into some issues. The biggest issue is that we are achieving a significantly lower bandwidth than expected for the Pis (~5 Mbps) and for the ARM core (~20 Mbps). Thus, we made the decision to revert back to my original plan, which was utilizing H264 compression to match the appropriate bandwidth for the Pis. Unfortunately, we haven’t yet been able to send the video over the network using UDP, but we plan on working throughout the weekend to hopefully be ready for our interim demo by Monday.

The Pi setup completion was a big step in the right direction in terms of our scheduling, but this new bandwidth issue that’s preventing us from sending the video is worrying. However, if we’re able to successfully send the video stream over the network by the demo, we will definitely be right on schedule if not ahead of schedule.

Ilan

Personal accomplishments this week:

Did some testing of Wi-Fi on ARM core
- Had to configure Wi-Fi to re-enable on boot, since it kept turning off. Also saw some slowness and freezing over SSH, which is a concern once we start using the ARM core for more intense processing.
- Found that current bandwidth is ~20 Mbps, which is too low for what we need. Initially we’re going to try to lower the bitrate as a temporary way to keep moving forward, and later we’ll try either changing the driver or looking into other tweaks or possibly ordering an antenna to get better performance.
Continued to work on memory interface, but wasn’t able to get full setup finalized. Going to work on this more tomorrow (3/31) to have something for the demo, but I focused on helping Brandon and Edric starting Wednesday so we have more tangible and visual results for our demo. Brandon and I worked on getting the Pis up and running, and I helped him out with some of the initial camera setup. I also looked into and set him up with a start on how to get a lower bitrate out of the camera so we can still send video over Wi-Fi and how to pipe it into UDP connections in Python. I helped Edric set up HLS and get started on the actual implementation of the Gaussian filter. We were able to get an implementation working and Edric is going to do more tweaking to improve performance. Tomorrow (3/31), he and I are going to work to try to connect the Gaussian and the Intensity gradient blocks (we’re going to try to implement this tomorrow beforehand) and then I’ll continue working on the memory interface. The memory interface’s PL input is defined by Edric’s final Gaussian filter input needs, so my work will change a bit and so I’ve reprioritized to help him finalize first.

Progress on schedule:

I’m a little behind where I would like to be, and the Wi-Fi issues we’ve experienced on both the ARM core have been a bit of a setback. My goal toward the second half of the week was to help Brandon and Edric so we can have more of the functional part of the system ready for the demo. I’ll likely be extending my schedule at least 1 week to finalize the memory interface between PS and PL.

Deliverables next week:

Memory interface prototype using unit test to verify functionality.

Edric

After researching different possibilities for implementing the Canny algorithm, I’ve decided on going through with Vivado’s High Level Synthesis (HLS) tools. The motivation for this decision is while the initial stages (simple 2D convolution for Gaussian filters) isn’t particularly intense in hard Verilog, the later steps involving trigonometry will prove to be more complex. HLS will allow us to keep the actual algorithm simple, yet customizable enough via HLS’s pragmas.

So far I have an implementation for the Gaussian blur which both simulates and synthesizes to a Zynq IP block. Preliminary analysis shows that the latency is quite high, but the DSP slices used is quite minimal. More tweaking will have to be done to lower the latency, however since current testing is done on 1080p images, lowering this down to the target 720p will definitely make up for the majority of the speedup.

For the demo, I aim to implement the next two stages of Canny (applying the Sobel filter for both the X and Y domain, then combining the two). Along with this I’d like to see if I can get a software benchmark to compare the HLS output with (ideally something that is done using a library like OpenCV). Thankfully using HLS gives us access to a simulator which we can use to compare images.

I’m a little behind with regards to the actual implementation of Canny, but now that HLS is (kind of) working the implementation in terms of code will be quite easy. The difficult part will be configuring the pragmas to get the compute blocks to meet our requirements.

March 24, 2019March 24, 2019

Status Report (3/10 – 3/23)

Team Status Report

Changes to schedule:

We anticipate shifting back our software-side timeline a bit since we were not able to get all the setup taken care of this week after receiving the Pis that we ordered. Since we ordered the incorrect USB cables, we will have to push back the Pi setup by about a week. Hopefully we can move quickly through the camera setup to make up for this, but regardless we have some slack in our schedule for problems like this.

Major project changes:

We don’t have any major project changes at this point.

Brandon

3/17-3/23

For the fifth week of work on the project, I tried to get the Raspberry Pis booted and configured. After doing a lot of research about setting up WiFi on the Pis, I determined that the best way to set them up and boot them up would be to SSH through USB to obtain the MAC address in order to register the Pis with the CMU DEVICE network. Once I figured this out, though, I realized that we actually had ordered the wrong USB cables (male micro to female usb instead of male to male). Thus, I had to place another order for the correct USB cables, which will hopefully come this next week. For the second half of the week, I was traveling to Seattle for family reasons, so I wasn’t able to work much on the project.

This ordering mistake has set me back slightly in terms of schedule, but hopefully I’ll be able to move quickly through the rest of the Pi setup once I’m able to SSH in. I hope to be able to achieve basic camera functionality on the Pi next week.

Edric

Over break, no work was done. This week, we’ve begun looking into the tools for implementing the edge detection pipeline. At the moment, Vivado’s High Level Synthesis (HLS) tool is very enticing, as a lot of the complex mathematical functions are available should we decide to go down this route. Unfortunately, setting up and configuring HLS is proving to be quite difficult. I’m not entirely sure if it will pan out, so next week I’d like to start developing Plan B, which is to just crank out the Verilog for the pipeline. If HLS works, fantastic. The algorithm can be done with only a few lines of code. If it doesn’t, hopefully Plan B will be an adequate substitute.

Ilan

Personal accomplishments this week:

Switched to Xilinx PYNQ boot image and got programming of the FPGA working successfully and using a simple setup with starter scripts as a base.
- This will allow Edric and me to very easily program both the ARM core and the FPGA fabric.
- Mostly tested and interactively did programming of FPGA, so I will need to create a script that will automate this for us to prevent any issues in the future.
Experimented with HLS, and decided to use HLS for memory interface verification
- HLS interacts very easily with AXI, which is the memory interfacing method we’ll be using to connect PS and PL. HLS will also reduce total verification time since I’m very familiar with C and do not have to worry about implementing RTL for AXI.
Started working on memory interfacing between PS and PL. I did some research and started putting together the block design for the memory interface between PS and PL, and plan on finishing this up over the course of the next week. I’ll also be implementing an interface-level test that will instantiate a mock image with random data in PS, DMA the memory into PL, have PL increment each element by 1 using an HLS module that I will write (and unit test), and the DMA the result back into PS. PS will then be able to compare and verify the returned result. This will give us a good amount of confidence in the implementation considering that it accurately represents the interface-level communication that will occur in our final implementation. I’ll also be targeting a 375 MHz clock to start – I don’t think the memory interface will be the limiting factor, but this is already around the frequency that we want to target for our whole design. I’d rather push the frequency a bit higher than the overall design to start so that we are aware of its limitations in case we need to clock the design higher to meet latency requirements or to reduce overall DSP usage.

Progress on schedule:

I wasn’t able to do any work over spring break other than reading about HLS since I had surgery and I came back to Pittsburgh late to allow more time for my recovery. I am slightly behind where I’d like to be, but I will be trying to catch up during the second half of next week.

Deliverables next week:

Memory interface prototype using unit test to verify functionality.
Continue improving toolchain and infrastructure as necessary (mainly scripting FPGA programming).

March 9, 2019March 10, 2019

Status Report (3/3 – 3/9)

Team Report

Changes to schedule:

We don’t have any current changes to our schedule.

Major project changes:

We don’t have any major project changes at this point.

Brandon

For the fourth week of work on the project, I focused on the video streaming/sending functionality. Unfortunately, I had to redo my order form, and add in a bunch of other stuff (power cables, sd cards, etc…), so we didn’t get our materials this week. This pushed back a lot of what I planned to do, since I didn’t have access to the Pis. Regardless, I was able to work on the sending of 2D arrays, along with converting the frame from a 3D RGB array of pixels to a 2D grayscale array using numpy tools. Here is the process from our design document:

We plan on sending 1280×720 grayscale frames across UDP. The Raspberry Pi will capture the frame as an RGB array, which we will convert into a grayscale array. Each frame then contains 921,600 pixels, which are each represented by one byte, as a grayscale image can represent pixels at 8 bpp (bits per pixel). This results in a total of 921,600 total bytes. These bytes will be stored in a 2-dimensional array with the row and column of the pixel as the indices. Since we can’t send the entire array in one packet over UDP, we will tentatively plan to send each row separately, resulting in 720 bytes per packet, and reconstruct the array on the receiving end.

Once I pick up the Pis from Quinn, I’ll be able to truly start working on the video capture part of the project, constructing the camera packs and using the Pis to record video, which will be converted to grayscale and hopefully displayed on a monitor. Once I get that working, I can then begin sending these arrays over to the FPGA ARM core for processing.

I’m still slightly behind schedule, but again, I plan on working a bit over Spring Break this upcoming week (even though I wasn’t originally planning to) to catch up. Once we get our materials from Quinn, everything should accelerate nicely. The deliverables I hope to achieve throughout the next two weeks include actually demonstrating the RGB -> grayscale transition on a visual output, along with acquiring and orienting with the materials.

Edric

This week, we got our design report document finished. It was a good opportunity to see where our planning is lacking. As a result, the higher-level decisions for our system architecture is now finished.

I worked with Ilan to get some preliminary numbers down. We now have a reasonable estimate on how many resources (in terms of DSP slices) a compute block will take based on the number of multiplications and adds each phase the Canny edge detection algorithm takes. Using an approximate pipeline design and a clock frequency from the Ultra96’s datasheet, we now have an estimate on how long a frame will take to process, which came down to about 15ms. The next step is to start on the Gaussian blur implementation.

As for Ultra96 things, now that we have a power supply we can start playing with it. We’ve been using their guide on getting a Vivado project for the U96 running, and Ilan is going to try to get some flashing lights on the board over break.

One concern I have at the moment is flashing designs on to the board. To flash over USB we need an adapter, but apparently there is a possibility of doing so via the ARM core. More investigation is warranted.

I think we’re decently on schedule. Once we get back from break we should be able to begin actual implementation.

Ilan

Personal accomplishments this week:

Finalized target clock frequency and DSP slice allocation.
- Was tricky since we didn’t fully understand all of the phases of the algorithm at the beginning, but just required more research to better understand the computation pattern and how many DSP slices are necessary.
- Future work will be if we see that we need more DSP slices, we’ll need to pull from the reserve 52/180 per stream.
Finished design review documentation
- Big focus by Edric and myself on getting quantifiable numbers around everything, including target clock frequency and DSP slice allocation above.
- Better diagramming different parts of the system and making sure our design is fully understood by both of us.
Continued working on bringing up FPGA and ARM core. Still working on finalizing infrastructure and toolchain so it works for all 3 of us.
- Part of this will be seeing how HLS fits in and how easy it is for us to use.
  - I’ll be looking into this over spring break.

Progress on schedule:

Schedule is on target, and I will be trying to do a little bit of work over spring break to get us some more slack during the second half of the semester

Deliverables next week:

Finish enabling ARM core and FPGA functionality and pushing toolchain up to GitHub and documenting setup.
Get infrastructure working.

March 3, 2019March 3, 2019

Status Report (2/24-3/2)

Team Report

Changes to schedule:

We’re catching up and for the most part maintaining the pace that we set over the past few weeks. We accounted for a reasonable amount of time spent this week towards the design reviews, so we don’t have any current changes to our schedule.

Major project changes:

At this point we don’t have any major project changes since we’ve just finalized our project design for the most part. We still have some concerns about the DSP pipeline mapping correctly onto the DSP slices, and that’s something we’ll keep in mind and re-evaluate after implementing the first stage of the pipeline.

Brandon:

2/24-3/2

For the third week of work on the project, we mainly focused on the design aspect of the project, as we had to present a design presentation as well as write a design document. Since I was presenting, I mainly had to focus on this process rather than spend a lot of time working on the actual project. Thus, I didn’t make as much progress as I was hoping to make this week on video streaming functionality. However, I was able to get OpenCV to work so now I’m at about 50% completion on the video streaming tests before we get the actual hardware. Speaking of the hardware, I also submitted the order form for three Raspberry Pi W with Camera Packs (see below), which we will be able to start working with once we receive them. Some technical challenges I was able to overcome included some weird UDP behavior over multiple machines, and simply installing and working with OpenCV. The steps I took to accomplish this was again, a lot of online research and various forms of testing.

I’m still behind schedule, since I devoted most of my time this week to the design aspect of the class, but I should be okay, because I’m planning on staying in Pittsburgh over spring break, so I’ll be able to catch up on anything I don’t finish then (currently, I don’t have anything scheduled on the Gantt chart, so it’ll be an opportunity to catch up). The deliverable I hope to achieve in this next week is still getting video streaming/sending functionality working completely.

Ilan:

Personal accomplishments this week:

Started working on bringing up FPGA and ARM core. Still working on finalizing infrastructure and toolchain so it works for all 3 of us.
- Had to work through temporary obstacle of powering board since we didn’t have a power supply, so we ordered one for ourselves as well as one for another team that wanted one.
- Future work involves finishing bring-up, pushing infrastructure up to GitHub, and documenting toolchain for Brandon and Edric.
Continued researching steps of Canny edge detection in more depth with Edric to prepare for design review, but we weren’t able to finalize DSP slice allocation for each stage. This was brought up as a flaw in our design review documentation, so we put some time toward this during the second half of the week and will be finalizing that as well as hopefully a clock frequency target for the design that we can include in our design report. We’re still trying to work through the algorithm and better our understanding which has been a bit of a challenge.
- Future work will be finalizing the DSP slice allocation and determining target clock frequency.
No progress yet on implementing interface functionality, but that’s scheduled for the upcoming 2 or so weeks, so that’s fine.

Progress on schedule:

Edric and I continued to make progress on understanding the algorithm and designing the pipeline. We’ll be finalizing this over the rest of the weekend and the implementation will start over the next week or so.

Deliverables next week:

Finish enabling ARM core and FPGA functionality and pushing toolchain up to GitHub and documenting setup.
Finalized DSP slice allocation and target clock frequency.