# **Real Time Video Upscaling**

Joshua Lau, James Garcia, Kunal Barde (B0)

## **Use Case/Application Area**

#### **Problem:**

- People want to watch <u>old home</u> <u>videos</u> and <u>movies</u>
- Don't want to plan for upscaling videos ahead of time
- Don't want/know how to do it online or on a computer

#### Solution:

<u>Plug-and-play</u>, <u>real-time</u>, video super-resolution device - Enhancing <u>240p</u> videos to <u>1080p</u>.

#### **Requirements:**

- Scaling Factor of 4.5x
- SSIM > 0.66
- Latency < 60ms
- Throughput  $\geq$  30FPS



# **Solution Approach**

- SRCNN-Ex
  - Software profiling
- Ultra96v2 FPGA
  - Hardware acceleration
- Ultra96v2 ARM Core
  - I/O and profiling
- Meet specifications



## Pitfalls, aka Hardware can be Hard

- Model too large for optimal solution on Ultra96v2
  - Our ex-SRCNN -> mathematically impossible to implement, while meeting specifications.
- As Vitis giveth, Vitis taketh away
  - Chosen because a good solution is easy to make with Vitis
  - Tool is still rather opaque in its implementation, optimisation, etc.
  - Leads to longer iteration at tail-end; fine-tuning less obvious/methodical
- PetaLinux for Embedded ARM is fragile
  - Many one-off oddities
  - Video I/O requires manual patches of version-dependent known issues in the OS; documentation tends to be version-independent or less granular than is relevant

## **Real-Time I/O Functionality**

- Iterative mechanism to stream frames into CNN kernel on FPGA
- Batching functionality present to consume multiple frames at a time
- Experimenting between interpolation methods on the host program vs. more expanded cnn architectures
- Critical part of our real-time system which works but needs finer tuning for batch based workloads



## **Image Quality**

- SSIM requirement met using SRCNN-Ex implementation
- Testing on GPU seemed to show CNN would be fast enough, however, timings did not work on FPGA.





- Much more difficult to meet
  - Impossible in some cases
- Iterative host-side frame processing causes
  latency-bound throughput
- Forced us to do overarching design trade-offs

| Model                              | Single-Frame<br>End-to-End Latency    |
|------------------------------------|---------------------------------------|
| SRCNN-Ex (U96)                     | 115.401s<br>(5.92s theoretical bound) |
| FSRCNN (U96)                       | 2.989s<br>(64ms theoretical bound)    |
| SRCNN-Ex<br>(embedded device avg.) | ~200s                                 |

# Tradeoff

- Non-tight coupling of HW and Algo design led to a "pick-two-triangle" trade-off
- 1. Fixing HW and an algorithm leads to missing specs
- 2. Fixing an algorithm and the specs requires different HW
- 3. Fixing HW and specs requires a different algorithm



# **Change in Approach**

#### Main change: Going from model based on <u>SRCNN-Ex</u> to <u>FSRCNN-s</u>.

#### SRCNN-Ex

- Shallow, but operates on the pre-upscaled frame
- More robust less variance in SSIM
- Highest SSIM out of implementations considered

#### FSRCNN-s

- Deeper, but operates on the native frame. Less data processed, increase in throughput
- 2 orders of magnitude fewer computations than ex-SRCNN
- Decreased SSIM compared to other implementations

## **Complete Solution - Specifications Met**

- Scaling Factor and SSIM have never been the problem for our implementations
- Failed to meet either timing requirement in both FPGA implementations

|                            | SRCNN-Ex | FSRCNN   | FSRCNN-s           |  |  |
|----------------------------|----------|----------|--------------------|--|--|
| Scaling<br>Factor<br>= 4.5 | 4.5      | 4        | 4                  |  |  |
| SSIM<br>> 0.66             | ~0.751   | ~0.734   | ~0.715             |  |  |
| Latency<br>< 60ms          | 115401ms | 2989ms   | 20ms<br>(ideal)    |  |  |
| Throughput<br>≥ 30FPS      | ~8.7mFPS | ~0.33FPS | 49.7FPS<br>(ideal) |  |  |

# **Complete Solution - User Experience**

#### Initialisation:

- Plug in the Ultra-96
- Power up the Ultra-96
- Connect Ultra-96 to compatible monitor / display

### <u>User Flow</u>:

- Plug USB into Ultra-96, containing single video file
- Launch upscaling program
- Upscaled video is displayed on external monitor / display

#### **Revised Schedule**

| Hardware                                            |                |                   | 1        | 1     |      |      |      | 22      |                                       |        | 10              |
|-----------------------------------------------------|----------------|-------------------|----------|-------|------|------|------|---------|---------------------------------------|--------|-----------------|
| Acquire Litra96                                     | 306            |                   | 12       |       | 10   | 10   |      | 33      | 1                                     | 5      | 12              |
| Acquire Peripherals                                 | 256            |                   |          |       |      |      |      |         |                                       |        |                 |
| Research UD                                         | 256            | <del>396</del>    | 356 - KD |       | 3    |      |      | 1200    |                                       |        |                 |
| Implement 90                                        |                |                   | KB + JSG | KB    | KB   | KB   | KB   | KB      | 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 |        | 18              |
| Text I/D                                            |                |                   | JSG + KB | 35G   | JSG  | JSG  | JSG  | JSG     |                                       |        |                 |
| Get Comms between ARM Core and FPGA                 | 13             | 396               |          | 1     |      | 1    |      |         |                                       |        | 18              |
| Write Math Functions for CNN in Vita HLS            | 3              |                   |          | 300   | 255  | 206  | 256  | 256     |                                       |        |                 |
| Get full-sized model to synih on the board property |                |                   |          |       |      |      |      | JSG     | JSG                                   | JSG    | JSG             |
| Optimize the sizes                                  |                |                   |          |       | 100  |      |      | JSG     | JSG                                   | JSG    | JSG             |
| Speed up is/ HLS pregmas                            |                |                   |          |       |      |      |      | 35G     | JSG                                   | JSG    | JSG             |
| Validate HW                                         | 12             |                   | 1        | кв    | KB   | KB   | KB   | JSG+KB  | JSG+KB                                |        | 1               |
| Port SW model onto FPGA                             |                | -27               | 18       |       |      |      | 206  |         |                                       |        | 18              |
| Validating FPGA model against SW model              |                |                   |          |       |      |      | кв   | КВ      | KB                                    |        |                 |
| Time Benchmarking                                   | 1              |                   | 12       |       |      |      |      | КВ      | КВ                                    | A      | 12              |
| LV                                                  | 1              |                   | 1        |       | 8    |      |      | KB      | KB                                    | кв     | КВ              |
| screen display                                      |                |                   |          |       |      |      |      | KB +JSG | KB +35G                               | KB+JSG | KB +35G         |
| video routing                                       | 1              | -                 |          | 1     | 55   |      |      | KB      | KB                                    | KB     | КВ              |
| Software                                            |                |                   |          |       |      |      |      |         |                                       |        |                 |
| Research DSP vs CNN models                          | ALL            | 1                 | 1        |       | - 31 |      |      | 8       |                                       |        |                 |
| Acquire AWS Credits                                 |                | KB                |          | -     | 8    |      |      |         |                                       |        | 100             |
| Setup AWS                                           |                | <b>HB</b>         | KB + JL  |       |      |      |      |         |                                       |        |                 |
| Acquire Dataset                                     | ale.           |                   | 2        | 1     |      |      |      | 2       |                                       |        | 18              |
| Familiarize VMAF Documentation                      | die.           | *                 |          |       | 8    |      |      | 8       |                                       |        |                 |
| Research specific CNN models                        |                | KD + at           | KB + JL  |       |      |      |      |         |                                       |        |                 |
| Benchmark VMAF                                      | 4              | 356 - 3L          | -        | 1     | 12   |      | 1    | 38      | 1                                     |        | 12              |
| Research SSIM                                       | нев            | 1.00              |          |       |      |      |      |         |                                       |        |                 |
| Benchmark SSIM                                      | -              | KD + JL           |          |       | 8    |      |      | 2       |                                       |        |                 |
| Benchmark CNN Models                                |                | at 1966           | *        |       | 23   |      |      | 10      |                                       |        | 100             |
| Develop Python Code for Training                    |                |                   | ALCO MEL | die . | -    |      |      |         |                                       |        |                 |
| Train Model                                         |                |                   |          | ate:  | -    | itte | -16  |         |                                       |        | 12              |
| Test/Evaluate Model                                 |                |                   |          | 41.   | n.   | 36.  | JL.  |         |                                       |        |                 |
| Further optimizing weights (keeping hyperpa         | Martin Const I | to HW model stays | s const) |       |      |      | di C | 11.     | JL.                                   |        |                 |
| Misc                                                |                | 2.4               |          |       | 8    |      |      | 2       |                                       |        |                 |
| Stack/Stop                                          |                |                   |          |       |      |      |      |         |                                       | ALL    |                 |
| Milestores                                          |                |                   | 10       | 1     | - 28 |      | 1    | 12      | 1                                     |        | 10              |
| Proposal Presentation                               | 356            |                   | 1        | 1     |      |      |      |         | 1                                     |        |                 |
| Design Presentation                                 |                |                   | KB       |       |      |      |      |         |                                       |        |                 |
| Design Review Report                                |                |                   | ALL      | ALL   | S    |      |      | 3       | 1                                     |        | 12              |
| Interim Demo-                                       |                | 1                 |          |       | 121  |      | ALL  | ALL     | 8                                     |        |                 |
| Final Presentation                                  | -              |                   |          |       |      |      |      |         |                                       |        | THE OWNER WATER |

James Joshua Kunal