Midterm report for 18-796 project: fast motion estimation

18-796 Multimedia communication and networking Project: Fast motion estimation based on pixel skipping in MAD calculation Midterm report background:
In video coding, full search in motion estimation take a long time. Generally speaking, to obtain a motion vector in the range of [-15,15], 512*31*31=492032 times plus and 256*31*31+31*31=246977 times comparison is needed. Although there is some method to improve it, but to find the best motion vector is always time wasting.

One solution is in stead of finding the best motion vector, searching for a good motion vector. It is possible to skip some pixels during the MAD value calculation. In this project, this method will be evaluated in the following aspects:
1. How much time is saved by skipping a number of pixels?
2. How different between the motion vectors with/without pixel skipping?
3. How long the bit stream (without header) will be?
...

Work finished
In the midterm report, two ways are selected to fulfill the method.

In the first method, only the pixels at (x*step,y*step) is compared, here x and y are integers and step is a preselected number. step=1 means all the pixels are concerned. In step 2, only 1/4 are concerned,... When step=16, only 1 pixel in one MOB is concerned. the following picture shows the case step=3. (black is the pixel concerned)

To evaluate this method, a program is made. In this program, a source file steven.yuv is chosen. Time spent in motion estimation and final bit stream size is calculated.

The following table shows the result of this method. Column 1 shows the step size chosen. Column 2 is the time spend during motion estimation. The latter 3 column indicate the bit stream size (the number of bits) for the sample with different quantizer.

# steps / # pixels ME time/rate total bits (quant=2) total bits (quant=4) total bits (quant=8)

1/256 25,201/100.00% 1,981,961/100.0% 1,104,193 /100.0% 570,448/100.0%

2/ 64 6,708/ 26.62% 2,015,764/101.7% 1,131,707/102.5% 588,983/103.2%

3/ 36 3,966/ 15.73% 2,030,387/102.4% 1,137,925/103.1% 591,917/103.8%

4/ 16 1,787/ 7.09% 2,152,607/108.6% 1,226,859/111.1% 652,514/114.4%

5/ 16 1,793/ 7.11% 2,127,913/107.4% 1,205,008/109.1% 635,777/111.5%

6/   9 1,085/ 4.31% 2,315,308/116.8% 1,342,507/121.6% 725,770/127.2%

7/   9 1,081/ 4.29% 2,307,809/116.4% 1,336,227/121.0% 720,879/126.4%

8/   4    598/ 2.37% 3,046,340/153.7% 1,893,120/171.4% 1,094,247/191.8%

9/   4    595/ 2.36% 3,059,645/154.4% 1,900,877/172.2% 1,097,841/192.5%

10/ 4    600/ 2.38% 3,027,932/152.8% 1,878,703/170.1% 1,079,756/189.3%

15/ 4    604/ 2.40% 3,127,448/157.8% 1,947,291/176.4% 1,122,828/196.8%

16/ 1    433/ 1.72% 3,737,555/188.6% 2,424,257/219.6% 1,451,751/254.5%

no ME /0        0/ 0.00% 3,600,990/181.7% 2,310,801/209.3% 1,354,662/237.5%

In the above plot, the time saved is proportional to the pixel skipped. The following plot shows the bit sizes of different quantizer.

The results shows that decreasing pixels in evaluation can save a lot of time without increase bits too much. However, if the frame is of thin texture and each frame moves at the rate of texture, using regular pixel skipping may not help to find a good motion vector. a synthetic video code is made to prove it.

To avoid this case happen, Another ways to select random pixels in MAD calculation. To do it, first assign numbers 0~255 randomly to the 16X16 pixels in a Mblock like the following way:

   82    123    105    144    115      15    103    193    131    198    122      90    157     84       14    173
   59    159    172    177      26      86    134    130      40      53      29      95      12    108    165      63
   77    202    162      28      39    168      80      51    152    215    182      64    179    250      61    232
135        4      32    241      83      35      76    238    151    206      56    210    221    133    197      20
192    201    178    205    109        2    248      97      45    237      25        6    246    219    243      70
126    114      22    254      37      78    253    174      21    240    153    120        5    117    255    213
   16    194    112      18    170    110      75        1      88      23      55    154    113    132      93    167
125      99    185        3      65      17      71    233    118    160      66    214    147      31      52    203
195    220    138      46    186      27      33      19    223    236    128    251      60        7      72    136
   30    140    104      94    156    164    189    100    247    208      67        8    231    252    217      92
143    211    204    142    180      48    224    242    183    188    129      91    218    187    169    249
   96    101      44    191      34    155    234    121      50    229    228    230      49      47    116      13
149      74    124      73    244    200      57      69    137      85      24    176    127      98    207      81
184    111      89    171    107      42      79      11    163      38    196    102    216    227    119        9
     0    239    199    146    222    212    235      10      58    148      43    161    158    175    209      41
141    166    181    190      87    139    245      62      36      54    226    145    225    150    106      68

Then the bits can be selected as the following way: suppose p pixels should be selected, we select the pixels with number less than p. For example, let p=16. The following figure marks the pixels selected black:

It is totally random. Using the synthetic video, the different of bit size by regular pixels selecting and random one is shown in the following: (quant=2)
regular: 4,045,852
random: 482,988
The bits saving represents good motion vectors are found under the texture condition.

Still using steven.yuv, selecting different number of pixels, we have the following table:

# pixels (p) ME time/rate total bits (quant=2) total bits (quant=4) total bits (quant=8)

256 151,850/100.0% 1,981,961/100.0% 1,104,193/100.0% 570,448/100.0%

224 133,060/ 87.6% 1,983,777/100.1% 1,105,016/100.07% 571,748/100.2%

192 115,310/ 75.9% 1,982,530/100.03% 1,104,705/100.05% 570,986/100.1%

160 95,920/ 63.2% 1,982,926/100.05% 1,105,834/100.15% 572,689/100.4%

128 76,800/ 50.6% 1,991,221/100.5% 1,109,013/100.4% 573,701/100.6%

96 56,720/ 37.4% 1,994,570/100.6% 1,112,309/100.7% 576,097/101.0%

64 38,420/ 25.3% 2,005,371/101.2% 1,121,133/101.5% 581,331/101.9%

32 19,370/ 12.8% 2,053,705/103.6% 1,157,016/104.8% 605,711/106.2%

16 10,440/ 6.9% 2,117,932/106.9% 1,201,757/108.8% 633,484/111.1%

We can also show the time and bits in plots:

Random selection does not shows advantages in sample steven.yuv may because in this sample the texture case does not exist. Several data is tried in motion.yuv, and the results is similar. More samples is needed to compare the two ways.

Conclusion:
Neglect a part of pixels in MAD calculation DO help to accelerate the coding process in motion estimation. The time spent is almost proportional to the number of pixels considered. With no less than 25% pixels considered, the total bit size enlarges no more than 2%.
The two ways to fulfill the method (regular and random selection) works similar in many cases. However, here are some cases that random selecting takes advantages.
Random selection can change the number of pixels one by one, so it is more flexible than the regular selection.

Further works:
In the following time the work will be focused in the following (depending on time, may not both of them can be finished)
1. From the value of mad to judge in which case a full pixel consideration is necessary (then redo it to find better motion vector) and in which case INTRA should be used.
2. Considering the motion vectors already know, search motion vectors in a smaller region.
A faster and accurate motion estimation program is expected to be shown at the final.

# steps / # pixels	ME time/rate	total bits (quant=2)	total bits (quant=4)	total bits (quant=8)
1/256	25,201/100.00%	1,981,961/100.0%	1,104,193 /100.0%	570,448/100.0%
2/ 64	6,708/ 26.62%	2,015,764/101.7%	1,131,707/102.5%	588,983/103.2%
3/ 36	3,966/ 15.73%	2,030,387/102.4%	1,137,925/103.1%	591,917/103.8%
4/ 16	1,787/ 7.09%	2,152,607/108.6%	1,226,859/111.1%	652,514/114.4%
5/ 16	1,793/ 7.11%	2,127,913/107.4%	1,205,008/109.1%	635,777/111.5%
6/ 9	1,085/ 4.31%	2,315,308/116.8%	1,342,507/121.6%	725,770/127.2%
7/ 9	1,081/ 4.29%	2,307,809/116.4%	1,336,227/121.0%	720,879/126.4%
8/ 4	598/ 2.37%	3,046,340/153.7%	1,893,120/171.4%	1,094,247/191.8%
9/ 4	595/ 2.36%	3,059,645/154.4%	1,900,877/172.2%	1,097,841/192.5%
10/ 4	600/ 2.38%	3,027,932/152.8%	1,878,703/170.1%	1,079,756/189.3%
15/ 4	604/ 2.40%	3,127,448/157.8%	1,947,291/176.4%	1,122,828/196.8%
16/ 1	433/ 1.72%	3,737,555/188.6%	2,424,257/219.6%	1,451,751/254.5%
no ME /0	0/ 0.00%	3,600,990/181.7%	2,310,801/209.3%	1,354,662/237.5%

# pixels (p)	ME time/rate	total bits (quant=2)	total bits (quant=4)	total bits (quant=8)
256	151,850/100.0%	1,981,961/100.0%	1,104,193/100.0%	570,448/100.0%
224	133,060/ 87.6%	1,983,777/100.1%	1,105,016/100.07%	571,748/100.2%
192	115,310/ 75.9%	1,982,530/100.03%	1,104,705/100.05%	570,986/100.1%
160	95,920/ 63.2%	1,982,926/100.05%	1,105,834/100.15%	572,689/100.4%
128	76,800/ 50.6%	1,991,221/100.5%	1,109,013/100.4%	573,701/100.6%
96	56,720/ 37.4%	1,994,570/100.6%	1,112,309/100.7%	576,097/101.0%
64	38,420/ 25.3%	2,005,371/101.2%	1,121,133/101.5%	581,331/101.9%
32	19,370/ 12.8%	2,053,705/103.6%	1,157,016/104.8%	605,711/106.2%
16	10,440/ 6.9%	2,117,932/106.9%	1,201,757/108.8%	633,484/111.1%