Background
The concept of pixel selection can be found in mid-term report, where
the relation of number of selected pixels and bit stream size is considered.
The clip "steven.yuv" are used as sample. Also it has been found that generally
random pixel selection has little difference to regular selection. However,
some cases do exist to make random selection much better in coding bit
stream size.
Adjustment after ME by pixels selection
After Examining the differences between the motion vectors estimated
with full 256 pixels search and 16 pixels selection search, we found that
they are distributed as a cross: Most differences are either within 5 steps
in x-direction OR y-direction, or only 1 step in both of the two direction.
With adjustments, the motion vector can be moved to a better place. In
other words, the bit stream can be shortened because better motion estimation
with adjustments. To adjust the motion vector, a cross-shaped mask was
developed, shown in Figure one.
After the motion estimation by pixel selection, one block is selected, called potential block. Put the center of the mask (marked with '0' ) at the potential block. The blocks masked with number ( 'T' means '10' ) are among the candidates being considered for MAD distortion, depending on the total number of blocks for adjustment. Those with smaller number have higher priority. For example, only 13 blocks are considered for MAD distortion, then the blocks marked with 0-4 are chosen. MADs are calculated and the one with least distortion is selected for motion compensation. (or potential block for further adjustments)
Three-step ME -- fast scan with coarse and find adjustments
First, the whole blocks are scanned with MADa (We call MAD calculated
by n pixels MADn. Here n can be 1~256. MAD256 is the
full pixel MAD), a potential block is selected out. Here a is a
small number for fast scan. Then put the mask center at the potential block
to choose its neighbor blocks for MADb calculation. Here b
is larger than a (in fact, better let b*b=a*256)
and the number of blocks chosen t1 is optional. A new potential
block is selected with least distortion. Last, repeat step 2 with MAD256
and select the block for motion compensation. The number of neighbor blocks
chosen t2 is not necessarily the same as t1.
Theoretical calculation cost
If only the MAD calculation dominates (this assumption is proven to
be good in the mid-term report), the calculation cost in one block (normalized
as two pixel comparison) should be: 31^2*a+t1*b+t2*256.
In unit of full MAD search, it is (31^2*a+t1*b+t2*256)/256
MAD/block. If the block is at the boarder of one frame, it should be smaller.
A full search with MAD16 costs 31^31=961MAD/block. TSS costs (9+8+8+8)=33MAD/block. FSS costs 17MAD/block and 50MAD/block in very bad case.
Implementation methods
Method1: This is developed to achieve very good bit rate quality. The
algorithm is modified a little: Not all the blocks are scanned in step
one, but all of those in center--within 12 steps from the center. This
modification is to decrease the time in scan. a=16 and b=80;
t1=41 and t2=21. The cost is [(12*2+1)^2*16+41*80+21*256]/256=72MAD/block.
Method2: It is developed to speed up ME further. In step one, not only the blocks far from the center are ignored, some others are also skipped considering their position. Only those have distances that can be divided by 3 in both x-direction and y-direction are chosen. Totally there are (25/3)^2=81 blocks are chosen. Other numbers are a=16 and b=64; t1=49 and t2=13. The cost is [81*16+49*64+13*256]/256=29MAD/block.
Method3: Similar to Method2, but t1=24. The cost is 23MAD/block.
Motion prediction
Motion vectors at the corresponded position of the previous frame can
be referenced for motion prediction. Because of this, the fast scan step
can be removed. However, if the reference motion vector is bad, no good
motion vector can be expected. Therefore, the motion vectors in the first
frame of a sequence should be carefully selected.
Implementation methods with motion prediction
Method4: With motion prediction, this method is expected as the best
one. For the first frame, 25^2 blocks are scanned with MAD16 (a=16).
b=64 and t1=21 for step two and t2=17 for step three.
The cost is 56 MAD in this frame. For other frames, step one is removed
and motion vector in corresponded position in the previous frame are selected
as current potential motion vector; b=64 and t1=49 for step
two and t2=5 for step three. The cost is 17 MAD/block.
Method5: It is to compare the effect of our new method with TSS under the condition of motion prediction. The motion vectors of the first frame are searched by the same method mentioned in method4. As to other frames, same potential motion vectors are selected from previous frame. TSS is used from step size equals to 2 to 1. The cost is 16 MAD/block.
Results:
Quantizer is chosen to be 8 for all the following coding process. There
are three samples: 'akiyo.yuv', 'mobile.yuv' and 'steven.yuv'. each o f
those are 10-frame CIF clip. Theoretical cost MAD/block, real time spent
in ME, total MADa, MADb, MAD256 and total cost, cost of MAD/block,
bit size of the nine frames and normalized size are listed in tables. For
the method with motion estimation, first frame is separated from others
for better understanding.
Akiyo.yuv
Full search | Three SS | Four SS | Method1 | Method2 | Method3 | |
Theoretical MAD/block |
961
|
33
|
17~50
|
72
|
29
|
23
|
Real time cost in ME |
3170
|
271
|
509
|
945
|
488
|
444
|
MAD16 |
0
|
0
|
0
|
2050488
|
246312
|
246312
|
MAD64 or MAD80 |
0
|
0
|
0
|
136459
|
162159
|
83642
|
MAD256 |
3152115
|
109116
|
56450
|
70798
|
43691
|
43694
|
Total cost in MAD |
3152115
|
109116
|
56450
|
241597
|
145808
|
79999
|
MAD/block |
884
|
31
|
15.8
|
68
|
41
|
22.4
|
Bit frame size |
17902
|
17902
|
18833
|
17908
|
17910
|
17910
|
Normalized size |
100%
|
100%
|
105.2%
|
100%
|
100%
|
100%
|
|
|
||||
Frame |
|
|
|
|
|
Theoretical MAD/block |
60
|
17
|
21.7
|
16
|
20.9
|
Real time cost in ME |
39
|
129
|
168
|
73
|
112
|
MAD16 |
227832
|
0
|
227832
|
0
|
227832
|
MAD64 or MAD80 |
7873
|
144084
|
151957
|
0
|
7873
|
MAD256 |
6365
|
15203
|
21568
|
50083
|
56448
|
Total cost in MAD |
22573
|
51224
|
74121
|
50083
|
72656
|
MAD/block |
57
|
16.2
|
20.8
|
15.8
|
20.4
|
Bit frame size |
6679
|
11347
|
18026
|
11215
|
17894
|
Normalized size |
|
|
Mobile.yuv
Full search | Three SS | Four SS | Method1 | Method2 | Method3 | |
Theoretical MAD/block |
961
|
33
|
17~50
|
72
|
29
|
23
|
Real time cost in ME |
23457
|
1328
|
1067
|
2579
|
488
|
444
|
MAD16 |
0
|
0
|
0
|
2050488
|
246312
|
246312
|
MAD64 or MAD80 |
0
|
0
|
0
|
134061
|
162159
|
83642
|
MAD256 |
3152115
|
109161
|
57911
|
69439
|
43691
|
43694
|
Total cost in MAD |
3152115
|
109161
|
57911
|
231110
|
145808
|
79999
|
MAD/block |
884
|
31
|
16.2
|
64.8
|
41
|
22.4
|
Bit frame size |
1124247
|
1146346
|
1143609
|
1132475
|
1205603
|
1230794
|
Normalized size |
100%
|
102.0%
|
101.7%
|
100.7%
|
107.2%
|
109.5%
|
|
|
||||
Frame |
|
|
|
|
|
Theoretical MAD/block |
60
|
17
|
21.7
|
16
|
20.9
|
Real time cost in ME |
96
|
282
|
378
|
334
|
430
|
MAD16 |
227832
|
0
|
227832
|
0
|
227832
|
MAD64 or MAD80 |
7645
|
142292
|
149937
|
0
|
7645
|
MAD256 |
6232
|
14927
|
21159
|
50160
|
56392
|
Total cost in MAD |
22381
|
50502
|
72883
|
50160
|
72541
|
MAD/block |
57
|
15.9
|
20.4
|
15.8
|
20.4
|
Bit frame size |
118064
|
1008064
|
1126128
|
1119601
|
1237665
|
Normalized size |
|
|
Steven.yuv
Full search | Three SS | Four SS | Method1 | Method2 | Method3 | |
Theoretical MAD/block |
961
|
33
|
17~50
|
72
|
29
|
23
|
Real time cost in ME |
30502
|
1595
|
1532
|
2975
|
1512
|
1318
|
MAD16 |
0
|
0
|
0
|
2050488
|
246312
|
246312
|
MAD64 or MAD80 |
0
|
0
|
0
|
136994
|
164010
|
84457
|
MAD256 |
3152115
|
110277
|
79887
|
71214
|
44063
|
44131
|
Total cost in MAD |
3152115
|
110277
|
79887
|
233618
|
100460
|
80640
|
MAD/block |
884
|
31
|
22.4
|
65.5
|
28.2
|
22.6
|
Bit frame size |
568326
|
616772
|
844312
|
577952
|
647299
|
658789
|
Normalized size |
100%
|
108.5%
|
148.6%
|
101.7%
|
113.9%
|
115.9%
|
|
|
||||
Frame |
|
|
|
|
|
Theoretical MAD/block |
60
|
17
|
21.7
|
16
|
20.9
|
Time used |
114
|
313
|
427
|
370
|
484
|
MAD in 16 |
227832
|
0
|
227832
|
0
|
227832
|
MAD in 64 |
7884
|
145339
|
153283
|
0
|
7884
|
MAD in 256 |
6411
|
15309
|
21720
|
50822
|
57233
|
Total MAD |
22622
|
51644
|
74280
|
50822
|
73444
|
MAD/block |
57
|
16.3
|
20.8
|
16.0
|
20.6
|
size |
62038
|
523186
|
585224
|
604749
|
666787
|
Normalized size |
|
|
Method1 is proven to be a very good method in ME. Compared to the full search algorithm, it saved more than 90% time. However, the bit stream is only about 1% longer. When the speed is not a critical problem, it is a recommended method. Method4 is another good one because its speed. For long sequence, it is not slower than TSS or FSS and the bit stream is shorter. Compared with method4 and method5, the adjustment by pixel selection is better than the adjustment from TSS.
Option I: the neighbor mask
Another mask is made as shown in Figure two.
Some evaluation is made (result will not be shown here but can be obtained
by running the source code). It is found this mask is not better than the
one we mentioned before.
Option II: The rate between the number of MAD64 and MAD256
One MAD256 costs about the same time as four MAD64. With motion prediction
(and ME for the first frame mentioned before), there are several options
to choose to keep the MAD/block the same. For example, we can use no MAD64
but 17 instead and the MAD cost is same. The following is a comparison
of bit stream among the methods with same cost in other frames. Except
method5, others are related to the cross-shaped mask in Figure
one.
|
|
|||||
s teven | # MAD by 64 pixels |
|
|
|
|
|
# MAD by all pixels |
|
|
|
|
|
|
MAD in 64 |
7884
|
7884
|
59231
|
106582
|
153283
|
|
MAD in 256 |
57233
|
57805
|
45672
|
33529
|
21720
|
|
Total MAD |
73444
|
72168
|
74720
|
74414
|
74280
|
|
MAD/block |
20.6
|
20.2
|
21.0
|
20.9
|
20.8
|
|
Bit stream size |
666787
|
609013
|
596060
|
586886
|
585224
|
|
Percent |
117.3%
|
115.3%
|
104.9%
|
103.3%
|
103.0%
|
|
M | Bit stream size |
1237665
|
1135993
|
1133209
|
1130357
|
1126128
|
Percent |
110.1%
|
101.0
|
100.8%
|
100.5%
|
100.2%
|
|
A | Bit stream size |
17941
|
18035
|
17894
|
17902
|
18026
|
Percent |
102.2%
|
100.7%
|
100%
|
100%
|
100.7%
|
From the first two columns, cross-shaped mask is better than TSS for adjustment, especially for simple clips. Also we found more MAD64 do help to shorten the bit stream. It proves pixel selection is a very good algorithm in adjustment.
Option III: random pixel selection
As stated in mid-term report, the pixels are selection by a pattern.
If many pixels are selected, the coding results may have little relation
with the pattern. However, if the number of pixels selection is small,
the pattern may affect the bit stream size. Here three patterns of 16 pixels
are shown in Figure 3:
|
|
|
||
|
|
|
|
|
akiyo.yuv |
18026
|
18066
|
18180
|
101.6%
|
mobile.yuv |
1126128
|
1134329
|
1120830
|
99.7%*
|
Steven.yuv |
585224
|
592520
|
584403
|
101.6%
|
Through the result, we concluded that more balanced pixel distribution pattern expects shorter coding bit stream size. However, to find out the best pattern for many more samples needs more experiments and analysis.
Conclusion:
A new algorithm for ME is developed. Some implementations are developed
and compared with existing algorithms. This new algorithm really shows
its advantages in time saving and bit rate quality obtaining. Furthermore,
it is flexible. Through this concept, different methods can be developed
for different sequence cases and different requirements.
Appendix:
source code
syncfile: gray scale, color
scale