motion estimation

A new search algorithm by pixels selection and adjustment for block motion estimation
Motion estimation (ME) is the critical part of video coding. Good ME shorten the coding bit stream by a factor of two or more. However, it also takes a lot of time during the whole coding process. One algorithm, which finds good motion vector quickly, is demanded. Based on the concept of random select some of the pixels to speed up MAD calculation, some methods are developed and compared with current fast algorithms such as Three Step Search (TSS) and Four Step Search (FSS).

Background
The concept of pixel selection can be found in mid-term report, where the relation of number of selected pixels and bit stream size is considered. The clip "steven.yuv" are used as sample. Also it has been found that generally random pixel selection has little difference to regular selection. However, some cases do exist to make random selection much better in coding bit stream size.

Adjustment after ME by pixels selection
After Examining the differences between the motion vectors estimated with full 256 pixels search and 16 pixels selection search, we found that they are distributed as a cross: Most differences are either within 5 steps in x-direction OR y-direction, or only 1 step in both of the two direction. With adjustments, the motion vector can be moved to a better place. In other words, the bit stream can be shortened because better motion estimation with adjustments. To adjust the motion vector, a cross-shaped mask was developed, shown in Figure one.

Figure 1: Mask of neighborhoods for adjustment

After the motion estimation by pixel selection, one block is selected, called potential block. Put the center of the mask (marked with '0' ) at the potential block. The blocks masked with number ( 'T' means '10' ) are among the candidates being considered for MAD distortion, depending on the total number of blocks for adjustment. Those with smaller number have higher priority. For example, only 13 blocks are considered for MAD distortion, then the blocks marked with 0-4 are chosen. MADs are calculated and the one with least distortion is selected for motion compensation. (or potential block for further adjustments)

Three-step ME -- fast scan with coarse and find adjustments
First, the whole blocks are scanned with MADa (We call MAD calculated by n pixels MADn. Here n can be 1~256. MAD256 is the full pixel MAD), a potential block is selected out. Here a is a small number for fast scan. Then put the mask center at the potential block to choose its neighbor blocks for MADb calculation. Here b is larger than a (in fact, better let b*b=a*256) and the number of blocks chosen t1 is optional. A new potential block is selected with least distortion. Last, repeat step 2 with MAD256 and select the block for motion compensation. The number of neighbor blocks chosen t2 is not necessarily the same as t1.

Theoretical calculation cost
If only the MAD calculation dominates (this assumption is proven to be good in the mid-term report), the calculation cost in one block (normalized as two pixel comparison) should be: 31^2*a+t1*b+t2*256. In unit of full MAD search, it is (31^2*a+t1*b+t2*256)/256 MAD/block. If the block is at the boarder of one frame, it should be smaller.

A full search with MAD16 costs 31^31=961MAD/block. TSS costs (9+8+8+8)=33MAD/block. FSS costs 17MAD/block and 50MAD/block in very bad case.

Implementation methods
Method1: This is developed to achieve very good bit rate quality. The algorithm is modified a little: Not all the blocks are scanned in step one, but all of those in center--within 12 steps from the center. This modification is to decrease the time in scan. a=16 and b=80; t1=41 and t2=21. The cost is [(12*2+1)^2*16+41*80+21*256]/256=72MAD/block.

Method2: It is developed to speed up ME further. In step one, not only the blocks far from the center are ignored, some others are also skipped considering their position. Only those have distances that can be divided by 3 in both x-direction and y-direction are chosen. Totally there are (25/3)^2=81 blocks are chosen. Other numbers are a=16 and b=64; t1=49 and t2=13. The cost is [81*16+49*64+13*256]/256=29MAD/block.

Method3: Similar to Method2, but t1=24. The cost is 23MAD/block.

Motion prediction
Motion vectors at the corresponded position of the previous frame can be referenced for motion prediction. Because of this, the fast scan step can be removed. However, if the reference motion vector is bad, no good motion vector can be expected. Therefore, the motion vectors in the first frame of a sequence should be carefully selected.

Implementation methods with motion prediction
Method4: With motion prediction, this method is expected as the best one. For the first frame, 25^2 blocks are scanned with MAD16 (a=16). b=64 and t1=21 for step two and t2=17 for step three. The cost is 56 MAD in this frame. For other frames, step one is removed and motion vector in corresponded position in the previous frame are selected as current potential motion vector; b=64 and t1=49 for step two and t2=5 for step three. The cost is 17 MAD/block.

Method5: It is to compare the effect of our new method with TSS under the condition of motion prediction. The motion vectors of the first frame are searched by the same method mentioned in method4. As to other frames, same potential motion vectors are selected from previous frame. TSS is used from step size equals to 2 to 1. The cost is 16 MAD/block.

Results:
Quantizer is chosen to be 8 for all the following coding process. There are three samples: 'akiyo.yuv', 'mobile.yuv' and 'steven.yuv'. each o f those are 10-frame CIF clip. Theoretical cost MAD/block, real time spent in ME, total MADa, MADb, MAD256 and total cost, cost of MAD/block, bit size of the nine frames and normalized size are listed in tables. For the method with motion estimation, first frame is separated from others for better understanding.

Akiyo.yuv

Full search Three SS Four SS Method1 Method2 Method3

Theoretical MAD/block
961

33

17~50

72

29

23

Real time cost in ME
3170

271

509

945

488

444

MAD16
0

0

0

2050488

246312

246312

MAD64 or MAD80
0

0

0

136459

162159

83642

MAD256
3152115

109116

56450

70798

43691

43694

Total cost in MAD
3152115

109116

56450

241597

145808

79999

MAD/block
884

31

15.8

68

41

22.4

Bit frame size
17902

17902

18833

17908

17910

17910

Normalized size
100%

100%

105.2%

100%

100%

100%

Method4 Method5

Frame First Other eight Total Other Total

Theoretical MAD/block
60

17

21.7

16

20.9

Real time cost in ME
39

129

168

73

112

MAD16
227832

0

227832

0

227832

MAD64 or MAD80
7873

144084

151957

0

7873

MAD256
6365

15203

21568

50083

56448

Total cost in MAD
22573

51224

74121

50083

72656

MAD/block
57

16.2

20.8

15.8

20.4

Bit frame size
6679

11347

18026

11215

17894

Normalized size 100.7% 100%

Mobile.yuv

Full search Three SS Four SS Method1 Method2 Method3

Theoretical MAD/block
961

33

17~50

72

29

23

Real time cost in ME
23457

1328

1067

2579

488

444

MAD16
0

0

0

2050488

246312

246312

MAD64 or MAD80
0

0

0

134061

162159

83642

MAD256
3152115

109161

57911

69439

43691

43694

Total cost in MAD
3152115

109161

57911

231110

145808

79999

MAD/block
884

31

16.2

64.8

41

22.4

Bit frame size
1124247

1146346

1143609

1132475

1205603

1230794

Normalized size
100%

102.0%

101.7%

100.7%

107.2%

109.5%

Method4 Method5

Frame First Other eight Total Other Total

Theoretical MAD/block
60

17

21.7

16

20.9

Real time cost in ME
96

282

378

334

430

MAD16
227832

0

227832

0

227832

MAD64 or MAD80
7645

142292

149937

0

7645

MAD256
6232

14927

21159

50160

56392

Total cost in MAD
22381

50502

72883

50160

72541

MAD/block
57

15.9

20.4

15.8

20.4

Bit frame size
118064

1008064

1126128

1119601

1237665

Normalized size 100.2% 110.1%

Steven.yuv

Full search Three SS Four SS Method1 Method2 Method3

Theoretical MAD/block
961

33

17~50

72

29

23

Real time cost in ME
30502

1595

1532

2975

1512

1318

MAD16
0

0

0

2050488

246312

246312

MAD64 or MAD80
0

0

0

136994

164010

84457

MAD256
3152115

110277

79887

71214

44063

44131

Total cost in MAD
3152115

110277

79887

233618

100460

80640

MAD/block
884

31

22.4

65.5

28.2

22.6

Bit frame size
568326

616772

844312

577952

647299

658789

Normalized size
100%

108.5%

148.6%

101.7%

113.9%

115.9%

Method4 Method5

Frame First Other eight Total Other Total

Theoretical MAD/block
60

17

21.7

16

20.9

Time used
114

313

427

370

484

MAD in 16
227832

0

227832

0

227832

MAD in 64
7884

145339

153283

0

7884

MAD in 256
6411

15309

21720

50822

57233

Total MAD
22622

51644

74280

50822

73444

MAD/block
57

16.3

20.8

16.0

20.6

size
62038

523186

585224

604749

666787

Normalized size 103.0% 117.3%

Bit stream performance and coding time cost are also ploted out.

Method1 is proven to be a very good method in ME. Compared to the full search algorithm, it saved more than 90% time. However, the bit stream is only about 1% longer. When the speed is not a critical problem, it is a recommended method. Method4 is another good one because its speed. For long sequence, it is not slower than TSS or FSS and the bit stream is shorter. Compared with method4 and method5, the adjustment by pixel selection is better than the adjustment from TSS.

Option I: the neighbor mask
Another mask is made as shown in Figure two. Some evaluation is made (result will not be shown here but can be obtained by running the source code). It is found this mask is not better than the one we mentioned before.

Figure two: Another neighbor mask

Option II: The rate between the number of MAD64 and MAD256
One MAD256 costs about the same time as four MAD64. With motion prediction (and ME for the first frame mentioned before), there are several options to choose to keep the MAD/block the same. For example, we can use no MAD64 but 17 instead and the MAD cost is same. The following is a comparison of bit stream among the methods with same cost in other frames. Except method5, others are related to the cross-shaped mask in Figure one.

Method5 Method4

s teven # MAD by 64 pixels 0 (step 3) 0 16 32 48

# MAD by all pixels 16 (step 3) 16 12 8 4

MAD in 64
7884

7884

59231

106582

153283

MAD in 256
57233

57805

45672

33529

21720

Total MAD
73444

72168

74720

74414

74280

MAD/block
20.6

20.2

21.0

20.9

20.8

Bit stream size
666787

609013

596060

586886

585224

Percent
117.3%

115.3%

104.9%

103.3%

103.0%

M Bit stream size
1237665

1135993

1133209

1130357

1126128

Percent
110.1%

101.0

100.8%

100.5%

100.2%

A Bit stream size
17941

18035

17894

17902

18026

Percent
102.2%

100.7%

100%

100%

100.7%

'M' represents the clip mobile.yuv and 'A' represents the clip akiyo.yuv

From the first two columns, cross-shaped mask is better than TSS for adjustment, especially for simple clips. Also we found more MAD64 do help to shorten the bit stream. It proves pixel selection is a very good algorithm in adjustment.

Option III: random pixel selection
As stated in mid-term report, the pixels are selection by a pattern. If many pixels are selected, the coding results may have little relation with the pattern. However, if the number of pixels selection is small, the pattern may affect the bit stream size. Here three patterns of 16 pixels are shown in Figure 3:

Figure 3: Three pattern for pixel selection The left pattern is the one used above and also appeared in the mid-term report. The center one looks more unbalanced. is used for comparison. Both of them are constructed from computer program. The right pattern is made by hand and looks more balanced.

	Left pattern	Center pattern	Right pattern
	Total bit size	Total bit size	Total bit size	percent
akiyo.yuv	18026	18066	18180	101.6%
mobile.yuv	1126128	1134329	1120830	99.7%*
Steven.yuv	585224	592520	584403	101.6%

*although the bit size made from full search motion vectors is not necessarily be the shortest, it is still a surprise to see the bit stream size made from the right pattern are shorter. Further investigation may needed in the following research.

Through the result, we concluded that more balanced pixel distribution pattern expects shorter coding bit stream size. However, to find out the best pattern for many more samples needs more experiments and analysis.

Conclusion:
A new algorithm for ME is developed. Some implementations are developed and compared with existing algorithms. This new algorithm really shows its advantages in time saving and bit rate quality obtaining. Furthermore, it is flexible. Through this concept, different methods can be developed for different sequence cases and different requirements.

Appendix:
source code
syncfile: gray scale, color scale

	Full search	Three SS	Four SS	Method1	Method2	Method3
Theoretical MAD/block	961	33	17~50	72	29	23
Real time cost in ME	3170	271	509	945	488	444
MAD16	0	0	0	2050488	246312	246312
MAD64 or MAD80	0	0	0	136459	162159	83642
MAD256	3152115	109116	56450	70798	43691	43694
Total cost in MAD	3152115	109116	56450	241597	145808	79999
MAD/block	884	31	15.8	68	41	22.4
Bit frame size	17902	17902	18833	17908	17910	17910
Normalized size	100%	100%	105.2%	100%	100%	100%

		Method4		Method5
Frame	First	Other eight	Total	Other	Total
Theoretical MAD/block	60	17	21.7	16	20.9
Real time cost in ME	39	129	168	73	112
MAD16	227832	0	227832	0	227832
MAD64 or MAD80	7873	144084	151957	0	7873
MAD256	6365	15203	21568	50083	56448
Total cost in MAD	22573	51224	74121	50083	72656
MAD/block	57	16.2	20.8	15.8	20.4
Bit frame size	6679	11347	18026	11215	17894
Normalized size		100.7%		100%

		Method5				Method4
s teven	# MAD by 64 pixels	0 (step 3)	0	16	32	48
	# MAD by all pixels	16 (step 3)	16	12	8	4
	MAD in 64	7884	7884	59231	106582	153283
	MAD in 256	57233	57805	45672	33529	21720
	Total MAD	73444	72168	74720	74414	74280
	MAD/block	20.6	20.2	21.0	20.9	20.8
	Bit stream size	666787	609013	596060	586886	585224
	Percent	117.3%	115.3%	104.9%	103.3%	103.0%
M	Bit stream size	1237665	1135993	1133209	1130357	1126128
M	Percent	110.1%	101.0	100.8%	100.5%	100.2%
A	Bit stream size	17941	18035	17894	17902	18026
A	Percent	102.2%	100.7%	100%	100%	100.7%