18-796 project end report

Shape coding for MPEG-4

For the second part of our project, we were concerned about the inter-coding of shape information. To do so, we started from the context-based shape intra-codec we built for the first part and added motion estimation/compensation as well as context-based inter-coding.

Encoder:

The decoder follows the functionality of the encoder.

The syntax of the encoded bitstream is as follows:

In order to (try to) improve the compression ratio or the speed of the codec, we implemented to independent control parameters:

A motion-estimation threshold (MOTION_TH in general.h) allows controlling the precision of the motion vectors: each time the motion estimator scans trough a frame, it takes the previously found motion vector as a prediction for the next one. If the pixel-wise difference between this predicted block and the current frame is under the threshold, we send the prediction as motion-vector. Else, we perform an exhaustive search (33*33) around the prediction. Therefore, a 0-threshold results in the usual exhaustive search for all blocks, while a high threshold saves estimation time, but makes the bitstream larger. The following table summarizes some results (children sequence, 10 frames):

Motion threshold	Compression time	Compression ratio
0	78 s. (100%)	46
64	45 s. (60%)	35
128	39 s. (50%)	31

As in the intra-codec, a block-threshold allows lossy compression: each block is set to all-0 or all-255 depending on its accepted quality compared to the threshold.

Threshold=0 Threshold=128 Threshold=256

We measured the following values:

Alpha threshold	Compression ratio (children.seg)	Compression ratio (stefan.seg)
0	46.4	59.7
16	46.3	59.6
32	46.1	59.7 (!)
64	46.0	58.9
128	45.2	56.2
256	29.5	22.1

We notice 2 facts: first, the compression ratio is not much better than with a succession of intra-coded frames (compression ratio would be 35). This is due to the fact that we always send all the motion vectors, even when there, as it is the case in the children sequence, is only a few movement. MPEG-4, however, clearly specifies NOT TO send the motion vectors in such cases. We didn't implement this decision, because it requires a more sophisticated syntax.

But we also see that the compression ratio gets worse for lossy compression! We are not certain about the reason, but it could be that the sudden creation of black or white blocks between two frames results in a big "residue" (sudden black block = big change between 2 frames). It is less probable that the MPEG group just want's to show us how nasty lossy coding is?

Pasting:

the use of shape coding can be shown by the little "paste.c" program. This just takes a shape, the corresponding video file and a background file, and then shows the background where the shape is black or the texture where the shape is white. Therefore, compositing can be done without any chroma keying if we have the shape information.

And the chain coder?

For the midterm part, we implemented an intra-chain-coder. While we didn't improve it in order to do inter-coding, this could be done by some "vector tracking", where the movement of the individual vectors would be encoded in an efficient manner. However, this implies pattern recognition as well as syntax issues that are far beyond the scope of our project (and of MPEG-4).

Conclusion: further steps

While this project gave us a good insight in the methods used by the MPEG group to achieve more flexibility and higher efficiency, the ratios of our implementation could probably be improved by the consideration of all the standard modes (which would require a complex syntax, however). It could also be of interest to try out other methods of lossy compression as well as to find some speed-efficiency tradeoffs based on our control parameters. Also, different sequences provide different results. But despite these facts, the project provided us a good first step in coding technology!

Source code (.zip):

C-code
Test files
Utilities

References:

MPEG-4 Video Verification Model 10

MPEG-4 Visual Comitee Draft

T. Sikora, The MPEG-4 Video Standard Verification Model, IEEE trans. on CSVT, Vol. 7, No. 1, Feb. 1997

Chain coding:

Lee and others, A Layered Video Object Coding System Using Sprite and Affine Motion Model, IEEE trans. on CSVT, Vol.7, No.1, Feb. 1997

Katata & Kusao, Temporal-Scalable Coding Based on Image Content, IEEE trans. on CSVT, Vol.7, No.1, Feb. 1997