How Video Compression Works

August 23, 2018


Have you ever thought about how video streaming is possible? Let's think about how big a typical 1080p video is: 1920x1080 pixels, 24-bits each, 30 frames per second... That's almost 1.5 gigabits per second. How can you transmit that much data, over the air, in real time? The answer is video compression.

You might have heard of codecs. A codec is a piece of software that encodes and decodes data. The encoding part compresses the data, making it easier to store and transmit. And the decoding part reverses this process, recreating the original data as closely as possible. Use of codecs is not limited to video, they can be used to encode and decode many types of signals. But for now, let's focus on how video codecs work.

In the previous video, we talked about how still images are compressed. In short, images are compressed by throwing out the information that is less visible to the human eye and storing redundant data more efficiently.

We can easily extend image compression to video compression by compressing a video frame by frame. This approach is called spatial or intra-frame coding. Even doing that alone would significantly reduce the file size, but we can actually do much more than that. In a typical video, many consecutive frames tend to be nearly identical. We can make use of this temporal, inter-frame redundancy to further compress a video.

First, let's think of an extreme case where nothing moves in a video. Instead of storing each one of these identical frames we can simply tell our encoder to keep the first frame and repeat it N times. That would save us a lot of space.

Now let's think of a more realistic case where only some parts in the video don't change. This time we can do the same thing, but more locally, by dividing the frames into blocks and repeating only the blocks that don't change.

What if all blocks change between consecutive frames but some change a lot and some change a little? Instead of checking whether a block has changed or not we can search for a given block in the next frame within a neighborhood. This process is called block motion estimation. How does this help with compression?

Well, instead of saving every frame, we can save a reference frame and the motion vectors for the blocks. The motion vectors tell us how we should move the blocks to closely match the next frames. This is called motion compensation. Although motion compensation can reduce the difference between two consecutive frames greatly, it is usually not enough by itself to fully recreate the next frame. So, in addition to the motion vectors, we should also save the frame differences between the actual and motion compensated frames. These differences are known as residual frames.

When it's time to play this video, the decoder predicts the current frame by taking the previous reference frame, compensating for the motion using the motion vectors, and adding the residual frame. You might ask couldn't we just save the original frames instead of saving the residual frames? We could, but residual frames have much less information than the full reference frames. Therefore they are highly compressible.

Let's overview the entire process. Traditional video compression algorithms represent a video as a sequence of reference frames followed by residual frames. There are two types of compression done here: intra-frame coding and inter-frame. In this video, I focused mostly on the inter-frame coding part which achieves a high compression efficiency by exploiting the similarities between consecutive frames. Intra-frame coding, on the other hand, compresses a frame by throwing out visually redundant information within the frame and storing the rest more efficiently. Check out my previous video to learn more about how this is done.

The methods I covered here are the very basics that are used by many codecs including the mainstream H.264 codec, which is also known as MPEG-4 AVC. Modern video codecs, including H.264, H.265, and VP9, use sophisticated methods to balance the level of compression and perceptual image quality without introducing too much computational complexity.

Although video compression algorithms we use today are pretty mature, video compression is still an active area of research. Researchers have already been experimenting with machine learning models that have the potential to perform better than today's block-based hybrid encoding standards. It's not easy to beat today's encoding standards since they had decades to mature and to be tuned in many possible ways. I still think that an end to end trainable codec will eventually outperform the traditional compression methods, by optimizing perceptual image quality while minimizing the file size.

That's all for today! I hope you liked it. If you have any comments or questions, let me know in the comments section below. Subscribe for more videos. As always, thanks for watching, stay tuned, and see you next time.