How Digital Images are Represented, Compressed, and Stored

July 31, 2018


What's up, everybody! Today we're talking about how digital images are represented, compressed, and stored on your devices. Let's get started!

A typical image is represented as a matrix. Values of this matrix correspond to pixel intensity values. A larger number means a brighter pixel, a smaller number means a darker pixel. Color images have different channels for each color components, such as red, green, and blue. Although this is probably the most common way to represent an image, it's not how they are typically stored on a disk. Why not? Let's take a look at what happens when we do. Let's say we have a 12-megapixel color picture, which means we have 12 million values to store for each color channel leading to a total of 36 million values. If we assume that these values are stored as 8-bit or single-byte integers we should end up with a 36-megabyte file. I have a 12-megapixel image here. Let's see how big it is. Wait, what? That's not even 2 megabytes. How's this possible? The answer is image compression! In this case, it's JPEG image compression.

You've probably seen the extension ".jpg" at the end of your image file names. JPEG is not the only compressed image format but it's probably the most common one. JPEG is a lossy compression format, meaning that some of the information in the original image is actually thrown out. The more information you discard the worse the image quality gets. So, there is a tradeoff between the image quality and file size. But JPEG makes a profitable trade, reducing file size while preserving the perceived image quality. Because the thrown out parts are designed to be the parts that we wouldn't notice easily. Let's see how this is possible.

The first step is color space conversion. Instead of representing an image with it's red, green, and blue color component intensities they are converted into a color space where one channel represents the light intensities, the other two channels represent the colors. This is a linear transform that can be expressed as a matrix multiplication. This conversion provides a separation of the luminance from the chrominance components. Since our visual system is much more sensitive to the changes in brightness than color, we can safely downsample the chroma components to save some space. This strategy is called chroma subsampling and is used in many image and video processing pipelines.

Another characteristic of the human visual system that we can take advantage of is the frequency-dependent contrast sensitivity. What this means is that it's easier to miss small objects or fine details in a picture as compared to the large ones, which is kind of obvious. In this figure, the spatial frequency of the bars increases from left to right and the contrast decreases from the bottom to top. This may vary from person to person but as you can see the bars under the curve are more visible than the rest. This is because our visual system is more sensitive to brightness variations in this range of spatial frequencies. Look at the bars at the low-contrast high-frequency part of the figure: they are barely visible. This phenomenon gives us some room for compression in those less visible frequencies.

JPEG compression does that by dividing the image into 8x8 blocks and quantizing them in a frequency-domain representation. This is done by comparing each one of these 8x8 blocks with 64 frequency patterns where the spatial frequency increases from left to right and top to bottom. This process decomposes the image into its frequency components, converting an 8x8 block where each element represents a brightness level into another 8x8 block where each element represents the presence of a particular frequency component. This method is called the discrete cosine transform.

In this representation, we can easily compress the frequencies that are less visible to us by dividing these frequency components with some constants and then quantizing them. The frequency components that we are less sensitive to get divided by larger constants as compared to the ones that we are more sensitive to. Quantization in this context simply means rounding the result to the nearest integers. Using larger divisors lead to more numbers rounded down to zero. This results in higher compression rates but it also lowers the image quality.

After quantization, we end up with a lot of zeros in the high frequencies. We can store this information more efficiently by rearranging the elements. If we rearrange these coefficients in a zig-zag order from top left to bottom right, we can group these zeroes together. Once we have the zeros together, instead of storing each one of them separately we can store their value and the number of times they consecutively occur in tuples. This technique is called run-length encoding and is used in many other algorithms as well.

Finally, we can further compress what's left by encoding the more frequent values with fewer bits and less frequent values with more bits. Doing so reduces the average number of bits per symbol. This process is called entropy coding.

Both run-length encoding and Huffman coding are lossless compression methods. No information is thrown out in these steps. The compression is achieved solely by storing redundant data more efficiently. This type of compression is used to compress, transmit, and store many types of data, including images, audio, and documents.

When it's time to decode an image, all these steps are reversed. Since some information is lost during subsampling and quantization steps, the decoded image won't be identical to the original one. However, the compressed images should look almost as good as the original ones when a reasonable compression rate is used. Compression artifacts become more visible as the compression rate increases. It's hard to show an uncompressed image here and compare it to a compressed one because the video you are watching now is compressed as well.

One type of image that JPEG particularly falls short is synthetic images such as web graphics. Sharp edges are not common in natural images but they are in synthetic images. High-frequency components that make up a strong edge in a synthetic image get compressed harshly leading to visible compression artifacts near the edges.

So what to do with synthetic images then? Use another image format such as PNG or WebP, or better yet use vector graphics when possible. Vector graphics are stored as mathematical equations rather than pixel values. They are lossless and can be scaled to any size without losing quality. Vector graphics are not feasible for pictures but they are perfect for graphics like logos, illustrations, and diagrams.

That's all for today! I hope you liked it. If you have any comments or questions, let me know in the comments section below. Subscribe for more videos. As always, thanks for watching, stay tuned, and see you next time.