Home / Problem Set 12 / JPEG Image Compression

JPEG Image Compression

The questions below are due on Monday May 17, 2021; 10:00:00 PM.
 
You are not logged in.

If you are a current student, please Log In for full access to the web site.
Note that this link will take you to an external site (https://shimmer.csail.mit.edu) to authenticate, and then you will be redirected back to this page.

In this problem, we will consider the problem of lossy image compression by implementing a JPEG encoder. That is, we will be writing a program that takes as input an image represented in the spatial domain (as a NumPy array of brightness values) and outputs a JPEG image (consisting of encoded frequency domain information).

Our encoder will operate by chopping our image into 8\times8 blocks, computing the DCT of each, quantizing and rounding those values (based on the value of a desired "quality" setting), and applying additional lossless compression to the result. Note that in JPEG, the representation we store on disk contains DCT coefficients rather than spatial information.

This page will guide you through the process step-by-step, and you'll have an opportunity to check intermediate pieces of your code throughout. We've also provided a number of helper functions to handle converting from a Python representation to the actual binary data stored in the JPEG file (so you only have to worry about the actual compression, not the on-disk representation).

This exercise is new this semester, so if there are things that are unclear, or if you have any questions, please don't hesitate to ask (or to share feedback)!

Template Code and Overview

We have provided some template code and helper functions, as well as a few test images, in this exercise's code distribution.

The main code you will have to edit is contained in jpeg.py, and it exists for the purpose of converting images from arbitrary format into JPEG images. From a high-level perspective, our main program is implemented as a function called write_encoded, which is already defined for you in jpeg.py. However, it is missing several important pieces, which you will implement over the course of this assignment!

Note that this program takes as input a filename representing an image to be treated as the input, a filename representing the filename to which the output should be written, and a number (0-100) representing the desired quality of the output image (higher quality -> larger files).

For example, we could convert the stronger.bmp file to JPEG using something like the following:

write_encoded('stronger.bmp', 'stronger.jpg', quality=80)

1) Encoder

Our ultimate goal for the remainder of this assignment will be to implement the encode function (used by write_encoded), which takes in an image (in our familiar format, as a NumPy array of brightness values ranging from 0 to 1, with r=0, c=0 in the upper left corner) and ultimately returns a list containing a representation of the DCT coefficients after quantization.

Note that JPEG operates on 8\times8 blocks one at a time. In the following sections, we'll talk about the process we'll need to apply to each block. Then, compressing the whole image simply involves applying the process to each block in the image in turn.

In the following sections, we'll talk through the process as a whole, as well as some useful helper functions. Then we'll show an example of the whole process, which you can use to help to test your code as you work through the actual implementation. As such, we encourage you to read through the following sections (and to write the helper functions) before diving too far into the implementation.

1.1) Range Conversion

The JPEG encoding scheme actually operates on a slightly different numeric range than the one we're accustomed to. Rather than representing our pixel brightnesses as floating point numbers in the range [0, 1], we'll instead represent them as integers in the range [-127, 128].

We do this for a couple of reasons:

  • symmetry about 0 will make things more pleasant for our DCT (reducing the DC component)
  • representing things as integers will make the rounding step easier, and it will means that things take up less room on disk
  • the expanded range will give us nicer numbers to work with
  • the JPEG standard quantization table was chosen specifically to work with integers in this range

We have provided a function called util.adjust_range_256 that takes an image in our usual format as input, and returns a new NumPy array with the values converted to this new range.

1.2) Block-wise Orthonormal DCT

To make things scale out nicely, we'll be using a slight variant of the DCT we discussed in lecture, the orthonormal version of the DCT. It differs from our definition only by a scaling factor. Paste your dct and dct2 functions from the previous exercise into your file, and note the odct2 function, which scales those values. For the remainder of the exercise, when we refer to the DCT, you should use the odct2 function.

For each block, we would like to:

  • compute its DCT using the odct2 function,
  • quantize (by dividing through element-wise by the quantization table discussed below),
  • round and convert to integers (note the provided util.round_to_ints function, which performs this step on an array)

Note that you do not need to write any code to do this quite yet; we'll implement that a little bit later.

1.3) Quantization

For our purposes, we will be quantizing values using a standard JPEG quantization table. Here, we'll implement some of the necessary pieces for that step, and we'll manually run through a small test case to make sure things are working as expected.

Our base quantization table (call it M_b) is stored as a variable base_quantization_table in the provided Python file. Given a "quality" parameter q (an integer s.t. 1 \leq q \leq 100, we can derive an appropriate quantization table as follows:

M_q[k_r, k_c] = {\rm min}\left(255, {\rm max}\left(1, \left\lfloor \frac{S_q\times M_b[k_r, k_c] + 50}{100}\right\rfloor\right)\right)

where S_q is given by:

S_q = \begin{cases} \left\lfloor \frac{5000}{q}\right\rfloor & \text{if}~q<50\\ 200 - 2\times q & \text{otherwise} \end{cases}

Write a function called make_quantization_table that takes the quality parameter q as input and returns an appropriate quantization table. You can use the box below to test your function, and then you can paste it into your file.

You may assume that base_quantization_table is available for you in the box below (you do not need to copy/paste it from your file).

Check Yourself 1:

Note that a quality of 100 should result in no change to the underlying image. What is the quantization table in that case? Does this make sense?

Check Yourself 2:

Once you have make_quantization_table working, copy/paste it into your Python file so that you'll have access to it later on!

1.4) Flattening and Run-length Encoding

A lot of JPEG's compression comes from anticipating some of the structure of these blocks of quantized DCT coefficients. In particular, we can assume that a large number of the elements in each block will be 0 after quantization, and we expect that the nonzero values will usually be clustered around the DC component.

1.4.1) Flattening

In order to exploit these properties, JPEG proceeds in two steps: firstly, we "flatten" this 2-dimensional structure into a 1-dimensional array of values, arranged in a "zig-zag" pattern shown below:

Note that we have provided a list called zigzag_pattern in jpeg.py, which contains the coordinates that should be associated with each element in the flattened list. Concretely, the i^\text{th} element in our flattened list should be the element at the coordinates given by zigzag_pattern[i].

We encourage you to write a helper function flatten_block to accomplish this flattening operation. You can use the box below to test your implementation before incorporating it into your jpeg.py file. You may also assume that zigzag_pattern has been defined for you in the box below (i.e., you do not have to copy/paste it from your jpeg.py file).

Check Yourself 3:

Once you have flatten_block working, copy/paste it into your jpeg.py file so that you have it available to you later on.

1.4.2) Run-length Encoding

This flattened structure (and the zig-zag pattern) were chosen so that we often end up with large sequences of zeros in the flattened structure. JPEG exploits this by using a variant of Run-length encoding.

Specifically, in anticipation of long sequences of zeros, JPEG only explicitly represents the non-zero values in each block, identifying each by the number of zeros that preceed it, as well as its value. For example, consider the following sequence:

[0, 7, 2, 0, 0, 0, 0, 9, 4, 0, 0, 0, 0]

We could encode this by saying that the nonzero values are, in order:

  • 1 zero, followed by a 7, then
  • 0 zeros, followed by a 2, then
  • 4 zeros, followed by a 9, then
  • 0 zeros, followed by a 4, then
  • no more non-zero values

which we could represent more compactly as [(1, 7), (0, 2), (4, 9), (0, 4)].

Write a helper function run_length_encode that takes in a single flat list (the result of the flatten_block process described above) and returns a list of tuples as described above.

Check Yourself 4:

Once you have flatten_block working, copy/paste it into your jpeg.py file so that you have it available to you later on.

1.5) Block-wise Process

Note that we will need to apply the compression process to each 8\times8 block in the image. It will also be important that we work through the blocks in a given order, as demonstrated in the loop below:

for r in range(im.shape[0]//8):  # loop vertically
    for c in range(im.shape[1]//8):  # loop horizontally
        block = im[r*8:(r+1)*8, c*8:(c+1)*8]

(feel free to copy/paste this into your encode function and use it as a basis for your code)

1.6) Example

Now let's take a look at the whole process from end-to-end, for a single block of an image. As our example, here we'll walk through the process for the first 8x8 block in the sparrow.bmp file included in the distribution, with a quality setting of 60. Below, we include all intermediate results (feel free to use this example when testing, and print out your intermediate results to verify that your representation matches).

When using a quality of 60, the quantization table should be given by:

[[13  9  8 13 19 32 41 49] # quantization table
 [10 10 11 15 21 46 48 44]
 [11 10 13 19 39 46 55 45]
 [11 14 18 23 41 70 64 50]
 [14 18 30 45 54 87 82 62]
 [19 28 44 51 65 83 90 74]
 [39 51 62 70 82 97 96 81]
 [58 74 76 78 90 80 82 79]]

And, after converting to the right range, the first block of (pixel values) from sparrow.bmp is:

[[103 115 103  62  43  29  43  71] # first block pixels
 [115 125 118  65  33  36  51  76]
 [122 127 125  88  61  70  73  81]
 [120 127 127 109  92 100  95  80]
 [125 127 126 112 102 111 107  78]
 [122 117 111  98  96 109 110  84]
 [ 95  92  92  85  82  98 108  98]
 [ 90  89  92  80  67  93 112 108]]

We then compute its DCT (from odct2), which should produce:

[[ 7.50125000e+02  9.83851516e+01  5.31295306e+01 -3.09168869e+01  # first block DCT
  -3.93750000e+01  9.70213628e+00  7.08231826e+00  9.69077938e-01]
 [-5.84742289e+01  8.84997339e+01  2.45338524e+01 -4.21228140e+01
   4.18434248e+00 -1.83547589e+01  5.21459446e+00  2.83455483e+00]
 [-8.67554626e+01 -1.60613098e+01  3.80034947e+01 -2.70144272e+01
   9.45135927e+00 -7.50280596e+00  1.17049513e+00  2.22158842e+00]
 [ 1.77642051e+00  6.93128152e+00 -9.49428794e+00  1.40506222e+01
   3.50610401e+00 -4.05816749e+00 -6.38058027e+00  4.80287410e+00]
 [ 9.37500000e+00 -7.01329060e+00 -1.32521637e+01  2.17489030e+00
   8.75000000e-01  3.90177620e-01 -3.57580876e+00 -4.93830459e-01]
 [-2.23169679e+00 -1.20516114e+01 -9.84580671e+00  4.88637616e+00
   5.87339227e+00 -2.76471842e+00 -1.84127564e+00 -1.61864234e-01]
 [ 7.11659690e+00  2.26117439e+00  4.20495129e-01  4.27121954e+00
   2.79388584e-01  2.07503310e-01 -3.49474439e-03  5.67234457e-02]
 [-8.31213252e-01 -2.22090152e+00 -2.40841949e+00 -5.20953956e-02
   1.66261116e+00  1.25923390e-01  1.97714980e-01  2.14362384e-01]]

After dividing through by the quantization table, we have:

[[ 5.77019231e+01  1.09316835e+01  6.64119132e+00 -2.37822207e+00  # DCT after quantization
  -2.07236842e+00  3.03191759e-01  1.72739470e-01  1.97771008e-02]
 [-5.84742289e+00  8.84997339e+00  2.23035022e+00 -2.80818760e+00
   1.99254404e-01 -3.99016497e-01  1.08637385e-01  6.44217008e-02]
 [-7.88686024e+00 -1.60613098e+00  2.92334575e+00 -1.42181196e+00
   2.42342545e-01 -1.63104477e-01  2.12817296e-02  4.93686315e-02]
 [ 1.61492773e-01  4.95091537e-01 -5.27460441e-01  6.10896615e-01
   8.55147321e-02 -5.79738212e-02 -9.96965668e-02  9.60574821e-02]
 [ 6.69642857e-01 -3.89627256e-01 -4.41738789e-01  4.83308956e-02
   1.62037037e-02  4.48480023e-03 -4.36074239e-02 -7.96500741e-03]
 [-1.17457726e-01 -4.30414694e-01 -2.23768334e-01  9.58112973e-02
   9.03598811e-02 -3.33098605e-02 -2.04586183e-02 -2.18735452e-03]
 [ 1.82476844e-01  4.43367528e-02  6.78217950e-03  6.10174219e-02
   3.40717786e-03  2.13920938e-03 -3.64035874e-05  7.00289454e-04]
 [-1.43312630e-02 -3.00121827e-02 -3.16897301e-02 -6.67889687e-04
   1.84734573e-02  1.57404237e-03  2.41115829e-03  2.71344790e-03]]

After rounding and converting to integers (using `util.round_to_ints), we have:

[[58 11  7 -2 -2  0  0  0]  # DCT, quantized and rounded
 [-6  9  2 -3  0  0  0  0]
 [-8 -2  3 -1  0  0  0  0]
 [ 0  0 -1  1  0  0  0  0]
 [ 1  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0]]

We then "flatten" the block according to the zigzag pattern, resulting in:

[58, 11, -6, -8, 9, 7, -2, 2, -2, 0, 1, 0, 3, -3, -2, 0, 0, -1, -1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Finally, we separate out the DC component, and we apply run-length encoding to the remaining 63 components (in order). Ultimately, we represent this block as a tuple whose first element is the DC value, and whose second value is the run-length-encoding of the remaining coefficients, shown below:

(58, [(0, 11), (0, -6), (0, -8), (0, 9), (0, 7), (0, -2), (0, 2), (0, -2), (1, 1), (1, 3), (0, -3), (0, -2), (2, -1), (0, -1), (5, 1)])

Note that this last format (a tuple containing the DC component and the run-length-encoded AC components) is how we will represent each block internally.

For each block in the input image (in order described above), you should build this representation. The final output of your encode function should be a list containing one of these tuples for each block in the input image (in order).

Check Yourself 5:

Implement this process in your encode function, and make sure your function's behavior matches to example above.

The write_encoded function, which can be used to save an image to disk, makes use of this representation1 when figuring out what bits to store on disk.

Once everything is working, you should be able to call your write_encoded function to produce a JPEG file, which you can open in an image editor of your choice to check the results!

1.7) Testing the Compression

Now we'll try to get a sense of how much space we're saving, compared against our original image. Note that we have provided the util.file_size function, which takes a filename as argument, to help you with this (it returns an integer representing the number of bytes in the given file).

Below, we'll run some tests using the stronger.bmp2 file contained in the distribution.

How many bytes were in the original image stronger.bmp?

Now try using write_encoded to save multiple compressed versions of this image to disk. For each of the quality values below, how many bytes are in the resulting file?

When quality = 99, how big is the compressed image as a fraction of the original image's size?
When quality = 90, how big is the compressed image as a fraction of the original image's size?
When quality = 80, how big is the compressed image as a fraction of the original image's size?
When quality = 70, how big is the compressed image as a fraction of the original image's size?
When quality = 50, how big is the compressed image as a fraction of the original image's size?
When quality = 20, how big is the compressed image as a fraction of the original image's size?
When quality = 10, how big is the compressed image as a fraction of the original image's size?
When quality = 1, how big is the compressed image as a fraction of the original image's size?

2) Decoding

While it's useful to look at those filesizes (some of the compression ratios are impressive!), it doesn't really tell the whole story. In order to figure out how well we're doing, we'll also have to look at the quality of the output.

Check Yourself 6:

Take a look at the outputs for the quality values mentioned above (or other numbers you want to try). How does the image quality relate to the filesize?

Check Yourself 7:

At what quality value are you happy with the output without zooming in on the image? What compression ratio is associated with that quality?

Zooming in, you can see pretty noticeable artifacts even at fairly high quality numbers. At what quality / compression ratio are you happy with the output even when zoomed in?

Use the write_encoded function to compress bluegill.bmp to, for example, bluegill.jpeg, using a quality setting of 40.

What is the compression ratio? Enter your answer as a decimal number representing the file size of our compressed version, divided by the file size of the original bluegill.bmp, accurate to 3 decimal places:

Compare the result against the original image, and upload the JPEG file below:
 No file selected

Check Yourself 8:

Open the original and your compressed version in an image viewing/editing program. How do they compare? Not too bad for a file with around 5% of the size of the original!


 
Footnotes

1You do not need to understand exactly what is going on inside of that function, but if you're interested, just ask! (click to return to text)

2The file is called stronger.bmp because it is a photo of Adam's cat, whose name is Stronger. (click to return to text)