CNN Image Compression - Neural Network Image Compression

Reading this article requires basic convolutional neural network knowledge.

Huffman coding

The best way to encode a binary string is using Huffman’s coding. The entropy of two different binary strings is different, and after encoding the length also varies, that is, the compression ratio is different. During data transmission,data need to be encoded and is converted into binary format with values 0 and 1.If all the character data is converted to binary format of same length irrespective of their priority,length of the entire code will be large. So,huffman encoding follows a rule of assigning short length code for most frequently used words and longer length code for less frequently used words.Thus the length of entire code gets reduced.

Here are two binary strings: 100001001000000000 - 10010110101101010001

The first string is easier to compress, resulting in a shorter compressed length than second string.

Lossy compression as name implies some data is lost during process. Lossy compression of can be acieved in following steps:

The media data is converted into binary string i.e. 0 or 1 through certain methods.
Huffman encoding of the results, to achieve compression.

JPEG

To know more about JPEG Algorithm and how it works. Read this paper Image Compression Algorithm and JPEG Standard

Joint Photographic Experts Group (JPEG) has built up a worldwide standard for universally useful, shading, stillpicture compression. There are numerous compression standards such as JPEG, JPEG-LS and JPEG-2000. JPEG-LS is also called loss less compression so exactly what you see is what you compress in that case there is no irrelevant information added. The block diagram of Baseline JPEG Transformation is shown in

JPEG goes through the following process:

Divide the image into several 8x8 tiles
Convert 8x8 2D images to 64x1 one-dimensional images
Discrete Fourier Transform (DFT) the 64x1 image to get the amplitude and phase at different frequencies
Excluding the smaller items in the DFT results (assuming the human eye is insensitive to these items), the result is that more 0s can be compressed by the Huffman method to be smaller
Huffman encoding of the results, to achieve compression.

This method has many disadvantages:

After DFT, the value of the pixel at the edge changes, causing the image to be discontinuous. The most notable feature of a JPEG-compressed image is its ability to see a large number of 8x8 mosaic tiles in the image.
The entire image is continuously converted into a large bitmap, can not convert piece of image into binary string.

Well, you can try it with Convolutional Neural Network (CNN).

Convolutional Neural Networks (CNNs / ConvNets)

Convolutional Neural Networks are very similar to ordinary Neural Networks, they are made up of neurons that have learnable weights and biases. Each neuron receives some inputs, performs a dot product and optionally follows it with a non-linearity. The whole network still expresses a single differentiable score function: from the raw image pixels on one end to class scores at the other. And they still have a loss function (e.g. SVM/Softmax) on the last (fully-connected) layer and all the training set we developed for learning regular Neural Networks still applies.

To convert the image to a binary string and then convert it back, two CNNs are needed, one responsible for encoding (image -> 0-1 bitmap) and one for decoding (0-1 bitmap -> image).

Training:

image -> Encoder CNN -> features
features += gaussian noise
features -> sigmoid -> code
code -> Decoder CNN -> reconstruction
loss = mean((image-reconstruction) ** 2) + mean(code**2) * 0.01

Training architecture (green arrows indicate error reduction as a target, update parameters):

To reduce reconstruction loss, the best encoding strategy for the encoder is to drive its output ("features") large, to reduce artifacts caused by the gaussian noise. Therefore by increasing the magnitude of the gaussian noise, the code will eventually saturate to 0 or 1. We encourage sparsity of the code (to allow for further compression) by adding a penalty term

 (mean(code**2) * 0.01)

, after which the code will tend to include more zeros and less ones.

loss = tf.reduce_mean((y-x)**2) + tf.reduce_mean(binary_code**2) * 0.01

Which sigmoid function is the role of the encoder output is limited to between 0 and 1. To make the network more output 0 and less output 1 (for compression), I added a penalty on the generated binary:

This structure becomes：

After the training began, the situation is this:

From left to right:

Input image (3 channels in total)
The decoded image is output from sigmoid
The sigmoid output is binarized and then the decoded image

Right from top to bottom:

sigmoid output (generated by the first figure on the left, a total of 128 channels)
sigmoid output, after binarization

A closer look can be seen, sigmoid output is not used 0 and 1, but in different shades of gray that information, and decoder from these shades of gray restore the image. If we binarize the sigmoid output, the information is completely lost. So how can we force neural networks to represent information using 0 and 1 rather than other values between 0 and 1?

The answer is to increase the noise!

If we add Gaussian noise before the sigmoid function, then the encoder and decoder will start to find out that with gray unreliable, only information encoded with 0 and 1 can resist noise.

After adding Gaussian noise, the architecture becomes like this:

The noise standard deviation transferred to 15, the training effect is as follows:

Can see that the network has learned to use binary dither to encode the image content, the binary output is not done on the sigmoid little difference.

Testing:

image -> Encoder CNN -> features
features -> sigmoid -> binary quantization -> code

Credits

Image Compression Algorithm and JPEG Standard by Suman Kunwar http://www.ijsrp.org/research-paper-1217/ijsrp-p7224.pdf
CNN structure based on VGG16, https://github.com/ry/tensorflow-vgg16/blob/master/vgg16.py
NPEG by Qin Yongliang https://github.com/ctmakro/npeg
JPEG image Compression Algorithm by Suman Kunwar https://medium.com/@sumn2u/jpeg-image-compression-algorithm-979de35c808d
Image Compression with Neural Networks https://research.googleblog.com/2016/09/image-compression-with-neural-networks.html