# Image Classification With Tensorflow for Beginners

# Image classification

Image Classification problem is the task of assigning an input image one label from a fixed set of categories. This is one of the core problems in Computer Vision that has a large variety of practical applications. Moreover many other seemingly distinct Computer Vision tasks (such as object detection, segmentation) can be reduced to image classification.

Recent advances in deep learning improved the quality of image processing tasks considerably. One of those advances was Convolutional Neural Networks (CNNs). CNNs use relatively little pre-processing compared to other image classification algorithms. This means that the **network learns the filters that in traditional algorithms were hand-engineered**. This independence from prior knowledge and human effort in feature design was a huge boost in the field. It suddenly made a lot of image classification tasks solvable with sufficient accuracy.

## Convolutional Neural Network

A CNN consists of an input and an output layer, as well as multiple hidden layers. The hidden layers of a CNN typically consist of convolutional layers, pooling layers and fully connected layers.

### Convolutional layer

Convolutional layers apply a convolution operation to the input, passing the result to the next layer.

Letโs see how convolution operation works. I will show it on a 2D matrix, the operation can be easily expanded to a 3D matrix.

During convolution we apply a **filter** (or **kernel**) to an input matrix the following way:

- put filter on top of the input matrix
- multiply the cells which are on top of each other
- sum the results of multiplications

Letโs look at an example:

In this example, the result 4 was obtained like this:

`1*1 + 0*0 + 0*1 + 1*0 + 1*1 + 0*0 + 1*1 + 1*0 + 1*1 = 4`

Then we slide the kernel matrix to right and bottom to convolve the whole input matrix with given kernel.

Convolution operation with hand designed kernels was used in image processing extensively before CNNs became so popular. For example, one can use convolution operation to detect edges in an image:

You can find more hand designed kernels here. Purpose of a convolution layer is finding appropriate kernels automatically without human effort.

### Pooling layer

Pooling combines one area of matrix into a single number. For example, **max pooling** calculates the maximum value of each sub-matrix.

Another example is **average pooling**, which uses the average value from each sub-matrix.

### Fully connected

Fully connected layers connect every neuron in one layer to every neuron in the next layer. It is in principle the same as the traditional multi-layer perceptron neural network.

### ReLU layer

**ReLU** is the abbreviation of **Rectified Linear Units**. This layer applies the non-saturating activation function ๐(๐ฅ)=๐๐๐ฅ(0,๐ฅ). It increases the nonlinear properties of the decision function and of the overall network without negatively affecting the performance of the convolution layer.

Other functions are also used to increase nonlinearity, for example the saturating hyperbolic tangent ๐(๐ฅ)=๐ก๐๐โ(๐ฅ) , ๐(๐ฅ)=|๐ก๐๐โ(๐ฅ)| and the sigmoid function ๐(๐ฅ)=(1+eโ๐ฅ)โ1 . ReLU is often preferred to other functions, because it trains the neural network several times faster without a significant penalty to generalisation accuracy.

If you would like to learn more about CNNs, I highly recommend taking the Convolutional Neural Networks course, which is taught by Andrew Ng.

## Image classification with Tensorflow

TensorFlow a is well-know open source library for high performance numerical computation. Its flexible architecture allows easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers and mobile devices. It was developed by researchers and engineers from the Google Brain team within Googleโs AI organization.

Now I am going to create and run a Convolutional Neural Network using Tensorflow.

### Load data and libraries

I first load the libraries, which Iโll use to accomplish this task:

```
import math
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.python.framework import ops
from keras.datasets import cifar10
%matplotlib inline
np.random.seed(1)
```

Letโs now load CIFAR10 dataset of 50,000 32ร32 color training images, labeled over 10 categories, and 10,000 test images.

`(x_train, y_train), (x_test, y_test) = cifar10.load_data()`

Forth image in the training data looks like this:

```
plt.imshow(x_train[4,:,:,:])
print('y =', y_train[4])
```

Letโs check the shape of the training and test set variables.

```
print('x_train.shape =', x_train.shape)
print('x_test.shape =', x_test.shape)
print('y_train.shape =', y_train.shape)
print('y_test.shape =', y_test.shape)
```

x_test.shape = (10000, 32, 32, 3)

y_train.shape = (50000, 1)

y_test.shape = (10000, 1)

As we see, we have 50,000 training and 10,000 test images of size 32ร32. All images are in RGB format, i.e have 3 channels.

The training and test labels are numbers in range [0-10]. I need to convert them to **one-hot format** before I use them.

```
n_classes = 10
im_width, im_height, n_chan = x_train.shape[1:]
def to_one_hot_vector(x, n_classes):
tf.reset_default_graph()
with tf.Session() as sess:
x1 = tf.one_hot(x, n_classes)
x2 = tf.contrib.layers.flatten(x1)
return sess.run(x2)
y_train_onehot = to_one_hot_vector(y_train, n_classes)
y_test_onehot = to_one_hot_vector(y_test, n_classes)
print('y_train_onehot.shape =', y_train_onehot.shape)
print('y_test_onehot.shape =', y_test_onehot.shape)
```

y_test_onehot.shape = (10000, 10)

### Create placeholders and initialize parameters

TensorFlow requires that you create placeholders for the input data that will be fed into the model when running the session.

```
def create_placeholders(n_H0, n_W0, n_C0, n_y):
"""
Creates the placeholders for the tensorflow session.
Arguments:
n_H0 -- scalar, height of an input image
n_W0 -- scalar, width of an input image
n_C0 -- scalar, number of channels of the input
n_y -- scalar, number of classes
Returns:
X -- placeholder for the data input, of shape [None, n_H0, n_W0, n_C0]
Y -- placeholder for the input labels, of shape [None, n_y]
"""
X = tf.placeholder(tf.float32, shape=(None, n_H0, n_W0, n_C0))
Y = tf.placeholder(tf.float32, shape=(None, n_y))
return X, Y
```

I will initialize weights W1 and W2 using *tf.contrib.layers.xavier_initializer(seed = 0)*.

```
def initialize_parameters():
"""
Initializes weight parameters to build a neural network with tensorflow.
Returns:
parameters -- a dictionary of tensors containing W1, W2
"""
# set random seed for reproduction of results
tf.set_random_seed(1)
init = tf.contrib.layers.xavier_initializer(seed = 0)
W1 = tf.get_variable("W1", [4, 4, 3, 64], initializer = init)
W2 = tf.get_variable("W2", [2, 2, 64, 64], initializer = init)
parameters = {"W1": W1,
"W2": W2}
return parameters
```

### Create model

I will implement the *forward_propagation* function below to build this network:

In detail, I will use the following parameters for all the steps:

- Conv2D: stride 1, padding is โSAMEโ
- ReLU
- Dropout
- Max pool: Use an 4 by 4 filter size and an 4 by 4 stride, padding is โSAMEโ
- Conv2D: stride 1, padding is โSAMEโ
- ReLU
- Dropout
- Max pool: Use a 4 by 4 filter size and a 4 by 4 stride, padding is โSAMEโ
- Flatten the previous output.
- FULLYCONNECTED (FC) layer: Apply a fully connected layer without an non-linear activation function.

```
def forward_propagation(X, parameters, keep_prob):
"""
Implements the forward propagation for the model:
CONV2D -> RELU -> MAXPOOL -> DROPOUT -> CONV2D -> RELU -> MAXPOOL -> DROPOUT -> FLATTEN -> FULLYCONNECTED
Arguments:
X -- input dataset placeholder, of shape (input size, number of examples)
parameters -- python dictionary containing your parameters "W1", "W2"
the shapes are given in initialize_parameters
Returns:
Z3 -- the output of the last LINEAR unit
"""
# Retrieve the parameters from the dictionary "parameters"
W1 = parameters['W1']
W2 = parameters['W2']
# CONV2D: stride of 1, padding 'SAME'
Z1 = tf.nn.conv2d(X, W1, strides = [1,1,1,1], padding = 'SAME')
# RELU
A1 = tf.nn.relu(Z1)
# MAXPOOL: window 3x3, sride 2, padding 'SAME'
P1 = tf.nn.max_pool(A1, ksize = [1,3,3,1], strides = [1,2,2,1], padding = 'SAME')
# DROPOUT
P1 = tf.layers.dropout(P1, rate=keep_prob)
# CONV2D: filters W2, stride 1, padding 'SAME'
Z2 = tf.nn.conv2d(P1, W2, strides = [1,1,1,1], padding = 'SAME')
# RELU
A2 = tf.nn.relu(Z2)
# MAXPOOL: window 3x3, stride 2, padding 'SAME'
P2 = tf.nn.max_pool(A2, ksize = [1,3,3,1], strides = [1,2,2,1], padding = 'SAME')
# DROPOUT
P2 = tf.layers.dropout(P2, rate=keep_prob)
# FLATTEN
P2 = tf.contrib.layers.flatten(P2)
# FULLY-CONNECTED without non-linear activation function
# 6 neurons in output layer.
Z3 = tf.contrib.layers.fully_connected(P2, num_outputs = n_classes, activation_fn = None)
return Z3
```

### Compute cost

I need a function which computes cost given the output of forward propagation and โtrueโ labels. The computed cost will be used to optimize the weights in the network.

```
def compute_cost(Z3, Y):
"""
Computes the cost
Arguments:
Z3 -- output of forward propagation (output of the last LINEAR unit), of shape (6, number of examples)
Y -- "true" labels vector placeholder, same shape as Z3
Returns:
cost - Tensor of the cost function
"""
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits = Z3, labels = Y))
return cost
```

### Online learning

I will use online learning which will train the network on the randomly chosen batches from the training set. Thatโs why I implement a function which creates a batch by randomly choosing samples from the training set.

```
def random_mini_batches(X, Y, mini_batch_size = 64, seed = 0):
"""
Creates a list of random minibatches from (X, Y)
Arguments:
X -- input data, of shape (input size, number of examples)
Y -- true "label" vector (1 for blue dot / 0 for red dot),
of shape (1, number of examples)
mini_batch_size -- size of the mini-batches, integer
Returns:
mini_batches -- list of synchronous (mini_batch_X, mini_batch_Y)
"""
# To make your "random" minibatches the same as ours
np.random.seed(seed)
# number of training examples
m = X.shape[0]
mini_batches = []
# Step 1: Shuffle (X, Y)
permutation = list(np.random.permutation(m))
shuffled_X = X[permutation, :, :, :]
shuffled_Y = Y[permutation, :]
# Step 2: Partition (shuffled_X, shuffled_Y). Minus the end case.
# number of mini batches of size mini_batch_size in your partitioning
num_complete_minibatches = math.floor(m/mini_batch_size)
for k in range(0, num_complete_minibatches):
mini_batch_X = shuffled_X[k*mini_batch_size : (k+1)*mini_batch_size, :, :, :]
mini_batch_Y = shuffled_Y[k*mini_batch_size : (k+1)*mini_batch_size, :]
mini_batch = (mini_batch_X, mini_batch_Y)
mini_batches.append(mini_batch)
# Handling the end case (last mini-batch < mini_batch_size)
if m % mini_batch_size != 0:
mini_batch_X = shuffled_X[num_complete_minibatches*mini_batch_size : m, :, :, :]
mini_batch_Y = shuffled_Y[num_complete_minibatches*mini_batch_size : m, :]
mini_batch = (mini_batch_X, mini_batch_Y)
mini_batches.append(mini_batch)
return mini_batches
```

### Model

Finally, I put all my helper functions together to create my model.

```
def model(X_train, Y_train, X_test, Y_test, learning_rate = 0.001,
num_epochs = 101, minibatch_size = 128, keep_probability = 0.75, print_cost = True):
"""
Implements a three-layer ConvNet in Tensorflow:
CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED
Arguments:
X_train -- training set, of shape (None, 64, 64, 3)
Y_train -- test set, of shape (None, n_y = 6)
X_test -- training set, of shape (None, 64, 64, 3)
Y_test -- test set, of shape (None, n_y = 6)
learning_rate -- learning rate of the optimization
num_epochs -- number of epochs of the optimization loop
minibatch_size -- size of a minibatch
print_cost -- True to print the cost every 10 epochs
Returns:
train_accuracy -- real number, accuracy on the train set (X_train)
test_accuracy -- real number, testing accuracy on the test set (X_test)
parameters -- parameters learnt by the model. They can then be used to predict.
"""
# to be able to rerun the model without overwriting tf variables
ops.reset_default_graph()
# to keep results consistent (tensorflow seed)
tf.set_random_seed(1)
seed = 3
(m, n_H0, n_W0, n_C0) = X_train.shape
n_y = Y_train.shape[1]
# To keep track of the cost
costs = []
# Create Placeholders of the correct shape
X, Y = create_placeholders(n_H0, n_W0, n_C0, n_y)
# Initialize parameters
parameters = initialize_parameters()
# Build the forward propagation in the tensorflow graph
Z3 = forward_propagation(X, parameters, keep_prob = keep_probability)
# Cost function: Add cost function to tensorflow graph
cost = compute_cost(Z3, Y)
# Backpropagation: Define the tensorflow optimizer.
# Use an AdamOptimizer that minimizes the cost.
optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)
# Calculate number of of correct predictions (for accuracy calculation)
predict_op = tf.argmax(Z3, 1)
correct_prediction = tf.equal(predict_op, tf.argmax(Y, 1))
total_correct_prediction = tf.reduce_sum(tf.cast(correct_prediction, "float"))
# Initialize all the variables globally
init = tf.global_variables_initializer()
# Start the session to compute the tensorflow graph
with tf.Session() as sess:
# Run the initialization
sess.run(init)
# Do the training loop
for epoch in range(num_epochs):
minibatch_cost = 0.
# number of minibatches of size minibatch_size in the train set
num_minibatches = int(m / minibatch_size)
seed = seed + 1
minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)
for minibatch in minibatches:
# Select a minibatch
(minibatch_X, minibatch_Y) = minibatch
# Run the session to execute the optimizer and the cost,
# the feedict should contain a minibatch for (X,Y).
feed_dict ={X: minibatch_X, Y: minibatch_Y}
_ , temp_cost = sess.run([optimizer, cost], feed_dict)
minibatch_cost += temp_cost / num_minibatches
# Print the cost, train and test accuracy every 10th epoch
if print_cost == True and epoch % 10 == 0:
train_accuracy = calc_accuracy(X, Y, total_correct_prediction, X_train, Y_train)
test_accuracy = calc_accuracy(X, Y, total_correct_prediction, X_test, Y_test)
print ("Cost after epoch %i: %f, train accuracy: %f, validation accuracy: %f" % (epoch, minibatch_cost, train_accuracy, test_accuracy))
if print_cost == True and epoch % 1 == 0:
costs.append(minibatch_cost)
# plot the cost
plt.plot(np.squeeze(costs))
plt.ylabel('cost')
plt.xlabel('iterations (per tens)')
plt.title("Learning rate =" + str(learning_rate))
plt.show()
print("Train Accuracy:", calc_accuracy(X, Y, total_correct_prediction, X_train, Y_train))
print("Test Accuracy:", calc_accuracy(X, Y, total_correct_prediction, X_test, Y_test))
return parameters
```

Now I run my model on the CIFAR10 dataset.

`parameters = model(x_train, y_train_onehot, x_test, y_test_onehot, num_epochs=51, keep_probability = 0.3)`

Cost after epoch 10: 1.055999, train accuracy: 0.644500, validation accuracy: 0.594900

Cost after epoch 20: 0.861374, train accuracy: 0.702080, validation accuracy: 0.611600

Cost after epoch 30: 0.749422, train accuracy: 0.772540, validation accuracy: 0.624900

Cost after epoch 40: 0.669292, train accuracy: 0.772300, validation accuracy: 0.612900

Cost after epoch 50: 0.605028, train accuracy: 0.803860, validation accuracy: 0.620600

Train Accuracy: 0.80386 Test Accuracy: 0.6206

## Summary

I got 62% test set accuracy. Note that classifying CIFAR10 with a random guess would give us ~10% accuracy. On the other hand, one can surely improve this result. Best classifiers for CIFAR10 data set attain >90% test set accuracy.

## Acknowledgements

That was a lot of work. I would like to thank Andrew Ng for teaching this course series in Deep Learning. Iโve learned most of the things in this article from him.