Huseyn Gasimov
Huseyn Gasimov
Creator of this blog.
Jul 16, 2018 11 min read

Image Classification With Tensorflow for Beginners

thumbnail for this post

Image classification

Image Classification problem is the task of assigning an input image one label from a fixed set of categories. This is one of the core problems in Computer Vision that has a large variety of practical applications. Moreover many other seemingly distinct Computer Vision tasks (such as object detection, segmentation) can be reduced to image classification.

Recent advances in deep learning improved the quality of image processing tasks considerably. One of those advances was Convolutional Neural Networks (CNNs). CNNs use relatively little pre-processing compared to other image classification algorithms. This means that the network learns the filters that in traditional algorithms were hand-engineered. This independence from prior knowledge and human effort in feature design was a huge boost in the field. It suddenly made a lot of image classification tasks solvable with sufficient accuracy.

Convolutional Neural Network

A CNN consists of an input and an output layer, as well as multiple hidden layers. The hidden layers of a CNN typically consist of convolutional layers, pooling layers and fully connected layers.

Convolutional layer

Convolutional layers apply a convolution operation to the input, passing the result to the next layer.

Image as matrix

Letโ€™s see how convolution operation works. I will show it on a 2D matrix, the operation can be easily expanded to a 3D matrix.

During convolution we apply a filter (or kernel) to an input matrix the following way:

  1. put filter on top of the input matrix
  2. multiply the cells which are on top of each other
  3. sum the results of multiplications

Letโ€™s look at an example:

Example convolution

In this example, the result 4 was obtained like this:

1*1 + 0*0 + 0*1 + 1*0 + 1*1 + 0*0 + 1*1 + 1*0 + 1*1 = 4

Then we slide the kernel matrix to right and bottom to convolve the whole input matrix with given kernel.

Convolution operation with hand designed kernels was used in image processing extensively before CNNs became so popular. For example, one can use convolution operation to detect edges in an image:

Edge detection with help of convolution

You can find more hand designed kernels here. Purpose of a convolution layer is finding appropriate kernels automatically without human effort.

Pooling layer

Pooling combines one area of matrix into a single number. For example, max pooling calculates the maximum value of each sub-matrix.

Max pooling

Another example is average pooling, which uses the average value from each sub-matrix.

Fully connected

Fully connected layers connect every neuron in one layer to every neuron in the next layer. It is in principle the same as the traditional multi-layer perceptron neural network.

Fully connected layer

ReLU layer

ReLU is the abbreviation of Rectified Linear Units. This layer applies the non-saturating activation function ๐‘“(๐‘ฅ)=๐‘š๐‘Ž๐‘ฅ(0,๐‘ฅ). It increases the nonlinear properties of the decision function and of the overall network without negatively affecting the performance of the convolution layer.

ReLU function

Other functions are also used to increase nonlinearity, for example the saturating hyperbolic tangent ๐‘“(๐‘ฅ)=๐‘ก๐‘Ž๐‘›โ„Ž(๐‘ฅ) , ๐‘“(๐‘ฅ)=|๐‘ก๐‘Ž๐‘›โ„Ž(๐‘ฅ)| and the sigmoid function ๐‘“(๐‘ฅ)=(1+eโˆ’๐‘ฅ)โˆ’1 . ReLU is often preferred to other functions, because it trains the neural network several times faster without a significant penalty to generalisation accuracy.

If you would like to learn more about CNNs, I highly recommend taking the Convolutional Neural Networks course, which is taught by Andrew Ng.

Image classification with Tensorflow

TensorFlow a is well-know open source library for high performance numerical computation. Its flexible architecture allows easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers and mobile devices. It was developed by researchers and engineers from the Google Brain team within Googleโ€™s AI organization.

Now I am going to create and run a Convolutional Neural Network using Tensorflow.

Load data and libraries

I first load the libraries, which Iโ€™ll use to accomplish this task:

import math
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.python.framework import ops
from keras.datasets import cifar10

%matplotlib inline

Letโ€™s now load CIFAR10 dataset of 50,000 32ร—32 color training images, labeled over 10 categories, and 10,000 test images.

CIFAR10 dataset

(x_train, y_train), (x_test, y_test) = cifar10.load_data()

Forth image in the training data looks like this:

print('y =', y_train[4])

Image from CIFAR10 dataset

Letโ€™s check the shape of the training and test set variables.

print('x_train.shape =', x_train.shape)
print('x_test.shape =', x_test.shape)
print('y_train.shape =', y_train.shape)
print('y_test.shape =', y_test.shape)
x_train.shape = (50000, 32, 32, 3)
x_test.shape = (10000, 32, 32, 3)
y_train.shape = (50000, 1)
y_test.shape = (10000, 1)

As we see, we have 50,000 training and 10,000 test images of size 32ร—32. All images are in RGB format, i.e have 3 channels.

The training and test labels are numbers in range [0-10]. I need to convert them to one-hot format before I use them.

n_classes = 10
im_width, im_height, n_chan = x_train.shape[1:]

def to_one_hot_vector(x, n_classes):
  with tf.Session() as sess:
    x1 = tf.one_hot(x, n_classes)
    x2 = tf.contrib.layers.flatten(x1)

y_train_onehot = to_one_hot_vector(y_train, n_classes)
y_test_onehot = to_one_hot_vector(y_test, n_classes)
print('y_train_onehot.shape =', y_train_onehot.shape)
print('y_test_onehot.shape =', y_test_onehot.shape)
y_train_onehot.shape = (50000, 10)
y_test_onehot.shape = (10000, 10)

Create placeholders and initialize parameters

TensorFlow requires that you create placeholders for the input data that will be fed into the model when running the session.

def create_placeholders(n_H0, n_W0, n_C0, n_y):
    Creates the placeholders for the tensorflow session.
    n_H0 -- scalar, height of an input image
    n_W0 -- scalar, width of an input image
    n_C0 -- scalar, number of channels of the input
    n_y -- scalar, number of classes
    X -- placeholder for the data input, of shape [None, n_H0, n_W0, n_C0]
    Y -- placeholder for the input labels, of shape [None, n_y]
    X = tf.placeholder(tf.float32, shape=(None, n_H0, n_W0, n_C0))
    Y = tf.placeholder(tf.float32, shape=(None, n_y))
    return X, Y

I will initialize weights W1 and W2 using tf.contrib.layers.xavier_initializer(seed = 0).

def initialize_parameters():
    Initializes weight parameters to build a neural network with tensorflow. 
    parameters -- a dictionary of tensors containing W1, W2
    # set random seed for reproduction of results
    init = tf.contrib.layers.xavier_initializer(seed = 0)
    W1 = tf.get_variable("W1", [4, 4, 3, 64], initializer = init)
    W2 = tf.get_variable("W2", [2, 2, 64, 64], initializer = init)
    parameters = {"W1": W1,
                  "W2": W2}
    return parameters

Create model

I will implement the forward_propagation function below to build this network:


In detail, I will use the following parameters for all the steps:

  • Conv2D: stride 1, padding is โ€œSAMEโ€
  • ReLU
  • Dropout
  • Max pool: Use an 4 by 4 filter size and an 4 by 4 stride, padding is โ€œSAMEโ€
  • Conv2D: stride 1, padding is โ€œSAMEโ€
  • ReLU
  • Dropout
  • Max pool: Use a 4 by 4 filter size and a 4 by 4 stride, padding is โ€œSAMEโ€
  • Flatten the previous output.
  • FULLYCONNECTED (FC) layer: Apply a fully connected layer without an non-linear activation function.
def forward_propagation(X, parameters, keep_prob):
    Implements the forward propagation for the model:
    X -- input dataset placeholder, of shape (input size, number of examples)
    parameters -- python dictionary containing your parameters "W1", "W2"
                  the shapes are given in initialize_parameters
    Z3 -- the output of the last LINEAR unit
    # Retrieve the parameters from the dictionary "parameters" 
    W1 = parameters['W1']
    W2 = parameters['W2']
    # CONV2D: stride of 1, padding 'SAME'
    Z1 = tf.nn.conv2d(X, W1, strides = [1,1,1,1], padding = 'SAME')
    # RELU
    A1 = tf.nn.relu(Z1)
    # MAXPOOL: window 3x3, sride 2, padding 'SAME'
    P1 = tf.nn.max_pool(A1, ksize = [1,3,3,1], strides = [1,2,2,1], padding = 'SAME')
    P1 = tf.layers.dropout(P1, rate=keep_prob)
    # CONV2D: filters W2, stride 1, padding 'SAME'
    Z2 = tf.nn.conv2d(P1, W2, strides = [1,1,1,1], padding = 'SAME')
    # RELU
    A2 = tf.nn.relu(Z2)
    # MAXPOOL: window 3x3, stride 2, padding 'SAME'
    P2 = tf.nn.max_pool(A2, ksize = [1,3,3,1], strides = [1,2,2,1], padding = 'SAME')
    P2 = tf.layers.dropout(P2, rate=keep_prob)
    P2 = tf.contrib.layers.flatten(P2)
    # FULLY-CONNECTED without non-linear activation function
    # 6 neurons in output layer.  
    Z3 = tf.contrib.layers.fully_connected(P2, num_outputs = n_classes, activation_fn = None)
    return Z3

Compute cost

I need a function which computes cost given the output of forward propagation and โ€œtrueโ€ labels. The computed cost will be used to optimize the weights in the network.

def compute_cost(Z3, Y):
    Computes the cost
    Z3 -- output of forward propagation (output of the last LINEAR unit), of shape (6, number of examples)
    Y -- "true" labels vector placeholder, same shape as Z3
    cost - Tensor of the cost function
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits = Z3, labels = Y))
    return cost

Online learning

I will use online learning which will train the network on the randomly chosen batches from the training set. Thatโ€™s why I implement a function which creates a batch by randomly choosing samples from the training set.

def random_mini_batches(X, Y, mini_batch_size = 64, seed = 0):
    Creates a list of random minibatches from (X, Y)
    X -- input data, of shape (input size, number of examples)
    Y -- true "label" vector (1 for blue dot / 0 for red dot),
         of shape (1, number of examples)
    mini_batch_size -- size of the mini-batches, integer
    mini_batches -- list of synchronous (mini_batch_X, mini_batch_Y)
    # To make your "random" minibatches the same as ours
    # number of training examples
    m = X.shape[0]
    mini_batches = []
    # Step 1: Shuffle (X, Y)
    permutation = list(np.random.permutation(m))
    shuffled_X = X[permutation, :, :, :]
    shuffled_Y = Y[permutation, :]
    # Step 2: Partition (shuffled_X, shuffled_Y). Minus the end case.
    # number of mini batches of size mini_batch_size in your partitioning
    num_complete_minibatches = math.floor(m/mini_batch_size) 
    for k in range(0, num_complete_minibatches):
        mini_batch_X = shuffled_X[k*mini_batch_size : (k+1)*mini_batch_size, :, :, :]
        mini_batch_Y = shuffled_Y[k*mini_batch_size : (k+1)*mini_batch_size, :]
        mini_batch = (mini_batch_X, mini_batch_Y)
    # Handling the end case (last mini-batch < mini_batch_size)
    if m % mini_batch_size != 0:
        mini_batch_X = shuffled_X[num_complete_minibatches*mini_batch_size : m, :, :, :]
        mini_batch_Y = shuffled_Y[num_complete_minibatches*mini_batch_size : m, :]
        mini_batch = (mini_batch_X, mini_batch_Y)
    return mini_batches


Finally, I put all my helper functions together to create my model.

def model(X_train, Y_train, X_test, Y_test, learning_rate = 0.001,
          num_epochs = 101, minibatch_size = 128, keep_probability = 0.75, print_cost = True):
    Implements a three-layer ConvNet in Tensorflow:
    X_train -- training set, of shape (None, 64, 64, 3)
    Y_train -- test set, of shape (None, n_y = 6)
    X_test -- training set, of shape (None, 64, 64, 3)
    Y_test -- test set, of shape (None, n_y = 6)
    learning_rate -- learning rate of the optimization
    num_epochs -- number of epochs of the optimization loop
    minibatch_size -- size of a minibatch
    print_cost -- True to print the cost every 10 epochs
    train_accuracy -- real number, accuracy on the train set (X_train)
    test_accuracy -- real number, testing accuracy on the test set (X_test)
    parameters -- parameters learnt by the model. They can then be used to predict.
    # to be able to rerun the model without overwriting tf variables    
    # to keep results consistent (tensorflow seed)
    seed = 3
    (m, n_H0, n_W0, n_C0) = X_train.shape             
    n_y = Y_train.shape[1]
    # To keep track of the cost
    costs = []
    # Create Placeholders of the correct shape
    X, Y = create_placeholders(n_H0, n_W0, n_C0, n_y)
    # Initialize parameters
    parameters = initialize_parameters()
    # Build the forward propagation in the tensorflow graph
    Z3 = forward_propagation(X, parameters, keep_prob = keep_probability)
    # Cost function: Add cost function to tensorflow graph
    cost = compute_cost(Z3, Y)
    # Backpropagation: Define the tensorflow optimizer. 
    # Use an AdamOptimizer that minimizes the cost.
    optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)
    # Calculate number of of correct predictions (for accuracy calculation)
    predict_op = tf.argmax(Z3, 1)
    correct_prediction = tf.equal(predict_op, tf.argmax(Y, 1))
    total_correct_prediction = tf.reduce_sum(tf.cast(correct_prediction, "float"))
    # Initialize all the variables globally
    init = tf.global_variables_initializer()
    # Start the session to compute the tensorflow graph
    with tf.Session() as sess:
        # Run the initialization
        # Do the training loop
        for epoch in range(num_epochs):
            minibatch_cost = 0.
            # number of minibatches of size minibatch_size in the train set
            num_minibatches = int(m / minibatch_size)
            seed = seed + 1
            minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)
            for minibatch in minibatches:
                # Select a minibatch
                (minibatch_X, minibatch_Y) = minibatch
                # Run the session to execute the optimizer and the cost, 
                #   the feedict should contain a minibatch for (X,Y).
                feed_dict ={X: minibatch_X, Y: minibatch_Y}
                _ , temp_cost =[optimizer, cost], feed_dict)
                minibatch_cost += temp_cost / num_minibatches
            # Print the cost, train and test accuracy every 10th epoch
            if print_cost == True and epoch % 10 == 0:
                train_accuracy = calc_accuracy(X, Y, total_correct_prediction, X_train, Y_train)
                test_accuracy = calc_accuracy(X, Y, total_correct_prediction, X_test, Y_test)
                print ("Cost after epoch %i: %f, train accuracy: %f, validation accuracy: %f" % (epoch, minibatch_cost, train_accuracy, test_accuracy))
            if print_cost == True and epoch % 1 == 0:
        # plot the cost
        plt.xlabel('iterations (per tens)')
        plt.title("Learning rate =" + str(learning_rate))        
        print("Train Accuracy:", calc_accuracy(X, Y, total_correct_prediction, X_train, Y_train))
        print("Test Accuracy:", calc_accuracy(X, Y, total_correct_prediction, X_test, Y_test))
        return parameters

Now I run my model on the CIFAR10 dataset.

parameters = model(x_train, y_train_onehot, x_test, y_test_onehot, num_epochs=51, keep_probability = 0.3)
Cost after epoch 0: 3.290655, train accuracy: 0.481740, validation accuracy: 0.469100
Cost after epoch 10: 1.055999, train accuracy: 0.644500, validation accuracy: 0.594900
Cost after epoch 20: 0.861374, train accuracy: 0.702080, validation accuracy: 0.611600
Cost after epoch 30: 0.749422, train accuracy: 0.772540, validation accuracy: 0.624900
Cost after epoch 40: 0.669292, train accuracy: 0.772300, validation accuracy: 0.612900
Cost after epoch 50: 0.605028, train accuracy: 0.803860, validation accuracy: 0.620600

Costs over time Train Accuracy: 0.80386 Test Accuracy: 0.6206


I got 62% test set accuracy. Note that classifying CIFAR10 with a random guess would give us ~10% accuracy. On the other hand, one can surely improve this result. Best classifiers for CIFAR10 data set attain >90% test set accuracy.


That was a lot of work. I would like to thank Andrew Ng for teaching this course series in Deep Learning. Iโ€™ve learned most of the things in this article from him.