A Convolutional Neural Network Tutorial in Keras and TensorFlow 2

For Computer Vision and Object Detection problems, Convolutional Neural Networks provide exceptional classification accuracy

In some cases, CNN’s have proven to be more accurate than human image classification while requiring less pre-processing than classical machine learning approaches

CNN’s have proven very useful in other domains such as recommendation systems and natural language processing

What is a Convolutional Neural Network?

A Convolutional Neural Network often abbreviated to CNN or ConvNet is a type of artificial neural network used to solve supervised machine learning problems.

Specifically, supervised machine learning is often divided into two subfields. The first is regression which involves models that have a continuous output. The second is classification in which the model output is discrete and categorically defined. For example, a cat or a dog.

In this article we will unpack what a CNN is, then we will look at what it does, what real-world application it has and finally we look at a practical example of how to implement a world-class CNN using Tensorflow 2, which has Keras as a default API.

What on earth is a convolution?

To understand what a Convolutional Neural Network is, we first need to understand a convolution. A convolution is basically a symmetrical sliding window also called a filter. It doesn’t have to be symmetrical, but it usually is.

This sliding filter moves across the input image to look for activations based on the target features. The image below shows a simplified version of what a convolutional layer does when you provide an image.

The sliding window moves along the x-axis until the end and then drops down on the y-axis to cover the entire image

The window that slides acts as a filter on the image to find any pixels or features that it considers relevant. Relevance is determined by comparing the pixels in the input image with the features in the training data.

A single convolutional layer will do this over and over with different features until it has a stack of filters or outputs. Below are the original input image and output of some of the filters

Pooling to prevent overfitting

Another key component of convolutional neural network architecture is a pooling layer. This layer typically sits between two sequential convolutional layers.

A pooling layer is responsible for dimensionality reduction to ultimately prevent overfitting. By reducing the computations and parameters of the network it allows the network to scale better and at the same time provide regularization.

Regularization allows the network to generalize better which ultimately improves the performance of the network over unseen data.

The most common pooling layer types are Max Pooling and Average Pooling. In the practical CNN example later in the article, we will look at how the Max Pooling layer is used. Max pooling is by far the most common pooling layer as it produces better results.

The max pooling calculation finds the max value of the stride parameter which represents the factor by which to downsample in relation to the W x H x D of the data shape.

Dropout for smaller networks

While pooling helps to avoid overfitting by reducing the dimensionality of the parameters, dropout extends regularization further.

Dropout works really well with fully connected layers (which we will discuss next). Dropout will randomly drop units in the hidden layers in order to add noise to the data.

Its surprisingly effective on smaller neural networks and is not only limited to CNNs. For larger networks Batch Normalization appear to be more popular and personally, I have achieved far better results.

Fully Connected Layer

A central part of a Convolutional Neural Network is that the hidden layers are fully connected. Like most Neural Networks this means that every activated output neuron is fully connected to the input of the next layer.

A CNN fully connected has slight differences from a general FC layer, but bear in mind when designing the architecture of your CNN the fully connected layers expect the same shape as the previous layer.

What can Convolutional Neural Networks Do?

As with many Machine Learning problems, the solution is often to break the overall problem into smaller subproblems. When identifying images or objects a great solution is to look for very similar pixel arrangements or patterns (features).

But image recognition or object classifications are not the only uses for CNNs. They have proven useful in many general classification problems.

By using smaller regions or filters a Convolutional Neural Networks scale far better than regular neural networks and makes it a great starting point for any classification problem

With Tensorflow and Keras its been easier than ever to design a very accurate ConvNet for either binary classification or multi-classification problems.

Building a convolutional neural network using Python, Tensorflow 2, and Keras

Now that we know what Convolutional Neural Networks are, what they can do, its time to start building our own.

For this tutorial, we will use the recently released TensorFlow 2 API, which has Keras integrated more natively into the Tensorflow library.

At the time of writing, the Tensorflow 2.0 library is still only in alpha release. This means you will need to install it by running the following command:

pip install tensorflow==2.0.0-alpha0

For this tutorial, we will be using the famous MNIST dataset. The data is a well-known set of written hand digits. With only a few lines of code, we can achieve an accuracy of 99.25%.

Let’s get started at designing our first Convolutional Neural Network. First, make sure you import the necessary dependencies following the installation of Tensorflow 2.

import tensorflow as tf 

Next we download our dataset into our training and test sets. The training data is what we will use to train out CNN. The test set is used to measure our accuracy.

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

In order to provide our CNN with the correct classification data we convert our class vectors into binary class matrices

y_train = tf.keras.utils.to_categorical(y_train, num_classes)
y_test = tf.keras.utils.to_categorical(y_test, num_classes)

Now we finally get to the fun part. We create our Convolutional Neural Network model using the Keras Api.

# We use the Sequential model in keras which is used 99% of the time
model = tf.keras.Sequential()

# We add our first convolutional layer with 32 neurons and filter size of 3 x 3
model.add(tf.keras.layers.Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))

# We add our max pooling layer
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))

# We flatten the features
model.add(tf.keras.layers.Flatten())

#We add a second convolutional layer
model.add(tf.keras.layers.Conv2D(64, (3, 3), activation='relu'))

# Our dropout layer
model.add(tf.keras.layers.Dropout(0.25))

# A fully connected layer
model.add(tf.keras.layers.Dense(128, activation='relu'))

# Another dropout layer with more dropouts
model.add(tf.keras.layers.Dropout(0.5))

# We add an output layer that uses softmax activation for the 10 classes
model.add(tf.keras.layers.Dense(num_classes, activation='softmax'))

Next we compile our model and add a loss function along with an optimization function. Detailing these two hyper parameters is outside of the scope of this article, but its something you should look into.

model.compile(loss=tf.keras.losses.categorical_crossentropy,
              optimizer=tf.keras.optimizers.Adadelta(),
              metrics=['accuracy'])

Next we train our model based on the training set and test set

model.fit(x_train, y_train,
          batch_size=128,
          epochs=12,
          verbose=1,
          validation_data=(x_test, y_test))

This is the output I see from the training data. As you can see the training accuracy and validation accuracy are close to each other. This means we are not overfitting or underfitting.

Train on 60000 samples, validate on 10000 samples
Epoch 1/12
60000/60000 [==============================] - 5s 88us/sample - loss: 0.2183 - acc: 0.9439 - val_loss: 0.1160 - val_acc: 0.9655
Epoch 2/12
60000/60000 [==============================] - 5s 83us/sample - loss: 0.0565 - acc: 0.9827 - val_loss: 0.0597 - val_acc: 0.9822
Epoch 3/12
60000/60000 [==============================] - 5s 84us/sample - loss: 0.0391 - acc: 0.9880 - val_loss: 0.0419 - val_acc: 0.9865
Epoch 4/12
60000/60000 [==============================] - 5s 81us/sample - loss: 0.0303 - acc: 0.9905 - val_loss: 0.0411 - val_acc: 0.9872
Epoch 5/12
60000/60000 [==============================] - 5s 78us/sample - loss: 0.0241 - acc: 0.9923 - val_loss: 0.0335 - val_acc: 0.9892
Epoch 6/12
60000/60000 [==============================] - 5s 78us/sample - loss: 0.0206 - acc: 0.9936 - val_loss: 0.0448 - val_acc: 0.9870
Epoch 7/12
60000/60000 [==============================] - 5s 78us/sample - loss: 0.0163 - acc: 0.9946 - val_loss: 0.0411 - val_acc: 0.9886
Epoch 8/12
60000/60000 [==============================] - 5s 78us/sample - loss: 0.0134 - acc: 0.9957 - val_loss: 0.0473 - val_acc: 0.9881
Epoch 9/12
60000/60000 [==============================] - 5s 79us/sample - loss: 0.0115 - acc: 0.9961 - val_loss: 0.0396 - val_acc: 0.9891
Epoch 10/12
60000/60000 [==============================] - 5s 79us/sample - loss: 0.0103 - acc: 0.9965 - val_loss: 0.0409 - val_acc: 0.9888
Epoch 11/12
60000/60000 [==============================] - 5s 78us/sample - loss: 0.0082 - acc: 0.9974 - val_loss: 0.0407 - val_acc: 0.9888
Epoch 12/12
60000/60000 [==============================] - 5s 79us/sample - loss: 0.0080 - acc: 0.9973 - val_loss: 0.0405 - val_acc: 0.9902
<tensorflow.python.keras.callbacks.History at 0x7fe474f92358>
Full CNN Tensorflow Code Workbook

Full guide and source code building a CNN using Python + Tensorflow 2.0

Download Now

As a final step we evaluate the accuracy of the model against the test set. We do this to test the model accuracy on completely unseen data

score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Here is the output I see when I run the above command

Test loss: 0.03787359103212468
Test accuracy: 0.9923

Next Steps

Now you have an example of an exceptional image classification solution using Convolutional Neural Network.

See if you can tune the hyper parameters to get it even more accurate.

Try to run the same neural network on a different data set.

Conclusion

In this article we discovered the components that make up a Convolutional Neural Network and detailed the inner workings of the various layers, regularization techniques, and when CNNs are a great choice.

Finally we looked at how to use Tensorflow and Keras to build an exceptional digit recognizer.

Please let me know if this was helpful by leaving me a comment. Also, reach out if you have questions or run into issues.

References

Convolution

A Mathematical Theory of Deep Convolutional
Neural Networks for Feature Extraction

CS231n Convolutional Neural Networks for Visual Recognition

Keras API Documentation

Dropout: A Simple Way to Prevent Neural Networks from
Overfitting