Transfer Learning – Exceptional accuracy with small data sets

Wouldn’t it be great if you could have a marketplace where you can shop for all the brilliant minds of the past and download all their learning in an instant?

In some ways that is precisely what Transfer Learning is. Many believe that Transfer learning is the next revolution of Machine Learning. Since the success of Deep Learning implementations, researchers have sought ways to reduce the training time and reduce the required dataset to train the state of the art models.

Transfer Learning is essentially the practice of using a pre-trained neural network (model) and repurposing its target value without losing the model’s inherent ability to classify the new target value accurately.

Lets Simplify:

Transfer Learning allows you to use a state of the art model that would typically take weeks if not months to train on your own and with a few lines of code, use it for your domain specific problem

Here’s the best part: It works well even if you have a small data set and it can significantly reduce the architecture engineering phase.

How does it work?

Transfer Learning is achieved by popping or removing the output layer of a Neural Network and repurposing it for a specific classification domain. In practice (depending on the architecture of the model), it might be necessary to remove another layer (directly before the output layer).

The reason for this is because the last layers of a Neural Network are more definitive on the original labeled classification and have lost most of its ability to generalize to other classifiers.

As a final step, the NN is trained on the new dataset to repurpose to the new classification. However, this training is much faster than retraining the entire model as it only trains the final two layers.

A practical example of Transfer Learning

Let’s consider the neural network below that represents a binary classification for whether an image is a cat (a1) or a dog (a2).

Pre-Trained Neural Network

Let’s choose a pre-trained model: For argument’s sake, we decide on the InceptionV3 image classification model (Created by Google and trained on the 15 million images of the ImageNet DataSet).

But our problem is not to detect whether an image is a cat or a dog. We have a multi-class classification problem. We want to detect whether an image is a Lion, a Tiger or a Bear.

In that case, we want to pop the last two layers off the neural network. We retrain the neural network with a small sample of images, but it’s quick because we only train the final two layers. The resulting neural network looks like this.

Transfer Learning Neural Network

The result is a state of the art accuracy with a small data set and very little training time. InceptionV3 and Resnet models typically take weeks to train on the ImageNet Dataset.

Free Transfer Learning Code Workbook

Tutorial and running example of Transfer Learning using Python + Tensorflow 2.0

Download Now

When should I use it?

Transfer Learning works well when there are models trained on a large dataset that is related to your specific problem domain. Image and Language problems have proven to be very useful with new problem domains appearing every day.

Use transfer learning when you have a small data set or if you have limited computational resources. I always encourage experimenting with small and large datasets as I have achieved exceptional results in both scenarios.

MachineisLearning Expert Tip

When using Deep Learning the feature engineering stage is significantly reduced; however, engineering your model architecture can prove to be just as cumbersome. A quick way to get an idea for what architecture works well is to use transfer learning for various model architecture to see which one performs better than others.


Transfer learning is a great technique to get state of the art results with relatively little work and even speed up the model architecture phase by evaluating pre-trained model performance.

In a similar way, transfer learning is a quick way to get exceptional results by leveraging pre-trained models (with some caveats), without having to entirely retrain these models.