You look like my best friend

I Lucas

7 min readMar 20, 2021

Convolutional Neural Networks used for Dog Identification

Introduction

In order to complete my Udacity Data Scientist Capstone project. I have choose the Dog Breed Classifier Project.

This project uses Convolutional Neural Networks (CNNs) to train a model that is able to identify 133 canine’s breed.

Given an image of a dog, the algorithm will identify an estimate of the canine’s breed.

If supplied an image of a human, the code will identify the resembling dog breed.

I must thank Udacity for the data and the base Jupyter File. here

Strategy to solve the problem

The Latin phrase “Divide et impera” is attributed to Julius Cesar and is the strategy followed in this proyect.

The idea is create a single output that tell us:

If there are a human in the image.
If there are a dog in the image.
The dog breed.

So I have used 3 different models:

A model to identify if there is a human face in the image.
A model to identify if there is a dog in the image.
A model to identify the dog breed.

Metrics

All the models are evaluated using the accuracy.

On the human model the accuracy will be: (times a human is detected in an image where there is a human/ total images showing a human used to test the model).

On the dog model the accuracy will be: (times a dog is detected in an image where there is a dog/ total images showing a dog used to test the model).

On the dog breed model: Times the breed is correctly assigned / Number of images used to test the model.

EDA

I worked with the following data bases:

The dog dataset. In this dataset are 8351 dog images. Each Image has it breed asociated, there are 133 diferent dog categories.
The human dataset. There are 13233 total human images in this dataset.

Let’s see the histogram of the total dataset.

Every category has more than 20 dog images

To develop the different models, the dog dataset is divided into 3 parts.

Training
Validation
Test

Each division of the dataset must preserve all 133 categories and have a sufficient sample to build, validate, and test the model.

Training Dataset

Validation Dataset

Test Dataset

Modelling

Models used and trained

1.Detecting human faces.

I’ve used OpenCV’s implementation of Haar feature-based cascade classifiers to detect human faces in images. OpenCV provides many pre-trained face detectors, stored as XML files on github. The Udacity project has one of these detectors and it is stored in the haarcascades directory.

Once we provide a photo to the model we will have an output like this:

The model has been tested with a sample of 100 human images and 100 dog images with the following results:

The 100.0% of the first 100 images in human_files have a detected human face.
The 11.0% of the first 100 images in dog_files have a detected human face.

It seems that the model detects human faces well and we run the risk of detecting a dog as a human being.

2. Detecting Dogs

I’ve used a pre-trained ResNet-50 model to detect dogs in images. The model has been trained on ImageNet, a very large, very popular dataset used for image classification and other vision tasks. ImageNet contains over 10 million URLs, each linking to an image containing an object from one of 1000 categories. Given an image, this pre-trained ResNet-50 model returns a prediction (derived from the available categories in ImageNet) for the object that is contained in the image.

The categories corresponding to dogs appear in an uninterrupted sequence and correspond to dictionary keys 151–268, inclusive, to include all categories from 'Chihuahua' to 'Mexican hairless'. Thus, in order to check to see if an image is predicted to contain a dog by the pre-trained ResNet-50 model, we need only check if the function returns a value between 151 and 268 (inclusive).

The model has been tested with a sample of 100 human images and 100 dog images with the following results:

The 0.0%t of the first 100 images in human_files_short have a detected dog
The 100.0% of the first 100 images in dog_files_short have a detected dog

With this sample it seems that we have a perfect model. In real life that doesn’t exist. If we increase the sample we will surely find cases in which the model fails.

Hyperparameter tuning

So far, the two models used have been provided to me by Udacity.

For the dog breed detection model I had to create a CNN and determine its hyperparameters and architecture.

3. Detect canine’s breed

a. Create a CNN to Classify Dog Breeds (from Scratch)

In this proyect we will see the power that gives us to use a pre-trained model using transfer learning. But now the first challenge is to create a model from scratch and be able to get an accuracy of at least 1%.

To overcome this challenge I have used the next architecture:

I’ve added three 2D convolution layers, three Max Pooling layers, a flatten, Dense Layers, and two Dropout layers.

That trains relatively fast on CPU and attains >1% test accuracy.

I used 20 epochs to train my model and I got an accuracy of 8.01%.

b. Create a CNN to Classify Dog Breeds (using Transfer Learning)

Transfer learning involves taking a pre-trained neural network and adapting the neural network to a new, different data set.

For this proyect I used the Inception V3 CNN model.

https://cloud.google.com/tpu/docs/inception-v3-advanced?hl=en

With transfer learning we keep the entire architecture of CNN Inception V3 and only modify the last layer to suit our image classification project.

Here we can see it in a diagram:

In Jupyter’s notebook, this looks like this:

Does this improves the accuracy? Let’s see the accuracy in our train and validation data using 20 epochs to train the model:

wow! Our model now have an accuracy over 80%! This is with the Training data and the Validation data.

The accuracy in the test data (database that has not been used to train the model)is 80.5%.

Results

I wrote an algorithm that accepts a file path to an image and first determines whether the image contains a human, dog, or neither. Then,

if a dog is detected in the image, returns the predicted breed.
if a human is detected in the image, returns the resembling dog breed.
if neither is detected in the image, provide output that indicates that no human or dog are detected. Also returns the resembling dog breed.

Let’s see some examples with dogs:

And finally the funny part, let’s use some human photos and see what the model predicts.

I took some photos from the Friends TV show.

Same predictions for Ross and Monica … makes sense since they were siblings

We can’t say why CNN chose each race, but I tested whether the results are the same with different photos of the same person.

Conclusions

Thanks to this project I have been able to learn to use CNN in a project from scratch.

Now I can process images and train a model that has high accuracy.

Transfer Learning is a really useful and powerful tool to train a CNN, it improves accuracy and helps save time.

I think the final output is quite good for photos with a dog in it. Also, it is really funny to use a human photo and see the what is the breed of dog chosen!

Improvements

For increasing the results maybe some different techniques could be used like data augmentation, different model architectures or more training data.

Do YOU want to know more?

To see more about this project, see the link to my Github available here.