Dog Breed Classification Using Flask

Dog Breed Classifier web app demonstrating the use of Transfer Learning with ResNet-50

Prateek Sawhney
14 min readSep 9, 2021

A simple dog breed classification web application that detects the breed of the dog among 133 categories. Also, if the uploaded image is that of a human, the algorithm outputs the dog breed the human resembles the most.

Dog breed application using Deep Learning (Image by author)

Overview

The dog breed classification project of the Data Scientist Nanodegree by Udacity. A simple web application is developed using Flask through which the user can check if an uploaded image is that of a dog or human. Also, if the uploaded image is that of a human, the algorithm tells the user what dog breed the human resembles the most. The Deep Learning model distinguishes between the 133 classes of dogs with an accuracy of over 82.89%.

Domain Background

The project involves Image Processing, classification and detection using Deep Learning. Convolutional Neural Networks along with Transfer Learning is used to deliver the results. Given an image of the dog, the CNN must give out the correct breed of the dog in the image out of 133 classes of dogs. All this functionality is provided using a web application which is developed using Flask.

Problem Statement

The aim of this project is to create a classifier that is able to identify a breed of a dog if given a photo or image as input. If the photo or image contains a human face, then the application will return the breed of dog that most resembles the person in the image. I decided to opt for this project as the Capstone project of Udacity Data Scientist Nanodegree as I found the topic of Deep Neural Networks to be very fascinating and wanted to dive deeper into this with some practical work.

Dataset Exploration

Two datasets i.e. Dog Dataset and Human Dataset are used which are provided by Udacity. Dog dataset contains images of 133 classes of dogs in 133 folders each. The links to both the datasets are given below:

  1. Dog Dataset
  2. Human Dataset

While exploring the dataset, some useful insights that I found are given below:

  1. There are 133 total dog categories.
  2. There are 8351 total dog images.
  3. There are 6680 training dog images.
  4. There are 835 validation dog images.
  5. There are 836 test dog images.
  6. There are in total 13233 human images.

Solution Statement

The steps that I used in the solution approach are given below:

  1. First of all, the datasets are imported.
  2. Writing a function to detect humans in the input image.
  3. Detecting dogs in the input image.
  4. Pre-processing the Input Data.
  5. Creating a CNN to classify dog breeds (from scratch).
  6. Using Transfer Learning to create a CNN using ResNet-50 bottleneck features.
  7. Writing the algorithm.
  8. Testing the Pipeline.
  9. Using the final model to make predictions from a web application using Flask.
  10. The user can select any image to upload and the backend will make out the prediction and display the results on the next page.

Libraries Used

The libraries and packages alongwith their appropriate versions that I used in this project are described below. I would recommend to make a separate environment for these packages, as installing them in your system alongwith other pre-installed packages might cause some version errors. I used Anaconda in my Ubuntu desktop to create a new virtual environment called “dog-breed”.

conda create --name dog-breed
  1. flask==1.1.0
  2. h5py==2.10.0
  3. keras==2.0.9
  4. numpy==1.18.4
  5. pandas==0.23.3
  6. pillow==5.2.0
  7. python==3.6.3
  8. scipy==1.4.1
  9. tensorflow==1.3.0
  10. tqdm==4.11.2
  11. Matplotlib==2.1.0

Detecting Humans

I used OpenCV’s implementation of feature based Haar Cascade Classifiers available on the Internet to easily detect humans in an image as the first major step of the pipeline. OpenCV provides many pre-trained face detectors, stored as XML files on github. I downloaded one of these one of these detectors and stored it in the haarcascades directory.

import cv2                
import matplotlib.pyplot as plt
%matplotlib inline
# extracting the pre-trained face detector
face_cascade = cv2.CascadeClassifier('haarcascades/haarcascade_frontalface_alt.xml')
img = cv2.imread(human_files[3])
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray)
print('Number of faces detected:', len(faces))
for (x,y,w,h) in faces:
# add a bounding box to color image
cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)

# convert BGR image to RGB for plotting
cv_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.imshow(cv_rgb)
plt.show()

Converted the input image to grayscale because it is a standard procedure to convert the images to grayscale before using any of the face detectors. The detectMultiScale function executes the classifier stored in face_cascade which takes the grayscale image as a parameter. Also, added a bounding box to each detected face in the image. An output from the above step is given below:

Number of faces detected: 1
Human image (Image Source: Human Dataset)

Writing a human face detector

I used the same procedure to write a function that detects humans in an image and returns True if a human face is detected and False otherwise. This function, aptly named face_detector, takes a single string-valued file path to an image as input and appears in the code block below.

def face_detector(img_path):
img = cv2.imread(img_path)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray)
return len(faces) > 0

Detecting Dogs

In this step, I used a pre-trained ResNet-50 model to detect dogs in the input images. The first line of code beloe downloads the ResNet-50 model, along with weights that have been trained on ImageNet, a very large and popular dataset used for image classification and other computer vision tasks. ImageNet contains over 10 million URLs, each linking to an image containing an object from one of 1000 categories. Given an image, this pre-trained ResNet-50 model returns a prediction (derived from the available categories in ImageNet) for the object that is contained in the image.

from keras.applications.resnet50 import ResNet50# defining ResNet50 model
ResNet50_model = ResNet50(weights='imagenet')

Pre-processing the Input Data

When using TensorFlow as the backend, Keras CNNs require a 4D array as input, with shape (nb_samples, rows, columns, channels), where nb_samples corresponds to the total number of images (or samples), and rows, columns, and channels correspond to the number of rows, columns, and channels for each image, respectively.

The path_to_tensor function below takes a string-valued file path to a color image as input and returns a 4D tensor suitable for supplying to a Keras CNN. The function first loads the image and resizes it to a square image that is 224x224 pixels.

from keras.preprocessing import image                  
from tqdm import tqdm
def path_to_tensor(img_path):
# loading RGB image as PIL.Image.Image type
img = image.load_img(img_path, target_size=(224, 224))
# converting PIL.Image.Image type to 3D tensor with shape (224, 224, 3)
x = image.img_to_array(img)
# converting 3D tensor to 4D tensor with shape (1, 224, 224, 3) and return 4D tensor
return np.expand_dims(x, axis=0)

Next, the image is converted to an array, which is then resized to a 4D tensor. In this case, since we are working with color images of dogs, each image has three channels. Likewise, since we are processing a single image (or sample), the returned tensor will always have shape (1, 224, 224, 3).

The paths_to_tensor function given below takes a numpy array of string-valued image paths as input and returns a 4D tensor with shape (nb_samples, 224, 224, 3).

def paths_to_tensor(img_paths):
list_of_tensors = [path_to_tensor(img_path) for img_path in tqdm(img_paths)]
return np.vstack(list_of_tensors)

Here, nb_samples is the number of samples, or number of images, in the supplied array of image paths. It is best to think of nb_samples as the number of 3D tensors (where each 3D tensor corresponds to a different image) in our dataset!

Making Predictions with ResNet-50

At this stage, a 4D tensor is ready for the ResNet-50 to make predictions. Now that we have a way to format our image for supplying to ResNet-50, we are now ready to use the model to extract the predictions. This is accomplished with the predict method, which returns an array whose i-th entry is the model’s predicted probability that the image belongs to the i-th ImageNet category. This is implemented in the ResNet50_predict_labels function below.

from keras.applications.resnet50 import preprocess_input, decode_predictionsdef ResNet50_predict_labels(img_path):
# returns prediction vector for image located at img_path
img = preprocess_input(path_to_tensor(img_path))
return np.argmax(ResNet50_model.predict(img))

Writing a Dog Detector

While looking at the dictionary of the ImageNet labels, I noticed that the categories corresponding to dogs appear in an uninterrupted sequence and correspond to dictionary keys 151–268, inclusive, to include all categories from 'Chihuahua' to 'Mexican hairless'. Thus, in order to check to see if an image is predicted to contain a dog by the pre-trained ResNet-50 model implemented above, we need to only check if the ResNet50_predict_labels function above returns a value between 151 and 268 (inclusive).

I used these ideas to make the dog_detector function below, which returns True if a dog is detected in an image (and False if not).

def dog_detector(img_path):
prediction = ResNet50_predict_labels(img_path)
return ((prediction <= 268) & (prediction >= 151))

Creating a CNN to Classify Dog Breeds (from Scratch)

Now that we have functions for detecting humans and dogs in images, we need a way to predict dog breed from images. Considering this fact, that even a human would have great difficulty in distinguishing between a Brittany and a Welsh Springer Spaniel given below.

Dog breed — Brittany (Image Source: Dog Dataset)
Dog breed — Welsh Springer Spaniel (Image Source: Dog Dataset)

Also, more distinguishing/challenging categories are shown below for reference.

Dog breed — Yellow Labrador, Chocolate Labrador and Black Labrador (Image Source: Dog Dataset)

Pre-processing the Data

I rescaled the images by dividing every pixel in every image by 255. The corresponding code is given below for reference.

from PIL import ImageFile                            
ImageFile.LOAD_TRUNCATED_IMAGES = True
# pre-processing the data for Keras
train_tensors = paths_to_tensor(train_files).astype('float32')/255
valid_tensors = paths_to_tensor(valid_files).astype('float32')/255
test_tensors = paths_to_tensor(test_files).astype('float32')/255

Model Architecture used

After so many attempts, I came up with a model architecture that performed really well. Added six convolutional layers with “relu” activation followed by a flattening and two dense layers. The exact code snippet is provided below.

model = Sequential()model.add(Conv2D(16, (3, 3), padding='same', activation='relu', input_shape=(224, 224, 3)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(133, activation='softmax'))
### Defining the summary of the model.
model.summary()

The model architecture is given below for further reference. The total parameters and trainable parameters comes out to be 258,981.

_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_13 (Conv2D) (None, 224, 224, 16) 448
_________________________________________________________________
max_pooling2d_14 (MaxPooling (None, 112, 112, 16) 0
_________________________________________________________________
conv2d_14 (Conv2D) (None, 110, 110, 32) 4640
_________________________________________________________________
max_pooling2d_15 (MaxPooling (None, 55, 55, 32) 0
_________________________________________________________________
conv2d_15 (Conv2D) (None, 53, 53, 64) 18496
_________________________________________________________________
max_pooling2d_16 (MaxPooling (None, 26, 26, 64) 0
_________________________________________________________________
conv2d_16 (Conv2D) (None, 24, 24, 128) 73856
_________________________________________________________________
max_pooling2d_17 (MaxPooling (None, 12, 12, 128) 0
_________________________________________________________________
conv2d_17 (Conv2D) (None, 10, 10, 64) 73792
_________________________________________________________________
max_pooling2d_18 (MaxPooling (None, 5, 5, 64) 0
_________________________________________________________________
conv2d_18 (Conv2D) (None, 3, 3, 64) 36928
_________________________________________________________________
max_pooling2d_19 (MaxPooling (None, 1, 1, 64) 0
_________________________________________________________________
flatten_4 (Flatten) (None, 64) 0
_________________________________________________________________
dense_5 (Dense) (None, 256) 16640
_________________________________________________________________
dropout_3 (Dropout) (None, 256) 0
_________________________________________________________________
dense_6 (Dense) (None, 133) 34181
=================================================================
Total params: 258,981
Trainable params: 258,981
Non-trainable params: 0
_________________________________________________________________

Compiling and Training the Model

I used “rmsprop” optimizer and “loss=categorical_crossentropy” for now. I used accuracy in compiling the model as it is suitable for this kind of problem. Also, analysis can be done on F1 score as well because this dataset is imbalanced and for minority classes F1 score would produce more realistic results.

model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

I used epochs=25 and batch_size=20. Also, used model checkpointing to save the model that attains the best validation loss.

epochs = 25checkpointer = ModelCheckpoint(filepath='saved_models/weights.best.from_scratch.hdf5', verbose=1, save_best_only=True)model.fit(train_tensors, train_targets, 
validation_data=(valid_tensors, valid_targets),
epochs=epochs, batch_size=20, callbacks=[checkpointer], verbose=1)

Testing the Model

dog_breed_predictions = [np.argmax(model.predict(np.expand_dims(tensor, axis=0))) for tensor in test_tensors]# reporting test accuracy
test_accuracy = 100*np.sum(np.array(dog_breed_predictions)==np.argmax(test_targets, axis=1))/len(dog_breed_predictions)
print('Test accuracy: %.4f%%' % test_accuracy)

I tested the model on the testset and got a test accuracy of 14.2344%. This led me to use transfer learning with ResNet-50 model.

Creating a CNN to Classify Dog Breeds (using Transfer Learning)

I used transfer learning to create a CNN using ResNet-50 bottleneck features. After downloading the bottleneck features for the ResNet-50 model, I extracted the bottleneck features corresponding to the train, test, and validation sets by running the following in the code block below:

bottleneck_features = np.load('bottleneck_features/DogResnet50Data.npz')
train_Resnet = bottleneck_features['train']
valid_Resnet = bottleneck_features['valid']
test_Resnet = bottleneck_features['test']

Model Architecture

Appended the GlobalAveragePooling2D layer to the trained “ResNet_Model” made with the corresponding “bottleneck_features” followed by the dense layer with activation “softmax”. The final layer contains 133 nodes as there are total 133 different dog breed classes in the dataset. The code of the model is given below:

Resnet_Model = Sequential()Resnet_Model.add(GlobalAveragePooling2D(input_shape=train_Resnet.shape[1:]))
Resnet_Model.add(Dense(133,activation='softmax'))
Resnet_Model.summary()

I used the ResNet50 model and the last output layer of this model is changed so as to predict 133 different classes of dogs. The architecture of the model is depicted below and the total number of parameters are equal to 272,517.

Layer (type)                 Output Shape              Param #   
=================================================================
global_average_pooling2d_2 ( (None, 2048) 0
_________________________________________________________________
dense_8 (Dense) (None, 133) 272517
=================================================================
Total params: 272,517
Trainable params: 272,517
Non-trainable params: 0
_________________________________________________________________

Training and Testing the Model

I used categorical crossentropy loss in addition to the “adam” optimizer in this transfer learning approach.

Resnet_Model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

Peformed training of the model in the code cell below. I used model checkpointing to save the model that attains the best validation loss. Batch_size is set to 20 and the number of epochs is equal to 25.

from keras.callbacks import ModelCheckpoint  
checkpointer = ModelCheckpoint(filepath='saved_models/weights.best.Resnet.hdf5',
verbose=1, save_best_only=True)
Resnet_Model.fit(train_Resnet, train_targets,
validation_data=(valid_Resnet, valid_targets),
epochs=25, batch_size=20, callbacks=[checkpointer], verbose=1)

After loading the model with the best validation loss using “load_weights” method, I tried out my model on the test dataset of dog images. The below code is given for reference.

Resnet_Predictions = [np.argmax(Resnet_Model.predict(np.expand_dims(feature, axis=0))) for feature in test_Resnet]# Reporting Test Accuracy
test_accuracy = 100*np.sum(np.array(Resnet_Predictions)==np.argmax(test_targets, axis=1))/len(Resnet_Predictions)
print('Test accuracy: %.4f%%' % test_accuracy)

The architecture deployed above is suitable because we are getting a very high accuracy from it. The test accuracy comes out to be 82.8947%. From the model, that I developed from scratch using 6 convolutional and 2 dense layers had an accuracy of 14.2344%.

Predicting Dog Breed with ResNet-50 Model

Writing a function that takes an image path as input and returns the dog breed (Affenpinscher, Afghan_hound, etc) that is predicted by our model.

  1. Extracting the bottleneck features corresponding to the ResNet-50 CNN model.
  2. Supplying the bottleneck features as input to the model to return the predicted vector. (Noting that the argmax of this prediction vector gives the index of the predicted dog breed).
def extract_Resnet50(tensor):
return ResNet50(weights='imagenet', include_top=False).predict(preprocess_input(tensor))
def Resnet_predict_breed(img_path):
# extracting the bottleneck features
bottleneck_feature = extract_Resnet50(path_to_tensor(img_path))
# obtaining the predicted vector
predicted_vector = Resnet_Model.predict(bottleneck_feature)
# returning the dog breed that is predicted by the model
return dog_names[np.argmax(predicted_vector)]

The Algorithm

Witten an algorithm that accepts a file path to an image and first determines whether the image contains a human, dog, or neither. Then,

  • if a dog is detected in the image, the predicted breed is returned.
  • if a human is detected in the image, the resembling dog breed is returned.
  • if nothing is detected in the input image, providing output that indicates an error.
def get_correct_prenom(word, vowels):
if word[0].lower() in vowels:
return "an"
else:
return "a"
def predict_image(img_path):
vowels=["a","e","i","o","u"]
show_img(img_path)
#if a dog is detected in the image, returning the predicted breed.
if dog_detector(img_path)==True:
predicted_breed=Resnet_predict_breed(img_path).rsplit('.',1)[1].replace("_", " ")
prenom=get_correct_prenom(predicted_breed,vowels)
return "The predicted dog breed is " + prenom + " "+ str(predicted_breed) + "."
#if a human is detected in the image, returning the resembling dog breed.
if face_detector(img_path)==True:
predicted_breed=Resnet_predict_breed(img_path).rsplit('.',1)[1].replace("_", " ")
prenom=get_correct_prenom(predicted_breed,vowels)
return "This photo looks like " + prenom + " "+ str(predicted_breed) + "."
#if neither is detected in the image, provide output that indicates an error.
else:
return "No human or dog could be detected, please provide another picture."

Testing the Algorithm on sample images

The output is better than I expected :) Also if the human face in the picture has distinct features, the predicted dog race matches very well. However for very close-looking dog races, it still seems to be tricky to get it right. For improving the algorithm, we could increase the training data by including more pictures per dog breed. We could include more and deeper layers into our CNN. This would also lead to a decreased performance or require better hardware in order not to slow down processing time. Also, we could run different CNNs against each other to see which one performs best for this task.

Sample image 1 for testing the pipeline:

predict_image('./images/sample_human_2.png')
sample_human_2.png (Image Source: Human Dataset)
'This photo looks like an Afghan hound.'

Sample image 2 for testing the pipelime:

predict_image('./images/Brittany_02625.jpg')
Brittany_02625.jpg (Image Source: Dog Dataset)
'The predicted dog breed is a Brittany.'

Sample image 3 for testing the pipeline:

predict_image('./images/Labrador_retriever_06449.jpg')
Labrador_retriever_06449.jpg (Image Source: Dog Dataset)
'The predicted dog breed is a Labrador retriever.'

Web Application Using Flask

I integrated the above code into a simple web application using flask to carry out the dog breed predictions. For this, I converted the “dog_names” array to a “json file” and imported it into my flask application by the below code.

import json
dog_names=[]
with open(‘data/dog_names.json’) as json_file:
dog_names = json.load(json_file)

Steps for running the web application

For running the web application on local machine, the following instructions should be followed:

  1. Make sure you have all necessary packages installed.
  2. Git clone this repository.
  3. Within command line, cd to the cloned repo, and within the main repository.
  4. Running the following command in the parent directory to run the web application.
python main.py

Go to http://0.0.0.0:8080/ to view the web app and input new pictures of dogs or humans — the app will tell you the resembling dog breed successfully without any errors.

Prediction Using Flask

Below, I have attached some screenshots that depict the working of my web app. The user can upload an image on the homepage and click the “Upload and Detect Dog button”. The next page displays the detected breed successfully.

Home page — running on localhost:8080 (Image by author)

Results about the prediction are shown on the next page.

Input Image uploaded (Image by author)
The predicted dog breed is a Labrador retriever. (Image by author)

Conclusion

I was surprised by the good results of the algorithm and the model i.e. Resnet50. Without doing too much fine-tuning, the algorithm provides high accuracy and the predictions were mostly correct. An accuracy of 82.8947%. For human faces it seems easier if the face has distinct features that resembles a certain dog breed. Otherwise, it starts to guess from some features, but the results vary. For higher accuracy, the parameters could be further optimized, maybe also including more layers into the model. Further, number of epochs could be increased to 40 to lower the loss. Also by providing an even bigger training data set, the classification accuracy could be improved further. Another improvement could be made with regard to UI.

References

  1. ResNet-50 Model architecture that is used for Tranfer Learning in our problem. The detailed architecture with amazing visualization can be found on the below reference.

2. ImageNet is an image database in which each node of the hierarchy is depicted by hundreds and thousands of images organized according to the WordNet hierarchy.

3. More information about the feature based Haarcascade classifier that we used for detecting faces in the input image can be found on the below source.

4. Traffic Sign Recognition Using Convolutional Neural Networks (CNN)

The project can be found on my GitHub Repository with detailed instructions. Main.py contains detailed code about the functioning of the web application and the deep learning model used.

And with that, we have come to the end of this article. Bundle of thanks for reading it!

My Portfolio and Linkedin :)

--

--

Prateek Sawhney

AI Engineer at DPS, Germany | 1 Day Intern @Lenovo | Explore ML Facilitator at Google | HackWithInfy Finalist’19 at Infosys | GCI Mentor @TensorFlow | MAIT, IPU