Deep Learning

Traffic Sign Recognition Using Convolutional Neural Networks (CNN)

German Traffic Sign Classification Project of the Self-Driving Car Engineer Nano Degree Term 1 demonstrating the use of CNN in Classification Tasks

Prateek Sawhney

--

A Convolutional Neural Network is designed and trained to detect the traffic signs using the German Traffic Sign Dataset. The system is also tested on German traffic signs to measure its performance.

Image by Devinder Singh on Unsplash

The detailed instructions along with the project code can be found on my GitHub repository referenced below.

Outlines of the Project

The goals of the project are given below:

  1. Loading the dataset i.e. the German Traffic Sign Dataset.
  2. Exploring, summarizing, and visualizing the dataset.
  3. Designing and training the model architecture.
  4. Evaluating the performance of the model on the test dataset.
  5. Using the trained model to make predictions on new images.

Introduction

Deep Neural Networks (DNN) have greater capabilities for image pattern recognition and are widely used in Computer Vision algorithms. And, Convolutional Neural Network (CNN, or ConvNet) is a class of DNN (Deep Neural Networks) which is most commonly applied to analyzing visual imagery. Traffic sign classification and detection are one of the major tasks in self-driving as it gives the input of what sign is in the image to decision making.

Different traffic signs (Image by author)

Traffic-sign recognition (TSR) is a technology by which a vehicle is able to recognize the traffic signs put on the road e.g. “speed limit” or “children” or “turn ahead”. This is part of the features collectively called ADAS. The technology is being developed by a variety of automotive suppliers. It uses image processing techniques to detect traffic signs. The detection methods can be generally divided into color-based, shape-based and learning-based methods.

Dataset Used

The Dataset used is the German Traffic Signs Dataset which contains images of the shape (32x32x3) i.e. RGB images. I used the Numpy library to calculate the summary statistics of the traffic signs data set given below:

  • The size of the training set is 34799
  • The size of the validation set is 4410
  • The size of the test set is 12630
  • The shape of a traffic sign image is (32, 32, 3)
  • The number of unique classes/labels in the data set is 43

The dataset used is referenced below:

Exploratory Visualization of the Dataset

The visualization of the dataset is done in two parts. In the first part, a very simple basic approach is taken to display a single image from the dataset. After that, there is an exploratory visualization of the dataset, by drawing the first image of 35 classes, 43 classes in total.

Visualization of the Dataset — 35 classes (Image by author)

Model Architecture

As a first step, I decided to shuffle my X_train, y_train. Then, I used Normalization as one of the preprocessing techniques. In which, the dataset (X_train, X_test, X_valid) is fed into the normalization(x_label) function which converts all the data and returns the normalized one. The python code that performs this step is given below:

import tensorflow as tffrom sklearn.utils import shuffle
X_train, y_train = shuffle(X_train, y_train)
EPOCHS = 20
BATCH_SIZE = 128
def normalization(x_label):
return x_label / 255 * 0.8 + 0.1
X_train = normalization(X_train)
X_test = normalization(X_test)
X_valid = normalization(X_valid)

To train the model, I used: EPOCHS = 20, BATCH_SIZE = 128, rate = 0.001, mu = 0, sigma = 0.1. I used the same LeNet model architecture which consists of two convolutional layers and three fully connected layers. The input is an image of size (32x32x3) and output is 43 i.e. the total number of distinct classes. In the middle, I used RELU activation function after each convolutional layer as well as the first two fully connected layers.

Flatten is used to convert the output of 2nd convolutional layer after pooling i.e. 5x5x16 into 400. Pooling is also done in between after the 1st and the 2nd convolutional layer. My final model consisted of the following layers:

Model Architecture used (Image by author)

LeNet model which was developed and introduced by LeCun et al. in 1998, was primarily used for optical character recognition, the MNIST dataset, etc. More information about the LeNet Model can be found below:

Below is a snapshot of the actual code that describes my model architecture.

def LeNet(x):    
mu = 0
sigma = 0.1

# Layer 1: Convolutional. Input = 32x32x3. Output = 28x28x6.
weights_layer_1 = tf.Variable(tf.truncated_normal(shape=(5,5,3,6), mean = mu, stddev = sigma))
bias_layer_1 = tf.Variable(tf.zeros(6))
output_layer_1 = tf.nn.conv2d(x, weights_layer_1, strides=[1, 1, 1, 1], padding='VALID')
output_layer_1 = tf.nn.bias_add(output_layer_1, bias_layer_1)
# Activation.
output_layer_1 = tf.nn.relu(output_layer_1)
# Pooling. Input = 28x28x6. Output = 14x14x6.
output_layer_1 = tf.nn.max_pool(output_layer_1, ksize = [1, 2, 2, 1], strides = [1, 2, 2, 1], padding = 'VALID')

# Layer 2: Convolutional. Output = 10x10x16.
weights_layer_2 = tf.Variable(tf.truncated_normal(shape=(5,5,6,16), mean = mu, stddev = sigma))
bias_layer_2 = tf.Variable(tf.zeros(16))
output_layer_2 = tf.nn.conv2d(output_layer_1, weights_layer_2, strides=[1, 1, 1, 1], padding='VALID')
output_layer_2 = tf.nn.bias_add(output_layer_2, bias_layer_2)
# Activation.
output_layer_2 = tf.nn.relu(output_layer_2)
# Pooling. Input = 10x10x16. Output = 5x5x16.
output_layer_2 = tf.nn.max_pool(output_layer_2, ksize = [1, 2, 2, 1], strides = [1, 2, 2, 1], padding = 'VALID')
# Flatten. Input = 5x5x16. Output =400.
output_layer_2 = flatten(output_layer_2)

# Layer 3: Fully Connected. Input = 400. Output = 120.
weights_fully_3 = tf.Variable(tf.truncated_normal(shape=(400, 120), mean = mu, stddev = sigma))
bias_fully_3 = tf.Variable(tf.zeros(120))
output_layer_3 = tf.add(tf.matmul(output_layer_2, weights_fully_3), bias_fully_3)
# Activation.
output_layer_3 = tf.nn.relu(output_layer_3)

# Layer 4: Fully Connected. Input = 120. Output = 84.
weights_fully_4 = tf.Variable(tf.truncated_normal(shape=(120, 84), mean = mu, stddev = sigma))
bias_fully_4 = tf.Variable(tf.zeros(84))
output_layer_4 = tf.add(tf.matmul(output_layer_3, weights_fully_4), bias_fully_4)
# Activation.
output_layer_4 = tf.nn.relu(output_layer_4)

# Layer 5: Fully Connected. Input = 84. Output = 43.
weights_fully_5 = tf.Variable(tf.truncated_normal(shape=(84, 43), mean = mu, stddev = sigma))
bias_fully_5 = tf.Variable(tf.zeros(43))
logits = tf.add(tf.matmul(output_layer_4, weights_fully_5), bias_fully_5)

return logits

Training, Validating, and Testing the Model

Training is the stage of machine learning when the model is gradually optimized, or the model learns the dataset. The goal is to learn enough about the structure of the training dataset to make predictions about unseen data.

Image by Riva Ferdian on Unsplash

If you learn too much about the training dataset, then the predictions only work for the data it has seen and will not be generalizable. This problem is called overfitting — it’s like memorizing the answers instead of understanding how to solve a problem.

x = tf.placeholder(tf.float32, (None, 32, 32, 3))
y = tf.placeholder(tf.int32, (None))
one_hot_y = tf.one_hot(y, 43)
rate = 0.001logits = LeNet(x)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=one_hot_y, logits=logits)
loss_operation = tf.reduce_mean(cross_entropy)
optimizer = tf.train.AdamOptimizer(learning_rate = rate)
training_operation = optimizer.minimize(loss_operation)

Traffic Sign Detection is an example of supervised machine learning: the model is trained from examples that contain labels. In unsupervised machine learning, the examples don’t contain labels. Instead, the model typically finds patterns among the features. LeNet model used gives the logits and cross-entropy. Finally, the Adam optimizer is used for optimization.

correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(one_hot_y, 1))
accuracy_operation = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
saver = tf.train.Saver()
def evaluate(X_data, y_data):
num_examples = len(X_data)
total_accuracy = 0
sess = tf.get_default_session()
for offset in range(0, num_examples, BATCH_SIZE):
batch_x, batch_y = X_data[offset:offset+BATCH_SIZE], y_data[offset:offset+BATCH_SIZE]
accuracy = sess.run(accuracy_operation, feed_dict={x: batch_x, y: batch_y})
total_accuracy += (accuracy * len(batch_x))
return total_accuracy / num_examples

A validation set can be used to assess how well the model is performing. A low accuracy on the training and validation sets imply underfitting. A high accuracy on the training set but low accuracy on the validation set implies overfitting.

The below code is the typical way of running the training and validation process in Tensorflow. Validation Accuracy after 20 epochs comes out to be 0.975.

with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
num_examples = len(X_train)

print("Training...")
print()
for i in range(EPOCHS):
X_train, y_train = shuffle(X_train, y_train)
for offset in range(0, num_examples, BATCH_SIZE):
end = offset + BATCH_SIZE
batch_x, batch_y = X_train[offset:end], y_train[offset:end]
sess.run(training_operation, feed_dict={x: batch_x, y: batch_y})

validation_accuracy = evaluate(X_valid, y_valid)
print("EPOCH {} ...".format(i+1))
print("Validation Accuracy = {:.3f}".format(validation_accuracy))
print()

saver.save(sess, './lenet')
print("Model saved")

Testing the Model on new images

Given below are the eight German traffic signs that I used for the purpose of testing on new images different from the test set. Further, the performance of the model is tested on fresh data from the Internet. Each image has a different resolution and size. For the purpose of testing, each image is resized to 32x32. The German Traffic signs after resizing them to 32x32:

German Traffic Signs after resizing (Images by author)

The model was able to correctly guess 5 of the 8 traffic signs. Test Accuracy on new loaded images = 0.625.

Results

My final model had a validation set accuracy of 0.975.

I used normalized images to train the model and the number of EPOCHS=20 and the BATCH_SIZE=128. With the use of the defined hyperparameters, the validation set accuracy is 0.975 which is more than my previous benchmark of 0.93. Further, the model has an accuracy of 0.625 on the eight downloaded images of german traffic signs from the web.

--

--

Prateek Sawhney

AI Engineer at DPS, Germany | 1 Day Intern @Lenovo | Explore ML Facilitator at Google | HackWithInfy Finalist’19 at Infosys | GCI Mentor @TensorFlow | MAIT, IPU