Introduction to Keras & Transfer Learning for Self Driving Cars

Introduction to Keras and the use of Transfer Learning in the development of Deep Learning architectures

Prateek Sawhney
10 min readSep 12, 2022

In this medium article, I’m going to explain the basics concepts behind Keras, Transfer Learning and Multilayer Convolutional Neural Network. I’ll be introducing an interface that sits on top of TensorFlow, and allows us to draw on the power of TensorFlow with far more concise code.

Photo by Vincent Ghilione on Unsplash

That’s right. In this medium article, we’ll be building a deep neural network using a new set of tools. We’ll still have TensorFlow under the hood, but with an interface that makes testing and prototyping much faster.

Deep Learning Framework

Deep neural networks have been a big focus of work in Autonomous Driving. We’re exploring whether we can get a car to drive itself, using only deep neural networks and nothing else. Sometimes we call that behavioral cloning because we’re training the network that clone human driving behavior. Well, sometimes it’s called end-to-end learning because the network is learning to predict the correct steering angle and speed, using only the inputs from the sensors. Deep learning isn’t the only approach to building a self-driving car. For a number of years, people have been working on a more traditional sort of robotics approach. So, why go with deep learning over robotics approach?

Difference between the deep learning approach versus the robotics approach while developing a self driving car

The robotics approach in building self-driving cars involves a lot of detail knowledge about sensors, controls and planning.

With the deep learning approach, we don’t have to program all that detail knowledge into the vehicle. We simply feed all the information we have into the network, and then we let the network figure out on its own what’s important. Also, deep learning allows us to build a feedback loop where the more we drive, the more data we can collect, which in turn allows us to learn how to drive even better.

Photo by Yanshu Lee on Unsplash

Now the question arises if we actually can drive a real vehicle with just a deep neural network? We can. A startup in US recently drove 20 miles down the highway in San Francisco to the airport and back, without touching the steering wheel once. Wow. So it really works. Well then, let’s get started building a deep neural network.

Introduction to Transfer Learning

Deep Learning and Computer Vision is an amazing set of skills with a wide range of applications. However, deep learning engineers often don’t actually start with a blank slate when they’re building neural networks. Starting from scratch can be time-consuming. It’s not just architecting the network, but also experimenting with it, training it and adjusting it, which can take days or even weeks.

To accelerate the process, engineers often begin with a pre-trained network and then modify it. Fine-tuning an existing network is a powerful technique because improving a network takes much less effort than creating one from scratch. Going even further, we can take an existing network and re-purpose it for a related but different task. Re-purposing a network is called Transfer Learning, because we’re transferring the learning from an existing network to a new one.

Role of GPU’s

GPUs have also become extraordinarily important for deep learning. GPUs are optimized for high throughput computation. Whereas CPUs are mostly optimized for latency, running a single thread of instructions as quickly as possible, GPUs are optimized for throughput, running as many simultaneous computations as possible. Throughput computing is important for computer graphics because we want to update lots of pixels on the screen at the same time. And it turns out that throughput computing is also important for deep learning because the computations fundamental to deep learning have a lot of parallelism. What level of acceleration do you typically see when you move from training a network on a CPU to a GPU?

It depends on a lot of factors, including how the software we’re running has been designed, and the precise CPU and GPU we’re comparing. For example, the low power processor in our laptop is going to be much slower than a big server processor. But a rule of thumb would be that networks train about five times faster on a GPU than on a CPU. That’s similar to what I’ve seen, and I know from my own experience that it’s just so much faster and easier to make progress when my network is training five or ten times faster, and I have all that extra time to experiment with new approaches and get fast feedback on how they work. Another good approach for getting fast feedback is to use transfer learning, to take advantage of networks that have already been trained instead of just starting from scratch every time.

Deep Learning History

Neural Networks go back all the way to the 1950s. That’s a great point. One of the fascinating things about neural networks, at least to me, is how long they’ve taken to become an overnight success. As far back as the late 90s there was a network called the LeNet built by Yann LeCun that the post office and banks used to read digits on mail and checks and it was incredibly important. But even after LeNet, not a lot of interest picked up in neural networks. Deep learning has only really taken off in the last five years. It’s kind of a puzzle why it took so long. Yeah, that’s a good question.

The answer boils down to the increased availability of label data along with the greatly increased computational throughput of modern processors. For a long time, we didn’t have the huge label data sets that we needed to make deep learning work. Those data sets only became widely available with the rise of the Internet which made collecting and labeling huge data sets feasible. But even when we had big data sets, we often didn’t have enough computational power to make use of them. It’s only been in the last five years that processors have gotten big enough and fast enough to train large scale neural networks.

Pre-trained Networks

When we’re tackling a new problem with a neural network, it might help to start with an existing network that was built for a similar task and then try to fine-tune it for our own problem. There are a couple of good reasons to do this. First, existing neural networks can be really useful. If somebody has taken days or weeks to train a network already, then a lot of intelligence is stored in that network. Taking advantage of that work can accelerate our own progress. Second, sometimes the data set for the problem we’ll work on might be small. In those cases, looking for an existing network that’s designed for a problem similar to our own. If that network has already been trained on a larger data set, then we can use it as a starting point to help our own network generalize better. In order to do this, it makes sense to learn a little bit about the most prominent pre-trained networks that already exist.

ImageNet

From the past 5 years, the Internet had made it easier to generate and collect images, storage costs had dropped so that it was cheap to save large collections of images, and services like Amazon’s Mechanical Turk had even made it more cost-effective to label images. That confluence of factors gave rise to ImageNet, a huge database of hand-labeled images. And the ImageNet database gave rise to the ImageNet Large Scale Visual Recognition Competition.

The ImageNet Large Scale Visual Recognition Competition is most famous as an annual competition, where teams from industry and academia try to build the best networks for object detection and localization. Kind of like this image. Is this a dog or a horse?

Photo by Helena Lopes on Unsplash

It’s a horse.

Anyway, this spurred really intense competition between teams in industry and academia to produce the best image classification network. And as these teams published their approaches, we learned a lot about the best ways to build neural networks. The first big breakthrough was in 2012, when the winning submission really changed the field. It was called AlexNet, and it looked a lot like Yann LeCun’s neural network from way back in 1998.

AlexNet

AlexNet was developed at the University of Toronto. Although the fundamental architecture of AlexNet resembled LeNet from 1998, AlexNet was a breakthrough in several respects. First and foremost, AlexNet used the massive parallelism afforded by GPUs to accelerate training. Using the best GPUs available in 2012, the AlexNet team was able to train the network in about a week. Additionally, AlexNet pioneered the use of Rectified Linear Units as an activation function and dropout is a technique for avoiding overfitting. In 2011, the year before AlexNet was developed, the winner of the ImageNet competition successfully classified 74% of images, or in the terminology of the competition, its error was 26%. The next year, AlexNet lowered its error to 15%. This was a huge leap forward.

AlexNet (Image by author)

AlexNet is still used today as a starting point for building and training neural networks. Actually, engineers typically use a simplified version of AlexNet because in recent years we’ve discovered that some of AlexNet’s features aren’t really necessary, and so they’ve been removed. But most implementations of AlexNet that we’ll find online reflect these changes. Since AlexNet, a number of newer networks have been developed with even higher accuracy. But AlexNet is still one of the best understood and most widely used starting points for computer vision.

VGG

In 2014, two different groups nearly tied in the ImageNet competition with the seven percent classification error. One of those networks is called VGGNet, or sometimes just VGG, and it came from the Visual Geometry Group at Oxford University. VGG has a simple and elegant architecture, which makes it great for transfer learning.

VGG Network (Image by author)

The VGG architecture is just a long sequence of three-by-three convolutions, broken up by two-by-two pooling layers, and finished by a trio of fully-connected layers at the end. Lots of engineers use VGG as a starting point for working on other image classification tasks, and it works really well. The flexibility of VGG is one of its great strengths.

GoogleNet

In 2014, Google published its own network in the ImageNet competition and in homage to Yann LeCun and LeNet, Google named their network, GoogLeNet. It’s spelled like GoogleNet but it’s pronounced GoogLeNet. In the ImageNet competition, GoogLeNet performed even a little better than VGG: 6.7% compared to 7.3% percent, although at that level, it kind of feels like we’re splitting hairs. GoogLeNet’s great advantage is that it runs really fast. The team that developed GoogLeNet developed a clever concept called an Inception module, which trains really well and is efficiently deployable.

Do you remember inception?

It’s called an inception module. It’s going to look a little more complicated. The idea is that at each layer of your ConvNet, we can make a choice, have a pooling operation, have a convolution and then we need to decide, is it the one by one convolution or a three by three or five by five. All of these are actually beneficial to the modeling power of our network. So why choose? Let’s use them all. Here’s what an inception module looks like.

The Naive Inception Module. (Source: Inception v1)

Instead of having a single convolution, we have a composition of average pooling followed by one by one, then a one by one convolution, then a one by one followed by a three by three, then a one by one followed by a five by five and at the top we simply concatenate the output of each of them. It looks complicated but what’s interesting is that we can choose these parameters in such a way that the total number of parameters in our model is very small, yet the model performs better than if we had a simple convolution. The inception modules create a situation in which the total number of parameters is very small. This is why GoogLeNet runs almost as fast as AlexNet. And of course GoogLeNet has great accuracy. Like I mentioned earlier, it’s ImageNet error was only 7%. GoogLeNet is a great choice to investigate if we need to run our network in real time, like maybe in a self-driving car.

ResNet

The 2015 ImageNet winner was a network from Microsoft Research called ResNet. ResNet claim to fame is that it has a massive 152 layers. For contrast, AlexNet has eight layers, VGG has 19 layers, and GoogLeNet has 22 layers. ResNet is kinda like VGG and that the same structure is repeated again and again for layer after layer. The main idea was to add connections to the neural network that skip layers. So that very deep neural networks could practically be trained. ResNet achieves a loss of only three percent on ImageNet which is actually better than normal human accuracy.

Conclusion

AlexNet, VGG, GoogLeNet and ResNet are important neural network architectures we might want to use in future projects. And even if we build our own neural network from scratch, we’ll still want to fine-tune it. We can save our network as we train and then come back later to experiment with new data and additional layer or different hyperparameters.

--

--

Prateek Sawhney

AI Engineer at DPS, Germany | 1 Day Intern @Lenovo | Explore ML Facilitator at Google | HackWithInfy Finalist’19 at Infosys | GCI Mentor @TensorFlow | MAIT, IPU