Note: This article is part of CodeProject's Image Classification Challenge.
Part 1: Introduction
We’ll be building a neural network-based image classifier using Python, Keras, and Tensorflow. Using an existing data set, we’ll be teaching our neural network to determine whether or not an image contains a cat.
This concept will sound familiar if you are a fan of HBO’s Silicon Valley. In one of the show’s most popular episodes, a character created an app called Not Hotdog - which, which supplied with an image, was able to determine if the image was a picture of a hot dog. The show’s producers used Python, Keras, and Tensorflow to create the app - exactly the same tech stack we’re going to use.
Our Approach
This is going to be a code-first tutorial. When starting to work with neural networks and deep learning, it can be tempting to want to learn all of the theory before trying to create anything. We’re going to take a different approach.
We are going to dive head-first into the code, learning about concepts as we encounter them. This won’t be a deep, technical dive into the nuances of neural networks and deep learning. Instead, it will be a hands-on, developer-centric look into AI. As a developer, I've found that this is usually the best way to learn: first and foremost, start by making something that works. Seeing something in action first makes it easier to understand the theory behind it later on.
There are already great resources available for free on all of the topics we will be covering, and I’ll link to them throughout the article for anyone who wants to explore in more depth. The bulk of this article, though, will be focused on writing the code that makes our image classifier work.
What this won't be is a comprehensive introduction to neural networks, deep learning, or image classification. I'll show you a solution that gets results, which I hope will serve as a good starting point for your journey into AI. For a thorough but approachable introduction to neural networks and deep learning, I recommend Stanford's CS231n course notes.
Setup
Installing Python
Getting started is easy - we’ll need to install Python, and then install a few packages using pip. Note that recent Tensorflow releases require a 64-bit build of Python, so make sure you download one of those.
I recommend installing Python 3.7 - although 3.6 should work as well. If you’re running Windows or MacOS, you can download an installer package from the Python website:
When you run the installer, be sure to select that option that adds Python to your system’s path. This will make it much easier to work with Python from via the terminal - which we’ll be doing frequently during this tutorial.
If you’re running Linux, there’s a good chance you already have a recent version of Python installed. If not, I suggest updating it via your system’s package manager. There are too many Linux distributions to go through all of the install instructions here, so Linux users may have to resort to a bit of Googling or reading documentation to get the correct version of Python installed.
Installing Packages
Once Python is installed, open a terminal window (on MacOS and Linux) or a command prompt window (on Windows). Powershell on Windows will work, too. Next, run the following command:
pip install tensorflow keras numpy pillow
This will install all of the packages needed to complete this tutorial. Experienced Python users will notice that we’re installing the packages globally instead of using a virtual environment. If you know how to set up a virtual environment, feel free to do so.
Note for Readers with NVidia GPUs
If you have a recent Nvidia GPU, you can likely use the GPU-accelerated version of Tensorflow. This will make the training of our neural network much faster. If you have a GPU that's in the GTX 9-series or higher, I highly recommend taking advantage of it. Even the modest GTX 1050 in my laptop is able to train the neural network about 50x faster than a mid-range i5 CPU.
To take advantage of GPU support, you’ll need to install some extra software from Nvidia, as well as the tensorflow-gpu
pip package. The instructions are too complex to cover here, but the Tensorflow site has a great explanation of the steps needed to use GPU acceleration with Tensorflow.
Libraries
Let’s take a quick look at the Python libraries we’ll be using.
Tensorflow
Tensorflow is an open source machine learning library created by Google. We’re not going to be using it directly. Instead, we’ll be using Keras, which uses Tensorflow behind the scenes. You can also use Keras with other back-ends like Microsoft's Cognitive Toolkit. I decided to go with Tensorflow because the Tensorflow-Keras combination is what's most commonly used and easiest to find help for if you run into trouble.
Pillow
Pillow is an easy-to-use image manipulation library. We’ll be using it to resize the images that we’ll be using to train our neural network. Pillow is a fork of PIL - the Python imaging library. PIL was great, but it stopped receiving updates. The Pillow project picked up the PIL torch and continues to improve it. It still aims for backwards compatibility with PIL - which is why when we use pillow, we'll be importing PIL.
NumPy
NumPy is a scientific computing library that makes it easy to efficiently perform calculations on large arrays and matrices.
Keras
Keras makes it easy to build and train many types of neural networks. Instead of performing manual calculations in a lower-level library like Tensorflow, with Keras we just define our network architecture using a friendly API and then feed training data into it.
Entry Code
Thanks for reading this far! The entry code for part 1 of the contest is simply your CodeProject member number. You can find it on your CodeProject member page. Please click here to submit your entry code. In the dropdown box, be sure to choose Round 1.
Part 2: The Code
To begin with, you can find a copy of all of this article's code on GitHub at https://github.com/rpeden/cat-or-not. I've also included a pre-trained model, in case you'd like to try it out without having to train it yourself. This might be a good idea if you're not using a GPU and you find that training just takes too long on your CPU.
You'll also need the image dataset to train and test the neural network. You can also find that on GitHub: https://github.com/rpeden/cat-or-not/releases. The .zip contains a data directory that you must unzip to the directory where you cloned cat-or-not - or the directory where you're creating it on your own.
With all of that out of the way, let's dive into the code. In a new directory, create a file named train.py. This will be where we train our neural network.
We'll start by adding all of the import
s we'll need:
from keras.models import Sequential, load_model
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.layers.normalization import BatchNormalization
from PIL import Image
from random import shuffle, choice
import numpy as np
import os
Next, we define some constants and a couple of helper functions that will help us import images:
IMAGE_SIZE = 256
IMAGE_DIRECTORY = './data/test_set'
def label_img(name):
if name == 'cats': return np.array([1, 0])
elif name == 'notcats' : return np.array([0, 1])
def load_data():
print("Loading images...")
train_data = []
directories = next(os.walk(IMAGE_DIRECTORY))[1]
for dirname in directories:
print("Loading {0}".format(dirname))
file_names = next(os.walk(os.path.join(IMAGE_DIRECTORY, dirname)))[2]
for i in range(200):
image_name = choice(file_names)
image_path = os.path.join(IMAGE_DIRECTORY, dirname, image_name)
label = label_img(dirname)
if "DS_Store" not in image_path:
img = Image.open(image_path)
img = img.convert('L')
img = img.resize((IMAGE_SIZE, IMAGE_SIZE), Image.ANTIALIAS)
train_data.append([np.array(img), label])
return train_data
Let's unpack this a little. We start by defining constants representing image size and the directory where we keep our images. The directory is self-explanatory, but why do we need an image size? As it turns out, the inputs to a neural network must all be the same length. So we'll be resizing all of our images to a common size.
In general, the larger your image size, the more accurate your network will be. There's a downside, though - the larger the image size you use, the longer it will take to train your model. An image size of 256 x 256 will already cause slow training times if you're using the CPU-only build of Tensorflow. If you have a beefy GPU, feel free to experiment with image sizes to see what impact it has on the accuracy of your model.
Next, we create a label_img
function. This function assigns a label of [1, 0]
to images of cats, and [0, 1]
to images that are not cats. Encoding our image classes as binary vectors in this way is called one-hot-encoding. It's necessary because neural networks work by using vectors of numbers, and lots of vector multiplication. They wouldn't know what to do with a text label like "cat
" or "hot dog
".
After that, we define a function that walks through our image directories and loads 200 images from each directory. We're only loading 200 instead of the whole data set because most of us aren't using computers and GPUs that can handle all of the images at once. Don't worry - later on, we'll take a look at how to do additional training runs with more images to improve the accuracy of our model.
For each image, we open it, convert it to grayscale, and then resize it to our desired image size. Converting to grayscale is another way of decreasing the amount of data we have to process. Reducing image size and converting to grayscale are two techniques that are part of a larger technique known as dimensionality reduction.
In this case, converting to grayscale was based on a guess that the shapes and pixel intensities that appear in an image would provide more meaningful information for a cat vs. not cat decision than would the colors in the image. After processing, each image is added to added to an array. The array is returned after all images have been loaded.
Next, we define a function that creates our neural network model:
def create_model():
model = Sequential()
model.add(Conv2D(32, kernel_size = (3, 3), activation='relu',
input_shape=(IMAGE_SIZE, IMAGE_SIZE, 1)))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(BatchNormalization())
model.add(Conv2D(64, kernel_size=(3,3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(BatchNormalization())
model.add(Conv2D(128, kernel_size=(3,3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(BatchNormalization())
model.add(Conv2D(128, kernel_size=(3,3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(BatchNormalization())
model.add(Conv2D(64, kernel_size=(3,3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(BatchNormalization())
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(64, activation='relu'))
model.add(Dense(2, activation = 'softmax'))
return model
Let's walk through what's happening here. We start by creating an instance of Keras' Sequential
class. A sequential model is exactly what it sounds like - a neural network through which data will be passed sequentially, in the order in which the layers were added to the network.
Next, you'll see a pattern - we repeatedly add a Conv2D
layer to the network, followed by a MaxPooling2D
layer, followed by a BatchNormalization
layer. This is a very common pattern when building neural networks to classify images. Although a deep explanation of these layer types is beyond the scope of this code-first introduction, let's briefly take a look at each:
Convolutional layers - represented in this case by Keras' Conv2D
class - adjust the weights of each neuron based not only on each data point in the input vector, but also on the surrounding data points. This makes some intuitive sense; features in an image tend to be defined by lines and edges, which only have meaning when you look at a pixel in relation to the other pixels that are near it.
Pooling layers - represented here by Keras' MaxPooling2D
layers - reduce the overall computational power required to train and use a model, and help the model generalize to learn about features without depending on those features always being at a certain location within an image. This is handy when building a cat detector, because ideally we'd like our classifier to recognize a cat no matter where in the image the cat appears.
These explanations don't give these layers the full coverage they deserve. If you're interesting in learning more, I recommend this great article.
Next, we see a BatchNormalization
layer. Batch Normalization is a technique that can dramatically reduce the time required to train a deep neural network. Rather than trying to explain it in a paragraph, I'll instead refer you to this article that does a great job of explaining batch normalization - complete with Keras sample code.
Then, we see a Dropout
layer. Dropout layers take a certain percentage of the input data they see and set those values to zero. This seems counterintuitive - after all, we just spent time training this network. Why get rid of some of the data? In practice, dropout layers prevent overfitting, which occurs when a neural network learns its input data a little too well. When overfitting occurs, a network will be very good at classifying the images has been trained on, but this accuracy doesn't generalize, causing subpar performance when classifying never-before-seen images.
The Flatten
layer, as its name implies, flattens our previous multi-dimensional layers into a single-dimensional vector. This is done because we finish up by adding several Dense layers, which take a single-dimensional vector as input, and output a single-dimensional vector. Dense layers are typically used to create traditional, non-convolutional neural networks. For a good discussion of how the different layer types compare, I like this fast.ai discussion thread.
Next, we'll use the functions we've defined to load images and train our neural network:
training_data = load_data()
training_images = np.array([i[0] for i in training_data]).reshape(-1, IMAGE_SIZE, IMAGE_SIZE, 1)
training_labels = np.array([i[1] for i in training_data])
print('creating model')
model = create_model()
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print('training model')
model.fit(training_images, training_labels, batch_size=50, epochs=10, verbose=1)
model.save("model.h5")
We start by loading our training data. We then split it into two arrays. First, we pull each image into a NumPy array, and the reshape image in the array match the shape of the data the first layer of our network is expecting.
Next, we pull out our one-hot-encoded labels into a NumPy array. We then call our create_model
function, and call the compile the model to prepare it for use. Loss functions and optimizers are important parts of deep learning, but to get us up and running quickly, I'm outsourcing the explanation of them to this article. You can also read the Keras documentation on loss functions and optimizers to see what it supports.
Finally, we call model.fit
to train our model using the images we loaded. We pass in the training images and labels we loaded, and feed them to our model in batches of 50. We run our training for 10 epochs which means the set of training images is fed to the neural network 10 times.
I'll warn you in advance - a training run of 10 epochs on subset of our training image set will not result in a great image classifier. I chose a small number to start because I want CPU-only users to be able to watch their training complete in a reasonable amount of time. If you're using a GPU, feel free to increase the number of epochs and the number of images you load at once. This will speed up the training process.
To do additional training runs, see retrain.py in the GitHub repo linked above. Instead of creating a new network from scratch, it loads the existing network from disk, trains it using a new set of randomly selected images, and then saves the result to disk. In general, each time you run retrain.py, your network will become a bit more accurate. You'll see the occasional training run where accuracy decreases, but don't let this deter you from doing more training runs. Your model accuracy will trend upward for quite a while before it levels off. Be careful not to re-run train.py, though, unless your intent is to overwrite your existing model with a new one because that's what will happen.
Finally, test.py loads the test images - which the neural network has never seen before - and uses them to test the accuracy of the network.
Entry Code
To generate your entry code for this part of the contest, you'll need to use the entry.py file in the GitHub repository. To generate the code, run python entry.py 12345678
, replacing 12345678
with your CodeProject member number. Click here to submit your entry code. Be sure to select Round 2 in the dropdown box.
Conclusion
We’ve come a long way in a short time!
Starting from scratch, we have built an image classifier using Python, Keras, and Tensorflow. We’ve also trained it to determine whether or not an image is a cat.
We’re only scratching the surface of what’s possible. For a few ideas of why to try next, consider training a network on a famous dataset like CIFAR-10, or try constructing an application using a pre-trained model like ImageNet.
If you're curious, you can try adding layers to the network we just built to see what happens. You can even try removing layers, or altering the parameters of the layers to see how it changes the results you get. You might notice that the 'not cat' data set isn't very big, and it's mostly made up of dogs, flowers, and household objects. Those are only a subset of things that aren't cats. There are many image datasets freely available on the web. Another good experiment would be to use these to augment your 'not cat' data set, and then do additional training on your model to see if its accuracy improves.
I spent 5 years working for Ottawa startups before returning home to Toronto. During Covid I decided to escape the big city and moved north to Penetanguishene, Ontario.
I'm a .NET/JavaScript/TypeScript developer and technical writer by day, but in the evening you'll often find me cooking up projects in Ruby, Haskell, Clojure, Elixir, and F#.