Image Classification with Keras

Open In Colab

This file is intended for use in Google Colabs (or a GPU based environment) because it will pretty much die with CPU-only

Image Processing with Keras

The notes in here are based on this YouTube series from Jeff Heaton, further information on CNN's however can also be found in this series from Stanford

# Set Colab TF to 2.x

try:
    %tensorflow_version 2.x
    COLAB = True
    print("Note: using Google CoLab")
except:
    print("Note: not using Google CoLab")
    COLAB = False
TensorFlow 2.x selected.
Note: using Google CoLab

Overview

For processing images we'll use PIL which enables you to use images directly in Python

To install PIL use:

pip install pillow

We'll also use the requests package to get images from the internet via an HTTP request

IMAGE_URL = 'https://upload.wikimedia.org/wikipedia/commons/9/92/Brookings.jpg'
%matplotlib inline

import pandas as pd
import numpy as np
import requests

from io import BytesIO
from matplotlib.pyplot import imshow

from PIL import Image

We can import an image with:

response = requests.get(IMAGE_URL)

img = Image.open(BytesIO(response.content))
img.load()

img

Each image that we import has it's pixels in an array grouped by position as well as color list. So for an image that's 3x3 pixels, we have an array like so:

[
  [[r, g, b], [r, g, b], [r, g, b]],
  [[r, g, b], [r, g, b], [r, g, b]],
  [[r, g, b], [r, g, b], [r, g, b]],
]

We can see our data set by converting the image to an array:

np.array(img)
array([[[ 86, 133, 177],
        [ 85, 132, 176],
        [ 84, 133, 176],
        ...,
        [ 94, 128, 153],
        [ 91, 128, 155],
        [ 94, 129, 169]],

       [[ 86, 133, 177],
        [ 88, 135, 179],
        [ 88, 137, 180],
        ...,
        [ 96, 133, 159],
        [ 92, 136, 165],
        [ 99, 141, 183]],

       [[ 83, 130, 174],
        [ 87, 134, 178],
        [ 89, 138, 181],
        ...,
        [108, 150, 175],
        [100, 149, 179],
        [ 97, 144, 186]],

       ...,

       [[127,  77,  76],
        [131,  81,  80],
        [128,  80,  76],
        ...,
        [  4,  10,  10],
        [  2,  11,  10],
        [  2,  11,  10]],

       [[132,  81,  77],
        [129,  80,  75],
        [124,  75,  70],
        ...,
        [  4,  10,  10],
        [  3,  12,  11],
        [  3,  12,  11]],

       [[140,  90,  83],
        [137,  87,  80],
        [130,  81,  74],
        ...,
        [ 11,  17,  17],
        [ 10,  19,  18],
        [ 10,  19,  18]]], dtype=uint8)

Using the PIL library we can also generate images from an array of data, for example a 64x64 pixel image can be created whith the following:

w, h = 64,  64
data = np.zeros((h, w, 3), dtype=np.uint8)

def assign_pixels(rgb, row_start, col_start):
  for row in range(32):
    for col in range(32):
      data[row + row_start, col + col_start] = rgb

# yellow
assign_pixels([255, 255, 0], 0, 0)
# red
assign_pixels([255, 0, 0], 32, 0)
# blue
assign_pixels([0, 0, 255], 0, 32)
#green
assign_pixels([0, 255, 0], 32, 32)

img = Image.fromarray(data, 'RGB')

img

Using a combation of reading, writing, and processing using PIL and numpy. When using images some preprocessing tasks we may want to do are:

  1. Size and Shape normalization
  2. Greyscaling
  3. Flatting of Image data to 1D array
  4. Normalizing pixel values from 0 -> 255 to -126 -> 126

Computer Vision

When processing computer vision we can make use of something like Colabs to ensure that we have a GPU to run on, otherwise these tasks can take a very long time, when setting up we'll use the Python with GPU configuration on Colab

When using image type data there are some distinctions to when we use NN's for other tasks:

  • Usually classification
  • Input is now 3 dimensional - heght, width, colour
  • Data is not transformed, no Z-scores or Dummy Variables
  • Processing is much slower
  • Different Layer types such as Dense, Convolutional, and Max Pooling
  • Data will be in the form of image files and not CSV (TF provides some mechanisms to support with this)

Some common ML DB's are the MNIST Digits and MNIST Fashion data which have the same data scructures as well as the CIFAR data which is used for ResNet training

Convolutional Neural Networks

A Convolution Layer is a layer type that's able to scan across the previous layer, this allows it to identify features that are positioned relative to other features

In a Convolution Layer some of the things we need to specify are:

  • Number of filters
  • Filter size
  • Stride
  • Padding
  • Activation Function

Max Pooling

After a colvolution we may want to subsample the previous Convolution layer in order to either connect to an output based on a Dense layer or pass it into another Convolution Layer to identify even higher order features

Max pooling layers help us to decrease resolution

MNIST Digit Dataset

Importing Data

We can import the MNIST dataset from TF to use like so:

%matplotlib inline

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
from IPython.display import display

import tensorflow.keras 
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Conv2D, MaxPooling2D, Flatten
from tensorflow.keras import backend as K
from tensorflow.keras import regularizers
from tensorflow.keras.datasets import mnist

(X_train, y_train), (X_test, y_test) = mnist.load_data()

print(f"Training: X {X_train.shape} Y {y_train.shape}")
print(f"Testing : X {X_test.shape} Y {y_test.shape}")
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11493376/11490434 [==============================] - 0s 0us/step
Training: X (60000, 28, 28) Y (60000,)
Testing : X (10000, 28, 28) Y (10000,)

Based on the above we can see that we have a set of images with a size of 28x28. We can view the raw data for one of these with:

pd.DataFrame(X_train[0])
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 0 0 3 18 18 18 126 136 175 26 166 255 247 127 0 0 0 0
6 0 0 0 0 0 0 0 0 30 36 94 154 170 253 253 253 253 253 225 172 253 242 195 64 0 0 0 0
7 0 0 0 0 0 0 0 49 238 253 253 253 253 253 253 253 253 251 93 82 82 56 39 0 0 0 0 0
8 0 0 0 0 0 0 0 18 219 253 253 253 253 253 198 182 247 241 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 80 156 107 253 253 205 11 0 43 154 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 14 1 154 253 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11 0 0 0 0 0 0 0 0 0 0 0 139 253 190 2 0 0 0 0 0 0 0 0 0 0 0 0 0
12 0 0 0 0 0 0 0 0 0 0 0 11 190 253 70 0 0 0 0 0 0 0 0 0 0 0 0 0
13 0 0 0 0 0 0 0 0 0 0 0 0 35 241 225 160 108 1 0 0 0 0 0 0 0 0 0 0
14 0 0 0 0 0 0 0 0 0 0 0 0 0 81 240 253 253 119 25 0 0 0 0 0 0 0 0 0
15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 45 186 253 253 150 27 0 0 0 0 0 0 0 0
16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 93 252 253 187 0 0 0 0 0 0 0 0
17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 249 253 249 64 0 0 0 0 0 0 0
18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 46 130 183 253 253 207 2 0 0 0 0 0 0 0
19 0 0 0 0 0 0 0 0 0 0 0 0 39 148 229 253 253 253 250 182 0 0 0 0 0 0 0 0
20 0 0 0 0 0 0 0 0 0 0 24 114 221 253 253 253 253 201 78 0 0 0 0 0 0 0 0 0
21 0 0 0 0 0 0 0 0 23 66 213 253 253 253 253 198 81 2 0 0 0 0 0 0 0 0 0 0
22 0 0 0 0 0 0 18 171 219 253 253 253 253 195 80 9 0 0 0 0 0 0 0 0 0 0 0 0
23 0 0 0 0 55 172 226 253 253 253 253 244 133 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0
24 0 0 0 0 136 253 253 253 212 135 132 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
26 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Or as an image using plt.imshow:

plt.imshow(X_train[0], cmap='gray', interpolation='nearest')
<matplotlib.image.AxesImage at 0x7fc62a5c62b0>

Training a Network

Preprocessing

Before training a network we'll do some preprocessing to format the data into somehting we can use directly by our network:

batch_size = 128
num_classes = 10
epochs = 12
img_rows, img_cols = 28, 28
# the below may be necessary to reshape the data based on the Keras backend
# for example there could be different image format requirements for TF vs
# another library that Keras is compatible with

if K.image_data_format() == 'channels_first':
    X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)
    X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
    X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

# normalize the X
X_train_norm = X_train.astype('float32') / 255
X_test_norm = X_test.astype('float32') / 255

# categorize the y
y_train_cat = to_categorical(y_train, num_classes)
y_test_cat = to_categorical(y_test, num_classes)
input_shape
(28, 28, 1)
X_train_norm.shape, X_test_norm.shape
((60000, 28, 28, 1), (10000, 28, 28, 1))
y_train_cat.shape, y_test_cat.shape
((60000, 10), (10000, 10))

Train Model

  1. Define Sequential Model
  2. Create a few Conv2D layers
  3. Use a MaxPooling2D layer to reduce the resolution
  4. Flatten the data to pass to a Dense Layer
  5. Use a Dense Layer
  6. Add some Dropout
  7. Add the output Dense Layer
model = Sequential()

model.add(Conv2D(
    64,                     # number of filters
    (3, 3),                 # kernal size     
    activation='relu',      # activation function
    input_shape=input_shape # input shape
))

model.add(Conv2D(
    64,
    (3, 3),
    activation='relu'
))

model.add(MaxPooling2D(
    pool_size=(2, 2)
))

model.add(Dropout(0.25))

model.add(Flatten())       # always need to flatten when moving from Conv Layer

model.add(Dense(
    num_classes, 
    activation='softmax'
))

model.compile(
    loss='categorical_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 26, 26, 64)        640       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 24, 24, 64)        36928     
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 12, 12, 64)        0         
_________________________________________________________________
dropout (Dropout)            (None, 12, 12, 64)        0         
_________________________________________________________________
flatten (Flatten)            (None, 9216)              0         
_________________________________________________________________
dense (Dense)                (None, 10)                92170     
=================================================================
Total params: 129,738
Trainable params: 129,738
Non-trainable params: 0
_________________________________________________________________

Next, we can fit the model. Additionally we have some code to see how long the overall runtime will be for the training

import time

print(f"Start: {time.ctime()}")

model.fit(
    X_train_norm, y_train_cat,
    batch_size=batch_size,
    epochs=epochs,
    verbose=2,
    validation_data=(X_test_norm, y_test_cat)
)

print(f"End: {time.ctime()}")
Start: Sun Mar 22 15:28:46 2020
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
60000/60000 - 14s - loss: 0.2013 - accuracy: 0.9408 - val_loss: 0.0561 - val_accuracy: 0.9831
Epoch 2/12
60000/60000 - 8s - loss: 0.0622 - accuracy: 0.9809 - val_loss: 0.0469 - val_accuracy: 0.9849
Epoch 3/12
60000/60000 - 8s - loss: 0.0470 - accuracy: 0.9856 - val_loss: 0.0423 - val_accuracy: 0.9862
Epoch 4/12
60000/60000 - 8s - loss: 0.0363 - accuracy: 0.9888 - val_loss: 0.0361 - val_accuracy: 0.9885
Epoch 5/12
60000/60000 - 8s - loss: 0.0313 - accuracy: 0.9900 - val_loss: 0.0373 - val_accuracy: 0.9874
Epoch 6/12
60000/60000 - 8s - loss: 0.0273 - accuracy: 0.9912 - val_loss: 0.0344 - val_accuracy: 0.9882
Epoch 7/12
60000/60000 - 8s - loss: 0.0226 - accuracy: 0.9927 - val_loss: 0.0340 - val_accuracy: 0.9889
Epoch 8/12
60000/60000 - 8s - loss: 0.0194 - accuracy: 0.9934 - val_loss: 0.0410 - val_accuracy: 0.9887
Epoch 9/12
60000/60000 - 8s - loss: 0.0168 - accuracy: 0.9943 - val_loss: 0.0355 - val_accuracy: 0.9894
Epoch 10/12
60000/60000 - 8s - loss: 0.0146 - accuracy: 0.9949 - val_loss: 0.0378 - val_accuracy: 0.9885
Epoch 11/12
60000/60000 - 8s - loss: 0.0133 - accuracy: 0.9952 - val_loss: 0.0400 - val_accuracy: 0.9884
Epoch 12/12
60000/60000 - 8s - loss: 0.0123 - accuracy: 0.9960 - val_loss: 0.0352 - val_accuracy: 0.9902
End: Sun Mar 22 15:30:29 2020

Evaluate Accuracy

Next we'll evaluate the accuracy of the models using our normal method:

score = model.evaluate(
    X_test_norm, 
    y_test_cat, 
    verbose=0
)

print(f"Loss    : {score[0]}")
print(f"Accuracy: {score[1]}")
Loss    : 0.03515712368493842
Accuracy: 0.9901999831199646

MNIST Fashion Dataset

Import Data

from tensorflow.keras.datasets import fashion_mnist

(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()

print(f"Training: X {X_train.shape} Y {y_train.shape}")
print(f"Testing : X {X_test.shape} Y {y_test.shape}")
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
32768/29515 [=================================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
26427392/26421880 [==============================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
8192/5148 [===============================================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz
4423680/4422102 [==============================] - 0s 0us/step
Training: X (60000, 28, 28) Y (60000,)
Testing : X (10000, 28, 28) Y (10000,)

The Fashion Dataset pretty much works as a drop-in for the Digits dataset, we can just copy all the data from the above as-is and we should be able to train the model

View Data

pd.DataFrame(X_train[0])
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 13 73 0 0 1 4 0 0 0 0 1 1 0
4 0 0 0 0 0 0 0 0 0 0 0 0 3 0 36 136 127 62 54 0 0 0 1 3 4 0 0 3
5 0 0 0 0 0 0 0 0 0 0 0 0 6 0 102 204 176 134 144 123 23 0 0 0 0 12 10 0
6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 236 207 178 107 156 161 109 64 23 77 130 72 15
7 0 0 0 0 0 0 0 0 0 0 0 1 0 69 207 223 218 216 216 163 127 121 122 146 141 88 172 66
8 0 0 0 0 0 0 0 0 0 1 1 1 0 200 232 232 233 229 223 223 215 213 164 127 123 196 229 0
9 0 0 0 0 0 0 0 0 0 0 0 0 0 183 225 216 223 228 235 227 224 222 224 221 223 245 173 0
10 0 0 0 0 0 0 0 0 0 0 0 0 0 193 228 218 213 198 180 212 210 211 213 223 220 243 202 0
11 0 0 0 0 0 0 0 0 0 1 3 0 12 219 220 212 218 192 169 227 208 218 224 212 226 197 209 52
12 0 0 0 0 0 0 0 0 0 0 6 0 99 244 222 220 218 203 198 221 215 213 222 220 245 119 167 56
13 0 0 0 0 0 0 0 0 0 4 0 0 55 236 228 230 228 240 232 213 218 223 234 217 217 209 92 0
14 0 0 1 4 6 7 2 0 0 0 0 0 237 226 217 223 222 219 222 221 216 223 229 215 218 255 77 0
15 0 3 0 0 0 0 0 0 0 62 145 204 228 207 213 221 218 208 211 218 224 223 219 215 224 244 159 0
16 0 0 0 0 18 44 82 107 189 228 220 222 217 226 200 205 211 230 224 234 176 188 250 248 233 238 215 0
17 0 57 187 208 224 221 224 208 204 214 208 209 200 159 245 193 206 223 255 255 221 234 221 211 220 232 246 0
18 3 202 228 224 221 211 211 214 205 205 205 220 240 80 150 255 229 221 188 154 191 210 204 209 222 228 225 0
19 98 233 198 210 222 229 229 234 249 220 194 215 217 241 65 73 106 117 168 219 221 215 217 223 223 224 229 29
20 75 204 212 204 193 205 211 225 216 185 197 206 198 213 240 195 227 245 239 223 218 212 209 222 220 221 230 67
21 48 203 183 194 213 197 185 190 194 192 202 214 219 221 220 236 225 216 199 206 186 181 177 172 181 205 206 115
22 0 122 219 193 179 171 183 196 204 210 213 207 211 210 200 196 194 191 195 191 198 192 176 156 167 177 210 92
23 0 0 74 189 212 191 175 172 175 181 185 188 189 188 193 198 204 209 210 210 211 188 188 194 192 216 170 0
24 2 0 0 0 66 200 222 237 239 242 246 243 244 221 220 193 191 179 182 182 181 176 166 168 99 58 0 0
25 0 0 0 0 0 0 0 40 61 44 72 41 35 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
26 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
plt.imshow(X_train[0], cmap='gray', interpolation='nearest')
<matplotlib.image.AxesImage at 0x7fc607fb3ef0>

Preprocess Data

batch_size = 128
num_classes = 10
epochs = 12
img_rows, img_cols = 28, 28
# the below may be necessary to reshape the data based on the Keras backend
# for example there could be different image format requirements for TF vs
# another library that Keras is compatible with

if K.image_data_format() == 'channels_first':
    X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)
    X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
    X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

# normalize the X
X_train_norm = X_train.astype('float32') / 255
X_test_norm = X_test.astype('float32') / 255

# categorize the y
y_train_cat = to_categorical(y_train, num_classes)
y_test_cat = to_categorical(y_test, num_classes)
input_shape
(28, 28, 1)
X_train_norm.shape, X_test_norm.shape
((60000, 28, 28, 1), (10000, 28, 28, 1))
y_train_cat.shape, y_test_cat.shape
((60000, 10), (10000, 10))

Define Model

model = Sequential()

model.add(Conv2D(
    64,                     # number of filters
    (3, 3),                 # kernal size     
    activation='relu',      # activation function
    input_shape=input_shape # input shape
))

model.add(Conv2D(
    64,
    (3, 3),
    activation='relu'
))

model.add(MaxPooling2D(
    pool_size=(2, 2)
))

model.add(Dropout(0.25))

model.add(Flatten())       # always need to flatten when moving from Conv Layer

model.add(Dense(
    num_classes, 
    activation='softmax'
))

model.compile(
    loss='categorical_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_2 (Conv2D)            (None, 26, 26, 64)        640       
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 24, 24, 64)        36928     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 12, 12, 64)        0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 12, 12, 64)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 9216)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                92170     
=================================================================
Total params: 129,738
Trainable params: 129,738
Non-trainable params: 0
_________________________________________________________________

Train Model

import time

print(f"Start: {time.ctime()}")

model.fit(
    X_train_norm, y_train_cat,
    batch_size=batch_size,
    epochs=epochs,
    verbose=2,
    validation_data=(X_test_norm, y_test_cat)
)

print(f"End: {time.ctime()}")
Start: Sun Mar 22 15:30:32 2020
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
60000/60000 - 9s - loss: 0.4766 - accuracy: 0.8304 - val_loss: 0.3773 - val_accuracy: 0.8646
Epoch 2/12
60000/60000 - 8s - loss: 0.3117 - accuracy: 0.8907 - val_loss: 0.3111 - val_accuracy: 0.8879
Epoch 3/12
60000/60000 - 8s - loss: 0.2711 - accuracy: 0.9044 - val_loss: 0.2719 - val_accuracy: 0.9026
Epoch 4/12
60000/60000 - 8s - loss: 0.2426 - accuracy: 0.9133 - val_loss: 0.2580 - val_accuracy: 0.9098
Epoch 5/12
60000/60000 - 8s - loss: 0.2197 - accuracy: 0.9201 - val_loss: 0.2515 - val_accuracy: 0.9089
Epoch 6/12
60000/60000 - 8s - loss: 0.2031 - accuracy: 0.9262 - val_loss: 0.2401 - val_accuracy: 0.9152
Epoch 7/12
60000/60000 - 8s - loss: 0.1898 - accuracy: 0.9316 - val_loss: 0.2320 - val_accuracy: 0.9170
Epoch 8/12
60000/60000 - 8s - loss: 0.1785 - accuracy: 0.9348 - val_loss: 0.2334 - val_accuracy: 0.9182
Epoch 9/12
60000/60000 - 8s - loss: 0.1670 - accuracy: 0.9391 - val_loss: 0.2293 - val_accuracy: 0.9207
Epoch 10/12
60000/60000 - 8s - loss: 0.1547 - accuracy: 0.9437 - val_loss: 0.2274 - val_accuracy: 0.9230
Epoch 11/12
60000/60000 - 8s - loss: 0.1469 - accuracy: 0.9459 - val_loss: 0.2404 - val_accuracy: 0.9177
Epoch 12/12
60000/60000 - 8s - loss: 0.1389 - accuracy: 0.9491 - val_loss: 0.2251 - val_accuracy: 0.9229
End: Sun Mar 22 15:32:09 2020

Evaluate Model

score = model.evaluate(
    X_test_norm, 
    y_test_cat, 
    verbose=0
)

print(f"Loss    : {score[0]}")
print(f"Accuracy: {score[1]}")
Loss    : 0.22508650785684586
Accuracy: 0.9229000210762024

ResNets in Keras

A Risidual Layer, also known as a skip layer, allows us to add some previous output as an additional input to another layer. This enables our network to go deeper than they would under normal circumstances while still showing a potential improvement in the output

We can look at implementing a ResNet using the CIFAR Dataset

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import os

from six.moves import cPickle 

import tensorflow.keras
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.layers import Dense, Conv2D, BatchNormalization, Activation
from tensorflow.keras.layers import AveragePooling2D, Input, Flatten
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import ModelCheckpoint, LearningRateScheduler
from tensorflow.keras.callbacks import ReduceLROnPlateau
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.regularizers import l2
from tensorflow.keras import backend as K
from tensorflow.keras.models import Model

Import the Data

from tensorflow.keras.datasets import cifar10

(X_train, y_train), (X_test, y_test) = cifar10.load_data()
Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
170500096/170498071 [==============================] - 6s 0us/step
(X_train.shape, X_test.shape), (y_train.shape, y_test.shape)
(((50000, 32, 32, 3), (10000, 32, 32, 3)), ((50000, 1), (10000, 1)))
plt.imshow(X_train[0], cmap='gray', interpolation='nearest')
<matplotlib.image.AxesImage at 0x7fc60719bdd8>

Constants for Training

# Training parameters
BATCH_SIZE = 32  # orig paper trained all networks with batch_size=128
EPOCHS = 200 # 200
USE_AUGMENTATION = True
NUM_CLASSES = np.unique(y_train).shape[0] # 10
COLORS = X_train.shape[3]

# Subtracting pixel mean improves accuracy
# This centers the pixel values around 0
SUBTRACT_PIXEL_MEAN = True

# Model version
# Orig paper: version = 1 (ResNet v1), Improved ResNet: version = 2 (ResNet v2)
VERSION = 1

# Computed depth from supplied model parameter n
if VERSION == 1:
    DEPTH = COLORS * 6 + 2
elif version == 2:
    DEPTH = COLORS * 9 + 2

Defining the ResNet Functions

The different ResNet functions based on the two papers can be seen defined below. They both make use of the common resnet_layer function definition

The papers are:

  1. ResNet v1: K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385,2015.
  2. ResNet v2: He, K., Zhang, X., Ren, S., & Sun, J. (2016, October). Identity mappings in deep residual networks. In European conference on computer vision (pp. 630-645). Springer, Cham.

The difference between the two is that V2 makes use of batch normalization before each weight layer

ResNet Layer Definition

def lr_schedule(epoch):
    """Learning Rate Schedule

    Learning rate is scheduled to be reduced after 80, 120, 160, 180 epochs.
    Called automatically every epoch as part of callbacks during training.

    # Arguments
        epoch (int): The number of epochs

    # Returns
        lr (float32): learning rate
    """
    lr = 1e-3
    if epoch > 180:
        lr *= 0.5e-3
    elif epoch > 160:
        lr *= 1e-3
    elif epoch > 120:
        lr *= 1e-2
    elif epoch > 80:
        lr *= 1e-1
    print('Learning rate: ', lr)
    return lr
def resnet_layer(inputs,
                 num_filters=16,
                 kernel_size=3,
                 strides=1,
                 activation='relu',
                 batch_normalization=True,
                 conv_first=True):
    """2D Convolution-Batch Normalization-Activation stack builder

    # Arguments
        inputs (tensor): input tensor from input image or previous layer
        num_filters (int): Conv2D number of filters
        kernel_size (int): Conv2D square kernel dimensions
        strides (int): Conv2D square stride dimensions
        activation (string): activation name
        batch_normalization (bool): whether to include batch normalization
        conv_first (bool): conv-bn-activation (True) or
            bn-activation-conv (False)

    # Returns
        x (tensor): tensor as input to the next layer
    """
    conv = Conv2D(num_filters,
                  kernel_size=kernel_size,
                  strides=strides,
                  padding='same',
                  kernel_initializer='he_normal',
                  kernel_regularizer=l2(1e-4))

    x = inputs
    if conv_first:
        x = conv(x)
        if batch_normalization:
            x = BatchNormalization()(x)
        if activation is not None:
            x = Activation(activation)(x)
    else:
        if batch_normalization:
            x = BatchNormalization()(x)
        if activation is not None:
            x = Activation(activation)(x)
        x = conv(x)
    return x

ResNet v1

def resnet_v1(input_shape, depth, num_classes=10):
    """ResNet Version 1 Model builder [a]

    Stacks of 2 x (3 x 3) Conv2D-BN-ReLU
    Last ReLU is after the shortcut connection.
    At the beginning of each stage, the feature map size is halved (downsampled)
    by a convolutional layer with strides=2, while the number of filters is
    doubled. Within each stage, the layers have the same number filters and the
    same number of filters.
    Features maps sizes:
    stage 0: 32x32, 16
    stage 1: 16x16, 32
    stage 2:  8x8,  64
    The Number of parameters is approx the same as Table 6 of [a]:
    ResNet20 0.27M
    ResNet32 0.46M
    ResNet44 0.66M
    ResNet56 0.85M
    ResNet110 1.7M

    # Arguments
        input_shape (tensor): shape of input image tensor
        depth (int): number of core convolutional layers
        num_classes (int): number of classes (CIFAR10 has 10)

    # Returns
        model (Model): Keras model instance
    """
    if (depth - 2) % 6 != 0:
        raise ValueError('depth should be 6n+2 (eg 20, 32, 44 in [a])')
    # Start model definition.
    num_filters = 16
    num_res_blocks = int((depth - 2) / 6)

    inputs = Input(shape=input_shape)
    x = resnet_layer(inputs=inputs)
    # Instantiate the stack of residual units
    for stack in range(3):
        for res_block in range(num_res_blocks):
            strides = 1
            if stack > 0 and res_block == 0:  # first layer but not first stack
                strides = 2  # downsample
            y = resnet_layer(inputs=x,
                             num_filters=num_filters,
                             strides=strides)
            y = resnet_layer(inputs=y,
                             num_filters=num_filters,
                             activation=None)
            if stack > 0 and res_block == 0:  # first layer but not first stack
                # linear projection residual shortcut connection to match
                # changed dims
                x = resnet_layer(inputs=x,
                                 num_filters=num_filters,
                                 kernel_size=1,
                                 strides=strides,
                                 activation=None,
                                 batch_normalization=False)
            x = tensorflow.keras.layers.add([x, y])
            x = Activation('relu')(x)
        num_filters *= 2

    # Add classifier on top.
    # v1 does not use BN after last shortcut connection-ReLU
    x = AveragePooling2D(pool_size=8)(x)
    y = Flatten()(x)
    outputs = Dense(num_classes,
                    activation='softmax',
                    kernel_initializer='he_normal')(y)

    # Instantiate model.
    model = Model(inputs=inputs, outputs=outputs)
    return model

ResNet v2

def resnet_v2(input_shape, depth, num_classes=10):
    """ResNet Version 2 Model builder [b]

    Stacks of (1 x 1)-(3 x 3)-(1 x 1) BN-ReLU-Conv2D or also known as
    bottleneck layer
    First shortcut connection per layer is 1 x 1 Conv2D.
    Second and onwards shortcut connection is identity.
    At the beginning of each stage, the feature map size is halved (downsampled)
    by a convolutional layer with strides=2, while the number of filter maps is
    doubled. Within each stage, the layers have the same number filters and the
    same filter map sizes.
    Features maps sizes:
    conv1  : 32x32,  16
    stage 0: 32x32,  64
    stage 1: 16x16, 128
    stage 2:  8x8,  256

    # Arguments
        input_shape (tensor): shape of input image tensor
        depth (int): number of core convolutional layers
        num_classes (int): number of classes (CIFAR10 has 10)

    # Returns
        model (Model): Keras model instance
    """
    if (depth - 2) % 9 != 0:
        raise ValueError('depth should be 9n+2 (eg 56 or 110 in [b])')
    # Start model definition.
    num_filters_in = 16
    num_res_blocks = int((depth - 2) / 9)

    inputs = Input(shape=input_shape)
    # v2 performs Conv2D with BN-ReLU on input before splitting into 2 paths
    x = resnet_layer(inputs=inputs,
                     num_filters=num_filters_in,
                     conv_first=True)

    # Instantiate the stack of residual units
    for stage in range(3):
        for res_block in range(num_res_blocks):
            activation = 'relu'
            batch_normalization = True
            strides = 1
            if stage == 0:
                num_filters_out = num_filters_in * 4
                if res_block == 0:  # first layer and first stage
                    activation = None
                    batch_normalization = False
            else:
                num_filters_out = num_filters_in * 2
                if res_block == 0:  # first layer but not first stage
                    strides = 2    # downsample

            # bottleneck residual unit
            y = resnet_layer(inputs=x,
                             num_filters=num_filters_in,
                             kernel_size=1,
                             strides=strides,
                             activation=activation,
                             batch_normalization=batch_normalization,
                             conv_first=False)
            y = resnet_layer(inputs=y,
                             num_filters=num_filters_in,
                             conv_first=False)
            y = resnet_layer(inputs=y,
                             num_filters=num_filters_out,
                             kernel_size=1,
                             conv_first=False)
            if res_block == 0:
                # linear projection residual shortcut connection to match
                # changed dims
                x = resnet_layer(inputs=x,
                                 num_filters=num_filters_out,
                                 kernel_size=1,
                                 strides=strides,
                                 activation=None,
                                 batch_normalization=False)
            x = tensorflow.keras.layers.add([x, y])

        num_filters_in = num_filters_out

    # Add classifier on top.
    # v2 has BN-ReLU before Pooling
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = AveragePooling2D(pool_size=8)(x)
    y = Flatten()(x)
    outputs = Dense(num_classes,
                    activation='softmax',
                    kernel_initializer='he_normal')(y)

    # Instantiate model.
    model = Model(inputs=inputs, outputs=outputs)
    return model

Normalize Data

# Input image dimensions
input_shape = X_train.shape[1:]

# Normalize data
X_train_norm = X_train.astype('float32') / 255
X_test_norm = X_test.astype('float32') / 255

if SUBTRACT_PIXEL_MEAN:
    X_train_mean = np.mean(X_train, axis=0)
    X_train_norm -= X_train_mean
    X_test_norm -= X_train_mean

# Categorize target
y_train_cat = to_categorical(y_train, NUM_CLASSES)
y_test_cat = to_categorical(y_test, NUM_CLASSES)

Define Model Based on Version

if VERSION == 2:
    model = resnet_v2(input_shape=input_shape, depth=DEPTH)
else:
    model = resnet_v1(input_shape=input_shape, depth=DEPTH)

model.compile(
    loss='categorical_crossentropy',
    optimizer=Adam(lr=lr_schedule(0)),
    metrics=['accuracy']
)

model.summary()
Learning rate:  0.001
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            [(None, 32, 32, 3)]  0                                            
__________________________________________________________________________________________________
conv2d_4 (Conv2D)               (None, 32, 32, 16)   448         input_1[0][0]                    
__________________________________________________________________________________________________
batch_normalization (BatchNorma (None, 32, 32, 16)   64          conv2d_4[0][0]                   
__________________________________________________________________________________________________
activation (Activation)         (None, 32, 32, 16)   0           batch_normalization[0][0]        
__________________________________________________________________________________________________
conv2d_5 (Conv2D)               (None, 32, 32, 16)   2320        activation[0][0]                 
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 32, 32, 16)   64          conv2d_5[0][0]                   
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 32, 32, 16)   0           batch_normalization_1[0][0]      
__________________________________________________________________________________________________
conv2d_6 (Conv2D)               (None, 32, 32, 16)   2320        activation_1[0][0]               
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 32, 32, 16)   64          conv2d_6[0][0]                   
__________________________________________________________________________________________________
add (Add)                       (None, 32, 32, 16)   0           activation[0][0]                 
                                                                 batch_normalization_2[0][0]      
__________________________________________________________________________________________________
activation_2 (Activation)       (None, 32, 32, 16)   0           add[0][0]                        
__________________________________________________________________________________________________
conv2d_7 (Conv2D)               (None, 32, 32, 16)   2320        activation_2[0][0]               
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 32, 32, 16)   64          conv2d_7[0][0]                   
__________________________________________________________________________________________________
activation_3 (Activation)       (None, 32, 32, 16)   0           batch_normalization_3[0][0]      
__________________________________________________________________________________________________
conv2d_8 (Conv2D)               (None, 32, 32, 16)   2320        activation_3[0][0]               
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 32, 32, 16)   64          conv2d_8[0][0]                   
__________________________________________________________________________________________________
add_1 (Add)                     (None, 32, 32, 16)   0           activation_2[0][0]               
                                                                 batch_normalization_4[0][0]      
__________________________________________________________________________________________________
activation_4 (Activation)       (None, 32, 32, 16)   0           add_1[0][0]                      
__________________________________________________________________________________________________
conv2d_9 (Conv2D)               (None, 32, 32, 16)   2320        activation_4[0][0]               
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, 32, 32, 16)   64          conv2d_9[0][0]                   
__________________________________________________________________________________________________
activation_5 (Activation)       (None, 32, 32, 16)   0           batch_normalization_5[0][0]      
__________________________________________________________________________________________________
conv2d_10 (Conv2D)              (None, 32, 32, 16)   2320        activation_5[0][0]               
__________________________________________________________________________________________________
batch_normalization_6 (BatchNor (None, 32, 32, 16)   64          conv2d_10[0][0]                  
__________________________________________________________________________________________________
add_2 (Add)                     (None, 32, 32, 16)   0           activation_4[0][0]               
                                                                 batch_normalization_6[0][0]      
__________________________________________________________________________________________________
activation_6 (Activation)       (None, 32, 32, 16)   0           add_2[0][0]                      
__________________________________________________________________________________________________
conv2d_11 (Conv2D)              (None, 16, 16, 32)   4640        activation_6[0][0]               
__________________________________________________________________________________________________
batch_normalization_7 (BatchNor (None, 16, 16, 32)   128         conv2d_11[0][0]                  
__________________________________________________________________________________________________
activation_7 (Activation)       (None, 16, 16, 32)   0           batch_normalization_7[0][0]      
__________________________________________________________________________________________________
conv2d_12 (Conv2D)              (None, 16, 16, 32)   9248        activation_7[0][0]               
__________________________________________________________________________________________________
conv2d_13 (Conv2D)              (None, 16, 16, 32)   544         activation_6[0][0]               
__________________________________________________________________________________________________
batch_normalization_8 (BatchNor (None, 16, 16, 32)   128         conv2d_12[0][0]                  
__________________________________________________________________________________________________
add_3 (Add)                     (None, 16, 16, 32)   0           conv2d_13[0][0]                  
                                                                 batch_normalization_8[0][0]      
__________________________________________________________________________________________________
activation_8 (Activation)       (None, 16, 16, 32)   0           add_3[0][0]                      
__________________________________________________________________________________________________
conv2d_14 (Conv2D)              (None, 16, 16, 32)   9248        activation_8[0][0]               
__________________________________________________________________________________________________
batch_normalization_9 (BatchNor (None, 16, 16, 32)   128         conv2d_14[0][0]                  
__________________________________________________________________________________________________
activation_9 (Activation)       (None, 16, 16, 32)   0           batch_normalization_9[0][0]      
__________________________________________________________________________________________________
conv2d_15 (Conv2D)              (None, 16, 16, 32)   9248        activation_9[0][0]               
__________________________________________________________________________________________________
batch_normalization_10 (BatchNo (None, 16, 16, 32)   128         conv2d_15[0][0]                  
__________________________________________________________________________________________________
add_4 (Add)                     (None, 16, 16, 32)   0           activation_8[0][0]               
                                                                 batch_normalization_10[0][0]     
__________________________________________________________________________________________________
activation_10 (Activation)      (None, 16, 16, 32)   0           add_4[0][0]                      
__________________________________________________________________________________________________
conv2d_16 (Conv2D)              (None, 16, 16, 32)   9248        activation_10[0][0]              
__________________________________________________________________________________________________
batch_normalization_11 (BatchNo (None, 16, 16, 32)   128         conv2d_16[0][0]                  
__________________________________________________________________________________________________
activation_11 (Activation)      (None, 16, 16, 32)   0           batch_normalization_11[0][0]     
__________________________________________________________________________________________________
conv2d_17 (Conv2D)              (None, 16, 16, 32)   9248        activation_11[0][0]              
__________________________________________________________________________________________________
batch_normalization_12 (BatchNo (None, 16, 16, 32)   128         conv2d_17[0][0]                  
__________________________________________________________________________________________________
add_5 (Add)                     (None, 16, 16, 32)   0           activation_10[0][0]              
                                                                 batch_normalization_12[0][0]     
__________________________________________________________________________________________________
activation_12 (Activation)      (None, 16, 16, 32)   0           add_5[0][0]                      
__________________________________________________________________________________________________
conv2d_18 (Conv2D)              (None, 8, 8, 64)     18496       activation_12[0][0]              
__________________________________________________________________________________________________
batch_normalization_13 (BatchNo (None, 8, 8, 64)     256         conv2d_18[0][0]                  
__________________________________________________________________________________________________
activation_13 (Activation)      (None, 8, 8, 64)     0           batch_normalization_13[0][0]     
__________________________________________________________________________________________________
conv2d_19 (Conv2D)              (None, 8, 8, 64)     36928       activation_13[0][0]              
__________________________________________________________________________________________________
conv2d_20 (Conv2D)              (None, 8, 8, 64)     2112        activation_12[0][0]              
__________________________________________________________________________________________________
batch_normalization_14 (BatchNo (None, 8, 8, 64)     256         conv2d_19[0][0]                  
__________________________________________________________________________________________________
add_6 (Add)                     (None, 8, 8, 64)     0           conv2d_20[0][0]                  
                                                                 batch_normalization_14[0][0]     
__________________________________________________________________________________________________
activation_14 (Activation)      (None, 8, 8, 64)     0           add_6[0][0]                      
__________________________________________________________________________________________________
conv2d_21 (Conv2D)              (None, 8, 8, 64)     36928       activation_14[0][0]              
__________________________________________________________________________________________________
batch_normalization_15 (BatchNo (None, 8, 8, 64)     256         conv2d_21[0][0]                  
__________________________________________________________________________________________________
activation_15 (Activation)      (None, 8, 8, 64)     0           batch_normalization_15[0][0]     
__________________________________________________________________________________________________
conv2d_22 (Conv2D)              (None, 8, 8, 64)     36928       activation_15[0][0]              
__________________________________________________________________________________________________
batch_normalization_16 (BatchNo (None, 8, 8, 64)     256         conv2d_22[0][0]                  
__________________________________________________________________________________________________
add_7 (Add)                     (None, 8, 8, 64)     0           activation_14[0][0]              
                                                                 batch_normalization_16[0][0]     
__________________________________________________________________________________________________
activation_16 (Activation)      (None, 8, 8, 64)     0           add_7[0][0]                      
__________________________________________________________________________________________________
conv2d_23 (Conv2D)              (None, 8, 8, 64)     36928       activation_16[0][0]              
__________________________________________________________________________________________________
batch_normalization_17 (BatchNo (None, 8, 8, 64)     256         conv2d_23[0][0]                  
__________________________________________________________________________________________________
activation_17 (Activation)      (None, 8, 8, 64)     0           batch_normalization_17[0][0]     
__________________________________________________________________________________________________
conv2d_24 (Conv2D)              (None, 8, 8, 64)     36928       activation_17[0][0]              
__________________________________________________________________________________________________
batch_normalization_18 (BatchNo (None, 8, 8, 64)     256         conv2d_24[0][0]                  
__________________________________________________________________________________________________
add_8 (Add)                     (None, 8, 8, 64)     0           activation_16[0][0]              
                                                                 batch_normalization_18[0][0]     
__________________________________________________________________________________________________
activation_18 (Activation)      (None, 8, 8, 64)     0           add_8[0][0]                      
__________________________________________________________________________________________________
average_pooling2d (AveragePooli (None, 1, 1, 64)     0           activation_18[0][0]              
__________________________________________________________________________________________________
flatten_2 (Flatten)             (None, 64)           0           average_pooling2d[0][0]          
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 10)           650         flatten_2[0][0]                  
==================================================================================================
Total params: 274,442
Trainable params: 273,066
Non-trainable params: 1,376
__________________________________________________________________________________________________

Train Model

# Prepare callbacks for model saving and for learning rate adjustment.
lr_scheduler = LearningRateScheduler(lr_schedule)

lr_reducer = ReduceLROnPlateau(
    factor=np.sqrt(0.1),
    cooldown=0,
    patience=5,
    min_lr=0.5e-6
)

callbacks = [lr_reducer, lr_scheduler]

In the below section we have a choice to use image augmentation which will apply random transformations like resizing and moving around the image so the model does not overfit, it's not really doing anything more complicated than that

import time

print(f"Start: {time.ctime()}")

# Run training, with or without data augmentation.
if not USE_AUGMENTATION:
    print('Not using data augmentation.')
    model.fit(
        X_train_norm, y_train_cat,
        batch_size=BATCH_SIZE,
        epochs=EPOCHS,
        validation_data=(X_test_norm, y_test_cat),
        shuffle=True,
        callbacks=callbacks
    )
else:
    print('Using real-time data augmentation.')
    # This will do preprocessing and realtime data augmentation:
    datagen = ImageDataGenerator(
        # set input mean to 0 over the dataset
        featurewise_center=False,
        # set each sample mean to 0
        samplewise_center=False,
        # divide inputs by std of dataset
        featurewise_std_normalization=False,
        # divide each input by its std
        samplewise_std_normalization=False,
        # apply ZCA whitening
        zca_whitening=False,
        # epsilon for ZCA whitening
        zca_epsilon=1e-06,
        # randomly rotate images in the range (deg 0 to 180)
        rotation_range=0,
        # randomly shift images horizontally
        width_shift_range=0.1,
        # randomly shift images vertically
        height_shift_range=0.1,
        # set range for random shear
        shear_range=0.,
        # set range for random zoom
        zoom_range=0.,
        # set range for random channel shifts
        channel_shift_range=0.,
        # set mode for filling points outside the input boundaries
        fill_mode='nearest',
        # value used for fill_mode = "constant"
        cval=0.,
        # randomly flip images
        horizontal_flip=True,
        # randomly flip images
        vertical_flip=False,
        # set rescaling factor (applied before any other transformation)
        rescale=None,
        # set function that will be applied on each input
        preprocessing_function=None,
        # image data format, either "channels_first" or "channels_last"
        data_format=None,
        # fraction of images reserved for validation (strictly between 0 and 1)
        validation_split=0.0)

    # Compute quantities required for featurewise normalization
    # (std, mean, and principal components if ZCA whitening is applied).
    datagen.fit(X_train_norm)

    model.fit_generator(
        datagen.flow(
          X_train_norm,
          y_train_cat, 
          batch_size=BATCH_SIZE
        ),
        validation_data=(X_test_norm, y_test_cat),
        epochs=EPOCHS, 
        verbose=0, 
        workers=1,
        callbacks=callbacks, 
        use_multiprocessing=False
    )


print(f"End: {time.ctime()}")
Start: Sun Mar 22 15:32:22 2020
Using real-time data augmentation.
WARNING:tensorflow:From <ipython-input-42-28046b8d6223>:76: Model.fit_generator (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
Please use Model.fit, which supports generators.
WARNING:tensorflow:sample_weight modes were coerced from
  ...
    to  
  ['...']
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  0.0001
Learning rate:  1e-05
Learning rate:  1e-05
Learning rate:  1e-05
Learning rate:  1e-05
Learning rate:  1e-05
Learning rate:  1e-05
Learning rate:  1e-05
Learning rate:  1e-05
Learning rate:  1e-05
Learning rate:  1e-05
Learning rate:  1e-05
Learning rate:  1e-05
Learning rate:  1e-05
Learning rate:  1e-05
Learning rate:  1e-05
Learning rate:  1e-05

Evaluate the Model

scores = model.evaluate(X_test_norm, y_test_cat, verbose=1)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])

Using your Own Images with Keras

When we're using common datasets, e.g. from Keras, we have certain convenience methods for accessing and working with the data, like we see below

from tensorflow.keras.datasets import cifar10
import numpy as np

# Load the CIFAR10 data.
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train.shape
(50000, 32, 32, 3)
x_train[0]
array([[[ 59,  62,  63],
        [ 43,  46,  45],
        [ 50,  48,  43],
        ...,
        [158, 132, 108],
        [152, 125, 102],
        [148, 124, 103]],

       [[ 16,  20,  20],
        [  0,   0,   0],
        [ 18,   8,   0],
        ...,
        [123,  88,  55],
        [119,  83,  50],
        [122,  87,  57]],

       [[ 25,  24,  21],
        [ 16,   7,   0],
        [ 49,  27,   8],
        ...,
        [118,  84,  50],
        [120,  84,  50],
        [109,  73,  42]],

       ...,

       [[208, 170,  96],
        [201, 153,  34],
        [198, 161,  26],
        ...,
        [160, 133,  70],
        [ 56,  31,   7],
        [ 53,  34,  20]],

       [[180, 139,  96],
        [173, 123,  42],
        [186, 144,  30],
        ...,
        [184, 148,  94],
        [ 97,  62,  34],
        [ 83,  53,  34]],

       [[177, 144, 116],
        [168, 129,  94],
        [179, 142,  87],
        ...,
        [216, 184, 140],
        [151, 118,  84],
        [123,  92,  72]]], dtype=uint8)

We can see some 32x32 images with a colour depth of 3 (0 - 255)

Usually when using any image training we try to resize/structure our images in a standard size so that we can handle the data consistently

Sometimes we would want to change our RGB values to be between 0 and 1 or -1 and 1 during some preprocessing step

Transforming Images

We can make use of the make_square function below to convert an image to a square. the version below will simply crop off a part of the image in whatever direction the image is longer

%matplotlib inline
from PIL import Image, ImageFile
from matplotlib.pyplot import imshow
import requests
import numpy as np
from io import BytesIO
from IPython.display import display, HTML

IMAGE_WIDTH = 200
IMAGE_HEIGHT = 200
IMAGE_CHANNELS = 3
images = [
    "https://upload.wikimedia.org/wikipedia/commons/9/92/Brookings.jpg",
    "https://upload.wikimedia.org/wikipedia/commons/f/ff/"\
    "WashU_Graham_Chapel.JPG",
    "https://upload.wikimedia.org/wikipedia/commons/9/9e/SeigleHall.jpg",
    "https://upload.wikimedia.org/wikipedia/commons/a/aa/WUSTLKnight.jpg",
    "https://upload.wikimedia.org/wikipedia/commons/3/32/WashUABhall.jpg",
    "https://upload.wikimedia.org/wikipedia/commons/c/c0/Brown_Hall.jpg",
    "https://upload.wikimedia.org/wikipedia/commons/f/f4/South40.jpg"    
]
"""
Trim an image's edges in the longer direction to convert it to a square
"""
def make_square(img):
    cols,rows = img.size
    
    if rows>cols:
        pad = (rows-cols)/2
        img = img.crop((pad,0,cols,cols))
    else:
        pad = (cols-rows)/2
        img = img.crop((0,pad,rows,rows))
    
    return img

Next we will download all the images, convert them to a square, and resize to be our set IMAGE_HEIGHT and IMAGE_WIDTH

training_data = []

for url in images:
    ImageFile.LOAD_TRUNCATED_IMAGES = False
    response = requests.get(url)
    img = Image.open(BytesIO(response.content))
    img.load()
    img = make_square(img)
    img = img.resize((IMAGE_WIDTH,IMAGE_HEIGHT),Image.ANTIALIAS)
    training_data.append(np.asarray(img))

Once we've resized the images we will have a list of arrays, what we do next is transform the list into an array of arrays using the np.array function. The training data is then divided by 127.5 and subtracted by 1 to normalize between -1 and 1

training_data = np.array(training_data) / 127.5 - 1.
training_data.shape
(7, 200, 200, 3)
training_data[0]
array([[[-0.12156863,  0.2627451 ,  0.59215686],
        [-0.12156863,  0.2627451 ,  0.59215686],
        [-0.10588235,  0.27058824,  0.59215686],
        ...,
        [-0.3254902 , -0.05098039,  0.27058824],
        [-0.61568627, -0.33333333,  0.01960784],
        [-0.40392157, -0.11372549,  0.20784314]],

       [[-0.16862745,  0.23921569,  0.59215686],
        [-0.14509804,  0.2627451 ,  0.61568627],
        [-0.11372549,  0.27058824,  0.59215686],
        ...,
        [-0.41176471, -0.25490196, -0.01960784],
        [-0.4745098 , -0.31764706, -0.02745098],
        [-0.81176471, -0.70196078, -0.42745098]],

       [[-0.15294118,  0.24705882,  0.60784314],
        [-0.1372549 ,  0.2627451 ,  0.62352941],
        [-0.10588235,  0.27058824,  0.6       ],
        ...,
        [-0.35686275, -0.15294118,  0.06666667],
        [-0.60784314, -0.37254902, -0.09803922],
        [-0.05882353,  0.18431373,  0.42745098]],

       ...,

       [[-0.00392157, -0.39607843, -0.43529412],
        [-0.01960784, -0.37254902, -0.45882353],
        [-0.05882353, -0.37254902, -0.49019608],
        ...,
        [-0.4745098 , -0.78039216, -0.7254902 ],
        [-0.56078431, -0.77254902, -0.77254902],
        [-0.56078431, -0.76470588, -0.73333333]],

       [[ 0.05098039, -0.33333333, -0.34117647],
        [ 0.01960784, -0.30980392, -0.38823529],
        [-0.05098039, -0.31764706, -0.42745098],
        ...,
        [-0.64705882, -0.81960784, -0.74901961],
        [-0.70980392, -0.85882353, -0.78039216],
        [-0.79607843, -0.81960784, -0.75686275]],

       [[ 0.00392157, -0.38039216, -0.41960784],
        [-0.00392157, -0.34117647, -0.39607843],
        [-0.05098039, -0.34901961, -0.41176471],
        ...,
        [-0.81960784, -0.89019608, -0.75686275],
        [-0.74901961, -0.84313725, -0.71764706],
        [-0.85098039, -0.86666667, -0.74901961]]])

It can also be useful to save the data object for future use. For high dimensional data CSVs don't work, and for large datasets pickle can be problematic. numpy has a way to save binary data to disk with the np.save method:

print("Saving training image binary...")
np.save("training",training_data) # Saves as "training_data.npy"
print("Done.")
Saving training image binary...
Done.

DarkNet and YOLO

YOLO = You Only Look Once

YOLO allows us to recognize multiple objects within the same CNN and is trained using a CNN which outputs a bunch of different layers and with bounding boxes and labels. The CNN makes multiple different sections/predictions but only looks at the input data one

DarkNet is the original implementation of YOLO for C, DarkFlow is the version that can be used from Python

Installing DarkFlow

Because we are using Google CoLabs we need to first mount a folder to our Google Drive:

try:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)
    COLAB = True
    print("Note: using Google CoLab")
    %tensorflow_version 2.x
except:
    print("Note: not using Google CoLab")
    COLAB = False
Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive
Note: using Google CoLab

Next, install the dependency via pip

import sys

!{sys.executable} -m pip install git+https://github.com/zzh8829/yolov3-tf2.git@master
Collecting git+https://github.com/zzh8829/yolov3-tf2.git@master
  Cloning https://github.com/zzh8829/yolov3-tf2.git (to revision master) to /tmp/pip-req-build-w3slnstj
  Running command git clone -q https://github.com/zzh8829/yolov3-tf2.git /tmp/pip-req-build-w3slnstj
Building wheels for collected packages: yolov3-tf2
  Building wheel for yolov3-tf2 (setup.py) ... done
  Created wheel for yolov3-tf2: filename=yolov3_tf2-0.1-cp36-none-any.whl size=8852 sha256=45f24d4bb6037ad7e561fa4252a74a9110be04812c6a38d81e8325c7631a9a99
  Stored in directory: /tmp/pip-ephem-wheel-cache-3u7xk07g/wheels/59/1b/97/905ab51e9c0330efe8c3c518aff17de4ee91100412cd6dd553
Successfully built yolov3-tf2
Installing collected packages: yolov3-tf2
Successfully installed yolov3-tf2-0.1

Import the Weights

Since we aren't trying to retrain the YOLO model we can just import the preconfigured weights from the following giles:

import tensorflow as tf
import os

if COLAB:
  ROOT = '/content/drive/My Drive/Colab Notebooks'
else:
  ROOT = os.path.join(os.getcwd(),'data')

filename_darknet_weights = tf.keras.utils.get_file(
    os.path.join(ROOT,'yolov3.weights'),
    origin='https://pjreddie.com/media/files/yolov3.weights')
TINY = False

filename_convert_script = tf.keras.utils.get_file(
    os.path.join(os.getcwd(),'convert.py'),
    origin='https://raw.githubusercontent.com/zzh8829/yolov3-tf2/master/convert.py')

filename_classes = tf.keras.utils.get_file(
    os.path.join(ROOT,'coco.names'),
    origin='https://raw.githubusercontent.com/zzh8829/yolov3-tf2/master/data/coco.names')

filename_converted_weights = os.path.join(ROOT,'yolov3.tf')
Downloading data from https://pjreddie.com/media/files/yolov3.weights
248012800/248007048 [==============================] - 83s 0us/step
Downloading data from https://raw.githubusercontent.com/zzh8829/yolov3-tf2/master/convert.py
8192/1277 [================================================================================================================================================================================================] - 0s 0us/step
Downloading data from https://raw.githubusercontent.com/zzh8829/yolov3-tf2/master/data/coco.names
8192/625 [=========================================================================================================================================================================================================================================================================================================================================================================================================] - 0s 0us/step

Once we have downloaded the weights we need to transform them into a version that can be used by tensorflow:

import sys
!{sys.executable} "{filename_convert_script}" --weights "{filename_darknet_weights}" --output "{filename_converted_weights}"
2020-06-27 11:45:43.342763: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-06-27 11:45:45.965322: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-06-27 11:45:46.029800: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-27 11:45:46.030601: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:00:04.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-06-27 11:45:46.030658: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-06-27 11:45:46.282496: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-06-27 11:45:46.411437: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-06-27 11:45:46.448702: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-06-27 11:45:46.729688: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-06-27 11:45:46.773976: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-06-27 11:45:47.287583: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-06-27 11:45:47.287825: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-27 11:45:47.288839: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-27 11:45:47.289654: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-06-27 11:45:47.339611: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2300000000 Hz
2020-06-27 11:45:47.339883: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1a2ad80 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-06-27 11:45:47.339923: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-06-27 11:45:47.450143: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-27 11:45:47.451076: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1a2af40 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-06-27 11:45:47.451113: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla K80, Compute Capability 3.7
2020-06-27 11:45:47.452554: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-27 11:45:47.453273: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:00:04.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-06-27 11:45:47.453329: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-06-27 11:45:47.453397: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-06-27 11:45:47.453447: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-06-27 11:45:47.453488: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-06-27 11:45:47.453525: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-06-27 11:45:47.453562: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-06-27 11:45:47.453600: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-06-27 11:45:47.453723: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-27 11:45:47.454509: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-27 11:45:47.455168: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-06-27 11:45:47.459214: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-06-27 11:45:53.072087: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-27 11:45:53.072152: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0 
2020-06-27 11:45:53.072175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N 
2020-06-27 11:45:53.080323: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-27 11:45:53.081164: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-27 11:45:53.081906: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10634 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7)
Model: "yolov3"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input (InputLayer)              [(None, None, None,  0                                            
__________________________________________________________________________________________________
yolo_darknet (Model)            ((None, None, None,  40620640    input[0][0]                      
__________________________________________________________________________________________________
yolo_conv_0 (Model)             (None, None, None, 5 11024384    yolo_darknet[1][2]               
__________________________________________________________________________________________________
yolo_conv_1 (Model)             (None, None, None, 2 2957312     yolo_conv_0[1][0]                
                                                                 yolo_darknet[1][1]               
__________________________________________________________________________________________________
yolo_conv_2 (Model)             (None, None, None, 1 741376      yolo_conv_1[1][0]                
                                                                 yolo_darknet[1][0]               
__________________________________________________________________________________________________
yolo_output_0 (Model)           (None, None, None, 3 4984063     yolo_conv_0[1][0]                
__________________________________________________________________________________________________
yolo_output_1 (Model)           (None, None, None, 3 1312511     yolo_conv_1[1][0]                
__________________________________________________________________________________________________
yolo_output_2 (Model)           (None, None, None, 3 361471      yolo_conv_2[1][0]                
__________________________________________________________________________________________________
yolo_boxes_0 (Lambda)           ((None, None, None,  0           yolo_output_0[1][0]              
__________________________________________________________________________________________________
yolo_boxes_1 (Lambda)           ((None, None, None,  0           yolo_output_1[1][0]              
__________________________________________________________________________________________________
yolo_boxes_2 (Lambda)           ((None, None, None,  0           yolo_output_2[1][0]              
__________________________________________________________________________________________________
yolo_nms (Lambda)               ((None, 100, 4), (No 0           yolo_boxes_0[0][0]               
                                                                 yolo_boxes_0[0][1]               
                                                                 yolo_boxes_0[0][2]               
                                                                 yolo_boxes_1[0][0]               
                                                                 yolo_boxes_1[0][1]               
                                                                 yolo_boxes_1[0][2]               
                                                                 yolo_boxes_2[0][0]               
                                                                 yolo_boxes_2[0][1]               
                                                                 yolo_boxes_2[0][2]               
==================================================================================================
Total params: 62,001,757
Trainable params: 61,949,149
Non-trainable params: 52,608
__________________________________________________________________________________________________
I0627 11:46:00.293676 140682570753920 convert.py:24] model created
I0627 11:46:00.313270 140682570753920 utils.py:45] yolo_darknet/conv2d bn
I0627 11:46:00.317567 140682570753920 utils.py:45] yolo_darknet/conv2d_1 bn
I0627 11:46:00.322344 140682570753920 utils.py:45] yolo_darknet/conv2d_2 bn
I0627 11:46:00.325972 140682570753920 utils.py:45] yolo_darknet/conv2d_3 bn
I0627 11:46:00.329948 140682570753920 utils.py:45] yolo_darknet/conv2d_4 bn
I0627 11:46:00.334656 140682570753920 utils.py:45] yolo_darknet/conv2d_5 bn
I0627 11:46:00.338386 140682570753920 utils.py:45] yolo_darknet/conv2d_6 bn
I0627 11:46:00.343204 140682570753920 utils.py:45] yolo_darknet/conv2d_7 bn
I0627 11:46:00.347740 140682570753920 utils.py:45] yolo_darknet/conv2d_8 bn
I0627 11:46:00.352217 140682570753920 utils.py:45] yolo_darknet/conv2d_9 bn
I0627 11:46:00.359822 140682570753920 utils.py:45] yolo_darknet/conv2d_10 bn
I0627 11:46:00.364156 140682570753920 utils.py:45] yolo_darknet/conv2d_11 bn
I0627 11:46:00.370888 140682570753920 utils.py:45] yolo_darknet/conv2d_12 bn
I0627 11:46:00.374932 140682570753920 utils.py:45] yolo_darknet/conv2d_13 bn
I0627 11:46:00.381504 140682570753920 utils.py:45] yolo_darknet/conv2d_14 bn
I0627 11:46:00.385318 140682570753920 utils.py:45] yolo_darknet/conv2d_15 bn
I0627 11:46:00.391501 140682570753920 utils.py:45] yolo_darknet/conv2d_16 bn
I0627 11:46:00.395364 140682570753920 utils.py:45] yolo_darknet/conv2d_17 bn
I0627 11:46:00.402820 140682570753920 utils.py:45] yolo_darknet/conv2d_18 bn
I0627 11:46:00.406758 140682570753920 utils.py:45] yolo_darknet/conv2d_19 bn
I0627 11:46:00.413093 140682570753920 utils.py:45] yolo_darknet/conv2d_20 bn
I0627 11:46:00.417021 140682570753920 utils.py:45] yolo_darknet/conv2d_21 bn
I0627 11:46:00.423854 140682570753920 utils.py:45] yolo_darknet/conv2d_22 bn
I0627 11:46:00.427942 140682570753920 utils.py:45] yolo_darknet/conv2d_23 bn
I0627 11:46:00.434171 140682570753920 utils.py:45] yolo_darknet/conv2d_24 bn
I0627 11:46:00.438207 140682570753920 utils.py:45] yolo_darknet/conv2d_25 bn
I0627 11:46:00.444870 140682570753920 utils.py:45] yolo_darknet/conv2d_26 bn
I0627 11:46:00.462729 140682570753920 utils.py:45] yolo_darknet/conv2d_27 bn
I0627 11:46:00.468483 140682570753920 utils.py:45] yolo_darknet/conv2d_28 bn
I0627 11:46:00.483909 140682570753920 utils.py:45] yolo_darknet/conv2d_29 bn
I0627 11:46:00.489217 140682570753920 utils.py:45] yolo_darknet/conv2d_30 bn
I0627 11:46:00.504070 140682570753920 utils.py:45] yolo_darknet/conv2d_31 bn
I0627 11:46:00.509139 140682570753920 utils.py:45] yolo_darknet/conv2d_32 bn
I0627 11:46:00.526385 140682570753920 utils.py:45] yolo_darknet/conv2d_33 bn
I0627 11:46:00.533507 140682570753920 utils.py:45] yolo_darknet/conv2d_34 bn
I0627 11:46:00.554649 140682570753920 utils.py:45] yolo_darknet/conv2d_35 bn
I0627 11:46:00.560546 140682570753920 utils.py:45] yolo_darknet/conv2d_36 bn
I0627 11:46:00.576810 140682570753920 utils.py:45] yolo_darknet/conv2d_37 bn
I0627 11:46:00.582202 140682570753920 utils.py:45] yolo_darknet/conv2d_38 bn
I0627 11:46:00.596493 140682570753920 utils.py:45] yolo_darknet/conv2d_39 bn
I0627 11:46:00.601401 140682570753920 utils.py:45] yolo_darknet/conv2d_40 bn
I0627 11:46:00.617212 140682570753920 utils.py:45] yolo_darknet/conv2d_41 bn
I0627 11:46:00.623171 140682570753920 utils.py:45] yolo_darknet/conv2d_42 bn
I0627 11:46:00.643667 140682570753920 utils.py:45] yolo_darknet/conv2d_43 bn
I0627 11:46:00.701253 140682570753920 utils.py:45] yolo_darknet/conv2d_44 bn
I0627 11:46:00.711289 140682570753920 utils.py:45] yolo_darknet/conv2d_45 bn
I0627 11:46:00.765773 140682570753920 utils.py:45] yolo_darknet/conv2d_46 bn
I0627 11:46:00.776560 140682570753920 utils.py:45] yolo_darknet/conv2d_47 bn
I0627 11:46:00.831469 140682570753920 utils.py:45] yolo_darknet/conv2d_48 bn
I0627 11:46:00.841793 140682570753920 utils.py:45] yolo_darknet/conv2d_49 bn
I0627 11:46:00.896098 140682570753920 utils.py:45] yolo_darknet/conv2d_50 bn
I0627 11:46:00.905665 140682570753920 utils.py:45] yolo_darknet/conv2d_51 bn
I0627 11:46:00.957883 140682570753920 utils.py:45] yolo_conv_0/conv2d_52 bn
I0627 11:46:00.966673 140682570753920 utils.py:45] yolo_conv_0/conv2d_53 bn
I0627 11:46:01.018548 140682570753920 utils.py:45] yolo_conv_0/conv2d_54 bn
I0627 11:46:01.027744 140682570753920 utils.py:45] yolo_conv_0/conv2d_55 bn
I0627 11:46:01.085738 140682570753920 utils.py:45] yolo_conv_0/conv2d_56 bn
I0627 11:46:01.094176 140682570753920 utils.py:45] yolo_output_0/conv2d_57 bn
I0627 11:46:01.144619 140682570753920 utils.py:45] yolo_output_0/conv2d_58 bias
I0627 11:46:01.149509 140682570753920 utils.py:45] yolo_conv_1/conv2d_59 bn
I0627 11:46:01.154027 140682570753920 utils.py:45] yolo_conv_1/conv2d_60 bn
I0627 11:46:01.158667 140682570753920 utils.py:45] yolo_conv_1/conv2d_61 bn
I0627 11:46:01.174371 140682570753920 utils.py:45] yolo_conv_1/conv2d_62 bn
I0627 11:46:01.179260 140682570753920 utils.py:45] yolo_conv_1/conv2d_63 bn
I0627 11:46:01.194183 140682570753920 utils.py:45] yolo_conv_1/conv2d_64 bn
I0627 11:46:01.198507 140682570753920 utils.py:45] yolo_output_1/conv2d_65 bn
I0627 11:46:01.214018 140682570753920 utils.py:45] yolo_output_1/conv2d_66 bias
I0627 11:46:01.217720 140682570753920 utils.py:45] yolo_conv_2/conv2d_67 bn
I0627 11:46:01.221015 140682570753920 utils.py:45] yolo_conv_2/conv2d_68 bn
I0627 11:46:01.224194 140682570753920 utils.py:45] yolo_conv_2/conv2d_69 bn
I0627 11:46:01.229461 140682570753920 utils.py:45] yolo_conv_2/conv2d_70 bn
I0627 11:46:01.232497 140682570753920 utils.py:45] yolo_conv_2/conv2d_71 bn
I0627 11:46:01.237847 140682570753920 utils.py:45] yolo_conv_2/conv2d_72 bn
I0627 11:46:01.240791 140682570753920 utils.py:45] yolo_output_2/conv2d_73 bn
I0627 11:46:01.245957 140682570753920 utils.py:45] yolo_output_2/conv2d_74 bias
I0627 11:46:01.247878 140682570753920 convert.py:27] weights loaded
2020-06-27 11:46:01.285885: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-06-27 11:46:05.320297: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
I0627 11:46:07.463657 140682570753920 convert.py:31] sanity check passed
I0627 11:46:09.284397 140682570753920 convert.py:34] weights saved

Delete the Conversion Script

Sicne we no longer need to conversion script this can be deleted:

import os
os.remove(filename_convert_script)

Running DarkFlow

Prereqs: cython and opencv

To use the DarkFlow library we need to do the following:

  1. Import all needed packages
  2. Define the YOLO configuration using Keras flags
  3. Scan for available devices to selectively use GPU

Import Packages

import time
from absl import app, flags, logging
from absl.flags import FLAGS
import cv2
import numpy as np
import tensorflow as tf
from yolov3_tf2.models import (YoloV3, YoloV3Tiny)
from yolov3_tf2.dataset import transform_images, load_tfrecord_dataset
from yolov3_tf2.utils import draw_outputs
import sys
from PIL import Image, ImageFile
import requests

Set Keras Flags

# Flags are used to define several options for YOLO.
flags.DEFINE_string('classes', filename_classes, 'path to classes file')
flags.DEFINE_string('weights', filename_converted_weights, 'path to weights file')
flags.DEFINE_boolean('tiny', False, 'yolov3 or yolov3-tiny')
flags.DEFINE_integer('size', 416, 'resize images to')
flags.DEFINE_string('tfrecord', None, 'tfrecord instead of image')
flags.DEFINE_integer('num_classes', 80, 'number of classes in the model')
FLAGS([sys.argv[0]])
['/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py']

Scan for Device with GPU

physical_devices = tf.config.experimental.list_physical_devices('GPU')
if len(physical_devices) > 0:
    tf.config.experimental.set_memory_growth(physical_devices[0], True)

Making Predictions

To make a prediction we can do the following:

  1. Create an instance of YoloV3
  2. Load the weights and classes
  3. Get an image to predict
  4. Preprocess image
  5. Make a Prediction
  6. Preview the Output over the Image

Create Yolo Instance

if FLAGS.tiny:
    yolo = YoloV3Tiny(classes=FLAGS.num_classes)
else:
    yolo = YoloV3(classes=FLAGS.num_classes)

FLAGS.yolo_score_threshold = 0.5

Load Weights

yolo.load_weights(FLAGS.weights).expect_partial()
class_names = [c.strip() for c in open(FLAGS.classes).readlines()]

Download Image

url = "https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/cook.jpg"
response = requests.get(url)
img_raw = tf.image.decode_image(response.content, channels=3)

Preprocess Image

img = tf.expand_dims(img_raw, 0)
img = transform_images(img, FLAGS.size)

Make Prediction

boxes, scores, classes, nums = yolo(img)
print('detections:')
for i in range(nums[0]):
    cls = class_names[int(classes[0][i])]
    score = np.array(scores[0][i])
    box = np.array(boxes[0][i])
    print(f"\t{cls}, {score}, {box}")
detections:
	person, 0.9995919466018677, [0.31659657 0.10725167 0.68426734 0.74258983]
	dog, 0.9896982312202454, [0.51111    0.557695   0.9339741  0.81879824]
	microwave, 0.9839580059051514, [0.00695175 0.08101549 0.2790975  0.2882001 ]
	oven, 0.9383127093315125, [0.00773551 0.32521233 0.42321444 0.83368266]
	bottle, 0.8538914322853088, [0.73093545 0.23399046 0.76463544 0.32874534]
	bottle, 0.5538200736045837, [0.790116   0.26327905 0.8189085  0.3274593 ]

Overlay Predictions

img = img_raw.numpy()
img = draw_outputs(img, (boxes, scores, classes, nums), class_names)
#cv2.imwrite(FLAGS.output, img) # Save the image
display(Image.fromarray(img, 'RGB')) # Display the image

Generative Adversarial Networks (GANs)

GANs are pairs of neural networks in which:

  • Generator - One network generates data, starts working with random seed data
  • Discriminator - Another tries to guess whether or not the data is real, this is trained on real data

The Generator tries to create data that fools the Discriminator

In general it is easier to train the Generator than the Discriminator

We have to sort of train the two independent of each other and not modify it one after the other. We should not train them together as this will lead to each just trying to fool the other and not actually give us anything usable

We pass random seeds into the generator and it outputs images. These images are passed to the Discriminator during new rounds of traning as the fake images. When training the Discriminator we will pass in images from the traning set (real) and images from the generator (fake) and the role of the discriminator will be to correctly and confidently differentiate between the real and fake images

The ideal training case is where our generator creates images that are so realistic that our discriminator can no longer figure out what's real or fake

Overall, the distributions in the way the generator learns the data will begin to resemble the trends in the actual data over time

Implementing a Simple GAN with Keras

Import Packages

import tensorflow as tf
from tensorflow.keras.layers import Input, Reshape, Dropout, Dense 
from tensorflow.keras.layers import Flatten, BatchNormalization
from tensorflow.keras.layers import Activation, ZeroPadding2D
from tensorflow.keras.layers import LeakyReLU
from tensorflow.keras.layers import UpSampling2D, Conv2D
from tensorflow.keras.models import Sequential, Model, load_model
from tensorflow.keras.optimizers import Adam
import numpy as np
from PIL import Image
from tqdm import tqdm
import os 
import time
import matplotlib.pyplot as plt

Init Constants

Some constants that we're using to train the GAN are GENERATE_RES which is the resolution factor, and DATA_PATH which is where the files are stored

# Generation resolution - Must be square 
# Training data is also scaled to this.
# Note GENERATE_RES 4 or higher  
# will blow Google CoLab's memory and have not
# been tested extensivly.
GENERATE_RES = 3 # Generation resolution factor 
# (1=32, 2=64, 3=96, 4=128, etc.)
GENERATE_SQUARE = 32 * GENERATE_RES # rows/cols (should be square)
IMAGE_CHANNELS = 3

# Preview image 
PREVIEW_ROWS = 4
PREVIEW_COLS = 7
PREVIEW_MARGIN = 16

# Size vector to generate images from
SEED_SIZE = 100

# Configuration
DATA_PATH = '/content/drive/My Drive/Colab Notebooks'
EPOCHS = 50
BATCH_SIZE = 32
BUFFER_SIZE = 60000

Download the Files

Download the files from Kaggle and save it to the data_path/face_images directory

Import the Downloaded Files

def hms_string(sec_elapsed):
    h = int(sec_elapsed / (60 * 60))
    m = int((sec_elapsed % (60 * 60)) / 60)
    s = sec_elapsed % 60
    return "{}:{:>02}:{:>05.2f}".format(h, m, s)
training_binary_path = os.path.join(DATA_PATH,
        f'training_data_{GENERATE_SQUARE}_{GENERATE_SQUARE}.npy')

print(f"Looking for file: {training_binary_path}")

if not os.path.isfile(training_binary_path):
  start = time.time()
  print("Loading training images...")

  training_data = []
  
  faces_path = os.path.join(DATA_PATH, 'face_images')

  if not os.path.exists(faces_path):
    os.mkdir()

  for filename in tqdm(os.listdir(faces_path)):
      path = os.path.join(faces_path,filename)
      image = Image.open(path).resize((GENERATE_SQUARE,
            GENERATE_SQUARE),Image.ANTIALIAS)
      training_data.append(np.asarray(image))
  training_data = np.reshape(training_data,(-1,GENERATE_SQUARE,
            GENERATE_SQUARE,IMAGE_CHANNELS))
  training_data = training_data.astype(np.float32)
  training_data = training_data / 127.5 - 1.

  print("Saving training image binary...")
  np.save(training_binary_path,training_data)
  elapsed = time.time()-start
  print (f'Image preprocess time: {hms_string(elapsed)}')
else:
  print("Loading previous training pickle...")
  training_data = np.load(training_binary_path)
  0%|          | 0/20 [00:00<?, ?it/s]
Looking for file: /content/drive/My Drive/Colab Notebooks/training_data_96_96.npy
Loading training images...
100%|██████████| 20/20 [00:11<00:00,  1.74it/s]
Saving training image binary...
Image preprocess time: 0:00:11.54
training_data
array([[[[-0.19999999,  0.23921573, -0.5294118 ],
         [-0.20784312,  0.22352946, -0.5294118 ],
         [-0.21568626,  0.21568632, -0.52156866],
         ...,
         [-0.10588235,  0.23921573, -0.5058824 ],
         [-0.09019607,  0.2313726 , -0.5058824 ],
         [-0.09803921,  0.2313726 , -0.5058824 ]],

        [[-0.20784312,  0.21568632, -0.5372549 ],
         [-0.20784312,  0.21568632, -0.5294118 ],
         [-0.19999999,  0.22352946, -0.5137255 ],
         ...,
         [-0.11372548,  0.2313726 , -0.5137255 ],
         [-0.12941176,  0.19215691, -0.54509807],
         [-0.12941176,  0.19215691, -0.54509807]],

        [[-0.2235294 ,  0.20000005, -0.5529412 ],
         [-0.20784312,  0.21568632, -0.5294118 ],
         [-0.19215685,  0.22352946, -0.5137255 ],
         ...,
         [-0.12941176,  0.21568632, -0.5294118 ],
         [-0.10588235,  0.21568632, -0.52156866],
         [-0.10588235,  0.21568632, -0.52156866]],

        ...,

        [[-0.3333333 , -0.00392157, -0.5921569 ],
         [-0.3490196 , -0.01960784, -0.5921569 ],
         [-0.35686272, -0.02745098, -0.5921569 ],
         ...,
         [-0.372549  , -0.35686272, -0.4588235 ],
         [-0.23137254, -0.2235294 , -0.3490196 ],
         [-0.4588235 , -0.45098037, -0.5764706 ]],

        [[-0.35686272, -0.02745098, -0.60784316],
         [-0.3333333 , -0.00392157, -0.5764706 ],
         [-0.3333333 , -0.00392157, -0.5686275 ],
         ...,
         [-0.4823529 , -0.47450978, -0.5764706 ],
         [-0.31764704, -0.32549018, -0.44313723],
         [-0.3098039 , -0.3098039 , -0.42745095]],

        [[-0.3490196 , -0.01960784, -0.6       ],
         [-0.35686272, -0.02745098, -0.6       ],
         [-0.36470586, -0.03529412, -0.6       ],
         ...,
         [-0.5764706 , -0.5686275 , -0.67058825],
         [-0.4823529 , -0.4980392 , -0.6156863 ],
         [-0.29411763, -0.3098039 , -0.42745095]]],


       [[[-0.19999999,  0.19215691, -0.5058824 ],
         [-0.19215685,  0.20000005, -0.4980392 ],
         [-0.19215685,  0.20000005, -0.4980392 ],
         ...,
         [-0.10588235,  0.254902  , -0.52156866],
         [-0.11372548,  0.24705887, -0.5372549 ],
         [-0.12156862,  0.23921573, -0.54509807]],

        [[-0.19215685,  0.20000005, -0.4980392 ],
         [-0.18431371,  0.20784318, -0.49019605],
         [-0.19215685,  0.20000005, -0.4980392 ],
         ...,
         [-0.11372548,  0.24705887, -0.5294118 ],
         [-0.1372549 ,  0.22352946, -0.54509807],
         [-0.1372549 ,  0.22352946, -0.54509807]],

        [[-0.1607843 ,  0.2313726 , -0.46666664],
         [-0.16862744,  0.22352946, -0.47450978],
         [-0.18431371,  0.20784318, -0.49019605],
         ...,
         [-0.1372549 ,  0.22352946, -0.5529412 ],
         [-0.12941176,  0.22352946, -0.5137255 ],
         [-0.12156862,  0.2313726 , -0.5058824 ]],

        ...,

        [[-0.32549018, -0.02745098, -0.58431375],
         [-0.32549018, -0.01176471, -0.5921569 ],
         [-0.3333333 , -0.00392157, -0.6156863 ],
         ...,
         [-0.6627451 , -0.6862745 , -0.7254902 ],
         [-0.21568626, -0.21568626, -0.3098039 ],
         [-0.30196077, -0.30196077, -0.38823527]],

        [[-0.32549018, -0.03529412, -0.5764706 ],
         [-0.3098039 , -0.02745098, -0.5686275 ],
         [-0.32549018, -0.01960784, -0.5921569 ],
         ...,
         [-0.6627451 , -0.6862745 , -0.7254902 ],
         [-0.40392154, -0.40392154, -0.4980392 ],
         [-0.29411763, -0.29411763, -0.38823527]],

        [[-0.30196077, -0.02745098, -0.54509807],
         [-0.32549018, -0.04313725, -0.5764706 ],
         [-0.34117645, -0.05098039, -0.5921569 ],
         ...,
         [-0.5294118 , -0.56078434, -0.5921569 ],
         [-0.58431375, -0.58431375, -0.6784314 ],
         [-0.42745095, -0.42745095, -0.52156866]]],


       [[[-0.19215685,  0.20000005, -0.4980392 ],
         [-0.18431371,  0.20784318, -0.49019605],
         [-0.20784312,  0.18431377, -0.5137255 ],
         ...,
         [-0.09019607,  0.2313726 , -0.4980392 ],
         [-0.10588235,  0.24705887, -0.47450978],
         [-0.12941176,  0.22352946, -0.4980392 ]],

        [[-0.1607843 ,  0.2313726 , -0.46666664],
         [-0.1607843 ,  0.2313726 , -0.46666664],
         [-0.18431371,  0.20784318, -0.49019605],
         ...,
         [-0.08235294,  0.24705887, -0.4980392 ],
         [-0.12941176,  0.2313726 , -0.5137255 ],
         [-0.1372549 ,  0.21568632, -0.52156866]],

        [[-0.16862744,  0.22352946, -0.47450978],
         [-0.16862744,  0.22352946, -0.47450978],
         [-0.19215685,  0.20000005, -0.4980392 ],
         ...,
         [-0.09803921,  0.2313726 , -0.54509807],
         [-0.12941176,  0.2313726 , -0.54509807],
         [-0.12156862,  0.23921573, -0.5294118 ]],

        ...,

        [[-0.3098039 ,  0.00392163, -0.54509807],
         [-0.32549018, -0.01176471, -0.5764706 ],
         [-0.34117645, -0.02745098, -0.60784316],
         ...,
         [-0.45098037, -0.46666664, -0.56078434],
         [-0.20784312, -0.2235294 , -0.3490196 ],
         [-0.31764704, -0.3333333 , -0.45098037]],

        [[-0.3098039 ,  0.01176476, -0.54509807],
         [-0.32549018,  0.00392163, -0.5686275 ],
         [-0.3490196 , -0.01960784, -0.6       ],
         ...,
         [-0.56078434, -0.5764706 , -0.67058825],
         [-0.4352941 , -0.45098037, -0.56078434],
         [-0.27843136, -0.29411763, -0.40392154]],

        [[-0.34117645, -0.01176471, -0.5764706 ],
         [-0.3490196 , -0.01176471, -0.5921569 ],
         [-0.3490196 , -0.01960784, -0.6156863 ],
         ...,
         [-0.54509807, -0.56078434, -0.654902  ],
         [-0.6156863 , -0.6313726 , -0.7254902 ],
         [-0.41176468, -0.42745095, -0.5294118 ]]],


       ...,


       [[[-0.19215685,  0.19215691, -0.5294118 ],
         [-0.20784312,  0.17647064, -0.54509807],
         [-0.21568626,  0.1686275 , -0.5529412 ],
         ...,
         [-0.1372549 ,  0.20784318, -0.5294118 ],
         [-0.12941176,  0.20000005, -0.47450978],
         [-0.12941176,  0.20000005, -0.47450978]],

        [[-0.20784312,  0.17647064, -0.5294118 ],
         [-0.19999999,  0.18431377, -0.52156866],
         [-0.20784312,  0.17647064, -0.5294118 ],
         ...,
         [-0.1372549 ,  0.20784318, -0.5294118 ],
         [-0.11372548,  0.20000005, -0.4980392 ],
         [-0.11372548,  0.20000005, -0.4980392 ]],

        [[-0.20784312,  0.1686275 , -0.5137255 ],
         [-0.19215685,  0.18431377, -0.4980392 ],
         [-0.19215685,  0.18431377, -0.49019605],
         ...,
         [-0.10588235,  0.23921573, -0.4980392 ],
         [-0.09803921,  0.20784318, -0.52156866],
         [-0.09803921,  0.20000005, -0.52156866]],

        ...,

        [[-0.34117645, -0.01960784, -0.5372549 ],
         [-0.3490196 , -0.02745098, -0.56078434],
         [-0.3490196 , -0.02745098, -0.5764706 ],
         ...,
         [-0.5921569 , -0.5921569 , -0.6627451 ],
         [-0.21568626, -0.25490195, -0.38039213],
         [-0.26274508, -0.30196077, -0.4352941 ]],

        [[-0.35686272, -0.05098039, -0.5686275 ],
         [-0.35686272, -0.05098039, -0.58431375],
         [-0.36470586, -0.05098039, -0.6       ],
         ...,
         [-0.60784316, -0.60784316, -0.6784314 ],
         [-0.40392154, -0.42745095, -0.56078434],
         [-0.29411763, -0.31764704, -0.45098037]],

        [[-0.3490196 , -0.04313725, -0.5686275 ],
         [-0.3490196 , -0.04313725, -0.5764706 ],
         [-0.35686272, -0.04313725, -0.6       ],
         ...,
         [-0.58431375, -0.58431375, -0.654902  ],
         [-0.6       , -0.62352943, -0.75686276],
         [-0.41176468, -0.42745095, -0.5686275 ]]],


       [[[-0.20784312,  0.18431377, -0.5137255 ],
         [-0.20784312,  0.18431377, -0.5137255 ],
         [-0.20784312,  0.18431377, -0.5137255 ],
         ...,
         [-0.12941176,  0.2313726 , -0.56078434],
         [-0.11372548,  0.20784318, -0.56078434],
         [-0.11372548,  0.20784318, -0.56078434]],

        [[-0.20784312,  0.18431377, -0.5137255 ],
         [-0.20784312,  0.18431377, -0.5137255 ],
         [-0.20784312,  0.18431377, -0.5137255 ],
         ...,
         [-0.10588235,  0.254902  , -0.5372549 ],
         [-0.12156862,  0.21568632, -0.5529412 ],
         [-0.12156862,  0.21568632, -0.5529412 ]],

        [[-0.19215685,  0.20000005, -0.4980392 ],
         [-0.19999999,  0.19215691, -0.4980392 ],
         [-0.20784312,  0.18431377, -0.5137255 ],
         ...,
         [-0.12156862,  0.24705887, -0.5529412 ],
         [-0.12941176,  0.23921573, -0.5294118 ],
         [-0.12941176,  0.23921573, -0.5372549 ]],

        ...,

        [[-0.32549018, -0.01176471, -0.56078434],
         [-0.3490196 , -0.03529412, -0.5921569 ],
         [-0.35686272, -0.04313725, -0.62352943],
         ...,
         [-0.84313726, -0.827451  , -0.9372549 ],
         [-0.372549  , -0.38823527, -0.4980392 ],
         [-0.19999999, -0.21568626, -0.32549018]],

        [[-0.3490196 , -0.02745098, -0.5921569 ],
         [-0.3490196 , -0.03529412, -0.6       ],
         [-0.35686272, -0.03529412, -0.6313726 ],
         ...,
         [-0.79607844, -0.78039217, -0.8901961 ],
         [-0.47450978, -0.49019605, -0.6156863 ],
         [-0.34117645, -0.35686272, -0.47450978]],

        [[-0.34117645, -0.01176471, -0.5921569 ],
         [-0.3333333 , -0.00392157, -0.5921569 ],
         [-0.34117645, -0.01960784, -0.62352943],
         ...,
         [-0.5921569 , -0.58431375, -0.69411767],
         [-0.56078434, -0.5686275 , -0.70980394],
         [-0.5529412 , -0.56078434, -0.7019608 ]]],


       [[[-0.23137254,  0.15294123, -0.54509807],
         [-0.19999999,  0.18431377, -0.5137255 ],
         [-0.18431371,  0.20000005, -0.4980392 ],
         ...,
         [-0.12941176,  0.20784318, -0.5529412 ],
         [-0.12941176,  0.22352946, -0.4823529 ],
         [-0.12941176,  0.22352946, -0.4823529 ]],

        [[-0.17647058,  0.20784318, -0.49019605],
         [-0.18431371,  0.20000005, -0.5058824 ],
         [-0.19215685,  0.19215691, -0.5058824 ],
         ...,
         [-0.12156862,  0.21568632, -0.5529412 ],
         [-0.1372549 ,  0.21568632, -0.5137255 ],
         [-0.1372549 ,  0.21568632, -0.5137255 ]],

        [[-0.16862744,  0.21568632, -0.4823529 ],
         [-0.17647058,  0.20784318, -0.49019605],
         [-0.18431371,  0.20000005, -0.4980392 ],
         ...,
         [-0.12156862,  0.20784318, -0.56078434],
         [-0.15294117,  0.20784318, -0.5529412 ],
         [-0.15294117,  0.20784318, -0.5529412 ]],

        ...,

        [[-0.3490196 , -0.03529412, -0.58431375],
         [-0.3490196 , -0.03529412, -0.58431375],
         [-0.3490196 , -0.03529412, -0.5764706 ],
         ...,
         [-0.58431375, -0.58431375, -0.6313726 ],
         [-0.25490195, -0.30196077, -0.38823527],
         [-0.24705881, -0.29411763, -0.372549  ]],

        [[-0.3490196 , -0.03529412, -0.5921569 ],
         [-0.35686272, -0.04313725, -0.5921569 ],
         [-0.372549  , -0.05098039, -0.6       ],
         ...,
         [-0.60784316, -0.6       , -0.64705884],
         [-0.47450978, -0.5058824 , -0.5764706 ],
         [-0.31764704, -0.3490196 , -0.42745095]],

        [[-0.35686272, -0.03529412, -0.5921569 ],
         [-0.36470586, -0.04313725, -0.60784316],
         [-0.38823527, -0.06666666, -0.60784316],
         ...,
         [-0.5294118 , -0.52156866, -0.5686275 ],
         [-0.60784316, -0.6392157 , -0.70980394],
         [-0.4823529 , -0.5058824 , -0.5764706 ]]]], dtype=float32)
train_dataset = tf.data.Dataset.from_tensor_slices(training_data).shuffle(BUFFER_SIZE).batch(BATCH_SIZE)

Define a function to Build the Generator and Discriminator

def build_generator(seed_size, channels):
    model = Sequential()

    model.add(Dense(4*4*256,activation="relu",input_dim=seed_size))
    model.add(Reshape((4,4,256)))

    model.add(UpSampling2D())
    model.add(Conv2D(256,kernel_size=3,padding="same"))
    model.add(BatchNormalization(momentum=0.8))
    model.add(Activation("relu"))

    model.add(UpSampling2D())
    model.add(Conv2D(256,kernel_size=3,padding="same"))
    model.add(BatchNormalization(momentum=0.8))
    model.add(Activation("relu"))
   
    # Output resolution, additional upsampling
    model.add(UpSampling2D())
    model.add(Conv2D(128,kernel_size=3,padding="same"))
    model.add(BatchNormalization(momentum=0.8))
    model.add(Activation("relu"))

    if GENERATE_RES>1:
      model.add(UpSampling2D(size=(GENERATE_RES,GENERATE_RES)))
      model.add(Conv2D(128,kernel_size=3,padding="same"))
      model.add(BatchNormalization(momentum=0.8))
      model.add(Activation("relu"))

    # Final CNN layer
    model.add(Conv2D(channels,kernel_size=3,padding="same"))
    model.add(Activation("tanh"))

    return model
def build_discriminator(image_shape):
    model = Sequential()

    model.add(Conv2D(32, kernel_size=3, strides=2, input_shape=image_shape, 
                     padding="same"))
    model.add(LeakyReLU(alpha=0.2))

    model.add(Dropout(0.25))
    model.add(Conv2D(64, kernel_size=3, strides=2, padding="same"))
    model.add(ZeroPadding2D(padding=((0,1),(0,1))))
    model.add(BatchNormalization(momentum=0.8))
    model.add(LeakyReLU(alpha=0.2))

    model.add(Dropout(0.25))
    model.add(Conv2D(128, kernel_size=3, strides=2, padding="same"))
    model.add(BatchNormalization(momentum=0.8))
    model.add(LeakyReLU(alpha=0.2))

    model.add(Dropout(0.25))
    model.add(Conv2D(256, kernel_size=3, strides=1, padding="same"))
    model.add(BatchNormalization(momentum=0.8))
    model.add(LeakyReLU(alpha=0.2))

    model.add(Dropout(0.25))
    model.add(Conv2D(512, kernel_size=3, strides=1, padding="same"))
    model.add(BatchNormalization(momentum=0.8))
    model.add(LeakyReLU(alpha=0.2))

    model.add(Dropout(0.25))
    model.add(Flatten())
    model.add(Dense(1, activation='sigmoid'))

    return model

Also define a function for saving the images that are generated

def save_images(cnt,noise):
  image_array = np.full(( 
      PREVIEW_MARGIN + (PREVIEW_ROWS * (GENERATE_SQUARE+PREVIEW_MARGIN)), 
      PREVIEW_MARGIN + (PREVIEW_COLS * (GENERATE_SQUARE+PREVIEW_MARGIN)), 3), 
      255, dtype=np.uint8)
  
  generated_images = generator.predict(noise)

  generated_images = 0.5 * generated_images + 0.5

  image_count = 0
  for row in range(PREVIEW_ROWS):
      for col in range(PREVIEW_COLS):
        r = row * (GENERATE_SQUARE+16) + PREVIEW_MARGIN
        c = col * (GENERATE_SQUARE+16) + PREVIEW_MARGIN
        image_array[r:r+GENERATE_SQUARE,c:c+GENERATE_SQUARE] = generated_images[image_count] * 255
        image_count += 1

          
  output_path = os.path.join(DATA_PATH,'output')
  if not os.path.exists(output_path):
    os.makedirs(output_path)
  
  filename = os.path.join(output_path,f"train-{cnt}.png")
  im = Image.fromarray(image_array)
  im.save(filename)

Generate a Test Image using the Noise

generator = build_generator(SEED_SIZE, IMAGE_CHANNELS)

noise = tf.random.normal([1, SEED_SIZE])
generated_image = generator(noise, training=False)

plt.imshow(generated_image[0, :, :, 0])
<matplotlib.image.AxesImage at 0x7f5f6934fe10>
image_shape = (GENERATE_SQUARE,GENERATE_SQUARE,IMAGE_CHANNELS)

discriminator = build_discriminator(image_shape)
decision = discriminator(generated_image)
print (decision)
tf.Tensor([[0.49988297]], shape=(1, 1), dtype=float32)
# This method returns a helper function to compute cross entropy loss
cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits=True)

def discriminator_loss(real_output, fake_output):
    real_loss = cross_entropy(tf.ones_like(real_output), real_output)
    fake_loss = cross_entropy(tf.zeros_like(fake_output), fake_output)
    total_loss = real_loss + fake_loss
    return total_loss

def generator_loss(fake_output):
    return cross_entropy(tf.ones_like(fake_output), fake_output)

Define Optimizers for the two networks

generator_optimizer = tf.keras.optimizers.Adam(1.5e-4,0.5)
discriminator_optimizer = tf.keras.optimizers.Adam(1.5e-4,0.5)

Define a Train Step

Based on the GAN in the Keras Documentation

The Train step uses GradientTape to enable the networks to train at the same time but separately. This allows us to apply the weight updates on our own so that we can handle it manually instead of having TF apply it automatically to the network

# Notice the use of `tf.function`
# This annotation causes the function to be "compiled".
@tf.function
def train_step(images):
  seed = tf.random.normal([BATCH_SIZE, SEED_SIZE])

  with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
    generated_images = generator(seed, training=True)

    real_output = discriminator(images, training=True)
    fake_output = discriminator(generated_images, training=True)

    gen_loss = generator_loss(fake_output)
    disc_loss = discriminator_loss(real_output, fake_output)
    

    gradients_of_generator = gen_tape.gradient(gen_loss, generator.trainable_variables)
    gradients_of_discriminator = disc_tape.gradient(disc_loss, discriminator.trainable_variables)

    generator_optimizer.apply_gradients(zip(gradients_of_generator, generator.trainable_variables))
    discriminator_optimizer.apply_gradients(zip(gradients_of_discriminator, discriminator.trainable_variables))
  return gen_loss,disc_loss

Define the Tranining Process

def train(dataset, epochs):
  fixed_seed = np.random.normal(0, 1, (PREVIEW_ROWS * PREVIEW_COLS, SEED_SIZE))
  start = time.time()

  for epoch in range(epochs):
    epoch_start = time.time()

    gen_loss_list = []
    disc_loss_list = []

    for image_batch in dataset:
      t = train_step(image_batch)
      gen_loss_list.append(t[0])
      disc_loss_list.append(t[1])

    g_loss = sum(gen_loss_list) / len(gen_loss_list)
    d_loss = sum(disc_loss_list) / len(disc_loss_list)

    epoch_elapsed = time.time()-epoch_start
    print (f'Epoch {epoch+1}, gen loss={g_loss},disc loss={d_loss},'\
           ' {hms_string(epoch_elapsed)}')
    save_images(epoch,fixed_seed)

  elapsed = time.time()-start
  print (f'Training time: {hms_string(elapsed)}')

Train the Model

train(train_dataset, EPOCHS)
Epoch 1, gen loss=0.6930996179580688,disc loss=1.0064759254455566, {hms_string(epoch_elapsed)}
Epoch 2, gen loss=0.6931149959564209,disc loss=1.0064730644226074, {hms_string(epoch_elapsed)}
Epoch 3, gen loss=0.6930887699127197,disc loss=1.0064938068389893, {hms_string(epoch_elapsed)}
Epoch 4, gen loss=0.6931302547454834,disc loss=1.0064555406570435, {hms_string(epoch_elapsed)}
Epoch 5, gen loss=0.6930830478668213,disc loss=1.0065312385559082, {hms_string(epoch_elapsed)}
Epoch 6, gen loss=0.6931189298629761,disc loss=1.0064769983291626, {hms_string(epoch_elapsed)}
Epoch 7, gen loss=0.6931207180023193,disc loss=1.0064527988433838, {hms_string(epoch_elapsed)}
Epoch 8, gen loss=0.693107545375824,disc loss=1.0064951181411743, {hms_string(epoch_elapsed)}
Epoch 9, gen loss=0.6931174993515015,disc loss=1.006460428237915, {hms_string(epoch_elapsed)}
Epoch 10, gen loss=0.6931171417236328,disc loss=1.0064548254013062, {hms_string(epoch_elapsed)}
Epoch 11, gen loss=0.69312983751297,disc loss=1.0064491033554077, {hms_string(epoch_elapsed)}
Epoch 12, gen loss=0.6931082010269165,disc loss=1.006463885307312, {hms_string(epoch_elapsed)}
Epoch 13, gen loss=0.6931336522102356,disc loss=1.0064470767974854, {hms_string(epoch_elapsed)}
Epoch 14, gen loss=0.6931320428848267,disc loss=1.0064570903778076, {hms_string(epoch_elapsed)}
Epoch 15, gen loss=0.693084716796875,disc loss=1.0065093040466309, {hms_string(epoch_elapsed)}
Epoch 16, gen loss=0.6930662393569946,disc loss=1.0065112113952637, {hms_string(epoch_elapsed)}
Epoch 17, gen loss=0.6931113004684448,disc loss=1.0064845085144043, {hms_string(epoch_elapsed)}
Epoch 18, gen loss=0.6931309700012207,disc loss=1.00644850730896, {hms_string(epoch_elapsed)}
Epoch 19, gen loss=0.6931254267692566,disc loss=1.0064489841461182, {hms_string(epoch_elapsed)}
Epoch 20, gen loss=0.6931127309799194,disc loss=1.0064712762832642, {hms_string(epoch_elapsed)}
Epoch 21, gen loss=0.6930199265480042,disc loss=1.0065534114837646, {hms_string(epoch_elapsed)}
Epoch 22, gen loss=0.6930903196334839,disc loss=1.0064971446990967, {hms_string(epoch_elapsed)}
Epoch 23, gen loss=0.6931049823760986,disc loss=1.0064624547958374, {hms_string(epoch_elapsed)}
Epoch 24, gen loss=0.6930536031723022,disc loss=1.006533145904541, {hms_string(epoch_elapsed)}
Epoch 25, gen loss=0.6930797100067139,disc loss=1.006493091583252, {hms_string(epoch_elapsed)}
Epoch 26, gen loss=0.6931148171424866,disc loss=1.0064830780029297, {hms_string(epoch_elapsed)}
Epoch 27, gen loss=0.693126916885376,disc loss=1.0064473152160645, {hms_string(epoch_elapsed)}
Epoch 28, gen loss=0.6931012868881226,disc loss=1.0065083503723145, {hms_string(epoch_elapsed)}
Epoch 29, gen loss=0.6930173635482788,disc loss=1.0065717697143555, {hms_string(epoch_elapsed)}
Epoch 30, gen loss=0.6931239366531372,disc loss=1.0064634084701538, {hms_string(epoch_elapsed)}
Epoch 31, gen loss=0.6930766701698303,disc loss=1.0064959526062012, {hms_string(epoch_elapsed)}
Epoch 32, gen loss=0.6930689215660095,disc loss=1.0065072774887085, {hms_string(epoch_elapsed)}
Epoch 33, gen loss=0.6931317448616028,disc loss=1.0064733028411865, {hms_string(epoch_elapsed)}
Epoch 34, gen loss=0.6931313872337341,disc loss=1.0064648389816284, {hms_string(epoch_elapsed)}
Epoch 35, gen loss=0.6931263208389282,disc loss=1.006460189819336, {hms_string(epoch_elapsed)}
Epoch 36, gen loss=0.6931310892105103,disc loss=1.00645112991333, {hms_string(epoch_elapsed)}
Epoch 37, gen loss=0.6931243538856506,disc loss=1.0064570903778076, {hms_string(epoch_elapsed)}
Epoch 38, gen loss=0.6930460333824158,disc loss=1.0065319538116455, {hms_string(epoch_elapsed)}
Epoch 39, gen loss=0.6931352019309998,disc loss=1.0064371824264526, {hms_string(epoch_elapsed)}
Epoch 40, gen loss=0.6931366920471191,disc loss=1.0064396858215332, {hms_string(epoch_elapsed)}
Epoch 41, gen loss=0.6931322813034058,disc loss=1.0064457654953003, {hms_string(epoch_elapsed)}
Epoch 42, gen loss=0.6930727958679199,disc loss=1.0065124034881592, {hms_string(epoch_elapsed)}
Epoch 43, gen loss=0.6931239366531372,disc loss=1.0064616203308105, {hms_string(epoch_elapsed)}
Epoch 44, gen loss=0.6931028962135315,disc loss=1.0064678192138672, {hms_string(epoch_elapsed)}
Epoch 45, gen loss=0.69310462474823,disc loss=1.006474256515503, {hms_string(epoch_elapsed)}
Epoch 46, gen loss=0.6931239366531372,disc loss=1.0064617395401, {hms_string(epoch_elapsed)}
Epoch 47, gen loss=0.6931269764900208,disc loss=1.0064493417739868, {hms_string(epoch_elapsed)}
Epoch 48, gen loss=0.6931357383728027,disc loss=1.0064547061920166, {hms_string(epoch_elapsed)}
Epoch 49, gen loss=0.6931393146514893,disc loss=1.006434679031372, {hms_string(epoch_elapsed)}
Epoch 50, gen loss=0.6931280493736267,disc loss=1.0064525604248047, {hms_string(epoch_elapsed)}
Training time: 0:00:19.70
generator.save(os.path.join(DATA_PATH,"face_generator.h5"))