import torch
import torchvision
import torchvision.datasets as datasets
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
# Some functions we'll need later
import modules.functions as functions
Gender Classification using PyTorch
Introduction
Neural networks are cool, they can take complex tasks that are usually pretty easy for humans to do and automate them, given you have sufficient training data and computing power. In this project, we will explore how to make our own neural network, and attempt to predict the gender of faces.
To get a basic understanding of how neural networks, I would recommend watching 3Blue1Brown’s YouTube playlist on neural networks. As neural networks are slightly more complicated than most common machine learning algorithms, I won’t go through the basics in much detail here.
Objective
For this project, we’ll aim to successfully classify the following images of Freya, Kratos, and me using a neural network model.
We’ll achieve this using publicly available training and testing datasets. But first, we need to load these images as a data type that can be inputted into a PyTorch neural network, a torch
tensor. We can do this using a couple of modules from torchvision
.
We can then use our imported modules to create Dataset
and DataLoader
objects. The Dataset
represents our image data, after applying a transformation which resizes our images to 128 by 128 pixels, converts to grayscale (this saves us some computational power, hopefully colour isn’t an important feature), and then converts the image to a tensor. The DataLoader
object then creates an iterable object using our Dataset
, which is useful for accessing our data in batches, this will help us later when we train our model.
# Set device for GPU acceleration, if available.
= functions.set_device()
device
= transforms.Compose([
loader 128, 128]),
transforms.Resize([1),
transforms.Grayscale(
transforms.ToTensor()
])
= datasets.ImageFolder(
my_dataset ='test_images/',
root=loader
transform
)
= DataLoader(
my_dataset_loader
my_dataset,=len(my_dataset),
batch_size=torch.Generator(device=device)
generator )
Let’s set images
and labels
as the image and label tensors in the first and only batch in our DataLoader
.
= iter(my_dataset_loader)
data = next(data) images, labels
We can then display the image tensors using a simple function that uses matplotlib.pyplot
under the hood.
functions.imshow(torchvision.utils.make_grid(images))
Datasets
As with any supervised machine learning algorithm, neural networks require a training and testing dataset for the model to learn and evaluate out of sample performance. In this section we’ll explore what this looks like for a neural network.
Training
For this project, we’ll require a dataset containing a large number of labelled images of faces, which as you can imagine isn’t all that common. Luckily for us, the CelebA is a publicly available labelled dataset of around 200k faces. As it’s a well known dataset, there is a function in torch
that automatically downloads the required files (sometimes, usually the Google drive link is down) and creates a dataset object for the CelebA.
= int(128/0.8)
imsize = 10
batch_size = ('Female', 'Male')
classes
= transforms.Compose([
fivecrop_transform
transforms.Resize([imsize, imsize]),1),
transforms.Grayscale(int(imsize*0.8)),
transforms.FiveCrop(lambda crops: torch.stack([transforms.ToTensor()(crop) for crop in crops]))
transforms.Lambda(
])
= datasets.CelebA(
train_dataset = './',
root ='all',
split='attr',
target_type=fivecrop_transform,
transform=True
download
)
= DataLoader(
train_loader
train_dataset,=batch_size,
batch_size=True,
shuffle=torch.Generator(device=device)
generator )
We can verify the number of training images using len
.
len(train_dataset)
202599
Note that the set of transformations applied to the training dataset contains FiveCrop
in addition to the standard resize and grayscale transformations, FiveCrop
makes 5 cropped versions of each image (who would have guessed), one for each corner plus centered. This improves model performance and reduces overfitting to the training dataset. However, this also increases the computational resources required to train the model on this dataset by a factor of 5.
There is also a TenCrop
function which applies the transformations from FiveCrop
, plus a vertical flip. I would have liked to use TenCrop
, but my old MacBook did not agree with that decision.
We can then access a few sample training images and their labels as we did previously.
= iter(train_loader)
train_data = next(train_data)
train_images, train_labels
# Index of Male label, as CelebA contains multiple labels.
= functions.attributes.index('Male')
factor
functions.imshow(torchvision.utils.make_grid(
torch.cat((0],
train_images[1],
train_images[2]
train_images[
)),=5
nrow
))
for i in range(3):
print(classes[train_labels[:, factor][i]])
Male
Male
Female
Testing
Next, we need a dataset to test the performance of our model on unseen data. The simple option would be to split CelebA into train and test partitions. However, I found achieving high test accuracy under this setup to be fairly simple, and resulted in poor performance on other image datasets.
Thus, we’ll use a Kaggle dataset of AI generated faces as the test dataset, which I found required a significantly more complicated model to achieve high accuracy in, but produced models with better performance when given a random selection of my own images.
= transforms.Compose([
test_transform int(imsize*0.8), int(imsize*0.8)]),
transforms.Resize([1),
transforms.Grayscale(
transforms.ToTensor()
])
= datasets.ImageFolder(
test_dataset ='ThisPersonDoesNotExist_resize/',
root=test_transform
transform
)
= DataLoader(
test_loader
test_dataset,=batch_size,
batch_size=True,
shuffle=torch.Generator(device=device)
generator )
Once again, we can get the number of images in the test dataset.
len(test_dataset)
6873
This dataset was originally the training dataset, given significantly reduced number of images compared to CelebA, it’s unsurprising the initial models did not perform well.
We can then show a few images from the test dataset, along with their labels.
= iter(test_loader)
test_data = next(test_data)
test_images, test_labels
=5))
functions.imshow(torchvision.utils.make_grid(test_images, nrow
for i in range(batch_size):
print(classes[test_labels[i]])
Female
Female
Male
Female
Female
Female
Female
Female
Female
Male
Model Architecture
Next, we’ll need to determine the architecture, or combination of layers and activation functions, that our neural network will use. I’ll skip the experimentation and failed models part of this project but I found that a scaled up version of the model used in this repository by CallenL worked the best (of the models I tried). This model seemed to perform better due to a combination of having residual layers (enabling back propogation to work better) and more convolution layers (allowing more features to be detected).
import torch.nn as nn
import torch.nn.functional as F
# Define recurring sequence of convolution, batch normalisation, and rectified linear activation function layers.
def conv_block(in_channels, out_channels, pool=False):
= [
layers
nn.Conv2d(
in_channels,
out_channels, =3,
kernel_size=1
padding
),
nn.BatchNorm2d(out_channels),
nn.ReLU()
]if pool:
layers.append(4)
nn.MaxPool2d(
)return nn.Sequential(*layers)
class resnetModel_128(nn.Module):
def __init__(self):
super().__init__()
# Define convolution and residual layers based on conv_block function.
self.conv_1 = conv_block(1, 64)
self.res_1 = nn.Sequential(
64, 64),
conv_block(64, 64)
conv_block(
)self.conv_2 = conv_block(64, 256, pool=True)
self.res_2 = nn.Sequential(
256, 256),
conv_block(256, 256)
conv_block(
)self.conv_3 = conv_block(256, 512, pool=True)
self.res_3 = nn.Sequential(
512, 512),
conv_block(512, 512)
conv_block(
)self.conv_4 = conv_block(512, 1024, pool=True)
self.res_4 = nn.Sequential(
1024, 1024),
conv_block(1024, 1024)
conv_block(
)
# Define classifier function using fully connected, dropout, and rectified linear activation function.
self.classifier = nn.Sequential(
nn.Flatten(),2*2*1024, 2048),
nn.Linear(0.5),
nn.Dropout(
nn.ReLU(),2048, 1024),
nn.Linear(0.5),
nn.Dropout(
nn.ReLU(),1024, 2)
nn.Linear(
)
# Define forward function using functions initialised earlier, which outputs predictions.
def forward(self, x):
= self.conv_1(x)
x = self.res_1(x) + x
x = self.conv_2(x)
x = self.res_2(x) + x
x = self.conv_3(x)
x = self.res_3(x) + x
x = self.conv_4(x)
x = self.res_4(x) + x
x = self.classifier(x)
x = F.softmax(x, dim=1)
x return x
We can now create a variable using our neural network class.
# Set seed for reproducibility.
2687)
torch.manual_seed(= resnetModel_128() resnet
Now is also a good time to check how many parameters (individual weights and biases) our model contains.
= functions.n_parameters(resnet)
total_params, trainable_params print(f'Total Parameters: {total_params}')
print(f'Trainable Parameters: {trainable_params}')
Total Parameters: 41400194
Trainable Parameters: 41400194
The variable resnet
is our model initialised with completely random parameters. For fun, let’s make a prediction based on the untrained model.
eval()
resnet.with torch.no_grad():
= resnet.forward(images.to(device))
output = torch.max(output.data, 1)[1]
predicted
for i in range(len(predicted)):
print(f'Image: {my_dataset.imgs[i][0]}')
print(f'Prediction: {classes[predicted[i]]}')
print(f'Actual: {classes[labels[i]]}')
print(f'{classes[0]} weight: {output[i][0]}')
print(f'{classes[1]} weight: {output[i][1]}\n')
Image: test_images/Female/freya.png
Prediction: Female
Actual: Female
Female weight: 0.5020168423652649
Male weight: 0.4979831576347351
Image: test_images/Male/kratos.png
Prediction: Female
Actual: Male
Female weight: 0.5018221139907837
Male weight: 0.4981779158115387
Image: test_images/Male/me.png
Prediction: Female
Actual: Male
Female weight: 0.5015270113945007
Male weight: 0.49847298860549927
As expected, the weights are about 50-50 which indicates the model isn’t doing much predicting.
Training
So, how to we change the parameters of the model such that it generates more accurate outputs? Basically, by doing an interative process known as backpropagation, which incrementally changes the model parameters based on the partial derivatives of the parameters with respect to the loss function, which minimises error and thus makes the model more accurate. This YouTube video by Artem Kirsanov provides a more detailed explanation of backpropagation.
In the code below, criterion
specifies the loss function and optimizer
specifies the optimisation algorithm used, which in this case is stochastic gradient descent. The additional optional variable scheduler
specifies how the learning rate changes. Here I have used torch.optim.lr_scheduler.StepLR
to multiply the learning rate by 0.1
after every step, with step being defined as an epoch in the training loop.
resnet.train()= nn.CrossEntropyLoss()
criterion = torch.optim.SGD(
optimizer
resnet.parameters(),=0.01,
lr=0.9,
momentum=0.001
weight_decay
)= torch.optim.lr_scheduler.StepLR(
scheduler =optimizer,
optimizer=1,
step_size=0.1
gamma )
Now we can train our model. For each batch in our training data, we need to:
- Resize the input tensors such that
resnet.forward
can take all cropped image tensors as inputs. - Average the outputs of each group of cropped image tensors, so each distinct image only gets one final prediction.
- Calculate loss based on predicted and actual labels.
- Update parameters using backpropagation.
- Record loss and accuracy.
Steps 1 and 2 would be unnecessary if we didn’t use FiveCrop
.
When we have completed this loop for the entire training dataset, we then make predictions on the entire test dataset via its batches and record the loss and accuracy for each batch.
This entire process is called an epoch, and we specify how many epochs to train for. I chose 2 epochs for this model due to time constraints (this exact training setup took 20 hours on my 2020 MacBook Pro), and because it resulted in satisfactory performance anyway.
= 2
epochs = []
train_losses = []
test_losses = []
train_accuracy = []
test_accuracy for i in range(epochs):
= 0
epoch_time
for j, (X_train, y_train) in enumerate(train_loader):
= X_train.to(device)
X_train = y_train[:, factor]
y_train
# Input all crops as separate images.
= X_train.size()
bs, ncrops, c, h, w = resnet.forward(X_train.view(-1, c, h, w))
y_pred_crops # Let image prediction be the mean of crop predictions.
= y_pred_crops.view(bs, ncrops, -1).mean(1)
y_pred
= criterion(y_pred, y_train)
loss
= torch.max(y_pred.data, 1)[1]
predicted = (predicted == y_train).sum()/len(X_train)
train_batch_accuracy
optimizer.zero_grad()
loss.backward()
optimizer.step()
train_losses.append(loss.item())
train_accuracy.append(train_batch_accuracy.item())
print(f'\nEpoch: {i+1}/{epochs} | Train Batch: {j+1}/{len(train_loader)}')
print(f'Train Loss: {loss}')
print(f'Train Accuracy: {train_batch_accuracy}')
break
with torch.no_grad():
for j, (X_test, y_test) in enumerate(test_loader):
= X_test.to(device)
X_test = resnet.forward(X_test)
y_val
= criterion(y_val, y_test)
loss
= torch.max(y_val.data, 1)[1]
predicted = (predicted == y_test).sum()/len(X_test)
test_batch_accuracy
test_losses.append(loss.item())
test_accuracy.append(test_batch_accuracy.item())
print(f'\nEpoch: {i+1}/{epochs} | Test Batch: {j+1}/{len(test_loader)}')
print(f'Test Loss: {loss}')
print(f'Test Accuracy: {test_batch_accuracy}')
break
scheduler.step()break
Epoch: 1/2 | Train Batch: 1/20260
Train Loss: 0.7186497449874878
Train Accuracy: 0.5
Epoch: 1/2 | Test Batch: 1/688
Test Loss: 0.746019721031189
Test Accuracy: 0.30000001192092896
Since the training loop takes a significant amount of time, I’ve broken the loop to demonstrate how it works on this page. However, the loss and accuracy data produced by the training loop was conveniently saved last time the model was trained, so we’ll load it in with pandas
and plot it.
import pandas as pd
= pd.read_csv('training_plot_data/train_data.csv')
train_plot_data = pd.read_csv('training_plot_data/test_data.csv')
test_plot_data
= train_plot_data.drop(columns='Unnamed: 0')
train_plot_data = test_plot_data.drop(columns='Unnamed: 0')
test_plot_data
= train_plot_data.rolling(500).mean()
train_plot_MA_data =True)
train_plot_MA_data.dropna(inplace
= test_plot_data.rolling(50).mean()
test_plot_MA_data =True) test_plot_MA_data.dropna(inplace
train_plot_MA_data.plot(=True,
use_index='Training Loss/Accuracy (2 Epochs)',
title='Batch'
xlabel )
test_plot_MA_data.plot(=True,
use_index='Test Loss/Accuracy (2 Epochs)',
title='Batch'
xlabel )
We can notice that the model converges fairly quickly, with improvements to performance plateauing after about 5000 batches. Testing performance stays relatively consistent, as it’s only evaluated after a full epoch of training is completed, and the model converges before a full epoch is complete.
Results
Going back to our original objective, we will once again attempt to classify Freya, Kratos, and me. But this time, we will use our trained model parameters, which can be loaded in using torch.load
.
'trained_models/resnetModel_128_epoch_2.pt', map_location=device))
resnet.load_state_dict(torch.load(
eval()
resnet.with torch.no_grad():
= resnet.forward(images.to(device))
output = torch.max(output.data, 1)[1]
predicted
for i in range(len(predicted)):
print(f'Image: {my_dataset.imgs[i][0]}')
print(f'Prediction: {classes[predicted[i]]}')
print(f'Actual: {classes[labels[i]]}')
print(f'{classes[0]} weight: {output[i][0]}')
print(f'{classes[1]} weight: {output[i][1]}\n')
Image: test_images/Female/freya.png
Prediction: Female
Actual: Female
Female weight: 0.9999755620956421
Male weight: 2.4402881535934284e-05
Image: test_images/Male/kratos.png
Prediction: Male
Actual: Male
Female weight: 0.019050072878599167
Male weight: 0.9809499382972717
Image: test_images/Male/me.png
Prediction: Male
Actual: Male
Female weight: 0.19523011147975922
Male weight: 0.8047698736190796
Success! The neural network successfully classified all 3 images. It even picked up that Kratos has a higher likelihood of being a man compared to me, which is fairly reasonable.
To get a bit of insight into what the neural network is doing we can visualise what the happens to an image as it passes through the convolutional layers. This shows us how the convolutional layers chooses features and the weight of each feature for our chosen image.
# Choose image of Kratos.
= images[1]
image
with torch.no_grad():
= resnet.conv_1(image.to(device).unsqueeze(0))
layer_1 = resnet.res_1(layer_1) + layer_1
layer_2 = resnet.conv_2(layer_2)
layer_3 = resnet.res_2(layer_3) + layer_3
layer_4 = resnet.conv_3(layer_4)
layer_5 = resnet.res_3(layer_5) + layer_5
layer_6 = resnet.conv_4(layer_6)
layer_7 = resnet.res_4(layer_7) + layer_7 layer_8
functions.imshow(
torchvision.utils.make_grid(0).unsqueeze(1),
layer_1.squeeze(=int(64**0.5)
nrow
) )
functions.imshow(
torchvision.utils.make_grid(0).unsqueeze(1),
layer_2.squeeze(=int(64**0.5)
nrow
) )
functions.imshow(
torchvision.utils.make_grid(0).unsqueeze(1),
layer_3.squeeze(=int(256**0.5)
nrow
) )
functions.imshow(
torchvision.utils.make_grid(0).unsqueeze(1),
layer_4.squeeze(=int(256**0.5)
nrow
) )
functions.imshow(
torchvision.utils.make_grid(0).unsqueeze(1),
layer_5.squeeze(=int(512**0.5)
nrow
) )
functions.imshow(
torchvision.utils.make_grid(0).unsqueeze(1),
layer_6.squeeze(=int(512**0.5)
nrow
) )
functions.imshow(
torchvision.utils.make_grid(0).unsqueeze(1),
layer_7.squeeze(=int(1024**0.5)
nrow
) )
functions.imshow(
torchvision.utils.make_grid(0).unsqueeze(1),
layer_8.squeeze(=int(1024**0.5)
nrow
) )
The code to train and use this model to make predictions on custom images is available in this GitHub repo.