This notebook presents a case study that aims to classify defects on metal surfaces. To do this we have a dataset of 1800 images of steel surfaces. There are 6 possible defects, so we have 300 images of steel surfaces for each defect. We started with a sub-dataset of only 90 images: 15 for each defect, dividing the set into two parts:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import torch
import os
from torchvision import datasets, transforms, models
from skimage import io, transform
import matplotlib.pyplot as plt
from torch import nn, optim
from time import time
train_trainsforms = transforms.Compose([transforms.Resize((200,200)),
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,)),])
test_trainsforms = transforms.Compose([transforms.Resize((200,200)),
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,)),])
train_data = datasets.ImageFolder('Data_train/',transform=train_trainsforms)
test_data = datasets.ImageFolder('Data_test/',transform=test_trainsforms)
num_train = len(train_data)
indices = list(range(num_train))
num_test = len(test_data)
indices = list(range(num_test))
The loader will load each time 32 images for one training iteration and we set a seed to have always the same random batches.
torch.manual_seed(2)
train_loader = torch.utils.data.DataLoader(train_data,batch_size=32,shuffle=True)
test_loader = torch.utils.data.DataLoader(test_data,batch_size=32,shuffle=True)
dataiter = iter(train_loader)
images, labels = dataiter.next()
images = images[:,0,:,:]
print(images.shape)
print(labels.shape)
torch.Size([32, 200, 200]) torch.Size([32])
All images are square and are made up of 40000 pixels (200 x 200)
Printing the labels of a batch as a result we get labels with numbers in a range between 0 and 5, as there are 6 defects identified by numbers. The defects corresponding to the various numbers are:
print(labels)
tensor([3, 3, 2, 3, 3, 1, 4, 5, 3, 0, 2, 2, 4, 4, 1, 0, 0, 2, 1, 5, 0, 3, 4, 2, 5, 5, 3, 1, 0, 4, 1, 1])
Here is the print of the first image of the batch.
plt.imshow(images[0], cmap='gray_r');
Here is the plot of a big part of the batch.
figure = plt.figure()
num_of_images = 30
for index in range(1, num_of_images+1):
plt.subplot(5, 6, index)
plt.axis('off')
plt.imshow(images[index].numpy().squeeze(), cmap='gray_r')
We create our own neural network model: since the image has already been transformed into a tensor, it has already been flattened, so in theory the input layer should consist of 40000 neurons (200x200). However, since our images are considered in colour, they have to pass through 3 filters (RGB) and therefore the input layer is composed of 120,000 neurons (40000x3). The output layer must be made up of 6 neurons because there are 6 possible metal defects. We have decided to create 2 hidden layers with 128 and 64 neurons each.
Each Neuron from the Hidden Layer is implemented with a ReLu activation function and the output layer with a LogSoftmax. This architecture is better suited for Convolutional Neural Network which is the reference for image processing.
input_size = 120000
hidden_sizes = [128,64]
output_size = 6
model = nn.Sequential(nn.Linear(input_size, hidden_sizes[0]),
nn.ReLU(),
nn.Linear(hidden_sizes[0], hidden_sizes[1]),
nn.ReLU(),
nn.Linear(hidden_sizes[1], output_size),
nn.LogSoftmax(dim=1))
print(model)
Sequential( (0): Linear(in_features=120000, out_features=128, bias=True) (1): ReLU() (2): Linear(in_features=128, out_features=64, bias=True) (3): ReLU() (4): Linear(in_features=64, out_features=6, bias=True) (5): LogSoftmax(dim=1) )
We opt for a the negative log likelihood loss. It is useful to train a classification problem with C classes.
criterion = nn.NLLLoss()
images, labels = next(iter(train_loader))
images = images.view(images.shape[0], -1)
logps = model(images)
loss = criterion(logps, labels)
The image tensor consists of 32 images (batch size) and therefore 32 rows and 120000 columns (number of pixels in each image multiplied by 3 because of the 3 RGB filters).
images.shape
torch.Size([32, 120000])
We now apply backward propagation to improve the weights and lower the cost function.
print('Before backward pass: \n', model[0].weight.grad)
loss.backward()
print('After backward pass: \n', model[0].weight.grad)
Before backward pass: None After backward pass: tensor([[-8.9428e-04, -9.7488e-04, -1.0046e-03, ..., 1.3797e-03, 1.3326e-03, 1.4158e-03], [-9.0823e-04, -8.5888e-04, -1.2532e-03, ..., -1.7898e-03, -1.5444e-03, -1.7550e-03], [ 7.5068e-04, 7.4561e-04, 6.9309e-04, ..., -6.4397e-05, 1.0643e-04, 1.5402e-04], ..., [ 1.4001e-03, 1.6305e-03, 1.5908e-03, ..., -4.4090e-04, -3.2167e-04, -1.7612e-04], [ 3.2741e-04, 3.4318e-04, 2.7764e-04, ..., -1.6735e-03, -1.5686e-03, -1.6354e-03], [ 2.2885e-03, 2.2349e-03, 2.1885e-03, ..., -1.7438e-03, -1.4838e-03, -1.2358e-03]])
Using the stochastic gradient descent optimiser, we try to achieve a good accuracy score by making 20 epochs (as the Gradient Descent is an iterative process and updating the weights with single pass or one epoch is not enough). We set again the seed to have always the same results.
torch.manual_seed(2)
#the Stochastic Gradient Descent optimizer
#momentum is a parameter that will avoid us getting stuck into a local minimum : a value between 0 and 1
optimizer = optim.SGD(model.parameters(), lr=0.003, momentum=0.9)
time0 = time()
#number of feedforward and backpropagation iterations that we will do to train the network
epochs = 20
for e in range(epochs):
#we start with an error or loss of zero
running_loss = 0
#Iteration through the images dataset
for images, labels in train_loader:
# Flatten images into a long vector
images = images.view(images.shape[0], -1)
# Training pass
optimizer.zero_grad()
output = model(images)
loss = criterion(output, labels)
#This is where the model learns by backpropagating
loss.backward()
#And optimizes its weights here
optimizer.step()
running_loss += loss.item()
else:
print("Epoch {} - Training loss: {}".format(e, running_loss/len(train_loader)))
print("\nTraining Time (in minutes) =",(time()-time0)/60)
Epoch 0 - Training loss: 1.780898928642273 Epoch 1 - Training loss: 1.7081470489501953 Epoch 2 - Training loss: 1.5955978035926819 Epoch 3 - Training loss: 1.4502174258232117 Epoch 4 - Training loss: 1.2909554243087769 Epoch 5 - Training loss: 1.1432413458824158 Epoch 6 - Training loss: 1.0294532179832458 Epoch 7 - Training loss: 0.9218446314334869 Epoch 8 - Training loss: 0.8360144197940826 Epoch 9 - Training loss: 0.7500104308128357 Epoch 10 - Training loss: 0.6797953844070435 Epoch 11 - Training loss: 0.6182784736156464 Epoch 12 - Training loss: 0.5592231750488281 Epoch 13 - Training loss: 0.5171482861042023 Epoch 14 - Training loss: 0.45815621316432953 Epoch 15 - Training loss: 0.4129913002252579 Epoch 16 - Training loss: 0.3811275511980057 Epoch 17 - Training loss: 0.33993464708328247 Epoch 18 - Training loss: 0.319619320333004 Epoch 19 - Training loss: 0.29341550171375275 Training Time (in minutes) = 0.07853331963221231
The time to complete the operation is very low even though there are so many neurons because in this case the dataset is only 90 images.
for images, labels in train_loader:
print(images.shape)
torch.Size([32, 3, 200, 200]) torch.Size([28, 3, 200, 200])
torch.manual_seed(2)
correct_count, all_count = 0, 0
for images,labels in test_loader:
for i in range(len(labels)):
img = images[i].view(1, 120000)
#torch.no_grad(): Context-manager that disabled gradient calculation
with torch.no_grad():
logps = model(img)
#get the scores obtained for each class.
ps = torch.exp(logps)
probab = list(ps.numpy()[0])
#choose the one having the maximum score and compare it to the real label
pred_label = probab.index(max(probab))
true_label = labels.numpy()[i]
if(true_label == pred_label):
correct_count += 1
all_count += 1
print("Number Of Images Tested =", all_count)
print("\nNumber of images predicted correctly =", correct_count)
print("\nModel Accuracy =", (correct_count/all_count))
Number Of Images Tested = 30 Number of images predicted correctly = 17 Model Accuracy = 0.5666666666666667
By training our neural network we had an accuracy score of our model of 57%. This means that out of 30 images in our test set, our model correctly classified the defect on the metal in 57% of the cases (i.e. 17 out of 30 images).
Not being a very high score, we try in the second part of the notebook to enlarge the dataset from 90 images to 1800. Moreover in the previous dataset the test set was 33.3% of the total dataset, while the dataset of 1800 images is separated in this way:
train_data= datasets.ImageFolder('New_data_train/',transform=train_trainsforms)
test_data = datasets.ImageFolder('New_data_test/',transform=test_trainsforms)
num_train = len(train_data)
indices = list(range(num_train))
num_test = len(test_data)
indices = list(range(num_test))
Again, the batch size chosen is 32 images.
torch.manual_seed(9)
train_loader = torch.utils.data.DataLoader(train_data,batch_size=32,shuffle=True)
test_loader = torch.utils.data.DataLoader(test_data,batch_size=32,shuffle=True)
#print the size of the data
#the first line prints the size of the images tensor : There are 64 images in a batch and each image has 200 x 2000 pixels
#as we have RGB images the first dimension of the image is 3 and not 1
dataiter = iter(train_loader)
images, labels = dataiter.next()
images = images[:,0,:,:]
print(images.shape)
print(labels.shape)
torch.Size([32, 200, 200]) torch.Size([32])
#we have 6 classes corresponding to 6 types of defects
#the loader will code these classes from 0 to 5
print(labels)
tensor([1, 4, 2, 5, 3, 5, 5, 2, 2, 0, 5, 5, 1, 5, 5, 3, 2, 3, 4, 0, 3, 5, 3, 3, 3, 1, 4, 3, 5, 1, 2, 0])
#depth, width, height
images[1].numpy().shape
(200, 200)
In order to increase the final accuracy of the model we have tried an heavier NN architecture with 4 hidden layers (512, 256, 128, 64 respectively).
input_size = 120000
hidden_sizes = [512,256,128,64]
output_size = 6
model = nn.Sequential(nn.Linear(input_size, hidden_sizes[0]),
nn.ReLU(),
nn.Linear(hidden_sizes[0], hidden_sizes[1]),
nn.ReLU(),
nn.Linear(hidden_sizes[1], hidden_sizes[2]),
nn.ReLU(),
nn.Linear(hidden_sizes[2], hidden_sizes[3]),
nn.ReLU(),
nn.Linear(hidden_sizes[3], output_size),
nn.LogSoftmax(dim=1))
print(model)
Sequential( (0): Linear(in_features=120000, out_features=512, bias=True) (1): ReLU() (2): Linear(in_features=512, out_features=256, bias=True) (3): ReLU() (4): Linear(in_features=256, out_features=128, bias=True) (5): ReLU() (6): Linear(in_features=128, out_features=64, bias=True) (7): ReLU() (8): Linear(in_features=64, out_features=6, bias=True) (9): LogSoftmax(dim=1) )
criterion = nn.NLLLoss()
images, labels = next(iter(train_loader))
#view(a, b)` will return a new tensor with the same data as `images` with size `(a, b)`.
images = images.view(images.shape[0], -1)
logps = model(images) #log probabilities, the output of our neural network model
loss = criterion(logps, labels) #calculate the NLL loss
print('Before backward pass: \n', model[0].weight.grad)
loss.backward()
print('After backward pass: \n', model[0].weight.grad)
Before backward pass: None After backward pass: tensor([[ 6.6058e-06, 7.2482e-06, 7.0832e-06, ..., -1.3033e-04, -1.3199e-04, -1.3523e-04], [-1.8447e-04, -1.8739e-04, -1.5799e-04, ..., -1.5068e-04, -1.4811e-04, -1.5261e-04], [ 9.0736e-05, 1.1525e-04, 1.1755e-04, ..., -1.6380e-05, -2.2267e-05, -2.0374e-05], ..., [ 1.3173e-04, 1.4631e-04, 1.6834e-04, ..., 8.6974e-06, -3.3343e-06, -1.0064e-05], [-5.4521e-05, -5.7422e-05, -7.8607e-05, ..., 8.4750e-06, -7.4097e-06, -4.7359e-06], [-1.7587e-04, -2.1240e-04, -2.3074e-04, ..., -1.8522e-04, -1.9259e-04, -1.7729e-04]])
#the Stochastic Gradient Descent optimizer
#momentum is a parameter that will avoid us getting stuck into a local minimum : a value between 0 and 1
optimizer = optim.SGD(model.parameters(), lr=0.003, momentum=0.9)
time0 = time()
torch.manual_seed(9)
#number of feedforward and backpropagation iterations that we will do to train the network
epochs = 20
for e in range(epochs):
#we start with an error or loss of zero
running_loss = 0
#Iteration through the images dataset
for images, labels in train_loader:
# Flatten images into a long vector
images = images.view(images.shape[0], -1)
# Training pass
optimizer.zero_grad()
output = model(images)
loss = criterion(output, labels)
#This is where the model learns by backpropagating
loss.backward()
#And optimizes its weights here
optimizer.step()
running_loss += loss.item()
else:
print("Epoch {} - Training loss: {}".format(e, running_loss/len(train_loader)))
print("\nTraining Time (in minutes) =",(time()-time0)/60)
Epoch 0 - Training loss: 1.7266536997813804 Epoch 1 - Training loss: 1.5533046979530185 Epoch 2 - Training loss: 1.4189983863456577 Epoch 3 - Training loss: 1.2786789244296504 Epoch 4 - Training loss: 1.1757741874339533 Epoch 5 - Training loss: 1.101893326815437 Epoch 6 - Training loss: 1.0372385686519099 Epoch 7 - Training loss: 0.953442283705169 Epoch 8 - Training loss: 0.8605675387616251 Epoch 9 - Training loss: 0.8166301928314508 Epoch 10 - Training loss: 0.7603713139599445 Epoch 11 - Training loss: 0.6924999367957022 Epoch 12 - Training loss: 0.6302560994438097 Epoch 13 - Training loss: 0.5399501352917915 Epoch 14 - Training loss: 0.5207440043781318 Epoch 15 - Training loss: 0.5892245062426025 Epoch 16 - Training loss: 0.4836716944096135 Epoch 17 - Training loss: 0.4731340124910953 Epoch 18 - Training loss: 0.4347659627012178 Epoch 19 - Training loss: 0.40920596905783113 Training Time (in minutes) = 6.060018372535706
The time in this case is much higher (about 5 minutes) due to both the much larger dataset and the addition of 2 hidden layers compared to the previous model.
torch.manual_seed(9)
correct_count, all_count = 0, 0
for images,labels in test_loader:
for i in range(len(labels)):
img = images[i].view(1, 120000)
#torch.no_grad(): Context-manager that disabled gradient calculation
with torch.no_grad():
logps = model(img)
#get the scores obtained for each class.
ps = torch.exp(logps)
probab = list(ps.numpy()[0])
#choose the one having the maximum score and compare it to the real label
pred_label = probab.index(max(probab))
true_label = labels.numpy()[i]
if(true_label == pred_label):
correct_count += 1
all_count += 1
print("Number Of Images Tested =", all_count)
print("\nNumber of images predicted correctly =", correct_count)
print("\nModel Accuracy =", (correct_count/all_count))
Number Of Images Tested = 180 Number of images predicted correctly = 115 Model Accuracy = 0.6388888888888888
With the larger dataset we managed to get a higher accuracy score: 64%. This means that out of the 180 images in the test set, the model got the metal default right in 115 images.
Let's test one image prediction from the test set
torch.manual_seed(1)
images, labels = next(iter(test_loader))
image = images[0]
label = labels[0]
img = image.view(1, 120000)
with torch.no_grad():
logps = model(img)
ps = torch.exp(logps)
probab = list(ps.numpy()[0])
print("Predicted metal surface defect =", probab.index(max(probab)))
print("Label =", labels[0])
Predicted metal surface defect = 1 Label = tensor(1)
This image was correctly predicted: in fact it is an inclusion defect (type 1) and our neural network gave the correct defect as a prediction.
image = image[0]
plt.imshow(image, cmap='gray_r');
In this last part we take an image from the validation set: i.e. an unlabelled image that our model has never seen, to see if it can correctly predict the type of defect.
val_trainsforms = transforms.Compose([transforms.Resize((200,200)),
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,)),])
val_data = datasets.ImageFolder('Data_validation/',transform=val_trainsforms)
val_loader = torch.utils.data.DataLoader(val_data,batch_size=32,shuffle=True)
torch.manual_seed(2)
images, labels = next(iter(val_loader))
image = images[0]
img = image.view(1, 120000)
with torch.no_grad():
logps = model(img)
ps = torch.exp(logps)
probab = list(ps.numpy()[0])
print("Predicted Defect =", probab.index(max(probab)))
Predicted Defect = 5
image = image[0]
plt.imshow(image, cmap='gray_r');
Of this image, we do not have the label, but comparing by eye with the images representing a defect of type scratches (type 5), it seems that our model correctly predicted this image coming out of the dataset without the labels as well.
This algorithm is useful in order to revamp the production part of the value-chain of a company. In fact, this model associated to a visual sensor could help to improve quality on the production line of a steel company. The line detects an anomaly on the steel layer and an operator or another robotisation can remove the steel layer from the line. This system exists already in the automotive and aeronautic industry where robots empowered by computer vision are doing the checkup of the pieces in order to validate its quality and lower error rate.