Design your First Custom Neural Network from scratch Using PyTorch

Posted by VJ on June 12, 2020

Design your First Custom Neural Network from scratch Using PyTorch

Okay, here’s an original one : So, they ran a deep neural network to predict the hottest technological trend of 2014. Surprisingly, it predicted the answer to be deep neural networks! People accused it was biased. NN coders assumed it was probably due to the large initial bias.😅

To jump directly to code.

Here is a link for more best resources to learn Data Science.

If you already used TenserFlow or keras to create Neural Network architecture this is gonna be a cake walk for you. Although TenserFlow is great for creating models it doesn’t’t support GPU as efficiently as PyTorch.

The main purpose of PyTorch is to replace numpy with tensors which support GPU computation and unlike keras it provides maximum flexibility to customize to Network architecture.

Let’s not waste time and get started.!

Install PyTorch: Copy the command in this link and run in your terminal based on your system requirements Link.

Import Libraries:

1
2
3
4
5
6
7
8
9
10
11
import torch

import torch.nn as nn # Contains Required functions and layers

import torch.nn.functional as F # For neural network functions:

# For Open ML datasets available in PyTorch.
from torchvision import datasets, transforms

# Contains Optimization function available in PyTorch.
import torch.optim as optim

Let’s try to build a 4 layer network with hello world data set of machine learning. MNIST

Import Dataset: MNIST Hand Written Digits

The data consists of a series of images (containing hand-written numbers) that are of the size 28 X 28. We will discuss the images shortly, but our plan is to load the data into batches of 64*.*

PyTorch datasets library has all the popular datasets required to get you started.(Available Datasets)

1
2
3
4
5
6
7
8
9
10
11
transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize((0.5,), (0.5,)),
                              ])
trainset = datasets.MNIST('~/.pytorch/MNIST_data/', download=True, train=True, transform=transform)

# trainloader is what holds the data loader object which takes care of shuffling the data and constructing the batches
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

testset = datasets.MNIST('~/.pytorch/MNIST_data/', download=True, train=False, transform=transform)

testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=False) # No need to shuffle test data.

Transform function above normalizes data to mean =0.5 and SD = 0.5

Load both train and test with same batch size(It doesn’t need to be 64). each iteration of data gives 64 images and their labels.

Basic steps of neural network contains :

Forward Pass -> Loss calculation -> Backward Pass to optimize weights

Forward Pass:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
class CustomNeuralNetwork(nn.Module):

    def __init__(self):
        super().__init__()

        # Define Layers:
        self.l1 = nn.Linear(784, 256) # layer 1
        self.l2 = nn.Linear(256, 128) # layer 2
        self.l3 = nn.Linear(128, 64) # layer 3
        self.l4 = nn.Linear(64, 10) # layer 4

        # Define Activation functions:
        self.sigmoid = nn.Sigmoid() 
        self.relu = nn.ReLU()
        self.softmax = nn.LogSoftmax(dim = 1) 


    def forward(self, x):
        """
        Layers: 4
        Activation Functions:
        RELU for first two layers
        Sigmoid for third layer
        Log Softmax for last layer
        """
        x = self.l1(x)
        x = self.relu(x)
        x = self.l2(x)
        x = self.relu(x)
        x = self.l3(x)
        x = self.sigmoid(x)
        x = self.l4(x)
        x = self.softmax(x)

        return x

    
NN = CustomNeuralNetwork()  # Intialize you NN

The nn.Module allows to override and create your own network architectures. You can even explore further by creating your own weights, bias and backward pass etc.

Loss Calculation:

1
criterion = nn.NLLLoss() # Initialize loss function

Here I am using Negative log likelihood Loss. As our Data is a multi-class problem with Log Soft-max activation. ReadHere

Optimizer:

1
optimizer = optim.Adam(NN.parameters(), lr = 0.001)

Adam optimization is a stochastic gradient descent method that is based on adaptive estimation of first-order and second-order moments. ReadHere

Train The Model (Back Propagation to update Weights):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# No:of times to train data
epochs = 5
for e in range(epochs):
    for images, labels in trainloader:
        # Faltten the images 
        images = images.view(images.shape[0], -1)

        # set optimizer gradients to zero:
        optimizer.zero_grad()

        
        output = NN(images) # Intial output
        loss = criterion(output, labels) # Loss  Caluclation
        loss.backward() # Pass loss function gradients to pervious layers:
        optimizer.step() # Update Weights

    print(loss.item()) # print loss for each epoch

Sample Output:

0.047625500708818436 0.0713285580277443 0.018748214468359947 0.023736275732517242 0.032160669565200806

Observe that for each epoch The loss reduces (*Don’t forget to set optimizers to zero_grad() before initialization.*)

Best Practice would be save your model for further use so that you need to train again: Using PyTorch you can save either model or state_dict() which requires less space.

1
2
3
4
5
6
# Save your model
PATH = './NeuralNet.pth'
torch.save(NN.state_dict(), PATH) 
# Load your model 
NN = CustomNeuralNetwork()
NN.load_state_dict(torch.load(PATH))

Predict on Test Data:

1
2
3
4
5
6
7
8
9
10
11
12
# Accuracy on Test Data
correct = 0
total = 0
with torch.no_grad():
    for images, labels in testloader:
        images = images.view(images.shape[0], -1)
        output = NN(images)
        _, predicted = torch.max(output.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))

Sample Output:

Accuracy of the network on the 10000 test images: 97 %

Accuracy of each label:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
classes = ('0','1','2','3','4','5','6','7','8','9')
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
    for data in testloader:
        images, labels = data
        images = images.view(images.shape[0], -1)
        outputs = NN(images)
        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()
        for i in range(4):
            label = labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1

# Accuracy of each class:
for i in range(10):
    print('Accuracy of %5s : %2d %%' % (
        classes[i], 100 * class_correct[i] / class_total[i]))

Sample Output:

1
2
3
4
5
6
7
8
9
10
Accuracy of     0 : 100 %
Accuracy of     1 : 100 %
Accuracy of     2 : 92 %
Accuracy of     3 : 98 %
Accuracy of     4 : 98 %
Accuracy of     5 : 90 %
Accuracy of     6 : 96 %
Accuracy of     7 : 98 %
Accuracy of     8 : 98 %
Accuracy of     9 : 96 %

We can observe that the model is confusing for 2 and 5, A simple solution would be collect more data of 2’s and 5’s or you explore options like Data Augmentation, Shift images to center before passing it train your model.

For More Advanced Practice refer Elvis.

If you are still struck at making a decision to learn tensorflow or pytorch like me at the start. ReadThis.