## Building a Simple Neural Network From Scratch in PyTorch

# Background Structure and Training Data

In this blog post, we will walk you through the process of creating a simple neural network from scratch in PyTorch for binary classification. We will implement a neural network with one hidden layer containing multiple neurons followed by a single output neuron. We will also discuss the design choices made for this network, including the use of ReLU activation in the hidden layer and sigmoid activation in the output layer.

Neural Network Architecture

The architecture of our simple neural network can be summarized as follows:

- Input Layer
- Hidden Layer with
`n`

neurons and ReLU activation. - Output Layer with a single neuron and sigmoid activation.

This structure allows us to demonstrate the gradient descent algorithm in PyTorch with multiple iterations of two steps as follows:

- Forward-propagate inputs to generate outputs and compute loss
- Backward-propagate loss by computing gradients and applying them to update model parameters.

We show how PyTorch uses tensors to parallelize operations for efficiency.

**Training Data**

It is customary to split the available data into three distinct sets: training, validation, and testing. These sets serve specific roles in the model development process.

**Training Data**: The training set is the largest portion of the data and is primarily used for training the model. During training, the gradients are computed on this data to update the weights and biases iteratively, allowing the model to learn from the provided examples.**Validation Data**: The validation set is essential for assessing the model’s performance during training. It is not used for gradient computation but serves as a means to measure the loss. This monitoring helps prevent overfitting, a scenario where the model memorizes the training data rather than generalizing from it. Adjustments to the model can be made based on the validation loss.- Test Data: The test set is a reserved subset and should be used sparingly. It comes into play only after the model has completed its training phase. It serves the purpose of evaluating the model’s generalization performance on unseen data and reporting the final results. It ensures that the model can make accurate predictions on new, previously unseen examples, thus providing a reliable measure of its effectiveness.

This partitioning strategy allows for rigorous model assessment and ensures that the model’s performance is accurately estimated on data it has not encountered during training or validation. Before running the code, ensure that `trainingIn`

and `trainingOut`

are defined as global variables. These are represented as tables where rows correspond to individual examples, and each column represents a specific field or feature.

`trainingIn`

contains the independent variables and has the shape (#examples x #variables), where`#examples`

is the number of data points or examples in our training dataset and`#variables`

is the number of independent variables or features.`trainingOut`

contains the dependent variable and has the shape (#examples x 1), where`#examples`

is the same as in`trainingIn`

Likewise, we’d want the `validationIn`

and `validationOut`

sets as global variables.

# Backpropagation Apply gradient descent for training

**Initializing Weights and Biases**

We start by defining the initialization function `init_coeffs`

to set up the initial weights and biases for the neural network. The initialization process includes the following steps:

`import torch def init_coeffs(n_hidden=20): wt_hidden = (torch.rand(trainingIn.shape[1], n_hidden) - 0.5) / n_hidden wt_output = torch.rand(n_hidden, 1) - 0.3 bias_output = torch.rand(1)[0] return wt_hidden.requires_grad_(), wt_output.requires_grad_(), bias_output.requires_grad_()`

The key points in this initialization are:

- We divide the weights in the hidden layer by the number of hidden neurons to help with convergence.
- We introduce a bias for the output layer.

Note that we set `requires_grad`

on weights and biases during initialization. This is a crucial step, as it informs PyTorch to track and compute gradients for these parameters during the subsequent forward and backward passes. When the loss is calculated as a function of weights and biases, PyTorch automatically computes the gradients of the loss with respect to these parameters and stores them for gradient descent optimization.

**Forward Pass**

Next, we define the function `calc_preds`

to perform the forward pass of the neural network:

`import torch.nn.functional as F def calc_preds(coeffs, indeps): wt_hid, wt_out, bias = coeffs hidden_layer_output = F.relu(indeps @ wt_hid) output = torch.sigmoid(hidden_layer_output @ wt_out + bias) return output`

In this function:

- We use the ReLU activation in the hidden layer.
- We use the sigmoid activation in the output layer.

The use of non-linearity is key, Without it, the linear layers are equivalent to a single layer. More importantly, the superposition of non-linearities is what gives the neural network the property of being a universal function approximator. We have chosen ReLU for hidden layer and sigmoid of the output layer, enabling the interpretation of the output as a likelihood score.

**Loss Calculation**

We calculate the loss using the mean absolute error (MAE) in the `calc_loss`

function:

`def calc_loss(coeffs, indeps, deps): predictions = calc_preds(coeffs, indeps) loss = torch.abs(predictions - deps).mean() return loss`

Notice that the loss is a function of the weights and biases. By setting `requires_grad`

on these parameters, we inform PyTorch that we are interested in computing the gradients of the loss with respect to these parameters for the purpose of optimization.

**Training the Model**

To train the neural network, we define the training process using the `train_model`

function:

`def train_model(epochs=30, lr=0.1): torch.manual_seed(442) coeffs = init_coeffs() for i in range(epochs): execute_epoch(coeffs=coeffs, lr=lr) return coeffs`

The `train_model`

function:

- Initializes the coefficients.
- Iterates through a specified number of epochs.
- Calls
`execute_epoch`

for each epoch to update the coefficients.

**Executing an Epoch**

The `execute epoch`

function calculates the loss using `calc_loss`

and propagates the gradients using `update_coeffs`

as follows:

`def execute_epoch(coeffs, lr): loss = calc_loss(coeffs, trainingIn, trainingOut) loss.backward() with torch.no_grad(): update_coeffs(coeffs, lr) print(f'{loss:.3f}', end='; ')`

When we call `backward`

on the loss, PyTorch automatically calculates gradients for all the parameters that contribute to the loss and have `requires_grad`

set. These gradients are stored with the respective parameters and can be accessed using the `grad`

attribute.

**Updating Coefficients**

The `update_coeffs`

function is used to update the coefficients using gradient descent as follows:

`def update_coeffs(coeffs, lr): for layer in coeffs: layer.sub_(layer.grad * lr) layer.grad.zero_()`

Note that PyTorch accumulates gradients unless these are reset to zero between successive steps. That is why we have `zero_`

once the gradients are used to update weights and biases.

**Running the Training**

Finally, we run the training with different learning rates and for varying numbers of epochs:

`coeffs = train_model(lr=1.4) # Example 1 coeffs = train_model(lr=20) # Example 2 coeffs = train_model(epochs=100, lr=10) # Example 3`

You can observe how the loss changes during training and evaluate the model’s accuracy based on your dataset.

**Model Accuracy**

Optionally, we can implement a function `model_accuracy(coeffs)`

, to evaluate the accuracy of the trained model on the validation dataset.

`def model_accuracy(coeffs): return (validationOut.bool() == (calc_preds(coeffs, validationIn) > 0.5)).float().mean()`

That’s it! We now have a simple neural network implemented from scratch in PyTorch for binary classification. We can customize the architecture, hyperparameters, and activation functions to suit our specific problem.

# Summary Conclusion and Takeaways

`train_model()`

wrapper that requires data cuts `trainingIn`

and `trainingOut`

in the environment. The steps are as follows:`train_model(epochs=30, lr=0.1)`

: This function acts as the outer wrapper of our training process. It requires access to the training data,`trainingIn`

and`trainingOut`

, which should be defined in the environment.`train_model`

orchestrates the training process by calling the`execute_epoch`

function for a specified number of epochs.`execute_epoch(coeffs, lr)`

: Serving as the inner wrapper, this function carries out one complete training epoch. It takes the current coefficients (weights and biases) and a learning rate as input. Within an epoch, it calculates the loss and updates the coefficients. To estimate the loss, it calls`calc_loss`

, which compares the predicted output generated by`calc_preds`

with the target output. After this,`execute_epoch`

performs a backward pass to compute the gradients of the loss, storing these gradients in the`grad`

attribute of each coefficient tensor.`calc_loss(coeffs, indeps, deps)`

: This function calculates the loss using the given coefficients, input predictors`indeps`

, and target output`deps`

. It relies on`calc_preds`

to obtain the predicted output, which is then compared to the target output to compute the loss. The backward pass is subsequently invoked to compute the gradients, which are stored within the`grad`

attribute of the coefficient tensors for further optimization.`calc_preds(coeffs, indeps)`

: Responsible for computing the predicted output based on the given coefficients and input predictors`indeps`

. This function follows the forward pass logic and applies activation functions where necessary to produce the output.`update_coeffs(coeffs, lr)`

: This function plays a pivotal role in updating the coefficients. It iterates through the coefficient tensors, applying gradient descent with the specified learning rate`lr`

. After each update, it resets the gradients to zero using the`zero_`

function, ensuring the gradients are fresh for the next iteration.`init_coeffs(n_hidden=20)`

: The initialization function is responsible for setting up the initial coefficients. It shapes each coefficient tensor based on the number of neurons specified for the sole hidden layer.`model_accuracy(coeffs)`

: An optional function that evaluates the prediction accuracy on the validation set, providing insights into how well the trained model generalizes to unseen data.

While we have demonstrated steepest gradient with a simple neural network, we can extend this implementation to a deep learning model by adding more hidden layers. All we need to do is refactor the code keeping the same set of 6 functions and their interfaces. Following the approach presented here, we can create a versatile and scalable neural network architecture tailored to specific requirements.

Are you eager to dive deeper into the world of deep learning and further enhance your skills?Consider joining our coaching class in deep learning with FastAI. Our class is designed to provide hands-on experience and in-depth knowledge of cutting-edge deep learning techniques. Whether you’re a beginner or an experienced practitioner, we offer tailored guidance to help you master the intricacies of deep learning and empower you to tackle complex projects with confidence. Join us on this exciting journey to unlock the full potential of artificial intelligence and neural networks.