Background A neural network in 6 steps
In Building a Simple Neural Network From Scratch in PyTorch, we described a recipe with 6 functions as follows:
train_model(epochs=30, lr=0.1): This function acts as the outer wrapper of our training process. It requires access to the training data,
trainingOut, which should be defined in the environment.
train_modelorchestrates the training process by calling the
execute_epochfunction for a specified number of epochs.
execute_epoch(coeffs, lr): Serving as the inner wrapper, this function carries out one complete training epoch. It takes the current coefficients (weights and biases) and a learning rate as input. Within an epoch, it calculates the loss and updates the coefficients. To estimate the loss, it calls
calc_loss, which compares the predicted output generated by
calc_predswith the target output. After this,
execute_epochperforms a backward pass to compute the gradients of the loss, storing these gradients in the
gradattribute of each coefficient tensor.
calc_loss(coeffs, indeps, deps): This function calculates the loss using the given coefficients, input predictors
indeps, and target output
deps. It relies on
calc_predsto obtain the predicted output, which is then compared to the target output to compute the loss. The backward pass is subsequently invoked to compute the gradients, which are stored within the
gradattribute of the coefficient tensors for further optimization.
calc_preds(coeffs, indeps): Responsible for computing the predicted output based on the given coefficients and input predictors
indeps. This function follows the forward pass logic and applies activation functions where necessary to produce the output.
update_coeffs(coeffs, lr): This function plays a pivotal role in updating the coefficients. It iterates through the coefficient tensors, applying gradient descent with the specified learning rate
lr. After each update, it resets the gradients to zero using the
zero_function, ensuring the gradients are fresh for the next iteration.
init_coeffs(n_hidden=20): The initialization function is responsible for setting up the initial coefficients. It shapes each coefficient tensor based on the number of neurons specified for the sole hidden layer.
model_accuracy(coeffs): An optional function that evaluates the prediction accuracy on the validation set, providing insights into how well the trained model generalizes to unseen data.
In this blog post, we’ll take a deep dive into constructing a powerful deep learning neural network from the ground up using PyTorch. Building upon the foundations of the previous simple neural network, we’ll refactor some of these functions for deep learning.
Deep Learning Refactor code for multiple hidden layers
Initializing Weights and Biases
To prepare our neural network for deep learning, we’ve revamped the weight and bias initialization process. The
init_coeffs function now allows for specifying the number of neurons in each hidden layer, making it flexible for different network configurations. We generate weight matrices and bias vectors for each layer while ensuring they are equipped to handle the deep learning challenges.
def init_coeffs(hiddens=[10, 10]): sizes = [trainingIn.shape] + hiddens +  n = len(sizes) weights = [(torch.rand(sizes[i], sizes[i+1]) - 0.3) / sizes[i+1] * 4 for i in range(n-1)] # Weight initialization biases = [(torch.rand(1) - 0.5) * 0.1 for i in range(n-1)] # Bias initialization for wt in weights: wt.requires_grad_() for bs in biases: bs.requires_grad_() return weights, biases
We define the architecture’s structure using
hiddens specifies the number of neurons in each hidden layer. We ensure that weight and bias initialization is suitable for deep networks.
Forward Propagation With Multiple Hidden Layers
calc_preds function accommodates multiple hidden layers in the network. It iterates through the layers, applying weight matrices and biases at each step and introducing non-linearity using the ReLU activation function in the hidden layers and the sigmoid activation in the output layer. This enables our deep learning network to capture complex patterns in the data.
def calc_preds(coeffs, indeps): weights, biases = coeffs res = indeps n = len(weights) for i, wt in enumerate(weights): res = res @ wt + biases[i] if (i != n-1): res = F.relu(res) # Apply ReLU activation in hidden layers return torch.sigmoid(res) # Sigmoid activation in the output layer
Note that weights is now a list of tensors containing layer-wise weights and correspondingly, biases is the the list of tensors containing layer-wise biases.
Backward Propagation With Multiple Hidden Layers
Loss calculation and gradient descent remain consistent with the simple neural network implementation. We use the mean absolute error (MAE) for loss as before and tweak the
update_coeffs function to apply gradient descent to update the weights and biases in each hidden layer.
def update_coeffs(coeffs, lr): weights, biases = coeffs for layer in weights+biases: layer.sub_(layer.grad * lr) layer.grad.zero_()
Putting It All Together in Wrapper Functions
train_model function can be used ‘as is’ to orchestrate the raining process using the
execute_epoch wrapper function to help as before. The
model_accuracy function also does not change.
Summary Conclusion and Takeaways
With these modifications, we’ve refactored our simple neural network into a deep learning model that has greater capacity for learning. The beauty of it is we have retained the same set of functions and interfaces that we implemented in a simple neural network, refactoring the code to scale with multiple hidden layers.
train_model(epochs=30, lr=0.1): No change!
execute_epoch(coeffs, lr): No change!
calc_loss(coeffs, indeps, deps): No change!
calc_preds(coeffs, indeps): Tweak to use the set of weights and corresponding set of biases in each hidden layer, iterating over all layers from input to output.
update_coeffs(coeffs, lr): Tweak to iterate over the set of weights and accompanying set of biases in each layer.
init_coeffs(hiddens=[10, 10]): Tweak for compatibility with an architecture that can potentially have any number of hidden layers of any size.
model_accuracy(coeffs): No change!
Such a deep learning model has greater capacity for learning. However, it is is more hungry for training data! In subsequent posts, we will examine the breakthroughs that have made it possible to make deep learning models practically feasible and reliable. These include advancements such as:
- Batch Normalization
- Residual Connections
Are you eager to dive deeper into the world of deep learning and further enhance your skills?Consider joining our coaching class in deep learning with FastAI. Our class is designed to provide hands-on experience and in-depth knowledge of cutting-edge deep learning techniques. Whether you’re a beginner or an experienced practitioner, we offer tailored guidance to help you master the intricacies of deep learning and empower you to tackle complex projects with confidence. Join us on this exciting journey to unlock the full potential of artificial intelligence and neural networks.