Unlocking the Power of Language: Leveraging Large Language Models for Next-Gen Semantic Search and Real-World Applications
Invited talk at Calfus, Pune, June 20, 2024.
Invited talk at Calfus, Pune, June 20, 2024.
In Building a Simple Neural Network From Scratch in PyTorch, we described a recipe with 6 functions as follows:
train_model(epochs=30, lr=0.1)
: This function acts as the outer wrapper of our training process. It requires access to the training data, trainingIn
and trainingOut
, which should be defined in the environment. train_model
orchestrates the training process by calling the execute_epoch
function for a specified number of epochs.execute_epoch(coeffs, lr)
: Serving as the inner wrapper, this function carries out one complete training epoch. It takes the current coefficients (weights and biases) and a learning rate as input. Within an epoch, it calculates the loss and updates the coefficients. To estimate the loss, it calls calc_loss
, which compares the predicted output generated by calc_preds
with the target output. After this, execute_epoch
performs a backward pass to compute the gradients of the loss, storing these gradients in the grad
attribute of each coefficient tensor.calc_loss(coeffs, indeps, deps)
: This function calculates the loss using the given coefficients, input predictors indeps
, and target output deps
. It relies on calc_preds
to obtain the predicted output, which is then compared to the target output to compute the loss. The backward pass is subsequently invoked to compute the gradients, which are stored within the grad
attribute of the coefficient tensors for further optimization.calc_preds(coeffs, indeps)
: Responsible for computing the predicted output based on the given coefficients and input predictors indeps
. This function follows the forward pass logic and applies activation functions where necessary to produce the output.update_coeffs(coeffs, lr)
: This function plays a pivotal role in updating the coefficients. It iterates through the coefficient tensors, applying gradient descent with the specified learning rate lr
. After each update, it resets the gradients to zero using the zero_
function, ensuring the gradients are fresh for the next iteration.init_coeffs(n_hidden=20)
: The initialization function is responsible for setting up the initial coefficients. It shapes each coefficient tensor based on the number of neurons specified for the sole hidden layer.model_accuracy(coeffs)
: An optional function that evaluates the prediction accuracy on the validation set, providing insights into how well the trained model generalizes to unseen data.In this blog post, we’ll take a deep dive into constructing a powerful deep learning neural network from the ground up using PyTorch. Building upon the foundations of the previous simple neural network, we’ll refactor some of these functions for deep learning.
Initializing Weights and Biases
To prepare our neural network for deep learning, we’ve revamped the weight and bias initialization process. The init_coeffs
function now allows for specifying the number of neurons in each hidden layer, making it flexible for different network configurations. We generate weight matrices and bias vectors for each layer while ensuring they are equipped to handle the deep learning challenges.
def init_coeffs(hiddens=[10, 10]):
sizes = [trainingIn.shape[1]] + hiddens + [1]
n = len(sizes)
weights = [(torch.rand(sizes[i], sizes[i+1]) - 0.3) / sizes[i+1] * 4 for i in range(n-1)] # Weight initialization
biases = [(torch.rand(1)[0] - 0.5) * 0.1 for i in range(n-1)] # Bias initialization
for wt in weights: wt.requires_grad_()
for bs in biases: bs.requires_grad_()
return weights, biases
We define the architecture’s structure using sizes
, where hiddens
specifies the number of neurons in each hidden layer. We ensure that weight and bias initialization is suitable for deep networks.
Forward Propagation With Multiple Hidden Layers
Our revamped calc_preds
function accommodates multiple hidden layers in the network. It iterates through the layers, applying weight matrices and biases at each step and introducing non-linearity using the ReLU activation function in the hidden layers and the sigmoid activation in the output layer. This enables our deep learning network to capture complex patterns in the data.
def calc_preds(coeffs, indeps):
weights, biases = coeffs
res = indeps
n = len(weights)
for i, wt in enumerate(weights):
res = res @ wt + biases[i]
if (i != n-1):
res = F.relu(res) # Apply ReLU activation in hidden layers
return torch.sigmoid(res) # Sigmoid activation in the output layer
Note that weights is now a list of tensors containing layer-wise weights and correspondingly, biases is the the list of tensors containing layer-wise biases.
Backward Propagation With Multiple Hidden Layers
Loss calculation and gradient descent remain consistent with the simple neural network implementation. We use the mean absolute error (MAE) for loss as before and tweak the update_coeffs
function to apply gradient descent to update the weights and biases in each hidden layer.
def update_coeffs(coeffs, lr):
weights, biases = coeffs
for layer in weights+biases:
layer.sub_(layer.grad * lr)
layer.grad.zero_()
Putting It All Together in Wrapper Functions
Our train_model
function can be used ‘as is’ to orchestrate the raining process using the execute_epoch
wrapper function to help as before. The model_accuracy
function also does not change.
With these modifications, we’ve refactored our simple neural network into a deep learning model that has greater capacity for learning. The beauty of it is we have retained the same set of functions and interfaces that we implemented in a simple neural network, refactoring the code to scale with multiple hidden layers.
train_model(epochs=30, lr=0.1)
: No change!execute_epoch(coeffs, lr)
: No change!calc_loss(coeffs, indeps, deps)
: No change!calc_preds(coeffs, indeps)
: Tweak to use the set of weights and corresponding set of biases in each hidden layer, iterating over all layers from input to output.update_coeffs(coeffs, lr)
: Tweak to iterate over the set of weights and accompanying set of biases in each layer.init_coeffs(hiddens=[10, 10])
: Tweak for compatibility with an architecture that can potentially have any number of hidden layers of any size.model_accuracy(coeffs)
: No change!Such a deep learning model has greater capacity for learning. However, it is is more hungry for training data! In subsequent posts, we will examine the breakthroughs that have made it possible to make deep learning models practically feasible and reliable. These include advancements such as:
Are you eager to dive deeper into the world of deep learning and further enhance your skills?Consider joining our coaching class in deep learning with FastAI. Our class is designed to provide hands-on experience and in-depth knowledge of cutting-edge deep learning techniques. Whether you’re a beginner or an experienced practitioner, we offer tailored guidance to help you master the intricacies of deep learning and empower you to tackle complex projects with confidence. Join us on this exciting journey to unlock the full potential of artificial intelligence and neural networks.
In this blog post, we will walk you through the process of creating a simple neural network from scratch in PyTorch for binary classification. We will implement a neural network with one hidden layer containing multiple neurons followed by a single output neuron. We will also discuss the design choices made for this network, including the use of ReLU activation in the hidden layer and sigmoid activation in the output layer.
Neural Network Architecture
The architecture of our simple neural network can be summarized as follows:
n
neurons and ReLU activation.This structure allows us to demonstrate the gradient descent algorithm in PyTorch with multiple iterations of two steps as follows:
We show how PyTorch uses tensors to parallelize operations for efficiency.
Training Data
It is customary to split the available data into three distinct sets: training, validation, and testing. These sets serve specific roles in the model development process.
This partitioning strategy allows for rigorous model assessment and ensures that the model’s performance is accurately estimated on data it has not encountered during training or validation. Before running the code, ensure that trainingIn
and trainingOut
are defined as global variables. These are represented as tables where rows correspond to individual examples, and each column represents a specific field or feature.
trainingIn
contains the independent variables and has the shape (#examples x #variables), where #examples
is the number of data points or examples in our training dataset and #variables
is the number of independent variables or features.trainingOut
contains the dependent variable and has the shape (#examples x 1), where #examples
is the same as in trainingIn
Likewise, we’d want the validationIn
and validationOut
sets as global variables.
Initializing Weights and Biases
We start by defining the initialization function init_coeffs
to set up the initial weights and biases for the neural network. The initialization process includes the following steps:
import torch
def init_coeffs(n_hidden=20):
wt_hidden = (torch.rand(trainingIn.shape[1], n_hidden) - 0.5) / n_hidden
wt_output = torch.rand(n_hidden, 1) - 0.3
bias_output = torch.rand(1)[0]
return wt_hidden.requires_grad_(), wt_output.requires_grad_(), bias_output.requires_grad_()
The key points in this initialization are:
Note that we set requires_grad
on weights and biases during initialization. This is a crucial step, as it informs PyTorch to track and compute gradients for these parameters during the subsequent forward and backward passes. When the loss is calculated as a function of weights and biases, PyTorch automatically computes the gradients of the loss with respect to these parameters and stores them for gradient descent optimization.
Forward Pass
Next, we define the function calc_preds
to perform the forward pass of the neural network:
import torch.nn.functional as F
def calc_preds(coeffs, indeps):
wt_hid, wt_out, bias = coeffs
hidden_layer_output = F.relu(indeps @ wt_hid)
output = torch.sigmoid(hidden_layer_output @ wt_out + bias)
return output
In this function:
The use of non-linearity is key, Without it, the linear layers are equivalent to a single layer. More importantly, the superposition of non-linearities is what gives the neural network the property of being a universal function approximator. We have chosen ReLU for hidden layer and sigmoid of the output layer, enabling the interpretation of the output as a likelihood score.
Loss Calculation
We calculate the loss using the mean absolute error (MAE) in the calc_loss
function:
def calc_loss(coeffs, indeps, deps):
predictions = calc_preds(coeffs, indeps)
loss = torch.abs(predictions - deps).mean()
return loss
Notice that the loss is a function of the weights and biases. By setting requires_grad
on these parameters, we inform PyTorch that we are interested in computing the gradients of the loss with respect to these parameters for the purpose of optimization.
Training the Model
To train the neural network, we define the training process using the train_model
function:
def train_model(epochs=30, lr=0.1):
torch.manual_seed(442)
coeffs = init_coeffs()
for i in range(epochs):
execute_epoch(coeffs=coeffs, lr=lr)
return coeffs
The train_model
function:
execute_epoch
for each epoch to update the coefficients.Executing an Epoch
The execute epoch
function calculates the loss using calc_loss
and propagates the gradients using update_coeffs
as follows:
def execute_epoch(coeffs, lr):
loss = calc_loss(coeffs, trainingIn, trainingOut)
loss.backward()
with torch.no_grad():
update_coeffs(coeffs, lr)
print(f'{loss:.3f}', end='; ')
When we call backward
on the loss, PyTorch automatically calculates gradients for all the parameters that contribute to the loss and have requires_grad
set. These gradients are stored with the respective parameters and can be accessed using the grad
attribute.
Updating Coefficients
The update_coeffs
function is used to update the coefficients using gradient descent as follows:
def update_coeffs(coeffs, lr):
for layer in coeffs:
layer.sub_(layer.grad * lr)
layer.grad.zero_()
Note that PyTorch accumulates gradients unless these are reset to zero between successive steps. That is why we have zero_
once the gradients are used to update weights and biases.
Running the Training
Finally, we run the training with different learning rates and for varying numbers of epochs:
coeffs = train_model(lr=1.4) # Example 1
coeffs = train_model(lr=20) # Example 2
coeffs = train_model(epochs=100, lr=10) # Example 3
You can observe how the loss changes during training and evaluate the model’s accuracy based on your dataset.
Model Accuracy
Optionally, we can implement a function model_accuracy(coeffs)
, to evaluate the accuracy of the trained model on the validation dataset.
def model_accuracy(coeffs): return (validationOut.bool() == (calc_preds(coeffs, validationIn) > 0.5)).float().mean()
That’s it! We now have a simple neural network implemented from scratch in PyTorch for binary classification. We can customize the architecture, hyperparameters, and activation functions to suit our specific problem.
train_model()
wrapper that requires data cuts trainingIn
and trainingOut
in the environment. The steps are as follows:train_model(epochs=30, lr=0.1)
: This function acts as the outer wrapper of our training process. It requires access to the training data, trainingIn
and trainingOut
, which should be defined in the environment. train_model
orchestrates the training process by calling the execute_epoch
function for a specified number of epochs.execute_epoch(coeffs, lr)
: Serving as the inner wrapper, this function carries out one complete training epoch. It takes the current coefficients (weights and biases) and a learning rate as input. Within an epoch, it calculates the loss and updates the coefficients. To estimate the loss, it calls calc_loss
, which compares the predicted output generated by calc_preds
with the target output. After this, execute_epoch
performs a backward pass to compute the gradients of the loss, storing these gradients in the grad
attribute of each coefficient tensor.calc_loss(coeffs, indeps, deps)
: This function calculates the loss using the given coefficients, input predictors indeps
, and target output deps
. It relies on calc_preds
to obtain the predicted output, which is then compared to the target output to compute the loss. The backward pass is subsequently invoked to compute the gradients, which are stored within the grad
attribute of the coefficient tensors for further optimization.calc_preds(coeffs, indeps)
: Responsible for computing the predicted output based on the given coefficients and input predictors indeps
. This function follows the forward pass logic and applies activation functions where necessary to produce the output.update_coeffs(coeffs, lr)
: This function plays a pivotal role in updating the coefficients. It iterates through the coefficient tensors, applying gradient descent with the specified learning rate lr
. After each update, it resets the gradients to zero using the zero_
function, ensuring the gradients are fresh for the next iteration.init_coeffs(n_hidden=20)
: The initialization function is responsible for setting up the initial coefficients. It shapes each coefficient tensor based on the number of neurons specified for the sole hidden layer.model_accuracy(coeffs)
: An optional function that evaluates the prediction accuracy on the validation set, providing insights into how well the trained model generalizes to unseen data.While we have demonstrated steepest gradient with a simple neural network, we can extend this implementation to a deep learning model by adding more hidden layers. All we need to do is refactor the code keeping the same set of 6 functions and their interfaces. Following the approach presented here, we can create a versatile and scalable neural network architecture tailored to specific requirements.
Are you eager to dive deeper into the world of deep learning and further enhance your skills?Consider joining our coaching class in deep learning with FastAI. Our class is designed to provide hands-on experience and in-depth knowledge of cutting-edge deep learning techniques. Whether you’re a beginner or an experienced practitioner, we offer tailored guidance to help you master the intricacies of deep learning and empower you to tackle complex projects with confidence. Join us on this exciting journey to unlock the full potential of artificial intelligence and neural networks.