Python

November 3 2023

Building a Deep Learning Model From Scratch in PyTorch

sanjaybhatikar FastAI, Python, PyTorch Artificial Intelligence, Backpropagation, Deep Learming, Gradient Descent, Neural Network, Python, Steepest Descent 0

Background A neural network in 6 steps

In Building a Simple Neural Network From Scratch in PyTorch, we described a recipe with 6 functions as follows:

train_model(epochs=30, lr=0.1): This function acts as the outer wrapper of our training process. It requires access to the training data, trainingIn and trainingOut, which should be defined in the environment. train_model orchestrates the training process by calling the execute_epoch function for a specified number of epochs.
execute_epoch(coeffs, lr): Serving as the inner wrapper, this function carries out one complete training epoch. It takes the current coefficients (weights and biases) and a learning rate as input. Within an epoch, it calculates the loss and updates the coefficients. To estimate the loss, it calls calc_loss, which compares the predicted output generated by calc_preds with the target output. After this, execute_epoch performs a backward pass to compute the gradients of the loss, storing these gradients in the grad attribute of each coefficient tensor.
calc_loss(coeffs, indeps, deps): This function calculates the loss using the given coefficients, input predictors indeps, and target output deps. It relies on calc_preds to obtain the predicted output, which is then compared to the target output to compute the loss. The backward pass is subsequently invoked to compute the gradients, which are stored within the grad attribute of the coefficient tensors for further optimization.
calc_preds(coeffs, indeps): Responsible for computing the predicted output based on the given coefficients and input predictors indeps. This function follows the forward pass logic and applies activation functions where necessary to produce the output.
update_coeffs(coeffs, lr): This function plays a pivotal role in updating the coefficients. It iterates through the coefficient tensors, applying gradient descent with the specified learning rate lr. After each update, it resets the gradients to zero using the zero_ function, ensuring the gradients are fresh for the next iteration.
init_coeffs(n_hidden=20): The initialization function is responsible for setting up the initial coefficients. It shapes each coefficient tensor based on the number of neurons specified for the sole hidden layer.
model_accuracy(coeffs): An optional function that evaluates the prediction accuracy on the validation set, providing insights into how well the trained model generalizes to unseen data.

In this blog post, we’ll take a deep dive into constructing a powerful deep learning neural network from the ground up using PyTorch. Building upon the foundations of the previous simple neural network, we’ll refactor some of these functions for deep learning.

Deep Learning Refactor code for multiple hidden layers

Initializing Weights and Biases

To prepare our neural network for deep learning, we’ve revamped the weight and bias initialization process. The init_coeffs function now allows for specifying the number of neurons in each hidden layer, making it flexible for different network configurations. We generate weight matrices and bias vectors for each layer while ensuring they are equipped to handle the deep learning challenges.

	def init_coeffs(hiddens=[10, 10]):
    sizes = [trainingIn.shape[1]] + hiddens + [1]
    n = len(sizes)
    weights = [(torch.rand(sizes[i], sizes[i+1]) - 0.3) / sizes[i+1] * 4 for i in range(n-1)]  # Weight initialization
    biases = [(torch.rand(1)[0] - 0.5) * 0.1 for i in range(n-1)]  # Bias initialization
    for wt in weights: wt.requires_grad_()
    for bs in biases: bs.requires_grad_()
    return weights, biases

We define the architecture’s structure using sizes, where hiddens specifies the number of neurons in each hidden layer. We ensure that weight and bias initialization is suitable for deep networks.

Forward Propagation With Multiple Hidden Layers

Our revamped calc_preds function accommodates multiple hidden layers in the network. It iterates through the layers, applying weight matrices and biases at each step and introducing non-linearity using the ReLU activation function in the hidden layers and the sigmoid activation in the output layer. This enables our deep learning network to capture complex patterns in the data.

	def calc_preds(coeffs, indeps):
    weights, biases = coeffs
    res = indeps
    n = len(weights)
    for i, wt in enumerate(weights):
        res = res @ wt + biases[i]
        if (i != n-1):
            res = F.relu(res)  # Apply ReLU activation in hidden layers
    return torch.sigmoid(res)  # Sigmoid activation in the output layer

Note that weights is now a list of tensors containing layer-wise weights and correspondingly, biases is the the list of tensors containing layer-wise biases.

Backward Propagation With Multiple Hidden Layers

Loss calculation and gradient descent remain consistent with the simple neural network implementation. We use the mean absolute error (MAE) for loss as before and tweak the update_coeffs function to apply gradient descent to update the weights and biases in each hidden layer.

	def update_coeffs(coeffs, lr):
  	weights, biases = coeffs
  	for layer in weights+biases:
    		layer.sub_(layer.grad * lr)
    		layer.grad.zero_()

Putting It All Together in Wrapper Functions

Our train_model function can be used ‘as is’ to orchestrate the raining process using the execute_epoch wrapper function to help as before. The model_accuracy function also does not change.

Summary Conclusion and Takeaways

With these modifications, we’ve refactored our simple neural network into a deep learning model that has greater capacity for learning. The beauty of it is we have retained the same set of functions and interfaces that we implemented in a simple neural network, refactoring the code to scale with multiple hidden layers.

train_model(epochs=30, lr=0.1): No change!
execute_epoch(coeffs, lr): No change!
calc_loss(coeffs, indeps, deps): No change!
calc_preds(coeffs, indeps): Tweak to use the set of weights and corresponding set of biases in each hidden layer, iterating over all layers from input to output.
update_coeffs(coeffs, lr): Tweak to iterate over the set of weights and accompanying set of biases in each layer.
init_coeffs(hiddens=[10, 10]): Tweak for compatibility with an architecture that can potentially have any number of hidden layers of any size.
model_accuracy(coeffs): No change!

Such a deep learning model has greater capacity for learning. However, it is is more hungry for training data! In subsequent posts, we will examine the breakthroughs that have made it possible to make deep learning models practically feasible and reliable. These include advancements such as:

Batch Normalization
Residual Connections
Dropouts

Are you eager to dive deeper into the world of deep learning and further enhance your skills?Consider joining our coaching class in deep learning with FastAI. Our class is designed to provide hands-on experience and in-depth knowledge of cutting-edge deep learning techniques. Whether you’re a beginner or an experienced practitioner, we offer tailored guidance to help you master the intricacies of deep learning and empower you to tackle complex projects with confidence. Join us on this exciting journey to unlock the full potential of artificial intelligence and neural networks.

November 3 2023

Building a Simple Neural Network From Scratch in PyTorch

sanjaybhatikar FastAI Artificial Intelligence, Neural Network, Python, Pytorch 0

Background Structure and Training Data

In this blog post, we will walk you through the process of creating a simple neural network from scratch in PyTorch for binary classification. We will implement a neural network with one hidden layer containing multiple neurons followed by a single output neuron. We will also discuss the design choices made for this network, including the use of ReLU activation in the hidden layer and sigmoid activation in the output layer.

Neural Network Architecture

The architecture of our simple neural network can be summarized as follows:

Input Layer
Hidden Layer with n neurons and ReLU activation.
Output Layer with a single neuron and sigmoid activation.

This structure allows us to demonstrate the gradient descent algorithm in PyTorch with multiple iterations of two steps as follows:

Forward-propagate inputs to generate outputs and compute loss
Backward-propagate loss by computing gradients and applying them to update model parameters.

We show how PyTorch uses tensors to parallelize operations for efficiency.

Training Data

It is customary to split the available data into three distinct sets: training, validation, and testing. These sets serve specific roles in the model development process.

Training Data: The training set is the largest portion of the data and is primarily used for training the model. During training, the gradients are computed on this data to update the weights and biases iteratively, allowing the model to learn from the provided examples.
Validation Data: The validation set is essential for assessing the model’s performance during training. It is not used for gradient computation but serves as a means to measure the loss. This monitoring helps prevent overfitting, a scenario where the model memorizes the training data rather than generalizing from it. Adjustments to the model can be made based on the validation loss.
Test Data: The test set is a reserved subset and should be used sparingly. It comes into play only after the model has completed its training phase. It serves the purpose of evaluating the model’s generalization performance on unseen data and reporting the final results. It ensures that the model can make accurate predictions on new, previously unseen examples, thus providing a reliable measure of its effectiveness.

This partitioning strategy allows for rigorous model assessment and ensures that the model’s performance is accurately estimated on data it has not encountered during training or validation. Before running the code, ensure that trainingIn and trainingOut are defined as global variables. These are represented as tables where rows correspond to individual examples, and each column represents a specific field or feature.

trainingIn contains the independent variables and has the shape (#examples x #variables), where #examples is the number of data points or examples in our training dataset and #variables is the number of independent variables or features.
trainingOut contains the dependent variable and has the shape (#examples x 1), where #examples is the same as in trainingIn

Likewise, we’d want the validationIn and validationOut sets as global variables.

Backpropagation Apply gradient descent for training

Initializing Weights and Biases

We start by defining the initialization function init_coeffs to set up the initial weights and biases for the neural network. The initialization process includes the following steps:

	import torch

def init_coeffs(n_hidden=20):
    wt_hidden = (torch.rand(trainingIn.shape[1], n_hidden) - 0.5) / n_hidden
    wt_output = torch.rand(n_hidden, 1) - 0.3
    bias_output = torch.rand(1)[0]
    return wt_hidden.requires_grad_(), wt_output.requires_grad_(), bias_output.requires_grad_()

The key points in this initialization are:

We divide the weights in the hidden layer by the number of hidden neurons to help with convergence.
We introduce a bias for the output layer.

Note that we set requires_grad on weights and biases during initialization. This is a crucial step, as it informs PyTorch to track and compute gradients for these parameters during the subsequent forward and backward passes. When the loss is calculated as a function of weights and biases, PyTorch automatically computes the gradients of the loss with respect to these parameters and stores them for gradient descent optimization.

Forward Pass

Next, we define the function calc_preds to perform the forward pass of the neural network:

	import torch.nn.functional as F

def calc_preds(coeffs, indeps):
    wt_hid, wt_out, bias = coeffs
    hidden_layer_output = F.relu(indeps @ wt_hid)
    output = torch.sigmoid(hidden_layer_output @ wt_out + bias)
    return output

In this function:

We use the ReLU activation in the hidden layer.
We use the sigmoid activation in the output layer.

The use of non-linearity is key, Without it, the linear layers are equivalent to a single layer. More importantly, the superposition of non-linearities is what gives the neural network the property of being a universal function approximator. We have chosen ReLU for hidden layer and sigmoid of the output layer, enabling the interpretation of the output as a likelihood score.

Loss Calculation

We calculate the loss using the mean absolute error (MAE) in the calc_loss function:

	def calc_loss(coeffs, indeps, deps):
    predictions = calc_preds(coeffs, indeps)
    loss = torch.abs(predictions - deps).mean()
    return loss

Notice that the loss is a function of the weights and biases. By setting requires_grad on these parameters, we inform PyTorch that we are interested in computing the gradients of the loss with respect to these parameters for the purpose of optimization.

Training the Model
To train the neural network, we define the training process using the train_model function:

	def train_model(epochs=30, lr=0.1):
    torch.manual_seed(442)
    coeffs = init_coeffs()
    for i in range(epochs):
        execute_epoch(coeffs=coeffs, lr=lr)
    return coeffs

The train_model function:

Initializes the coefficients.
Iterates through a specified number of epochs.
Calls execute_epoch for each epoch to update the coefficients.

Executing an Epoch

The execute epoch function calculates the loss using calc_loss and propagates the gradients using update_coeffs as follows:

	def execute_epoch(coeffs, lr):
    loss = calc_loss(coeffs, trainingIn, trainingOut)
    loss.backward()
    with torch.no_grad():
        update_coeffs(coeffs, lr)
    print(f'{loss:.3f}', end='; ')

When we call backward on the loss, PyTorch automatically calculates gradients for all the parameters that contribute to the loss and have requires_grad set. These gradients are stored with the respective parameters and can be accessed using the grad attribute.

Updating Coefficients

The update_coeffs function is used to update the coefficients using gradient descent as follows:

	def update_coeffs(coeffs, lr):
    for layer in coeffs:
        layer.sub_(layer.grad * lr)
        layer.grad.zero_()

Note that PyTorch accumulates gradients unless these are reset to zero between successive steps. That is why we have zero_ once the gradients are used to update weights and biases.

Running the Training

Finally, we run the training with different learning rates and for varying numbers of epochs:

	coeffs = train_model(lr=1.4)  # Example 1
coeffs = train_model(lr=20)   # Example 2
coeffs = train_model(epochs=100, lr=10)  # Example 3

You can observe how the loss changes during training and evaluate the model’s accuracy based on your dataset.

Model Accuracy

Optionally, we can implement a function model_accuracy(coeffs), to evaluate the accuracy of the trained model on the validation dataset.

	def model_accuracy(coeffs): return (validationOut.bool() == (calc_preds(coeffs, validationIn) > 0.5)).float().mean()

That’s it! We now have a simple neural network implemented from scratch in PyTorch for binary classification. We can customize the architecture, hyperparameters, and activation functions to suit our specific problem.

Summary Conclusion and Takeaways

We split the dataset into subsets for training and validaton. We then wrote a series of functions to parcel out the code for each step in the training process, culminating in the train_model() wrapper that requires data cuts trainingIn and trainingOut in the environment. The steps are as follows:

train_model(epochs=30, lr=0.1): This function acts as the outer wrapper of our training process. It requires access to the training data, trainingIn and trainingOut, which should be defined in the environment. train_model orchestrates the training process by calling the execute_epoch function for a specified number of epochs.
execute_epoch(coeffs, lr): Serving as the inner wrapper, this function carries out one complete training epoch. It takes the current coefficients (weights and biases) and a learning rate as input. Within an epoch, it calculates the loss and updates the coefficients. To estimate the loss, it calls calc_loss, which compares the predicted output generated by calc_preds with the target output. After this, execute_epoch performs a backward pass to compute the gradients of the loss, storing these gradients in the grad attribute of each coefficient tensor.
calc_loss(coeffs, indeps, deps): This function calculates the loss using the given coefficients, input predictors indeps, and target output deps. It relies on calc_preds to obtain the predicted output, which is then compared to the target output to compute the loss. The backward pass is subsequently invoked to compute the gradients, which are stored within the grad attribute of the coefficient tensors for further optimization.
calc_preds(coeffs, indeps): Responsible for computing the predicted output based on the given coefficients and input predictors indeps. This function follows the forward pass logic and applies activation functions where necessary to produce the output.
update_coeffs(coeffs, lr): This function plays a pivotal role in updating the coefficients. It iterates through the coefficient tensors, applying gradient descent with the specified learning rate lr. After each update, it resets the gradients to zero using the zero_ function, ensuring the gradients are fresh for the next iteration.
init_coeffs(n_hidden=20): The initialization function is responsible for setting up the initial coefficients. It shapes each coefficient tensor based on the number of neurons specified for the sole hidden layer.
model_accuracy(coeffs): An optional function that evaluates the prediction accuracy on the validation set, providing insights into how well the trained model generalizes to unseen data.

While we have demonstrated steepest gradient with a simple neural network, we can extend this implementation to a deep learning model by adding more hidden layers. All we need to do is refactor the code keeping the same set of 6 functions and their interfaces. Following the approach presented here, we can create a versatile and scalable neural network architecture tailored to specific requirements.

Happy coding!

September 19 2023

Unleashing the Power of CNNs on Non-Image Data: A Creative Twist

sanjaybhatikar FastAI Artificial Intelligence, Computer Vision, Convolutional Neural Networks, Deep Learning, FastAI, Heart Murmurs, Medical Diagnostics, Python 0

Deep learning models have proven their prowess in tasks ranging from identifying objects in images to recognizing handwriting. But what if your data doesn’t come in the form of images? Can you still harness the incredible power of Convolutional Neural Networks (CNNs)? The answer is a resounding “yes,” and today, we’ll explore just how to do that with a captivating example involving heart sounds.

Heart Sounds: Unveiling the Dual-Domain Magic

Heart sounds are typically recorded and can be examined in two fundamental ways: the time domain or the spectral domain. In the time domain, we track how the sound evolves over time, while the spectral domain delves into the sound’s frequency components. Each of these domains reveals a piece of the puzzle, but it’s when we dive into the realm of Wavelet Analysis that the real magic happens.

Wavelet Analysis: Where Time and Frequency Converge

Wavelet Analysis allows us to explore both time and frequency domains simultaneously. Instead of being limited to just one dimension, it combines information from both dimensions, enriching our data with a wealth of details beyond what we can obtain from either time or frequency alone. It’s like putting on 3D glasses for data analysis.

From Dual-Domain Data to Heat-Map Images

Now, here’s where it gets truly fascinating. This dual-domain representation lends itself beautifully to the creation of heat-map images. These images showcase how different frequencies play out over time, resembling a dynamic canvas of information. And guess what? These heat-map images are precisely what we need to tap into the world of CNNs.

CNNs: Ready to Work Their Magic

While CNNs are renowned for their image-processing abilities, they can effortlessly handle these heat-map images derived from heart sounds. There’s no need to reinvent the wheel or build a new model from scratch. With their established architectures like LeNet, AlexNet, GoogLeNet, and ResNet, CNNs become our partners in diagnosing heart defects from murmurs, all thanks to a little creative thinking.

Signal-to-Noise Ratio: A Cautionary Note

Of course, we should exercise caution. Not every image representation is equally informative. Maintaining a high signal-to-noise ratio is critical. We don’t want to obscure our diagnostic insights with unnecessary noise.

In Conclusion: Creativity Meets Cutting-Edge Technology

In the world of deep learning, innovation knows no bounds. Even when dealing with non-image data like heart sounds, we can creatively adapt CNNs to our advantage. No need to start from scratch; we can convert our data into an image format and let CNNs work their magic. This approach opens up exciting avenues for enhancing medical diagnosis and treatment. So, remember, with a little ingenuity, non-image data can also find its place in the world of CNNs.

Our FastAI coaching program at Craft With Code is your passport to mastery of deep learning. Dive into model building, explore real-world data, and transform imaginative concepts into practical solutions. Our hands-on approach guarantees the confidence to tackle a variety of data challenges, with expert guidance from seasoned instructors. Don’t miss out on the opportunity to unlock a world of possibilities—enroll today!”

September 9 2023

Remote Work Monitoring: Ethics in the cat-and-mouse game of surveillance and tech-savvy workers”

sanjaybhatikar Python Automation, Coding, Ethics, Python, Surveillance, Work From Home 0

In today’s increasingly digital world, the concept of remote work has become more prevalent than ever before. As the COVID-19 pandemic pushed many companies to adopt remote work policies, both employers and employees faced new challenges. One of these challenges was finding effective ways to monitor productivity and ensure a fair evaluation of remote workers. However, an Australian woman’s recent dismissal for low keystroke activity raises important ethical questions about remote work monitoring.

In the case at hand, an Australian remote worker was terminated from her job due to low keystroke activity detected by monitoring software. While monitoring employee productivity is a legitimate concern for many employers, it’s essential to strike a balance between tracking work performance and respecting the privacy and dignity of remote workers.

The primary issue with using keystroke activity as a metric for productivity is that it oversimplifies the complex nature of many remote job roles. Quality of output, rather than quantity of keystrokes, should be the primary measure of an employee’s performance. The nature of the work, job requirements, and individual work styles should all be taken into account when evaluating remote workers.

Moreover, relying solely on keystroke activity can create a hostile work environment, where employees feel they are constantly under surveillance. This can lead to stress, anxiety, and a decrease in job satisfaction, ultimately impacting productivity in a negative way.

But what happens when tech-savvy employees feel their autonomy is unjustly suppressed? This is where we must acknowledge that necessity is the mother of invention. A resourceful and tech-savvy employee with coding skills could easily devise a Python script to game the system.

	import pyautogui
import time
import random

while True:
    # Move the mouse cursor to a random position
    x, y = random.randint(0, 1920), random.randint(0, 1080)
    pyautogui.moveTo(x, y)
    time.sleep(random.randint(30, 300))  # Wait for a random time interval

This script simulates mouse activity by moving the cursor to random positions on the screen at irregular intervals, giving the impression that the employee is actively engaged at their desk.

However, it’s crucial to emphasize that resorting to such tactics is not a sustainable or ethically recommended approach. Instead, this serves as a reminder of how easily knowledgeable individuals can circumvent monitoring systems. It underlines the importance of fostering a workplace culture built on trust and open communication rather than relying on Orwellian surveillance methods.

In conclusion, remote work offers numerous benefits for both employees and employers, but it also presents challenges when it comes to monitoring productivity. The case of the Australian remote worker highlights the need for a balanced approach that considers ethical concerns and respects the dignity of remote employees. Trust, communication, and a focus on results can go a long way in ensuring the success of remote work arrangements while maintaining a healthy and ethical work environment.

Ready to boost your digital fluency and embark on an exciting journey of building innovative apps? Explore our courses designed to empower you with the skills you need while making learning enjoyable. Join us today and unlock your potential in the world of tech!

September 8 2023

Groove with Gradio or Jive with FastAPI

sanjaybhatikar Blog, FastAI Artificial Intelligence, FastAI, FastAPI, Gradio, Python, Streamlit 0

Gradio’s speed, a digital cruise,

Swift and simple, sparks a muse!

In no time flat, your groove you’ll prove,

With Gradio in your corner, that’s the move.

FastAPI’s power, it rocks the stage,

Shape your endpoints like a sage.

Flexibility is the game it plays,

In countless ways, it’ll amaze for always.

Streamlit’s charm, a sweet compromise,

Stands up your app through day and night’s rise.

It don’t take much coding might, no lies,

Streamlit’s got you covered, that’s so right.

Now, which path will you choose, tech bro?

Take a moment, let your thoughts flow.

Don’t be shy, eat your dogfood, let your app grow,

With these allies, you’ll architect with the know-how.

In our FastAI coaching program, we go beyond the realm of creation. We equip you with the skills not only to build astounding AI applications but also to deploy apps quickly. Imagine the satisfaction of putting your creations directly into the hands of eager users or showcasing them to impress friends and family, or persuade potential employers or investors. You’ll learn to deploy AI in swift and efficient ways, ensuring your innovations make an impact on the world. So, if you’re ready to not just craft AI marvels but get them out there for the world to see, reach out to us on WhatsApp, and let’s embark on this transformative journey together. Your AI-powered future awaits!

September 8 2023

How many lines of code does it take to train a deep learning model?

sanjaybhatikar Blog, FastAI AI, Artificial Intelligence, Coding, Computer Vision, Deep Learning, FastAI, Python 0

Six, if you know which ones.

Check out the secret sauce for training a savvy deep learning model to spot pets in pictures:

	from fastai.vision.all import *
path = untar_data(URLs.PETS)
dls = ImageDataLoaders.from_name_re(path, get_image_files(path/'images'), pat='(.+)_\d+.jpg', item_tfms=Resize(460), batch_tfms=aug_transforms(size=224, min_scale=0.75))
learn = vision_learner(dls, models.resnet50, metrics=accuracy)
learn.fine_tune(1)
learn.path = Path('.')
learn.export()

When I mentor AI novices, they often do a double-take at how little code it actually takes to kickstart a formidable deep learning machine. People have these wild notions about deep learning, thinking it’s all about:

A boatload of complex math
An ocean of data
A wallet-draining supercomputer setup

But here’s the reality check:

Basic high school math will do the trick
We can make do with just a pinch of data
And guess what? Most of the heavy lifting can be done on freebie computing resources

Get ready to craft some mind-blowing AI apps in our FastAI coaching program. We’ll arm you with the skills to build top-notch deep learning models. All you need is a minimum of one year’s experience in Python programming. Shoot us a message on WhatsApp now to snag your spot!