## Building a Feedforward Neural Network from Scratch in Python In the coding section, we will be covering the following topics.

1. Generate data that is not linearly separable
2. Train with Sigmoid Neuron and see performance
3. Write from scratch our first feedforward network
4. Train the FF network on the data and compare with Sigmoid Neuron
5. Write a generic class for a FF network
6. Train generic class on binary classification
7. Train a FF network for multi-class data using a cross-entropy loss function

If you want to skip the theory part and get into the code right away,

Before we start building our network, first we need to import the required libraries. We are importing the `numpy` to evaluate the matrix multiplication and dot product between two vectors, `matplotlib` to visualize the data and from the`sklearn` package we are importing functions to generate data and evaluate the network performance.

### Generate Dummy Data

Remember that we are using feedforward neural networks because we wanted to deal with non-linearly separable data. In this section, we will see how to randomly generate non-linearly separable data.

To generate data randomly we will use `make_blobs` to generate blobs of points with a Gaussian distribution. I have generated 1000 data points in 2D space with four blobs `centers=4` as a multi-class classification prediction problem. Each data point has two inputs and 0, 1, 2 or 3 class labels. The code present in Line 9, 10 helps to visualize the data using a scatter plot. We can see that they are 4 centers present and the data is linearly separable (almost).

In the above plot, I was able to represent 3 Dimensions — 2 Inputs and class labels as colors using a simple scatter plot. Note that `make_blobs()` function will generate linearly separable data, but we need to have non-linearly separable data for binary classification.

`labels_orig = labelslabels = np.mod(labels_orig, 2)`

One way to convert the 4 classes to binary classification is to take the remainder of these 4 classes when they are divided by 2 so that I can get the new labels as 0 and 1.

From the plot, we can see that the centers of blobs are merged such that we now have a binary classification problem where the decision boundary is not linear. Once we have our data ready, I have used the `train_test_split` function to split the data for `training` and `validation` in the ratio of 90:10

### Train with Sigmoid Neuron

Before we start training the data on the sigmoid neuron, We will build our model inside a class called SigmoidNeuron.

In the class `SigmoidNeuron` we have 9 functions, I will walk you through these functions one by one and explain what they are doing.

`def __init__(self):    self.w = None    self.b = None`

The `__init__` function (constructor function) helps to initialize the parameters of sigmoid neuron w weights and b biases to None.

`#forward pass    def perceptron(self, x):  return np.dot(x, self.w.T) + self.b`
`def sigmoid(self, x):    return 1.0/(1.0 + np.exp(-x))`

Next, we will define two functions `perceptron` and `sigmoid` which characterizes the forward pass. In case of a sigmoid neuron forward pass involves two steps

1. `perceptron` — Computes the dot product between the input x & weights w and adds bias b
2. `sigmoid` — Takes the output of perceptron and applies the sigmoid (logistic) function on top of it.
`#updating the gradients using mean squared error loss  def grad_w_mse(self, x, y):.....`
`def grad_b_mse(self, x, y):.....`
`#updating the gradients using cross entropy loss  def grad_w_ce(self, x, y):.....`
`def grad_b_ce(self, x, y):    .....`

The next four functions characterize the gradient computation. I have written two separate functions for updating weights w and biases b using mean squared error loss and cross-entropy loss.

`def fit(self, X, Y, epochs=1, learning_rate=1, initialise=True, loss_fn="mse", display_loss=False):.....return`

Next, we define ‘fit’ method that accepts a few parameters,

`X` — Inputs

`Y` — Labels

`epochs` — Number of epochs we will allow our algorithm through iterate on the data, default value set to 1

`learning_rate` — The magnitude of change for our weights during each step through our training data, default value set to 1

`intialise` — To randomly initialize the parameters of the model or not. If it is set to `True` weights will be initialized, you can set it to `False` if you want to retrain the trained model.

`loss_fn` — To select the loss function for the algorithm to update the parameters. It can be “mse” or “ce”

`display_loss` — Boolean Variable indicating whether to show the decrease of loss for each epoch

In the `fit` method, we go through the data passed through parameters X and Y and compute the update values for the parameters either using mean squared loss or cross entropy loss. Once we the update value we go and update the weights and bias terms (Line 49–62).

`def predict(self, X):`

Now we define our predict function takes inputs `X` as an argument, which it expects to be an `numpy` array. In the predict function, we will compute the forward pass of each input with the trained model and send back a numpy `array` which contains the predicted value of each input data.

Now we will train our data on the sigmoid neuron which we created. First, we instantiate the Sigmoid Neuron Class and then call the `fit` method on the training data with 1000 epochs and learning rate set to 1 (These values are arbitrary not the optimal values for this data, you can play around these values and find the best number of epochs and the learning rate). By default, the loss function is set to mean square error loss but you can change it to cross entropy loss as well.

As you can see that loss of the Sigmoid Neuron is decreasing but there is a lot of oscillations may be because of the large learning rate. You can decrease the learning rate and check the loss variation. Once we trained the model, we can make predictions on the testing data and binarise those predictions by taking 0.5 as the threshold. We can compute the training and validation accuracy of the model to evaluate the performance of the model and check for any scope of improvement by changing the number of epochs or learning rate.

`#visualizing the resultsplt.scatter(X_train[:,0], X_train[:,1], c=Y_pred_binarised_train, cmap=my_cmap, s=15*(np.abs(Y_pred_binarised_train-Y_train)+.2))plt.show()`

To know which of the data points that the model is predicting correctly or not for each point in the training set. we will use the scatter plot function from `matplotlib.pyplot`. The function takes two inputs as the first and second features, for the color I have used `Y_pred_binarised_train` and defined a custom ‘cmap’ for visualization. As you can see that the size of each point is different in the below plot.

The size of each point in the plot is given by a formula,

`s=15*(np.abs(Y_pred_binarised_train-Y_train)+.2)`

The formula takes the absolute difference between the predicted value and the actual value.

• If the ground truth is equal to the predicted value then size = 3
• If the ground truth is not equal to the predicted value the size = 18

All the small points in the plot indicate that the model is predicting those observations correctly and large points indicate that those observations are incorrectly classified.

In this plot, we are able to represent 4 Dimensions — Two input features, color to indicate different labels and size of the point indicates whether it is predicted correctly or not. The important note from the plot is that sigmoid neuron is not able to handle the non-linearly separable data.

If you want to learn sigmoid neuron learning algorithm in detail with math check out my previous post.

### Write First Feedforward Neural Network

In this section, we will take a very simple feedforward neural network and build it from scratch in python. The network has three neurons in total — two in the first hidden layer and one in the output layer. For each of these neurons, pre-activation is represented by ‘a’ and post-activation is represented by ‘h’. In the network, we have a total of 9 parameters — 6 weight parameters and 3 bias terms.

Similar to the Sigmoid Neuron implementation, we will write our neural network in a class called FirstFFNetwork.

In the class `FirstFFNetwork`we have 6 functions, we will go over these functions one by one.

`def __init__(self):    .....`

The `__init__` function initializes all the parameters of the network including weights and biases. Unlike the sigmoid neuron where we have only two parameters in the neural network, we have 9 parameters to be initialized. All the 6 weights are initialized randomly and 3 biases are set to zero.

`def sigmoid(self, x):    return 1.0/(1.0 + np.exp(-x))`

Next, we define the sigmoid function used for post-activation for each of the neurons in the network.

`def forward_pass(self, x):    #forward pass - preactivation and activation    self.x1, self.x2 = x    self.a1 = self.w1*self.x1 + self.w2*self.x2 + self.b1    self.h1 = self.sigmoid(self.a1)    self.a2 = self.w3*self.x1 + self.w4*self.x2 + self.b2    self.h2 = self.sigmoid(self.a2)   self.a3 = self.w5*self.h1 + self.w6*self.h2 + self.b3    self.h3 = self.sigmoid(self.a3)    return self.h3`

Now we have the forward pass function, which takes an input x and computes the output. First, I have initialized two local variables and equated to input x which has 2 features.

For each of these 3 neurons, two things will happen,

Pre-activation represented by ‘a’: It is a weighted sum of inputs plus the bias.

Activation represented by ‘h’: Activation function is Sigmoid function.

The pre-activation for the first neuron is given by,

`a₁ = w₁ * x₁ + w₂ * x₂ + b₁`

To get the post-activation value for the first neuron we simply apply the logistic function to the output of pre-activation a₁.

`h₁ = sigmoid(a₁)`

Repeat the same process for the second neuron to get a₂ and h₂.

The outputs of the two neurons present in the first hidden layer will act as the input to the third neuron. The pre-activation for the third neuron is given by,

`a₃ = w₅ * h₁ + w₆ * h₂ + b₃`

and applying the sigmoid on a₃ will give the final predicted output.

`def grad(self, x, y):    #back propagation......`

Next, we have the `grad` function which takes inputs x and y as arguments and computes the forward pass. Based on the forward pass it computes the partial derivates of these weights with respect to the loss function, which is mean squared error loss in this case.

Note: In this post, I am not explaining how do we arrive at these partial derivatives for the parameters. Just consider this function as a black box for now, in my next article I will explain how do we compute these partial derivatives in backpropagation.

`def fit(self, X, Y, epochs=1, learning_rate=1, initialise=True, display_loss=False):......`

Then, we have the `fit` function similar to the sigmoid neuron. In this function, we iterate through each data point, compute the partial derivates by calling the `grad` function and store those values in a new variable for each parameter (Line 63–75). Then, we go ahead and update the values of all the parameters (Line 77–87). We also have the `display_loss` condition, if set to `True` it will display the plot of network loss variation across all the epochs.

`def predict(self, X):    #predicting the results on unseen data.....`

Finally, we have the predict function that takes a large set of values as inputs and compute the predicted value for each input by calling the `forward_pass` function on each of the input.

### Train the FF network on the data

We will now train our data on the Feedforward network which we created. First, we instantiate the `FirstFFNetwork` Class and then call the `fit` method on the training data with 2000 epochs and learning rate set to 0.01.

`#visualize the predictionsplt.scatter(X_train[:,0], X_train[:,1], c=Y_pred_binarised_train, cmap=my_cmap, s=15*(np.abs(Y_pred_binarised_train-Y_train)+.2))plt.show()`