Uni-Variate, Polynomial and Multi-Variate Regression using OLS/Normal Equation Approach (A-Z)

OLS Polynomial Regression using Vector Algebra Form of OLS

A 1-module Implementation of Uni-Variate Linear Regression using OLS is formulated:

=>hypothesis(): It is the function that calculates and outputs the hypothesis value of the Target Variable, given theta (theta_0, theta_1, theta_2, …., theta_n), Feature X and Degree of the Polynomial n as input. The implementation of the hypothesis() is given below:

def hypothesis(theta, X, n):
h = np.ones((X.shape[0],1))
theta = theta.reshape(1,n+1)
for i in range(0,X.shape[0]):
x_array = np.ones(n+1)
for j in range(0,n+1):
x_array[j] = pow(X[i],j)
x_array = x_array.reshape(n+1,1)
h[i] = float(np.matmul(theta, x_array))
h = h.reshape(X.shape[0])
return h

=>Obtaining the minima of Cost Function as theta:

from numpy.linalg import inv
theta = np.matmul(np.matmul(inv(np.matmul(x_array.transpose()
, x_array)), x_array.transpose()), y_train)

Now, applying the OLS Polynomial (here Quadratic) Regression on the same dataset.

data = np.loadtxt('data1.txt', delimiter=',')
X_train = data[:,0] #the feature_set
y_train = data[:,1] #the labels
x_array = np.ones((X_train.shape[0], n+1))
for i in range(0,X_train.shape[0]):
for j in range(0,n+1):
x_array[i][j] = pow(X_train[i],j)
theta = np.matmul(np.matmul(inv(np.matmul(x_array.transpose()
, x_array)), x_array.transpose()), y_train)

theta after OLS for Polynomial Regression

The Regression Line Visualization of the obtained theta is done on Scatter Plot:

import matplotlib.pyplot as plt 
#getting the predictions...
training_predictions = hypothesis(theta, X_train, 2)
scatter = plt.scatter(X_train, y_train, label="training data")
regression_line = plt.plot(X_train, training_predictions, label="polynomial (degree 2) regression")
plt.legend()
plt.xlabel('Population of City in 10,000s')
plt.ylabel('Profit in $10,000s')

The Regression Line Visualization comes out to be:

Regression Line Visualization after OLS Polynomial Regression

Performance Analysis:

The Model Performance is analyzed and 2-way comparison is done. In other words, the performance of OLS Polynomial Regression is compared with Gradient Descent Quadratic Regression and is also compared with OLS Linear Regression.

The performance of Gradient Descent Quadratic Regression is obtained from

From the table, the following conclusions can be drawn,

  1. Gradient Descent Quadratic Regression fetches the least Mean Absolute Error
  2. OLS Quadratic Regression fetches the least Mean Square Error and Root Mean Square Error
  3. OLS Quadratic and Linear Regression fetch the highest R Square Score.

Based on the above 3 points, it can be confirmed that

OLS Approach is more successful than Gradient Descent Optimization

Reason : The possible reason is that in Gradient Descent, if the Algorithm, given in

is carefully studied, it can be seen that there is no fixed number of iterations mentioned. It is just mentioned “until convergence”. So, by giving a random number of iterations will not fetch best performance. The implementation should be done in such a way that the Algorithm itself finds the required number of iterations for convergence. Another important potential threat of Gradient Descent is

What will happen if Local Minima is obtained instead of Global Minima ?

Well, for this issue, the learning rate of Gradient Descent has to be chosen appropriately, in order to avoid the wiggliness of Cost Function.

Modified Implementation of Gradient Descent (for Polynomial Regression) is given below:

def modified_BGD(theta, alpha, h, X, y, n): 
# n = 1 for Linear Regression
k = 0 # number of iterations by the algorithm set as counter
theta[0] = theta[0] - (alpha/X.shape[0]) * sum(h - y)
for j in range(1,n+1):
theta[j] = theta[j]-(alpha/X.shape[0])*sum((h-y)*pow(X,j))
h = hypothesis(theta, X, n)
cost = (1 / X.shape[0]) * 0.5 * sum(np.square(h - y_train))
while(1):
theta[0] = theta[0] - (alpha/X.shape[0]) * sum(h - y)
for j in range(1,n+1):
theta[j]=theta[j]-(alpha/X.shape[0])*sum((h-y)*pow(X,j))
h = hypothesis(theta, X, n)
if ((1/X.shape[0])*0.5*sum(np.square(h - y_train))) >= cost:
break
cost = (1 / X.shape[0]) * 0.5 * sum(np.square(h - y_train))
k = k + 1
theta = theta.reshape(1,n+1)
return theta, cost, k

Following this implementation of BGD, there will statistically be very small or no difference in performance measures of Multi-Variate and Polynomial Regression.

Also from the above 2-way comparison table, it can also be confirmed that:

General Polynomial Regression over-performs Linear Regression, if proper degree of the polynomial is chosen

read original article here