Inside Ai

Let’s Develop Artificial Neural Network in 30 lines of code — II

9 min readJun 21, 2021

Part — II Simple yet Complete Guide on how to apply ANN for Regression with K-Fold Validation for accuracy over accuracy OMG!

Cheers, Nice to see you again … ! Previously we have already learned what is ANN and performed ANN with real-life examples. However, I will be briefing the definitions of ANN terminologies just in case if I haven’t bored you :)

Let’s Develop Artificial Neural Network in 30 lines of code

I believe you are already aware of how Neural Networks work if not…don’t worry,, there are plenty of resources available on the web to get started with. However, i will too walk you through in brief what is neuron networks and how it learns?

In this diagram/photo, Dendrites are the receivers of the neuron while Axom is the transmitter of neuron signal.

What is a neuron?

In Artificial Intelligence Neuron is a mathematical function that models the functioning of a biological neuron. Typically, a neuron computes the weighted average of its input, and this sum is passed through a nonlinear function, also called as activation function, such as the sigmoid, Relu

Now if we put this in a flow diagram it will look something like this

In real off-course we gonna have larger and more complex neuronal networks

How does it learn?

When they go process data back and forth (also known as backpropagation). They create weights to save the optimized parameter settings over n over again that gives less error/loss inaccuracy. Once it reaches the point where further calculation doesn’t give any improvement over preceding accuracy, the parameter settings are saved as weights. Now there are different types of methods to minimize the loss inaccuracy. One of them is the Gradient Descent.

Gradient Descent is an optimized algorithm often used for finding weights.

Types of Gradient Descent

1. Batch Gradient Descent: it calculates the error for each example in the training dataset but only updates the model after all training examples have been evaluated. In other words, it takes the whole data and adjusts weights with iterations & iterations.

Pros:

a) Fewer updates to the model means this variant of gradient descent is more computationally efficient than stochastic gradient descent.

b) And with the decreased update frequency results in a more stable error gradient and that may result in more stable convergence.

Cons:

a.) However stable error may result in premature convergence of the model to a less optimal set of parameters.

b.) It is implemented in such a way that it requires the entire training set in memory and is available to the algorithm. Thus with respect to training speed, may become slow for large datasets.

2. Stochastic Gradient Descent calculates the error and updates the model for each example in the training dataset.

In other words: one row at a time, adjust the weights with iterations. Helps to avoid the local minimum rather than the global minimum and it's faster.

Pros:

a.) This variant is simpler to understand and implement for beginners

b.) The frequent updates immediately give an insight into the performance of the model and the rate of improvement.

c.) The increased model update frequency one row at a time can result in faster learning on some problems.

Cons:

a.) However updating the model so frequently is computationally expensive than others variants of gradient descent, especially train models on large datasets.

b.) But the frequent updates can result in a noisy gradient signal which may cause the model parameters and in turn the model error to jump around.

3. Mini-Batch Gradient Descent: is a variation of the gradient descent algorithm that splits the training set into small batches that are used to calculate model error and update model co-efficient.

Mini-batch gradient descent seeks to find a balance between the robustness of stochastic gradient descent and the efficiency of batch gradient descent.

Pros:

a.) The model update frequency is higher than batch gradient descent which allows for a more robust convergence and avoiding local minima.

b.) The batch updates provide a computationally more efficient process than stochastic gradient descent.

c.) The batching allows both the efficiency of not having all the training data in memory and algorithm implementation.

Cons:

a.) Mini-batch requires the configuration of an additional ‘mini-batch’ size hyperparameter for the learning algorithm.

b.) Error information must be accumulated across mini-batches of training examples like batch gradient descent Thus requiring high computational power.

THE MOST COMMONLY USED OPTIMIZER IN DEEP LEARNING is ADAM, an another optimized algorithm.

NOW Since I bored you again by repeating the same stuff from the previous article, I guarantee we will now have a more clear Vivid Shinny picture of how Neural networks work. Let’s get started with a real-life example.

Load the important libraries and the data.

from pandas import read_csv
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline# load dataset
dataframe = read_csv("housing.data", delim_whitespace=True, header=None)
dataset = dataframe.values

The data contains 13 numerical attributes for houses in Boston suburbs that include crime rate, the proportion of non-retail business acres, chemical concentrations, municipal facilities, and more. Thus the problem is concerned with modeling the price of houses in those suburbs with respect to those attributes.

# split into input (X) and output (Y) variables
X = dataset[:,0:13]
Y = dataset[:,13]

#Let’s start with the fun part — creating the model

Here we will create a function instead of using the model straight away.

def larger_model():#create model
 model = Sequential()#Input layer
model.add(Dense(20, input_dim=13, kernel_initializer='normal', activation='relu'))#adding a second layer
 model.add(Dense(6, kernel_initializer='normal', activation='relu'))#output layer
 model.add(Dense(1, kernel_initializer='normal'))#Compile model
 model.compile(loss='mean_squared_error', optimizer='adam')
    
   return model

First, we initialized the model as Sequential() then using DENSE we will add and connect layers with units = 20. We don’t have to write the word ‘units=’ by default the first parameter is taken as ‘units=’ i.e. number of nodes/neurons in the first layer. Then we have input_dim =13 means we have 13 columns.

I will repeat our previous NOTE Again choosing as many nodes/neurons dosn’t mean it will improve the accuracy, it will simply create noise and complexity with too many unnecessary neurons.

Next, we have kernel _initializer = ‘uniform’ where uniform is a function to initialize the weights for Stochastic gradient descent or any other optimizer like ‘ADAM’ What is an optimizer? we will get to the part in a few seconds.

Activation = ‘relu’ stands for the rectified linear unit is the rectifier to create and measure the non-linearity.

Relu is linear for all positive values and zero for all negative values. The downside for being zero for all negative values is a problem called “dying RELU” . a Relu neuron is “dead” if it’s stuck in the negative side and always output 0. The dying problem is likely to occur when learning rate is too high or there is large negative bias. ‘Leaky ReLU’ and ‘ELU’ are also good alternatives to try. Other variants include ReLU-6, Concatenated ReLU(CReLU), Exponential Linear(ELU,SELU), Parametric ReLU.

Further, we will add a second layer the same way we did above, the only difference is we can reduce the size of the number of nodes/neurons optionally and we don’t need to add “input_dim” becoz it will learn itself from the first layer the input dimensions value is 13

#adding a second layer
 model.add(Dense(6, kernel_initializer='normal', activation='relu'))#output layer
 model.add(Dense(1, kernel_initializer='normal'))

Finally, our last layer is the output layer where we won’t use any activation() function as we did in our previous article. Becoz activation ‘sigmod’ is used for binary output and if we don’t need binary output we will simply remove and display the regression values straight from the activation ‘relu’

def larger_model():
 # create model
 model = Sequential().................# Compile model
 model.compile(loss='mean_squared_error', optimizer='adam')
    return model

We will compile all the layers and return the values to the function header.

In other words, calculate weights(settings) in the neural network.

Optimizer = ‘adam” just like Stochastic Gradient Descent (SGD) optimizes the algorithm to find the optimal set of weights in neural networks using pre-defined kernel_initializer =”uniform” that we set a while ago.

We have also successfully created ANN for Regression….!!!!!!!!!! having one input layer, second layer, and the output layer.

HOORAYYYY ….!!!!

Now it's time to fit out the dataset in the model. We will use a slightly different way than we did in classification. Here we will use KerasRegressor and pipelines. KerasRegressor is a wrapper function that helps to use the user-defined model function that we did just above.

The same way we have KerasClassifier for Classification problems.

#evaluate model 
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=larger_model, epochs=100, batch_size=15, verbose=1)))
pipeline = Pipeline(estimators)
kfold = KFold(n_splits=10)
results = cross_val_score(pipeline, X, Y, cv=kfold)
print("Larger: %.2f (%.2f) MSE" % (results.mean(), results.std()))

Then we have standardized StandardScaler() function that will normalize the data, in other words, it will scale down the magnitude of data into a small range without compromising or losing the original meaning of the dataset.

Next, we will define the build function ‘build_fn’ with epoch =100 refers to the number of times that the learning algorithm will work through the entire training dataset., while batch size =15 refers to the number of samples to work with before updating the internal model parameters and verbose is nothing much it gives u the animation of the progress

Finally we will put all of this in a pipeline framework to perform the standardization during the model evaluation process, within each fold(n_splits=10) of the cross validation Kfold

DONE….!!!! WAIT FOR THE PROCESS TO FINISH……….

Did we notice? the LOSS getting smaller and smaller with each iteration.

Well “we both are learning together”, We and our Ai model with an increase in accuracy………..

It's time to check the accuracy……

....print("Larger: %.2f (%.2f) MSE" % (results.mean(), results.std()))

The smaller the value the better the model is.

#removing the negative sign
results_mse = -results
import numpy as np
#converting from MSE to RMSE (Root MSE)
results_rmse = np.sqrt(results_mse)
print(results_rmse)#calculate the avarage RMSE
results_rmse.mean()

We will narrow it down to Root MSE

WAIT …!

We can also perform this using one line of code, where we have to define the np.sqrt to convert the output from MSE TO RMSE and –cross_val_score to remove the negative sign.

#one-line
results = np.sqrt(-cross_val_score(pipeline, X, Y, cv=kfold)).mean()

We have successfully create our Artificial Neuron Network for a Regression problem.

The whole code will look something like this:

Artificial Neural Network for Regression

I hope you enjoyed. Next is Convolutional Neural Network
Stay Tune! or ping me if you need

I WILL BE BACK………………!

Develop Artificial Neural Network in 30 lines of code Part — II

Inside Ai

Let’s Develop Artificial Neural Network in 30 lines of code — II

What is a neuron?

Types of Gradient Descent

We have also successfully created ANN for Regression….!!!!!!!!!! having one input layer, second layer, and the output layer.

HOORAYYYY ….!!!!

WAIT …!

Written by Rupak (Bob) Roy - II

No responses yet