## Deep Learning

# LSTMs for regression

--

Quick and easy guide to solve regression problems with Deep Learnings’ different types of LSTMs

Hi how are you doing, I hope it's great likewise.

Today we will start off with a topic LSTM, which is a powerful type of neural network designed and optimized to handle a sequence of time series data.

**Long-Strong-Term Memory (LSTM)** is the next generation of **Recurrent Neural Network (RNN)** used in deep learning for its optimized architecture to easily capture the pattern in sequential data.

In this article, we will learn how to **create different types** of Long Short-Term Memory Network and later we will move on with forecasting.

Here’s the list of what we will have in this article.

*1.) LSTM for Regression*

*2.) LSTM using Window Method*

*3.) LSTM time Step Framing*

*4.) LSTM Memory Between Batches*

*5.) Stacked LSTM with Memory Between Batches*

## Since this will be a long article I will continue the ‘forecasting’ part

*a.) LSTM Univariate*

*b.) LSTM Multivariate*

** c.) LSTM Multivariate Multi-Step **and more in my next article. The link to my next article will be available below at the end on this article.

So let’s get started!

## What is LSTM in brief?

It is a recurrent neural network that is trained by using Backpropagation through time and overcomes the vanishing gradient problem.

Now instead of having Neurons, ** LSTM networks have memory blocks that are connected through layers.** The blocks of LSTM contain 3 non-linear gates that make it smarter than a classical neuron and a memory for sequences. The 3 types of non-linear gates include

** a.) Input Gate:** decides which values from the input to update the memory state.

** b.) Forget Gate:** handles what information to throw away from the block

** c.) Output Gate:** finally handles what to be in output based on input and the memory gate.

Each LSTM unit is like a mini-state machine that utilizes a ”**memory**” cell that may maintain its state value over a longer time, where the gates of the units have weights that are learned during the training procedure.

There are tons of articles available on the internet about the workings of LSTM even the math behind LSTM. So here I will concentrate more on the quicker practical implementation of LSTM for our day-to-day problems.

Let’s get started!

First, we have **LSTM for Regression**

I believe we all know what is regression? If not no problem, let me give you in brief, regression helps us to identify the cause-effect relationship and the quantification of impact. In simple words, it identifies the strength and values of the relationship (positive/negative impact and the values derived is called quantification of impact) between one dependent variable(Y) and a series of other independent variables X

For this example, we have retail sales time series data recorded over a period of time.

Now as your regression requires X & Y independent and dependent variables for the algorithm to learn/train, so we will first convert our data into such format.

What we will do will first take the sales data(t) in our first column then the second column will have the next month's (t+1)sales data that we will use to predict. Remember X & Y independent and dependent variable format where we use Y to predict the data.

First, we will import all the libraries, set a seed number that ensures our sample data in the train and test dataset will be the same every time we use the same seed number.

#LSTM for regression

import numpy

import matplotlib.pyplot as plt

from pandas import read_csv

import math

from keras.models import Sequential

from keras.layers import Dense

from keras.layers import LSTM

from sklearn.preprocessing import MinMaxScaler

from sklearn.metrics import mean_squared_error#convert an array of values into a dataset matrix

def create_dataset(dataset, look_back=1):

dataX, dataY = [], []for i in range(len(dataset)-look_back-1):

a = dataset[i:(i+look_back), 0]

dataX.append(a)

dataY.append(dataset[i + look_back, 0])

return numpy.array(dataX), numpy.array(dataY)#random seed

numpy.random.seed(7)

We will convert the array values from the data frame to integer float32 values which are more suitable for modeling with a neural network then we will scale the input data.

LSTMs are a bit sensitive to widespread scale of data. Even in all deep learning methods scaling the data range of 0 to 1 before fitting it to our algorithm is good practice that helps the algorithm to work faster and effectively. And yes scaling the data will not lose its original meaning from the data. We also call this a Normalization using the MinMaxScaler pre-processing class function.

#load the dataset

dataframe = read_csv('sales_year.csv', usecols=[1], engine='python')

dataset = dataframe.values

dataset = dataset.astype('float32')#normalize the dataset

scaler = MinMaxScaler(feature_range=(0, 1))

dataset = scaler.fit_transform(dataset)

Then we will split the data into train and test sets where our model will learn and validate using unseen data from the test data set. These are the few regular machine learning data pre-processing steps that we have to do.

Now it’s time to format the dataset into X and Y where X is the number of sales at a given time(t) and Y is the number of sales at the next time (t+1) and look_back is the number of previous time steps to be as input(X) variable to predict the next time step(y)

`#reshape into X=t and Y=t+1`

look_back = 1

trainX, trainY = create_dataset(train, look_back)

testX, testY = create_dataset(test, look_back)

LSTM network expects the input data(X) to be [samples, time steps, features] format and our data is now in [samples, features] format.

`#reshape input to be [samples, time steps, features]`

trainX = numpy.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))

testX = numpy.reshape(testX, (testX.shape[0], 1, testX.shape[1]))

Done. Our data pre-processing is complete. It's time to build our LSTM network and fit the data.

`#create and fit our data to the LSTM network`

model = Sequential()

model.add(LSTM(15, input_shape=(1, look_back)))

model.add(Dense(1))

model.compile(loss='mean_squared_error', optimizer='adam',metrics =["accuracy"])

model.fit(trainX, trainY, epochs=400, batch_size=28, verbose=1)

First, we initialize the model as Sequential() then we have 1 hidden player with input data format as input_shape. 15 refers to the number of LSTM blocks or neurons then we have an output layer with 1 LSTM block to make a single value prediction. The default activation function in LSTM is sigmoid.

Note: choosing more LSTM blocks doesn't mean it will improve the accuracy also having fewer LSTM blocks might result in less accuracy. So you need to test with multiple numbers of LSTM blocks and get a properly balanced value.

‘Adam’ is the best-optimized metric to calculate the loss to re-train the trained model for optimal model weights meanwhile ‘loss’, ’ metrics’ are simply to display the scores of our model.

Finally, we will train our model with epochs(iteration) = 370 Batch_size = 28 means it will take 28 values at a time, and verbose = 1 is just to display the live training details.

Also do remember epochs(iterations) and the batch_size to train the model also impacts the accuracy of the model and to find the best Epoch and batch_size value we have various methods which I will show u in my other article.

Done our model is complete with a loss of 1124 units in train data set and 1221 units in test data set.

Time to do predictions and yes we need to invert the results so that we can get the results at the same original scale.

`#make predictions`

trainPredict = model.predict(trainX)

testPredict = model.predict(testX)

#invert predictions

trainPredict = scaler.inverse_transform(trainPredict)

trainY = scaler.inverse_transform([trainY])

testPredict = scaler.inverse_transform(testPredict)

testY = scaler.inverse_transform([testY])

Let’s see how good our model did.

#LSTM for regression

import numpy

import matplotlib.pyplot as plt

from pandas import read_csv

import math

from keras.models import Sequential

from keras.layers import Dense

from keras.layers import LSTM

from sklearn.preprocessing import MinMaxScaler

from sklearn.metrics import mean_squared_error

# convert an array of values into a dataset matrix

def create_dataset(dataset, look_back=1):

dataX, dataY = [], []

for i in range(len(dataset)-look_back-1):

a = dataset[i:(i+look_back), 0]

dataX.append(a)

dataY.append(dataset[i + look_back, 0])

return numpy.array(dataX), numpy.array(dataY)#random seed

numpy.random.seed(7)# load the dataset

dataframe = read_csv('sales_year.csv', usecols=[1], engine='python')

dataset = dataframe.values

dataset = dataset.astype('float32')# normalize the dataset

scaler = MinMaxScaler(feature_range=(0, 1))

dataset = scaler.fit_transform(dataset)# split into train and test sets

train_size = int(len(dataset) * 0.67)

test_size = len(dataset) - train_size

train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]# reshape into X=t and Y=t+1

look_back = 1

trainX, trainY = create_dataset(train, look_back)

testX, testY = create_dataset(test, look_back)# reshape input to be [samples, time steps, features]

trainX = numpy.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))

testX = numpy.reshape(testX, (testX.shape[0], 1, testX.shape[1]))# create and fit our data to the LSTM network

model = Sequential()

model.add(LSTM(15, input_shape=(1, look_back)))

model.add(Dense(1))

model.compile(loss='mean_squared_error', optimizer='adam',metrics =["accuracy"])

model.fit(trainX, trainY, epochs=370, batch_size=28, verbose=1)# make predictions

trainPredict = model.predict(trainX)

testPredict = model.predict(testX)

# invert predictions

trainPredict = scaler.inverse_transform(trainPredict)

trainY = scaler.inverse_transform([trainY])

testPredict = scaler.inverse_transform(testPredict)

testY = scaler.inverse_transform([testY])# calculate root mean squared error

trainScore = math.sqrt(mean_squared_error(trainY[0], trainPredict[:,0]))

print('Train Score: %.2f RMSE' % (trainScore))

testScore = math.sqrt(mean_squared_error(testY[0], testPredict[:,0]))

print('Test Score: %.2f RMSE' % (testScore))#-----Visualize----------

# shift train predictions for plotting

trainPredictPlot = numpy.empty_like(dataset)

trainPredictPlot[:, :] = numpy.nan

trainPredictPlot[look_back:len(trainPredict)+look_back, :] = trainPredict

# shift test predictions for plotting

testPredictPlot = numpy.empty_like(dataset)

testPredictPlot[:, :] = numpy.nan

testPredictPlot[len(trainPredict)+(look_back*2)+1:len(dataset)-1, :] = testPredict

# plot baseline and predictions

plt.plot(scaler.inverse_transform(dataset))

plt.plot(trainPredictPlot)

plt.plot(testPredictPlot)

plt.show()

We can observe that the model(orange(train) + green(test) data) did a very good job in understanding the random sequence of sales (blue(sales data set)) However we also observed some drop(less accuracy) in the test dataset (green color)this is because there might be some unseen event happen in that particular timestamp that leads to way less/downfall of sales than predicted by our model.

Here’s the entire code below

#LSTM for regression

import numpy

import matplotlib.pyplot as plt

from pandas import read_csv

import math

from keras.models import Sequential

from keras.layers import Dense

from keras.layers import LSTM

from sklearn.preprocessing import MinMaxScaler

from sklearn.metrics import mean_squared_error

# convert an array of values into a dataset matrix

def create_dataset(dataset, look_back=1):

dataX, dataY = [], []

for i in range(len(dataset)-look_back-1):

a = dataset[i:(i+look_back), 0]

dataX.append(a)

dataY.append(dataset[i + look_back, 0])

return numpy.array(dataX), numpy.array(dataY)#random seed

numpy.random.seed(7)# load the dataset

dataframe = read_csv('sales_year.csv', usecols=[1], engine='python')

dataset = dataframe.values

dataset = dataset.astype('float32')# normalize the dataset

scaler = MinMaxScaler(feature_range=(0, 1))

dataset = scaler.fit_transform(dataset)# split into train and test sets

train_size = int(len(dataset) * 0.67)

test_size = len(dataset) - train_size

train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]# reshape into X=t and Y=t+1

look_back = 1

trainX, trainY = create_dataset(train, look_back)

testX, testY = create_dataset(test, look_back)# reshape input to be [samples, time steps, features]

trainX = numpy.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))

testX = numpy.reshape(testX, (testX.shape[0], 1, testX.shape[1]))#create and fit our data to the LSTM network

model = Sequential()

model.add(LSTM(15, input_shape=(1, look_back)))

model.add(Dense(1))

model.compile(loss='mean_squared_error', optimizer='adam',metrics =["accuracy"])

model.fit(trainX, trainY, epochs=370, batch_size=28, verbose=1)#make predictions

trainPredict = model.predict(trainX)

testPredict = model.predict(testX)

#invert predictions

trainPredict = scaler.inverse_transform(trainPredict)

trainY = scaler.inverse_transform([trainY])

testPredict = scaler.inverse_transform(testPredict)

testY = scaler.inverse_transform([testY])#calculate root mean squared error

trainScore = math.sqrt(mean_squared_error(trainY[0], trainPredict[:,0]))

print('Train Score: %.2f RMSE' % (trainScore))

testScore = math.sqrt(mean_squared_error(testY[0], testPredict[:,0]))

print('Test Score: %.2f RMSE' % (testScore))#-----Visualize----------

#shift train predictions for plotting

trainPredictPlot = numpy.empty_like(dataset)

trainPredictPlot[:, :] = numpy.nan

trainPredictPlot[look_back:len(trainPredict)+look_back, :] = trainPredict

#shift test predictions for plotting

testPredictPlot = numpy.empty_like(dataset)

testPredictPlot[:, :] = numpy.nan

testPredictPlot[len(trainPredict)+(look_back*2)+1:len(dataset)-1, :] = testPredict

#plot baseline and predictions

plt.plot(scaler.inverse_transform(dataset))

plt.plot(trainPredictPlot)

plt.plot(testPredictPlot)

plt.show()

Perhaps there’s another way to approach the time series solution is by increasing the multiple time steps to make the prediction for the next time step.

This is called as Window Method

Now for example Y will be our current timestamp (t) and we need to predict the next timestamp (t+1). What we will do is we will use the current time step (t) as well as two prior time steps ( t-1, t-2) as input variables(X)

So we only need to change the loop_back value = 3 that will take t-1,t-2,t. The intuition behind this is the more data we have to train our model the more accuracy we get and also that doesn’t mean you will put loop_back value = 100-time steps, it’s like repeating the similar pattern of time steps 100times will give you less accuracy, we need to find a balance. And now since we have more data(look_back =3/time steps) we can increase our epochs/iterations to train the model.

Let’s re-run the code with look_back = 3, LSTM blocks = 17 and epochs = 450

#LSTM with window method

import numpy

import matplotlib.pyplot as plt

from pandas import read_csv

import math

from keras.models import Sequential

from keras.layers import Dense

from keras.layers import LSTM

from sklearn.preprocessing import MinMaxScaler

from sklearn.metrics import mean_squared_error

#convert an array of values into a dataset matrix

def create_dataset(dataset, look_back=1):

dataX, dataY = [], []

for i in range(len(dataset)-look_back-1):

a = dataset[i:(i+look_back), 0]

dataX.append(a)

dataY.append(dataset[i + look_back, 0])

return numpy.array(dataX), numpy.array(dataY)

#fix random seed for reproducibility

numpy.random.seed(7)

#load the dataset

dataframe = read_csv('sales_year.csv', usecols=[1], engine='python')

dataset = dataframe.values

dataset = dataset.astype('float32')#normalize the dataset

scaler = MinMaxScaler(feature_range=(0, 1))

dataset = scaler.fit_transform(dataset)

#split into train and test sets

train_size = int(len(dataset) * 0.67)

test_size = len(dataset) - train_size

train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]'''*************************************'''

#reshape into X=t and Y=t+1

look_back = 3 #window-method (t-2,t-1,t,y)trainX, trainY = create_dataset(train, look_back)

testX, testY = create_dataset(test, look_back)

#reshape input to be [samples, time steps, features]

trainX = numpy.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))

testX = numpy.reshape(testX, (testX.shape[0], 1, testX.shape[1]))'''**************************************'''#create and fit the LSTM network

model = Sequential()

model.add(LSTM(17, input_shape=(1, look_back)))

model.add(Dense(1))

model.compile(loss='mean_squared_error', optimizer='adam',metrics=["accuracy"])

model.fit(trainX, trainY, epochs=450, batch_size=28, verbose=1)#make predictions

trainPredict = model.predict(trainX)

testPredict = model.predict(testX)

#invert predictions

trainPredict = scaler.inverse_transform(trainPredict)

trainY = scaler.inverse_transform([trainY])

testPredict = scaler.inverse_transform(testPredict)

testY = scaler.inverse_transform([testY])#calculate root mean squared error

trainScore = math.sqrt(mean_squared_error(trainY[0], trainPredict[:,0]))

print('Train Score: %.2f RMSE' % (trainScore))

testScore = math.sqrt(mean_squared_error(testY[0], testPredict[:,0]))

print('Test Score: %.2f RMSE' % (testScore))#-----------------Visualize----------------

#shift train predictions for plotting

trainPredictPlot = numpy.empty_like(dataset)

trainPredictPlot[:, :] = numpy.nan

trainPredictPlot[look_back:len(trainPredict)+look_back, :] = trainPredict

#shift test predictions for plotting

testPredictPlot = numpy.empty_like(dataset)

testPredictPlot[:, :] = numpy.nan

testPredictPlot[len(trainPredict)+(look_back*2)+1:len(dataset)-1, :] = testPredict

# plot baseline and predictions

plt.plot(scaler.inverse_transform(dataset))

plt.plot(trainPredictPlot)

plt.plot(testPredictPlot)

plt.show()

Nice, we have improved our model from 1236 to 959 RMSE in the train dataset and from 1674 to 1303 in the test dataset even the graph if you have noticed the end part is now closer than before.

There is an approach with Time Steps.

LSTM for regression with Time Steps

As mentioned in ‘machine learning mastery’ **“ instead of phrasing the past observations as separate input features, we can use them as time steps of the one input feature, which is indeed a more accurate framing of the problem”**

Let’s see if it applies to our previous to improve our model. We only need to reshape the columns to be the time steps dimensions and features back to 1

Here’s the difference

#window method

#reshape input to be [samples, time steps, features]

trainX = numpy.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))

testX = numpy.reshape(testX, (testX.shape[0], 1, testX.shape[1]))Output((199, 1, 3), (96, 1, 3))#Time Frame Method

#reshape input to be [samples, time steps, features]

trainX = numpy.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))

testX = numpy.reshape(testX, (testX.shape[0], 1, testX.shape[1]))Output: ((199, 3, 1), (96, 3, 1))

The dimensions are changed to TrainX: 199 total rows, 3 rows, and 1 column, TestX: 96 total rows, 3 rows, and 1 column.

Now let’s put all the codes together and see how good it goes!

#LSTM regression with time step

import numpy

import matplotlib.pyplot as plt

from pandas import read_csv

import math

from keras.models import Sequential

from keras.layers import Dense

from keras.layers import LSTM

from sklearn.preprocessing import MinMaxScaler

from sklearn.metrics import mean_squared_error

#convert an array of values into a dataset matrix

def create_dataset(dataset, look_back=1):

dataX, dataY = [], []

for i in range(len(dataset)-look_back-1):

a = dataset[i:(i+look_back), 0]

dataX.append(a)

dataY.append(dataset[i + look_back, 0])

return numpy.array(dataX), numpy.array(dataY)

#random seed

numpy.random.seed(7)

#load the dataset

dataframe = read_csv('sales_year.csv', usecols=[1], engine='python')

dataset = dataframe.values

dataset = dataset.astype('float32')#normalize the dataset

scaler = MinMaxScaler(feature_range=(0, 1))

dataset = scaler.fit_transform(dataset)#split into train and test sets

train_size = int(len(dataset) * 0.67)

test_size = len(dataset) - train_size

train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]#reshape into X=t and Y=t+1

look_back = 3

trainX, trainY = create_dataset(train, look_back)

testX, testY = create_dataset(test, look_back)'''************************************************'''

#reshape input to be [samples, time steps, features]

trainX = numpy.reshape(trainX, (trainX.shape[0], trainX.shape[1], 1))

testX = numpy.reshape(testX, (testX.shape[0], testX.shape[1], 1))

#we are putting back the feature dimension = 1'''***********************************************'''

#create and fit the LSTM network

model = Sequential()

model.add(LSTM(17, input_shape=(look_back, 1)))

model.add(Dense(1))

model.compile(loss='mean_squared_error', optimizer='adam')

model.fit(trainX, trainY, epochs=450, batch_size=28, verbose=1)#make predictions

trainPredict = model.predict(trainX)

testPredict = model.predict(testX)

#invert predictions

trainPredict = scaler.inverse_transform(trainPredict)

trainY = scaler.inverse_transform([trainY])

testPredict = scaler.inverse_transform(testPredict)

testY = scaler.inverse_transform([testY])#calculate root mean squared error

trainScore = math.sqrt(mean_squared_error(trainY[0], trainPredict[:,0]))

print('Train Score: %.2f RMSE' % (trainScore))

testScore = math.sqrt(mean_squared_error(testY[0], testPredict[:,0]))

print('Test Score: %.2f RMSE' % (testScore))#----------Visualize-------------------------

#shift train predictions for plotting

trainPredictPlot = numpy.empty_like(dataset)

trainPredictPlot[:, :] = numpy.nan

trainPredictPlot[look_back:len(trainPredict)+look_back, :] = trainPredict

#shift test predictions for plotting

testPredictPlot = numpy.empty_like(dataset)

testPredictPlot[:, :] = numpy.nan

testPredictPlot[len(trainPredict)+(look_back*2)+1:len(dataset)-1, :] = testPredict#plot baseline and predictions

plt.plot(scaler.inverse_transform(dataset))

plt.plot(trainPredictPlot)

plt.plot(testPredictPlot)

plt.show()

Well, our RMSE scores are now higher than before. Thus our previous model with the window method is performing better for this sales_year data.

The next approach is *LSTM with Memory Between Batches ~ Stateful LSTM*

As we know the LSTM network has a memory that is capable of remembering across long sequences.

*The state within the network is reset after each training batch when fitting the model as well as each call to model.predict() or model.evaluate()*

We can have *more control over the internal state of the LSTM* network is cleared in Keras by making the LSTM layer “**stateful**”. This means that it can build state over the entire training sequence and even maintain that state if needed to make predictions and also it requires the training data is not to be shuffled when fitting the network. It also requires explicit resetting of the network state after each exposure to the training data(epoch) by calls to model.reset_staes(). Thus we have must create our own outer loop of epochs and within each epoch call model.fit() and model.reset_states()

`for i in range(100):`

model.fit(trainX, trainY, epochs=10, batch_size=batch_size, verbose=2, shuffle=False

model.reset_states()

and we have to hard code the samples, time steps, and the number of features in a time step by setting the batch_input_shape parameter.

`batch_size = 1`

model = Sequential()

model.add(LSTM(4, batch_input_shape=(batch_size, look_back, 1), stateful=True))

model.add(Dense(1))

model.compile(loss='mean_squared_error', optimizer='adam')

The same batch size value needs to be used during prediction or any evaluation.

Let’s re-run the codes and will see how far it goes.

#LSTM with memory between batches

import numpy

import matplotlib.pyplot as plt

from pandas import read_csv

import math

from keras.models import Sequential

from keras.layers import Dense

from keras.layers import LSTM

from sklearn.preprocessing import MinMaxScaler

from sklearn.metrics import mean_squared_error

# convert an array of values into a dataset matrix

def create_dataset(dataset, look_back=1):

dataX, dataY = [], []

for i in range(len(dataset)-look_back-1):

a = dataset[i:(i+look_back), 0]

dataX.append(a)

dataY.append(dataset[i + look_back, 0])

return numpy.array(dataX), numpy.array(dataY)#random seed

numpy.random.seed(7)

# load the dataset

dataframe = read_csv('sales_year.csv', usecols=[1], engine='python')

dataset = dataframe.values

dataset = dataset.astype('float32')#normalize the dataset

scaler = MinMaxScaler(feature_range=(0, 1))

dataset = scaler.fit_transform(dataset)#split into train and test sets

train_size = int(len(dataset) * 0.67)

test_size = len(dataset) - train_size

train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]#reshape into X=t and Y=t+1

look_back = 3

trainX, trainY = create_dataset(train, look_back)

testX, testY = create_dataset(test, look_back)#reshape input to be [samples, time steps, features]

trainX = numpy.reshape(trainX, (trainX.shape[0], trainX.shape[1], 1))

testX = numpy.reshape(testX, (testX.shape[0], testX.shape[1], 1))'''************************************************************'''

#the LSTM network

batch_size = 1

model = Sequential()

model.add(LSTM(4, batch_input_shape=(batch_size, look_back, 1), stateful=True))

model.add(Dense(1))

model.compile(loss='mean_squared_error', optimizer='adam')

for i in range(100):

model.fit(trainX, trainY, epochs=10, batch_size=batch_size, verbose=2, shuffle=False)

model.reset_states()

'''*********************************************'''

#make predictions

trainPredict = model.predict(trainX, batch_size=batch_size)

model.reset_states()

testPredict = model.predict(testX, batch_size=batch_size)

#invert predictions

trainPredict = scaler.inverse_transform(trainPredict)

trainY = scaler.inverse_transform([trainY])

testPredict = scaler.inverse_transform(testPredict)

testY = scaler.inverse_transform([testY])#calculate root mean squared error

trainScore = math.sqrt(mean_squared_error(trainY[0], trainPredict[:,0]))

print('Train Score: %.2f RMSE' % (trainScore))

testScore = math.sqrt(mean_squared_error(testY[0], testPredict[:,0]))

print('Test Score: %.2f RMSE' % (testScore))#----------Visualize---------------------------------

#shift train predictions for plotting

trainPredictPlot = numpy.empty_like(dataset)

trainPredictPlot[:, :] = numpy.nan

trainPredictPlot[look_back:len(trainPredict)+look_back, :] = trainPredict

#shift test predictions for plotting

testPredictPlot = numpy.empty_like(dataset)

testPredictPlot[:, :] = numpy.nan

testPredictPlot[len(trainPredict)+(look_back*2)+1:len(dataset)-1, :] = testPredict#plot baseline and predictions

plt.plot(scaler.inverse_transform(dataset))

plt.plot(trainPredictPlot)

plt.plot(testPredictPlot)

plt.show()

Alright we can observe the end part of the plot(original dataset + test dataset) is now closer than before and even the Root Mean Square Error RMSE score is smaller than our previous model.

Thus we will conclude this model performs better than the previous ones.

Finally, we will take an advantage of a deeper network topology in LSTMs which we can name ask **Stacked LSTMs with Memory between Batches.**

*However, a LSTM layer prior to each subsequent LSTM layer must return the sequence by settings the return_sequences parameter on the layer to ‘True”.*

`model.add(LSTM(4, batch_input_shape=(batch_size, look_back, 1), stateful=True, return_sequences=True))`

model.add(LSTM(4, batch_input_shape=(batch_size, look_back, 1), stateful=True))

Let’s re-run our code once again with stacked LSTMs

#Stacked LSTM

import numpy

import matplotlib.pyplot as plt

from pandas import read_csv

import math

from keras.models import Sequential

from keras.layers import Dense

from keras.layers import LSTM

from sklearn.preprocessing import MinMaxScaler

from sklearn.metrics import mean_squared_error

# convert an array of values into a dataset matrix

def create_dataset(dataset, look_back=1):

dataX, dataY = [], []

for i in range(len(dataset)-look_back-1):

a = dataset[i:(i+look_back), 0]

dataX.append(a)

dataY.append(dataset[i + look_back, 0])

return numpy.array(dataX), numpy.array(dataY)

#random seed

numpy.random.seed(7)

# load the dataset

dataframe = read_csv('sales_year.csv', usecols=[1], engine='python')

dataset = dataframe.values

dataset = dataset.astype('float32')#normalize the dataset

scaler = MinMaxScaler(feature_range=(0, 1))

dataset = scaler.fit_transform(dataset)#split into train and test sets

train_size = int(len(dataset) * 0.67)

test_size = len(dataset) - train_size

train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]#reshape into X=t and Y=t+1

look_back = 3

trainX, trainY = create_dataset(train, look_back)

testX, testY = create_dataset(test, look_back)

# reshape input to be [samples, time steps, features]

trainX = numpy.reshape(trainX, (trainX.shape[0], trainX.shape[1], 1))

testX = numpy.reshape(testX, (testX.shape[0], testX.shape[1], 1))'''***********************************************'''# create and fit the LSTM network

batch_size = 1

model = Sequential()

model.add(LSTM(4, batch_input_shape=(batch_size, look_back, 1), stateful=True, return_sequences=True))

model.add(LSTM(4, batch_input_shape=(batch_size, look_back, 1), stateful=True))

model.add(Dense(1))

model.compile(loss='mean_squared_error', optimizer='adam')

for i in range(100):

model.fit(trainX, trainY, epochs=20, batch_size=batch_size, verbose=2, shuffle=False)

model.reset_states()

'''**********************************************'''

#make predictions

trainPredict = model.predict(trainX, batch_size=batch_size)

model.reset_states()

testPredict = model.predict(testX, batch_size=batch_size)

#invert predictions

trainPredict = scaler.inverse_transform(trainPredict)

trainY = scaler.inverse_transform([trainY])

testPredict = scaler.inverse_transform(testPredict)

testY = scaler.inverse_transform([testY])#calculate root mean squared error

trainScore = math.sqrt(mean_squared_error(trainY[0], trainPredict[:,0]))

print('Train Score: %.2f RMSE' % (trainScore))

testScore = math.sqrt(mean_squared_error(testY[0], testPredict[:,0]))

print('Test Score: %.2f RMSE' % (testScore))#-----------------Visualize---------------------------

#shift train predictions for plotting

trainPredictPlot = numpy.empty_like(dataset)

trainPredictPlot[:, :] = numpy.nan

trainPredictPlot[look_back:len(trainPredict)+look_back, :] = trainPredict

#shift test predictions for plotting

testPredictPlot = numpy.empty_like(dataset)

testPredictPlot[:, :] = numpy.nan

testPredictPlot[len(trainPredict)+(look_back*2)+1:len(dataset)-1, :] = testPredict#plot baseline and predictions

plt.plot(scaler.inverse_transform(dataset))

plt.plot(trainPredictPlot)

plt.plot(testPredictPlot)

plt.show()

Amazing with stacked LSTM our RMSE error scores are far lesser than the last one and even in the plot the end part prediction on the Test dataset is closer than ever before.

Finally we are done…. It’s been a long way. I hope I’m to able re-frame the basic concepts of LSTMs. Feel Free to discuss any new concepts or innovative ways to use MLP for our day to day solutions.

Next, I’m going to write deeper about** LSTM **and replicate the same example above to validate its better performance on Time Series Data. Stay Tune!

I hope you enjoyed it.

**Some of my alternative internet presences are** Facebook, Instagram, Udemy, Blogger, Issuu, and more.

Also available on Quora @ https://www.quora.com/profile/Rupak-Bob-Roy

have a good day…