Deep Learning

Uni-Variate LSTM Time Series Forecasting

Apply State Of The Art Deep Learning Time Series Forecasting with the help of this template.

Rupak (Bob) Roy - II
8 min readJun 25, 2021

Hi how are you doing, I hope it’s great likewise.

Today we will start off with a topic LSTM, which is a powerful type of neural network designed and optimized to handle a sequence of time-series data.

Long-Strong-Term Memory (LSTM) is the next generation of Recurrent Neural Network (RNN) used in deep learning for its optimized architecture to easily capture the pattern in sequential data. The benefit of this type of network is that it can learn and remember over long sequences and doesn’t rely on pre-specified window lagged observation as input.

In Keras, this is referred to as stateful and involves settings the “Stateful” argument to “True” in the LSTM layer.

What is LSTM in brief?

It is a recurrent neural network that is trained by using Backpropagation through time and overcomes the vanishing gradient problem.

Now instead of having Neurons, LSTM networks have memory blocks that are connected through layers. The blocks of LSTM contain 3 non-linear gates that make it smarter than a classical neuron and a memory for sequences. The 3 types of non-linear gates include

a.) Input Gate: decides which values from the input to update the memory state.

b.) Forget Gate: handles what information to throw away from the block

c.) Output Gate: finally handles what to be in output based on input and the memory gate.

Each LSTM unit is like a mini-state machine that utilizes a ”memory” cell that may maintain its state value over a longer time, where the gates of the units have weights that are learned during the training procedure.

There are tons of articles available on the internet about the workings of LSTM even the math behind LSTM. So here I will concentrate more on the quicker practical implementation of LSTM for our day-to-day problems.

Let’s get started!

First is the data pre-processing step where we have to give the structure the data into supervised learning that is X and Y format.

In simple words, it identifies the strength and values of the relationship (positive/negative impact and the values derived is called quantification of impact) between one dependent variable(Y) and a series of other independent variables X

For this example, we have retail sales time series data recorded over a period of time.

Now as u know supervised learning requires X & Y independent and dependent variables for the algorithm to learn /train, so we will first convert our data into such a format

What we will do will first take the sales data(t) in our first column then the second column will have the next month's (t+1)sales data that we will use to predict. Remember X & Y independent and dependent variable format where we use Y to predict the data.

The code below will convert time series to supervised learning. And yes df.fillna(0,inplace=True) refers replace NaN value with 0 values.

#supervised learning function
def timeseries_to_supervised(data, lag=1):
df = DataFrame(data)
columns = [df.shift(i) for i in range(1, lag+1)]
columns.append(df)
df = concat(columns, axis=1)
df.fillna(0, inplace=True)
return df
Supervised Format

Here are what our sales data will look like after transforming it to supervised learning.

The next step is to convert time-series data to Stationary. And our ‘sales_year.csv’ data is not stationary.

This means that there is a structure in the data that is dependent on time. We can see there is a increasing trend in the data

Stationary data is easier to model and will very likely result in more skillful forecasts.

The trend can be removed from observations, then use for forecasts later we can scale it to the original value for prediction.

We can easily remove a trend by differencing the data with diff() function from pandas that are the observations from the previous time step (t-1) is subtracted from the current observation(t). This will give us a series of differences.

#create a differences series
def difference(dataset, interval=1):
diff = list()
for i in range(interval, len(dataset)):
value = dataset[i] - dataset[i - interval]
diff.append(value)
return Series(diff)
#invert differences value
def inverse_difference(history, yhat, interval=1):
return yhat + history[-interval]

Now its time to normalize/scale the data.

LSTMs are a bit sensitive to the widespread scale of data. Even in all deep learning methods scaling the data range of -1 to 1 before fitting it to our algorithm is good practice that helps the algorithm to work faster and effectively. And yes scaling the data will not lose its original meaning from the data. We also call this a Normalization using the MinMaxScaler pre-processing class function.

Even the default activation function for LSTMS is the hyperbolic tangent (tanh) which outputs values between -1 and 1 which is the preferred range for the time series data.

#transform scale
X = series.values
X = X.reshape(len(X), 1)
scaler = MinMaxScaler(feature_range=(-1, 1))
scaler = scaler.fit(X)
scaled_X = scaler.transform(X)

Again we must invert the scale on forecasts to return the values back to the original scale.

invert transform
inverted_X = scaler.inverse_transform(scaled_X)

It's time to deploy LSTM.

By default, an LSTM layer in Keras maintains a state between data within one batch. A batch of data is a fixed-sized number of rows from the training dataset that defines how many patterns to process before updating the weights of the network. By default State in the LSTM layer between batches is cleared. Therefore we must make the LSTM stateful. This gives us fine-grained control over when the state of the LSTM layer is cleared, with the reset_states() function.

LSTM network expects the input data(X) to be [samples, time steps, features] format.

X = X.reshape(X.shape[0], 1, X.shape[1])

We will use Sequential API to define the network. The shape of the input data must be specified in the LSTM layer using the “batch_input_shape” argument as a tuple that specifies the expected number of observations to reach each batch, the number of time steps, and the number of features.

And the number of neurons also called memory units or blocks. Then we have 1 output layer Dense(1). In the compiling network, we must specify a loss function and optimization algorithm to calculate the loss and weight.

model = Sequential()
model.add(LSTM(neurons, batch_input_shape=(batch_size, X.shape[1], X.shape[2]), stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')

Once compiled we must control when the internal state is reset because the network is stateful. We must manually manage the training process one epoch at a time across the desired number of epochs.

By default, the samples within an epoch are shuffled prior to being exposed to the network and again this is undesirable for the LSTM because we want the network to build up state as it learns across the sequences of observations.

So we will disable the shuffling of samples by settings “shuffle” to “False”

We will also reset the internal state at the end of the training epoch, ready for the next training iteration.

for i in range(nb_epoch):
model.fit(X, y, epochs=1, batch_size=batch_size, verbose=0, shuffle=False)
model.reset_states()

The batch_size must be set to 1. Even also in the predict() function of the model must be set to 1 because we are interested in making one-step forecasts on the test data.

As we remember during training our model the internet state is reset after each epoch. While forecasting we will not reset the internal state between forecasts. In fact, we would like the model to build up state as we forecast each time step in the test dataset.

If you are new to LSTM and still confused about how LSTM works follow the link Illustrated Guide to LSTM’s and GRU’s: A step by step explanation for a clear explanation of the workflow of LSTM.

Now Let’s put all of the pieces together.

from pandas import DataFrame
from pandas import Series
from pandas import concat
from pandas import read_csv
from pandas import datetime
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from math import sqrt
from matplotlib import pyplot
import numpy
#supervised learning function
def timeseries_to_supervised(data, lag=1):
df = DataFrame(data)
columns = [df.shift(i) for i in range(1, lag+1)]
columns.append(df)
df = concat(columns, axis=1)
df.fillna(0, inplace=True)
return df

#create a difference series
def difference(dataset, interval=1):
diff = list()
for i in range(interval, len(dataset)):
value = dataset[i] - dataset[i - interval]
diff.append(value)
return Series(diff)

#invert difference value
def inverse_difference(history, yhat, interval=1):
return yhat + history[-interval]

#scale train and test data to [-1, 1]
def scale(train, test):
# fit scaler
scaler = MinMaxScaler(feature_range=(-1, 1))
scaler = scaler.fit(train)
# transform train
train = train.reshape(train.shape[0], train.shape[1])
train_scaled = scaler.transform(train)
# transform test
test = test.reshape(test.shape[0], test.shape[1])
test_scaled = scaler.transform(test)
return scaler, train_scaled, test_scaled

#inverse scaling for the forecast value
def invert_scale(scaler, X, value):
new_row = [x for x in X] + [value]
array = numpy.array(new_row)
array = array.reshape(1, len(array))
inverted = scaler.inverse_transform(array)
return inverted[0, -1]

#fit an LSTM network to training data
def fit_lstm(train, batch_size, nb_epoch, neurons):
X, y = train[:, 0:-1], train[:, -1]
X = X.reshape(X.shape[0], 1, X.shape[1])
model = Sequential()
model.add(LSTM(neurons, batch_input_shape=(batch_size, X.shape[1], X.shape[2]), stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
for i in range(nb_epoch):
model.fit(X, y, epochs=1, batch_size=batch_size, verbose=1, shuffle=False)
model.reset_states()
return model

#make a one-step forecast
def forecast_lstm(model, batch_size, X):
X = X.reshape(1, 1, len(X))
yhat = model.predict(X, batch_size=batch_size)
return yhat[0,0]

#load dataset
series = read_csv('sales_year.csv', header=0, parse_dates=[0], index_col=0, squeeze=True)

#transform data to be stationary
raw_values = series.values
diff_values = difference(raw_values, 1)

#transform data to be supervised learning
supervised = timeseries_to_supervised(diff_values, 1)
supervised_values = supervised.values

#split data into train and test-sets
train, test = supervised_values[0:-12], supervised_values[-12:]

#transform the scale of the data
scaler, train_scaled, test_scaled = scale(train, test)

#fit the model
lstm_model = fit_lstm(train_scaled, 1, 3000, 4)
#forecast the entire training dataset to build up state for forecasting
train_reshaped = train_scaled[:, 0].reshape(len(train_scaled), 1, 1)
lstm_model.predict(train_reshaped, batch_size=1)

#walk-forward validation on the test data
predictions = list()
for i in range(len(test_scaled)):
#make one-step forecast
X, y = test_scaled[i, 0:-1], test_scaled[i, -1]
yhat = forecast_lstm(lstm_model, 1, X)
#invert scaling
yhat = invert_scale(scaler, X, yhat)
#invert differencing
yhat = inverse_difference(raw_values, yhat, len(test_scaled)+1-i)
#store forecast
predictions.append(yhat)
expected = raw_values[len(train) + i + 1]
print('Month=%d, Predicted=%f, Expected=%f' % (i+1, yhat, expected))

#report performance #raw_values[-12,] refers last 12 months/rows
rmse = sqrt(mean_squared_error(raw_values[-12:], predictions))
print('Test RMSE: %.3f' % rmse)
#line plot of observed vs predicted
pyplot.plot(raw_values[-12:])
pyplot.plot(predictions)
pyplot.show()

Well, we can observe it pretty close, our predicted value with the actual values. We can also try with a different set of settings to optimize the model accuracy. Check out another article where I have applied simple LSTM with optimized settings ‘LSTMs for regression

Next, we will try Multivariate LSTM for time series.

I hope you enjoyed it.

Some of my alternative internet presences are Facebook, Instagram, Udemy, Blogger, Issuu, and more.

Also available on Quora @ https://www.quora.com/profile/Rupak-Bob-Roy

https://www.quora.com/profile/Rupak-Bob-Roy
https://www.quora.com/profile/Rupak-Bob-Roy

Have a good day!

Pexel
Pexel

--

--

Rupak (Bob) Roy - II
Rupak (Bob) Roy - II

Written by Rupak (Bob) Roy - II

Things i write about frequently on Medium: Data Science, Machine Learning, Deep Learning, NLP and many other random topics of interest. ~ Let’s stay connected!