Yes, our favorite fbprophet is back with multivariate forecasting.
The next best alternative to Multivariate LSTM Time Series Forecasting.
Hi there, we are back again with a new topic across the globe. Keeping it short and simple. This time i will demonstrate to you how to perform multivariate time series forecasting using our lightning-fast fbprophet approach.
Yes, you heard it right, we can now perform Multivariate Forecasting with fbprophet.
Let’s get started Shall we?
Here is the dataset: https://www.kaggle.com/datasets/rupakroy/stock-trading-data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from fbprophet import Prophetdf = pd.read_csv("stock_trading_data.csv")
#we will take only 5 columns
df = df.iloc[:,:5]#fixed the datetime column
df["Date"] = pd.to_datetime(df["Date"])
Now we will divide the dataset into 2 parts Train and Test 70:30 ratio
#Divide the data into train and test
test_size =np.round(df.shape[0] *30/100).astype(int)
df1_train = df.iloc[test_size:,:]
df1_test = df.iloc[:test_size,:]df1_train.dtypes
#let’s plot the graph and try to visualize for all columns based on date
plt.figure(figsize=(10,8))
figure,axes = plt.subplots(nrows=2,ncols=2)
axes[0,0].plot(df1_train["Date"],df1_train["Open"],label="Open")
axes[0,1].plot(df1_train["Date"],df1_train["High"],label="High")
axes[1,0].plot(df1_train["Date"],df1_train["Low"],label="Low")
axes[1,1].plot(df1_train["Date"],df1_train["Close"],label="Close")
plt.legend()
as you are aware that we need the dataset to be in ‘ds’ i.e. Date and ‘y’ i.e. Target, ds,y format.
#prepare the dateset for FBprophet
df1_train.rename(columns={"Open":'y',"Date":'ds'},inplace=True)
df1_train.head(5)
Now to time to apply the Magic!
model = Prophet(interval_width=0.9)
model.add_regressor('High',standardize=False)
model.add_regressor('Low',standardize=False)
model.add_regressor('Close', standardize=False)
model.fit(df1_train)
Done!… we all just need to add the multivariate columns as add_regressor, that's it.
If you remember the formula of the regressor
y = mx+ c, where m = beta coefficients, x = x1, c = intercept
And if you remember how time series work, using Autoregressor
i.e. X(t+1) = c +m1*(t-1) + m1*(t-2)……m2*(t-1)+m2*(t-2) so on and so forth.
Autoregressor itself is multivariate in the sense it computes t+1 with its lag version of itself and this is how time series works! now adding results of each different multiple variables like ‘High’, ‘Low’, ‘Close’ we can compute multivariate time series
#To view the model parameters
model.params
We will replicate the training dataset without ‘y’ target column and see how the model is able to forecast using its internal regressor engine.
#understanding the model fit---------df1_train_2 = df1_train[["ds","High","Low","Close"]]
#we will be predicting 'y' i.e."Open"forecast1_train = model.predict(df1_train_2)
forecast1_train = forecast1_train[['ds','yhat']]df_model_fit = pd.concat((forecast1_train['yhat'],df1_train.reset_index()),axis=1)
Merge both the dataset predicted/fitted vs actual and plot the ‘y’ vs ‘yhat’
#Visualize it
plt.figure(figsize=(8,6))
plt.plot(df_model_fit['ds'],df_model_fit['y'],color='red',label='actual')
plt.plot(df_model_fit['ds'],df_model_fit['yhat'],color='blue',label='Forecasted')
plt.legend()
Looks great!
Let's try the same with test/unseen dataset
#create an test dataframe
df1_test.rename(columns={"Open":'y',"Date":'ds'},inplace=True)
df1_test.head(5)df1_test_2 = df1_test[["ds","High","Low","Close"]] #we will be predicting 'y'i.e."Open"
df1_test_2forecast1_test=model.predict(df1_test_2)
forecast1_test = forecast1_test[['ds','yhat']]df_testdata_fit = pd.concat((forecast1_test['yhat'],df1_test.reset_index()),axis=1).reset_index()#Visualize it
plt.figure(figsize=(8,6))
plt.plot(df_testdata_fit['ds'],df_testdata_fit['y'],color='red',label='actual')
plt.plot(df_testdata_fit['ds'],df_testdata_fit['yhat'],color='blue',label='Forecasted')
plt.legend()
Seems the results are improving!
But wait this is done with default fbProphet settings without proper parameter tunning and another way to improve is the check for any gaps(Na’s) in the timeline if so then reframe the dataset using
df.resample(“30min”,on = “DateColumnName”).Duration.mean().reset_index()
Repo Link to resampling: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html
Even though taking a proper time frame is necessary for time series analysis, there are tons of ways to achieve proper accuracy.
The objective of the article to demonstrate how to apply multivariate forecasting in faster way than our traditional LSTM approach. I hope you don't mind not tunning the model.
#make futuredate Approach--------------------
future = model.make_future_dataframe(periods=377)
future.tail()
future_prediction = model.predict(future)
We are also aware of ‘make_future_dataframe’ approach in fbprophet but will not work it will throw an error of missing other independent variables because the ‘make_future_dataframe’ will not have the independent variables’. Thus….
Putting all of the pieces together.
I hope you find this article useful for your machine learning and statistical use cases. Likewise, i will try to bring new ways across with the motto “curiosity leads to innovation” :)
Check out the kaggle implementation: https://www.kaggle.com/rupakroy/multivariate-timeseries-fbprophet
Github Repo: https://github.com/rupak-roy/Multivariate-Facebook-Prophet-Time-Series-Forecasting-Template
Thanks again, for your time, if you enjoyed this short article there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Some of my alternative internet presences are Facebook, Instagram, Udemy, Blogger, Issuu, Slideshare, Scribd and more.
Also available on Quora @ https://www.quora.com/profile/Rupak-Bob-Roy
Let me know if you need anything. Talk Soon.