Missing Value Imputation for Time Series

Rupak (Bob) Roy - II
3 min readDec 20, 2021

--

via proper methods like interpolation

Hi everyone how are you feeling today? Good likewise? Great

Just like before, today I will introduce you to one of the proper ways to impute missing values for Time Series* walla aa!

Traditionally we impute missing values using mean and median. And as we are aware that time series is numeric in nature so I didn't take the ‘mode’

Our Traditional Method:

#get the skewnessfrom scipy.stats import skew
df["column_name"].skew(nan_policy='omit')
df["column_name"].fillna(df["column_name"].mean())
df["column_name"].fillna(df["column_name"].mode())

and other methods include Outlier detection via Isolation forest, Quantile Method, Imputer, and so on.

Today we will see how to impute missing values for data having sequential properties throughout time and variance that is time series.

To get started let’s start with imputing Na’s in Series

#Using Interpolation to fill Missing Values in Series Data
import pandas as pd
import numpy as np
#Create an empty Series
a = pd.Series([0, 1, np.nan, 3, 4, 5, 7])
#1.Linear Interpolation
a.interpolate()
#2.Polynomial Interpolation
a.interpolate(method="polynomial", order=2)
#3 Interpolation through Padding - imputing missing values with the #same value present previously, however this method will not work if #if there is a missing value in the first row.
a.interpolate(method="pad", limit=2)

So, we did ‘interpolation’, let’s do the same for a DataFrame.

import pandas as pd
#Create dataframe with missing values
df = pd.DataFrame({"A":[12, 4, 7, None, 2],
"B":[None, 3, 57, 3, None],
"C":[20, 16, None, 3, 8],
"D":[14, 3, None, None, 6]})
  1. ) Linear Interpolate — just like we did in linear regression, we have a linear line to find the best fit, here too finds the best point to fit the missing value between two points A — ??— — C. Linear interpolation works in two directions Forward and Backward (Bottom to top approach). And obviously, remember if our first index is a missing value then it will leave it as Nan for Forward and vice versa for Backward direction
df.interpolate(method ='linear', limit_direction ='forward')df.interpolate(method ='linear', limit_direction ='backward')
#for a single column
df['C'].interpolate(method="linear", limit_direction ='forward',inplace=True)

2.) Interpolation with padding — Fill in NaNs using existing values. We need to define the limit of NaNs to be filled.

df.interpolate(method="pad", limit=2)
df.interpolate(method="pad", limit=1)

You can clearly see the difference!

3.) Interpolation with ‘time’ — works on daily and higher resolution data to interpolate given length of interval

4.) Interpolation with ‘index’, ‘Values’ — use the actual numerical values of the index

5.) And the other Interpolation methods ‘nearest’, ‘zero’, ‘quadratic’, ‘cubic’, ‘spline’, ‘barycentric’.

That’s it, we are done.

I hope you enjoyed it likewise i tried to keep my stuff short and simple to the point ya! marksman :)

I will try to bring more interesting topics from across debugged with my intuition, next we will move on with a powerful Genetic based algorithm for Time Series. See you then, Cao.

If you wish to explore more about new ways of doing data science follow my other articles.

Some of my alternative internet presences Facebook, Instagram, Udemy, Blogger, Issuu, and more.

Also available on Quora @ https://www.quora.com/profile/Bob-Rupak-Roy

https://www.quora.com/profile/Rupak-Bob-Roy
https://www.quora.com/profile/Rupak-Bob-Roy

Have a good day, Talk Soon.

pexel

--

--

Rupak (Bob) Roy - II
Rupak (Bob) Roy - II

Written by Rupak (Bob) Roy - II

Things i write about frequently on Medium: Data Science, Machine Learning, Deep Learning, NLP and many other random topics of interest. ~ Let’s stay connected!

No responses yet