Time Series Analysis

Notes from the Complete Guide on Time Series Analysis in Python by Prashant Banerjee

Introduction

Time series analysis analyzes data over time to gain insights into the data in ways such as:

  • Forecasting
  • Signal Processing
  • Pattern Recognition

Components of Time Series Data

  • Trends - ancreasing, decreasing, or horizontal (stationary)
  • Seasonality - a trend that repeats over time
  • Cyclical Component - don't necessarily have repeating trends but relate to actual correlations based on the nature of the time series data
  • Irregular variation - Fluctuations that become visible when trends and cyclical variation is removed - may or may not be random
  • ETS Decomposition - Error, Trend, and Seasonality - separate components of a time series

Types of Data

  • Time series data - data collected over points in time
  • Cross sectional data - Data of one or more variables recorded at the same point in time
  • Pooled data - combination of the above

Terminology

  • Dependence - association of two observations of the same variable
  • Stationarity - mean value remains constant over time
  • Differencing - make series stationary to control for auto-correlations
  • Specification - testing the relationhips of variables
  • Exponential smoothing - method used for short term predictions
  • Curve fitting - regression done when data is non-linear
  • ARIMA - Auto Regressive Integrated Moving Average

Patterns in Time Series

  • A Trend is observed when there is an increasing or decreasing slope
  • A Seasonality is observerd when there is a distinct repeated pattern at a specific frequency
  • Cyclic behaviour occurs when the rise and fall is not a fixed frequency, this is different to seasonality

Not all data will have a trend and seasonality but should usually have one

Additive and Multiplicative Time Series

  • Additive - Value=Base+Trend+Seasonality+ErrorValue = Base + Trend + Seasonality + Error
  • Multiplicative = Value=BasexTrendxSeasonalityxErrorValue = Base x Trend x Seasonality x Error

Decomposition of a Time Series

Decomposition can be performed by considering the series asneither additive or multiplicative

Decomposition can be done using statsmodels like so:

from statsmodels.tsa.seasonal import seasonal_decompose

# For a monthly decomposition (period = 30)
multiplicative_decomposition = seasonal_decompose(df['Number of Passengers'], model='multiplicative', period=30)

additive_decomposition = seasonal_decompose(df['Number of Passengers'], model='additive', period=30)

The above can also be plotted using the plot method defined:

multiplicative_decomposition.plot()
additive_decomposition.plot()

Multiplicative Decomposition

Additive Decomposition

In the above series, comparing the additive and multiplicative residuals we can see that the additive one has some pattern left over whereas the risidual in the multiplicative is quite small and random, this tells us that the multiplicative decomposition is more appliccable to the series

Stationary and Non-Stationary Time Series

A stationary series is one that is not a function of time, so values are independant of time

Statistical properties like mean, variance, and autocorrelation are constant over time - Auticorelation is a correlation of the series when compared to previous values

Stationary serieses are independant of seasonal effects

Below is a comparison of some stationary and non-stationary time series:

Stationary and Non-Stationary Time Series

Making a Series Stationary

Why does a time series have to be stationary?

There are a few methods for making a series stationary

  • Differencing
  • Taking the Log
  • Take the nth root
  • Combination of the above

Differencing

Differencing is simply subtracting the previous value from the current value

The first difference may not make the series stationary, we can take further differences

Why Convert a Non-Stationary Series into a Stationary One Before Forecasting

  • Forecasting a stationary series is relatively easy and moe reliable
  • Autoregressive forecasting models are essentially linear regression models
  • Stationarizing a series removes any persistent autocorrelation and makes predictors of the series nearly independent

Testing For Stationarity

  • Look at a plot
  • SPlit series into 2 or more contiguous parts and compute summary statistics (mean, variance, autocorrelation) - if these are similar then the series is likely stationary
  • Unit root tests can be done on the series:
    • Augmented Dickey Fuller (ADF) test
    • Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test (trend stationary)
    • Philips Perron (PP) test

Stationarity vs White Noise

White noise is not a function of time and also has a mean and variance that does not change over time

The difference between white noise and a stationarity is that white noise does not contain any resulting pattern

Detrend a Time Series

Detrending a time series means to remove the trend component which can be done using the following methods:

  • Subtract the line of best fit
  • Subtract the trend obtained from decomposition
  • Subtract the mean
  • Apply a filter like the Baxter-King filter (statsmodels.tsa.filters.bkfilter) or the Hodrick-Prescott Filter (statsmodels.tsa.filters.hpfilter)

Subtract the line of best fit

from scipy import signal

detrended = signal.detrend(df['Number of Passengers'].values)

Subtract the decomposition trend

from statsmodels.tsa.seasonal import seasonal_decompose
result_mul = seasonal_decompose(df['Number of Passengers'], model='multiplicative', period=30)
detrended = df['Number of Passengers'].values - result_mul.trend

Deseasonalize a Time Series

Some approaches for deseasonalizing a time series are as follows:

  • Take a moving average with the length of the seasonal window
  • Seasonal difference the series - subtract the previous season from the current one
  • Divide the series by the seasonal index from the STL decomposition

If dividing does not work well we can also take a log of the series and then resotre by taking an exponential

# Time Series Decomposition
result_mul = seasonal_decompose(df['Number of Passengers'], model='multiplicative', period=30)

# Deseasonalize
deseasonalized = df['Number of Passengers'].values / result_mul.seasonal

Testing for Seasonality

To test for seasonality it can be simplest to plot the data, but if we want ot inspect this more specifically we can use an Autocorrelation Function (ACF) plot - if there is a strong seasonal pattern the ACF plot will shor repeated spikes at multiples of the seasonal window

Alternatively, a CHTest can also be used to determine if seasonal differencing is required

Autocorrelation and Partial Autocorrelation Functions

  • Autocorrelation is a correlation of a series with its own lag. If ia series is significantly autocorrelated then it means that a previous series can help predict current value
  • Partial Autocorrelation is a pure correlation of a series without contribution from intermediate lags

Autocorrelationa and Partial Autocorrelation can be found using statsmodels with the following code:

from statsmodels.tsa.stattools import acf, pacf
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# Draw Plot
fig, axes = plt.subplots(1,2,figsize=(16,3), dpi= 100)
plot_acf(df['Number of Passengers'].tolist(), lags=50, ax=axes[0])
plot_pacf(df['Number of Passengers'].tolist(), lags=50, ax=axes[1])

Lag Plots

A lag plot is a scatter plot of a time series against a lag of itself and is used to check for autocorrelation. If there is any pattern in the series then the series is autocorrelated - if there is no pattern thatn the series is likel to be random

Granger Causality Test

Used to determine if one time series will be used to forecast another, it's based on the idea that if X causes Y then forecast on Y based on previous values of X should outperform a forecast using only previuos values of Y

Smoothening a Time Series

Smoothening can be useful to:

  • Reduce effect of noise
  • Smootheened data can be used as a feature to explain the original series
  • Visualize underlying trends

Some smoothening methods are:

  • Take a moving average
  • Do a LOESS smoothening (Localized regression)
  • Do a LOWESS smoothening (Locally weighted regression)

Moving Average

An average of the rolling window, a large window will over-smooth a series

Localized regression

LOESS fits multiple regressions in the local neighborhood of each point