Time Series Analysis
Notes from the Complete Guide on Time Series Analysis in Python by Prashant Banerjee
Introduction
Time series analysis analyzes data over time to gain insights into the data in ways such as:
- Forecasting
- Signal Processing
- Pattern Recognition
Components of Time Series Data
- Trends - ancreasing, decreasing, or horizontal (stationary)
- Seasonality - a trend that repeats over time
- Cyclical Component - don't necessarily have repeating trends but relate to actual correlations based on the nature of the time series data
- Irregular variation - Fluctuations that become visible when trends and cyclical variation is removed - may or may not be random
- ETS Decomposition - Error, Trend, and Seasonality - separate components of a time series
Types of Data
- Time series data - data collected over points in time
- Cross sectional data - Data of one or more variables recorded at the same point in time
- Pooled data - combination of the above
Terminology
- Dependence - association of two observations of the same variable
- Stationarity - mean value remains constant over time
- Differencing - make series stationary to control for auto-correlations
- Specification - testing the relationhips of variables
- Exponential smoothing - method used for short term predictions
- Curve fitting - regression done when data is non-linear
- ARIMA - Auto Regressive Integrated Moving Average
Patterns in Time Series
- A Trend is observed when there is an increasing or decreasing slope
- A Seasonality is observerd when there is a distinct repeated pattern at a specific frequency
- Cyclic behaviour occurs when the rise and fall is not a fixed frequency, this is different to seasonality
Not all data will have a trend and seasonality but should usually have one
Additive and Multiplicative Time Series
- Additive -
- Multiplicative =
Decomposition of a Time Series
Decomposition can be performed by considering the series asneither additive or multiplicative
Decomposition can be done using statsmodels like so:
from statsmodels.tsa.seasonal import seasonal_decompose
# For a monthly decomposition (period = 30)
multiplicative_decomposition = seasonal_decompose(df['Number of Passengers'], model='multiplicative', period=30)
additive_decomposition = seasonal_decompose(df['Number of Passengers'], model='additive', period=30)
The above can also be plotted using the plot method defined:
multiplicative_decomposition.plot()
additive_decomposition.plot()
In the above series, comparing the additive and multiplicative residuals we can see that the additive one has some pattern left over whereas the risidual in the multiplicative is quite small and random, this tells us that the multiplicative decomposition is more appliccable to the series
Stationary and Non-Stationary Time Series
A stationary series is one that is not a function of time, so values are independant of time
Statistical properties like mean, variance, and autocorrelation are constant over time - Auticorelation is a correlation of the series when compared to previous values
Stationary serieses are independant of seasonal effects
Below is a comparison of some stationary and non-stationary time series:
Making a Series Stationary
There are a few methods for making a series stationary
- Differencing
- Taking the Log
- Take the nth root
- Combination of the above
Differencing
Differencing is simply subtracting the previous value from the current value
The first difference may not make the series stationary, we can take further differences
Why Convert a Non-Stationary Series into a Stationary One Before Forecasting
- Forecasting a stationary series is relatively easy and moe reliable
- Autoregressive forecasting models are essentially linear regression models
- Stationarizing a series removes any persistent autocorrelation and makes predictors of the series nearly independent
Testing For Stationarity
- Look at a plot
- SPlit series into 2 or more contiguous parts and compute summary statistics (mean, variance, autocorrelation) - if these are similar then the series is likely stationary
- Unit root tests can be done on the series:
- Augmented Dickey Fuller (ADF) test
- Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test (trend stationary)
- Philips Perron (PP) test
Stationarity vs White Noise
White noise is not a function of time and also has a mean and variance that does not change over time
The difference between white noise and a stationarity is that white noise does not contain any resulting pattern
Detrend a Time Series
Detrending a time series means to remove the trend component which can be done using the following methods:
- Subtract the line of best fit
- Subtract the trend obtained from decomposition
- Subtract the mean
- Apply a filter like the Baxter-King filter (
statsmodels.tsa.filters.bkfilter
) or the Hodrick-Prescott Filter (statsmodels.tsa.filters.hpfilter
)
Subtract the line of best fit
from scipy import signal
detrended = signal.detrend(df['Number of Passengers'].values)
Subtract the decomposition trend
from statsmodels.tsa.seasonal import seasonal_decompose
result_mul = seasonal_decompose(df['Number of Passengers'], model='multiplicative', period=30)
detrended = df['Number of Passengers'].values - result_mul.trend
Deseasonalize a Time Series
Some approaches for deseasonalizing a time series are as follows:
- Take a moving average with the length of the seasonal window
- Seasonal difference the series - subtract the previous season from the current one
- Divide the series by the seasonal index from the STL decomposition
If dividing does not work well we can also take a log of the series and then resotre by taking an exponential
# Time Series Decomposition
result_mul = seasonal_decompose(df['Number of Passengers'], model='multiplicative', period=30)
# Deseasonalize
deseasonalized = df['Number of Passengers'].values / result_mul.seasonal
Testing for Seasonality
To test for seasonality it can be simplest to plot the data, but if we want ot inspect this more specifically we can use an Autocorrelation Function (ACF) plot - if there is a strong seasonal pattern the ACF plot will shor repeated spikes at multiples of the seasonal window
Alternatively, a CHTest can also be used to determine if seasonal differencing is required
Autocorrelation and Partial Autocorrelation Functions
- Autocorrelation is a correlation of a series with its own lag. If ia series is significantly autocorrelated then it means that a previous series can help predict current value
- Partial Autocorrelation is a pure correlation of a series without contribution from intermediate lags
Autocorrelationa and Partial Autocorrelation can be found using statsmodels with the following code:
from statsmodels.tsa.stattools import acf, pacf
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
# Draw Plot
fig, axes = plt.subplots(1,2,figsize=(16,3), dpi= 100)
plot_acf(df['Number of Passengers'].tolist(), lags=50, ax=axes[0])
plot_pacf(df['Number of Passengers'].tolist(), lags=50, ax=axes[1])
Lag Plots
A lag plot is a scatter plot of a time series against a lag of itself and is used to check for autocorrelation. If there is any pattern in the series then the series is autocorrelated - if there is no pattern thatn the series is likel to be random
Granger Causality Test
Used to determine if one time series will be used to forecast another, it's based on the idea that if X causes Y then forecast on Y based on previous values of X should outperform a forecast using only previuos values of Y
Smoothening a Time Series
Smoothening can be useful to:
- Reduce effect of noise
- Smootheened data can be used as a feature to explain the original series
- Visualize underlying trends
Some smoothening methods are:
- Take a moving average
- Do a LOESS smoothening (Localized regression)
- Do a LOWESS smoothening (Locally weighted regression)
Moving Average
An average of the rolling window, a large window will over-smooth a series
Localized regression
LOESS fits multiple regressions in the local neighborhood of each point