Time Series Classification

View this Notebook on Kaggle

Time Series Classification with SKTime

SKTime Docs

SKTime Tasks

Resources

!pip install sktime
Collecting sktime
  Downloading sktime-0.13.4-py3-none-any.whl (7.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.0/7.0 MB 36.0 MB/s eta 0:00:00
Requirement already satisfied: pandas<1.6.0,>=1.1.0 in /opt/conda/lib/python3.7/site-packages (from sktime) (1.3.5)
Requirement already satisfied: numpy<1.23,>=1.21.0 in /opt/conda/lib/python3.7/site-packages (from sktime) (1.21.6)
Requirement already satisfied: scikit-learn<1.2.0,>=0.24.0 in /opt/conda/lib/python3.7/site-packages (from sktime) (1.0.2)
Requirement already satisfied: statsmodels>=0.12.1 in /opt/conda/lib/python3.7/site-packages (from sktime) (0.13.2)
Collecting deprecated>=1.2.13
  Downloading Deprecated-1.2.13-py2.py3-none-any.whl (9.6 kB)
Requirement already satisfied: scipy<1.9.0 in /opt/conda/lib/python3.7/site-packages (from sktime) (1.7.3)
Requirement already satisfied: numba>=0.53 in /opt/conda/lib/python3.7/site-packages (from sktime) (0.55.2)
Requirement already satisfied: wrapt<2,>=1.10 in /opt/conda/lib/python3.7/site-packages (from deprecated>=1.2.13->sktime) (1.12.1)
Requirement already satisfied: llvmlite<0.39,>=0.38.0rc1 in /opt/conda/lib/python3.7/site-packages (from numba>=0.53->sktime) (0.38.1)
Requirement already satisfied: setuptools in /opt/conda/lib/python3.7/site-packages (from numba>=0.53->sktime) (59.8.0)
Requirement already satisfied: python-dateutil>=2.7.3 in /opt/conda/lib/python3.7/site-packages (from pandas<1.6.0,>=1.1.0->sktime) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /opt/conda/lib/python3.7/site-packages (from pandas<1.6.0,>=1.1.0->sktime) (2022.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from scikit-learn<1.2.0,>=0.24.0->sktime) (3.1.0)
Requirement already satisfied: joblib>=0.11 in /opt/conda/lib/python3.7/site-packages (from scikit-learn<1.2.0,>=0.24.0->sktime) (1.0.1)
Requirement already satisfied: patsy>=0.5.2 in /opt/conda/lib/python3.7/site-packages (from statsmodels>=0.12.1->sktime) (0.5.2)
Requirement already satisfied: packaging>=21.3 in /opt/conda/lib/python3.7/site-packages (from statsmodels>=0.12.1->sktime) (21.3)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.7/site-packages (from packaging>=21.3->statsmodels>=0.12.1->sktime) (3.0.9)
Requirement already satisfied: six in /opt/conda/lib/python3.7/site-packages (from patsy>=0.5.2->statsmodels>=0.12.1->sktime) (1.15.0)
Installing collected packages: deprecated, sktime
Successfully installed deprecated-1.2.13 sktime-0.13.4
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

Methodology

Using sktime for classification is similar to using it for forecasting wherein there are either predefined models or we can transform exising sklearn models to make them usable with time series data

Importing Data

We can import the arrow head dataset and graph some of the entries

import pandas as pd

from sktime.datasets import load_arrow_head
from sktime.utils.plotting import plot_series
from sklearn.model_selection import train_test_split
X, y = load_arrow_head()
X.head()
dim_0
0 0 -1.963009 1 -1.957825 2 -1.95614...
1 0 -1.774571 1 -1.774036 2 -1.77658...
2 0 -1.866021 1 -1.841991 2 -1.83502...
3 0 -2.073758 1 -2.073301 2 -2.04460...
4 0 -1.746255 1 -1.741263 2 -1.72274...
y[:5]
array(['0', '1', '2', '0', '1'], dtype='<U1')
X_0 = list(X['dim_0'][0])
plot_series(pd.Series(X_0))
(<Figure size 1152x288 with 1 Axes>, <AxesSubplot:>)
X_1 = list(X['dim_0'][1])
plot_series(pd.Series(X_1))
(<Figure size 1152x288 with 1 Axes>, <AxesSubplot:>)

Train/Test Split

Train/Test splitting can be cone using sklearn as normal since each row is a different series/observation

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y)

Using a Classifier

sktime has built in classifiers that can be used as normal sklearn classifiers:

from sktime.classification.interval_based import TimeSeriesForestClassifier
classifier = TimeSeriesForestClassifier()
classifier.fit(X_train, y_train)
TimeSeriesForestClassifier()

And predictions can be made using the predict method:

y_pred = classifier.predict(X_test)

Model Evaluation

We can also check the accuracy using normal sklearn metrics, for example accuracy_score

from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_pred)
0.9056603773584906
from sklearn.metrics import confusion_matrix
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

matrix = confusion_matrix(y_test, y_pred)
disp = ConfusionMatrixDisplay(matrix)
disp.plot()
<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at 0x7f826205ddd0>

Use with SKLearn Classifiers

sktime also allows the conversion of data such that it can be used with sklearn tabular classifiers. This is done by transforming the classifier using the Tabularizer in a sklearn pipeline

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.pipeline import make_pipeline

from sktime.transformations.panel.reduce import Tabularizer
classifier = make_pipeline(Tabularizer(), GradientBoostingClassifier())
classifier.fit(X_train, y_train)
Pipeline(steps=[('tabularizer', Tabularizer()),
                ('gradientboostingclassifier', GradientBoostingClassifier())])
y_pred = classifier.predict(X_test)
accuracy_score(y_test, y_pred)
0.9056603773584906
matrix = confusion_matrix(y_test, y_pred)
disp = ConfusionMatrixDisplay(matrix)
disp.plot()
<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at 0x7f8261ebb810>