Time Series Classification
Time Series Classification with SKTime
Resources
!pip install sktime
Collecting sktime Downloading sktime-0.13.4-py3-none-any.whl (7.0 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.0/7.0 MB 36.0 MB/s eta 0:00:00 Requirement already satisfied: pandas<1.6.0,>=1.1.0 in /opt/conda/lib/python3.7/site-packages (from sktime) (1.3.5) Requirement already satisfied: numpy<1.23,>=1.21.0 in /opt/conda/lib/python3.7/site-packages (from sktime) (1.21.6) Requirement already satisfied: scikit-learn<1.2.0,>=0.24.0 in /opt/conda/lib/python3.7/site-packages (from sktime) (1.0.2) Requirement already satisfied: statsmodels>=0.12.1 in /opt/conda/lib/python3.7/site-packages (from sktime) (0.13.2) Collecting deprecated>=1.2.13 Downloading Deprecated-1.2.13-py2.py3-none-any.whl (9.6 kB) Requirement already satisfied: scipy<1.9.0 in /opt/conda/lib/python3.7/site-packages (from sktime) (1.7.3) Requirement already satisfied: numba>=0.53 in /opt/conda/lib/python3.7/site-packages (from sktime) (0.55.2) Requirement already satisfied: wrapt<2,>=1.10 in /opt/conda/lib/python3.7/site-packages (from deprecated>=1.2.13->sktime) (1.12.1) Requirement already satisfied: llvmlite<0.39,>=0.38.0rc1 in /opt/conda/lib/python3.7/site-packages (from numba>=0.53->sktime) (0.38.1) Requirement already satisfied: setuptools in /opt/conda/lib/python3.7/site-packages (from numba>=0.53->sktime) (59.8.0) Requirement already satisfied: python-dateutil>=2.7.3 in /opt/conda/lib/python3.7/site-packages (from pandas<1.6.0,>=1.1.0->sktime) (2.8.2) Requirement already satisfied: pytz>=2017.3 in /opt/conda/lib/python3.7/site-packages (from pandas<1.6.0,>=1.1.0->sktime) (2022.1) Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from scikit-learn<1.2.0,>=0.24.0->sktime) (3.1.0) Requirement already satisfied: joblib>=0.11 in /opt/conda/lib/python3.7/site-packages (from scikit-learn<1.2.0,>=0.24.0->sktime) (1.0.1) Requirement already satisfied: patsy>=0.5.2 in /opt/conda/lib/python3.7/site-packages (from statsmodels>=0.12.1->sktime) (0.5.2) Requirement already satisfied: packaging>=21.3 in /opt/conda/lib/python3.7/site-packages (from statsmodels>=0.12.1->sktime) (21.3) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.7/site-packages (from packaging>=21.3->statsmodels>=0.12.1->sktime) (3.0.9) Requirement already satisfied: six in /opt/conda/lib/python3.7/site-packages (from patsy>=0.5.2->statsmodels>=0.12.1->sktime) (1.15.0) Installing collected packages: deprecated, sktime Successfully installed deprecated-1.2.13 sktime-0.13.4 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Methodology
Using sktime
for classification is similar to using it for forecasting wherein there are either predefined models or we can transform exising sklearn
models to make them usable with time series data
Importing Data
We can import the arrow head dataset and graph some of the entries
import pandas as pd
from sktime.datasets import load_arrow_head
from sktime.utils.plotting import plot_series
from sklearn.model_selection import train_test_split
X, y = load_arrow_head()
X.head()
dim_0 | |
---|---|
0 | 0 -1.963009 1 -1.957825 2 -1.95614... |
1 | 0 -1.774571 1 -1.774036 2 -1.77658... |
2 | 0 -1.866021 1 -1.841991 2 -1.83502... |
3 | 0 -2.073758 1 -2.073301 2 -2.04460... |
4 | 0 -1.746255 1 -1.741263 2 -1.72274... |
y[:5]
array(['0', '1', '2', '0', '1'], dtype='<U1')
X_0 = list(X['dim_0'][0])
plot_series(pd.Series(X_0))
(<Figure size 1152x288 with 1 Axes>, <AxesSubplot:>)
X_1 = list(X['dim_0'][1])
plot_series(pd.Series(X_1))
(<Figure size 1152x288 with 1 Axes>, <AxesSubplot:>)
Train/Test Split
Train/Test splitting can be cone using sklearn as normal since each row is a different series/observation
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y)
Using a Classifier
sktime
has built in classifiers that can be used as normal sklearn
classifiers:
from sktime.classification.interval_based import TimeSeriesForestClassifier
classifier = TimeSeriesForestClassifier()
classifier.fit(X_train, y_train)
TimeSeriesForestClassifier()
And predictions can be made using the predict
method:
y_pred = classifier.predict(X_test)
Model Evaluation
We can also check the accuracy using normal sklearn
metrics, for example accuracy_score
from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_pred)
0.9056603773584906
from sklearn.metrics import confusion_matrix
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
matrix = confusion_matrix(y_test, y_pred)
disp = ConfusionMatrixDisplay(matrix)
disp.plot()
<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at 0x7f826205ddd0>
Use with SKLearn Classifiers
sktime
also allows the conversion of data such that it can be used with sklearn
tabular classifiers. This is done by transforming the classifier using the Tabularizer
in a sklearn pipeline
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.pipeline import make_pipeline
from sktime.transformations.panel.reduce import Tabularizer
classifier = make_pipeline(Tabularizer(), GradientBoostingClassifier())
classifier.fit(X_train, y_train)
Pipeline(steps=[('tabularizer', Tabularizer()), ('gradientboostingclassifier', GradientBoostingClassifier())])
y_pred = classifier.predict(X_test)
accuracy_score(y_test, y_pred)
0.9056603773584906
matrix = confusion_matrix(y_test, y_pred)
disp = ConfusionMatrixDisplay(matrix)
disp.plot()
<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at 0x7f8261ebb810>