from sktime import *
from sktime.datasets import load_airline
from sktime.utils.plotting import plot_series
y = load_airline()
plot_series(y)
(<Figure size 1536x384 with 1 Axes>,
<AxesSubplot: ylabel='Number of airline passengers'>)
Plot the data.
If necessary, transform the data (using a Box-Cox transformation) to stabilise the variance.
3. If the data are non-stationary, take first differences of the data until the data are stationary.
4. Examine the ACF/PACF to identify a suitable model.
5. Try your chosen model(s), and use the AICc to search for a better model.
Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modified model.
Once the residuals look like white noise, calculate forecasts.
Plot the data.
If necessary, transform the data (using a Box-Cox transformation) to stabilise the variance.
Use AutoARIMA to select a model.
Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modified model.
Once the residuals look like white noise, calculate forecasts.
Take other important visualizations
import numpy as np
import pandas as pd
from sktime.forecasting.base import ForecastingHorizon
fh = ForecastingHorizon(
pd.PeriodIndex(pd.date_range("1960-01", periods=12, freq="M")), is_relative=False
)
fh
ForecastingHorizon(['1960-01', '1960-02', '1960-03', '1960-04', '1960-05', '1960-06',
'1960-07', '1960-08', '1960-09', '1960-10', '1960-11', '1960-12'],
dtype='period[M]', is_relative=False)
from sktime.forecasting.statsforecast import StatsForecastAutoARIMA
import numpy as np
forecaster = StatsForecastAutoARIMA(
sp=12, max_p=2, max_q=2
)
y_train.naturallog = np.log(y_train)
forecaster.fit(y_train.naturallog)
StatsForecastAutoARIMA(max_p=2, max_q=2, sp=12)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
StatsForecastAutoARIMA(max_p=2, max_q=2, sp=12)
sp: Number of observations per unit of time.
Help: https://www.sktime.org/en/stable/api_reference/auto_generated/sktime.forecasting.statsforecast.StatsForecastAutoARIMA.html
Preferm residual analysis
Coverage | ||
---|---|---|
0.9 | ||
lower | upper | |
1960-01 | 5.978520 | 6.099244 |
1960-02 | 5.916820 | 6.062361 |
1960-03 | 6.062679 | 6.229384 |
1960-04 | 6.026940 | 6.212408 |
1960-05 | 6.058205 | 6.260705 |
1960-06 | 6.195598 | 6.413804 |
1960-07 | 6.316247 | 6.549103 |
1960-08 | 6.321487 | 6.568124 |
1960-09 | 6.136959 | 6.396648 |
1960-10 | 6.000121 | 6.272235 |
1960-11 | 5.865717 | 6.149713 |
1960-12 | 5.966786 | 6.262187 |
y_test.naturallog = np.log(y_test)
from sktime.utils import plotting
# also requires predictions
y_pred = forecaster.predict()
fig, ax = plotting.plot_series(y_train.naturallog, y_pred, labels=["y", "y_pred"])
ax.fill_between(
ax.get_lines()[-1].get_xdata(),
y_pred_ints["Coverage"][coverage]["lower"],
y_pred_ints["Coverage"][coverage]["upper"],
alpha=0.2,
color=ax.get_lines()[-1].get_c(),
label=f"{coverage}% prediction intervals",
)
ax.legend();
Step 1: Select the number of differences d and D via unit root tests and strength of seasonality measure.
Step 2: Try four possible models to start with:
\(ARIMA(2, d, 2)\) if \(m = 1\) and \(ARIMA(2, d, 2)(1, D, 1)_m\) if \(m > 1\).
\(ARIMA(0, d, 0)\) if \(m = 1\) and \(ARIMA(0, d, 0)(0, D, 0)_m\) if \(m > 1\).
\(ARIMA(1, d, 0)\) if \(m = 1\) and \(ARIMA(1, d, 0)(1, D, 0)_m\) if \(m > 1\).
\(ARIMA(0, d, 1)\) if \(m = 1\) and \(ARIMA(0, d, 1)(0, D, 1)_m\) if \(m > 1\).
Step 3: Select the model with the smallest AICc from step 2. This becomes the current model.
Step 4: Consider up to 13 variations on the current model:
Vary one of \(p, q, P\) and \(Q\) from the current model by \(\pm 1\).
\(p, q\) both vary from the current model by \(\pm 1\).
\(P, Q\) both vary from the current model by \(\pm 1\).
Include or exclude the constant term from the current model. Repeat step 4 until no lower AICc can be found.