ARIMA 모형
ARMA(p,q) 모형은 AR(p) 모형과 MA(q) 모형이 합쳐진 모형입니다.
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_excel('beer.xlsx')
df.head()
quarter | production | |
---|---|---|
0 | 1956-01 | 284 |
1 | 1956-04 | 213 |
2 | 1956-07 | 227 |
3 | 1956-10 | 308 |
4 | 1957-01 | 262 |
y = df.production
y.plot()
<Axes: >
ARMA(1, 1)
from statsmodels.tsa.api import SARIMAX
arma = SARIMAX(y, order=(1, 0, 1)).fit()
arma.summary()
Dep. Variable: | production | No. Observations: | 218 |
---|---|---|---|
Model: | SARIMAX(1, 0, 1) | Log Likelihood | -1168.017 |
Date: | Sat, 01 Jul 2023 | AIC | 2342.034 |
Time: | 19:01:18 | BIC | 2352.187 |
Sample: | 0 | HQIC | 2346.135 |
- 218 | |||
Covariance Type: | opg |
coef | std err | z | P>|z| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
ar.L1 | 0.9998 | 0.001 | 1031.315 | 0.000 | 0.998 | 1.002 |
ma.L1 | -0.8493 | 0.040 | -21.298 | 0.000 | -0.927 | -0.771 |
sigma2 | 2576.5261 | 301.003 | 8.560 | 0.000 | 1986.571 | 3166.482 |
Ljung-Box (L1) (Q): | 0.52 | Jarque-Bera (JB): | 19.61 |
---|---|---|---|
Prob(Q): | 0.47 | Prob(JB): | 0.00 |
Heteroskedasticity (H): | 0.97 | Skew: | 0.68 |
Prob(H) (two-sided): | 0.88 | Kurtosis: | 2.45 |
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
y.plot()
arma.predict(0, 250).plot()
<Axes: >
from statsmodels.tsa.stattools import kpss
kpss(y)
C:\Users\eupho\AppData\Local\Temp\ipykernel_28656\4027067367.py:2: InterpolationWarning: The test statistic is outside of the range of p-values available in the look-up table. The actual p-value is smaller than the p-value returned. kpss(y)
(1.1416279223770989, 0.01, 9, {'10%': 0.347, '5%': 0.463, '2.5%': 0.574, '1%': 0.739})
yd = y.diff(1).dropna()
yd.plot()
<Axes: >
kpss(yd)
C:\Users\eupho\AppData\Local\Temp\ipykernel_28656\757044700.py:1: InterpolationWarning: The test statistic is outside of the range of p-values available in the look-up table. The actual p-value is greater than the p-value returned. kpss(yd)
(0.12919635959269035, 0.1, 17, {'10%': 0.347, '5%': 0.463, '2.5%': 0.574, '1%': 0.739})
d_arma = SARIMAX(yd, order=(1, 0, 1)).fit()
d_arma.summary()
c:\Users\eupho\anaconda3\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: An unsupported index was provided and will be ignored when e.g. forecasting. self._init_dates(dates, freq) c:\Users\eupho\anaconda3\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: An unsupported index was provided and will be ignored when e.g. forecasting. self._init_dates(dates, freq) c:\Users\eupho\anaconda3\lib\site-packages\statsmodels\tsa\statespace\sarimax.py:978: UserWarning: Non-invertible starting MA parameters found. Using zeros as starting parameters. warn('Non-invertible starting MA parameters found.'
Dep. Variable: | production | No. Observations: | 217 |
---|---|---|---|
Model: | SARIMAX(1, 0, 1) | Log Likelihood | -1160.621 |
Date: | Sat, 01 Jul 2023 | AIC | 2327.242 |
Time: | 19:01:32 | BIC | 2337.382 |
Sample: | 0 | HQIC | 2331.338 |
- 217 | |||
Covariance Type: | opg |
coef | std err | z | P>|z| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
ar.L1 | -0.0363 | 0.132 | -0.276 | 0.782 | -0.294 | 0.222 |
ma.L1 | -0.8450 | 0.050 | -16.926 | 0.000 | -0.943 | -0.747 |
sigma2 | 2573.5432 | 317.043 | 8.117 | 0.000 | 1952.150 | 3194.936 |
Ljung-Box (L1) (Q): | 0.26 | Jarque-Bera (JB): | 18.56 |
---|---|---|---|
Prob(Q): | 0.61 | Prob(JB): | 0.00 |
Heteroskedasticity (H): | 0.95 | Skew: | 0.65 |
Prob(H) (two-sided): | 0.82 | Kurtosis: | 2.42 |
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
yd.plot()
d_arma.predict(0, 250).plot()
c:\Users\eupho\anaconda3\lib\site-packages\statsmodels\tsa\base\tsa_model.py:836: ValueWarning: No supported index is available. Prediction results will be given with an integer index beginning at `start`. return get_prediction_index( c:\Users\eupho\anaconda3\lib\site-packages\statsmodels\tsa\base\tsa_model.py:836: FutureWarning: No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception. return get_prediction_index(
<Axes: >
ARIMA(p, d, q) 모형은 d번 차분한 값을 ARMA(p, q)으로 분석하는 것과 같습니다.
arima = SARIMAX(y, order=(1, 1, 1)).fit()
arima.summary()
c:\Users\eupho\anaconda3\lib\site-packages\statsmodels\tsa\statespace\sarimax.py:978: UserWarning: Non-invertible starting MA parameters found. Using zeros as starting parameters. warn('Non-invertible starting MA parameters found.'
Dep. Variable: | production | No. Observations: | 218 |
---|---|---|---|
Model: | SARIMAX(1, 1, 1) | Log Likelihood | -1160.613 |
Date: | Sat, 01 Jul 2023 | AIC | 2327.227 |
Time: | 19:01:49 | BIC | 2337.366 |
Sample: | 0 | HQIC | 2331.323 |
- 218 | |||
Covariance Type: | opg |
coef | std err | z | P>|z| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
ar.L1 | -0.0364 | 0.132 | -0.277 | 0.782 | -0.294 | 0.222 |
ma.L1 | -0.8450 | 0.050 | -16.924 | 0.000 | -0.943 | -0.747 |
sigma2 | 2573.5643 | 317.059 | 8.117 | 0.000 | 1952.139 | 3194.989 |
Ljung-Box (L1) (Q): | 0.26 | Jarque-Bera (JB): | 18.56 |
---|---|---|---|
Prob(Q): | 0.61 | Prob(JB): | 0.00 |
Heteroskedasticity (H): | 0.95 | Skew: | 0.65 |
Prob(H) (two-sided): | 0.82 | Kurtosis: | 2.42 |
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
y.plot()
arima.predict(0, 250).plot()
<Axes: >
계절성 ARIMA 모형은 계절성에 대해서 별도의 ARIMA 모형으로 분석합니다. seasonal_order
의 마지막 값은 계절성의 주기를 나타냅니다.
sarima = SARIMAX(y, order=(1, 1, 0), seasonal_order=(1, 0, 0, 4)).fit()
sarima.summary()
Dep. Variable: | production | No. Observations: | 218 |
---|---|---|---|
Model: | SARIMAX(1, 1, 0)x(1, 0, 0, 4) | Log Likelihood | -973.448 |
Date: | Sat, 01 Jul 2023 | AIC | 1952.896 |
Time: | 19:04:53 | BIC | 1963.036 |
Sample: | 0 | HQIC | 1956.992 |
- 218 | |||
Covariance Type: | opg |
coef | std err | z | P>|z| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
ar.L1 | -0.5972 | 0.048 | -12.414 | 0.000 | -0.691 | -0.503 |
ar.S.L4 | 0.9516 | 0.018 | 54.190 | 0.000 | 0.917 | 0.986 |
sigma2 | 440.2050 | 42.021 | 10.476 | 0.000 | 357.846 | 522.564 |
Ljung-Box (L1) (Q): | 22.50 | Jarque-Bera (JB): | 0.01 |
---|---|---|---|
Prob(Q): | 0.00 | Prob(JB): | 0.99 |
Heteroskedasticity (H): | 1.60 | Skew: | -0.01 |
Prob(H) (two-sided): | 0.05 | Kurtosis: | 3.03 |
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
y.plot()
sarima.predict(0, 250).plot()
<Axes: >
계절성 ARIMA 모형은 계절성의 주기를 제외하고 p, d, q, P, D, Q 모두 6개의 값을 설정해야 합니다. 아래 코드는 이 6개의 값에 0, 1, 2 세 가지를 대입하여 총 729가지 모형을 시도합니다. 그중 AIC가 가장 낮은 모형을 best_sarima
에 설정합니다.
from itertools import combinations_with_replacement
import numpy as np
import warnings
warnings.filterwarnings("ignore")
min_aic = np.inf
for ar, ma, seasonal_ar, seasonal_ma in combinations_with_replacement([0, 1, 2], 4):
try:
sarima = SARIMAX(y, order=(ar, 1, ma), seasonal_order=(seasonal_ar, 0, seasonal_ma, 4)).fit()
print(ar, ma, seasonal_ar, seasonal_ma, sarima.aic)
except:
pass
else:
if sarima.aic < min_aic:
min_aic = sarima.aic
best_sarima = sarima
warnings.resetwarnings()
0 0 0 0 2446.993870307465 0 0 0 0 2446.993870307465 0 0 0 1 2269.0870570757484 0 0 0 1 2269.0870570757484 0 0 0 2 2205.6411732862007 0 0 0 2 2205.6411732862007 0 0 1 1 1968.7883160730275 0 0 1 1 1968.7883160730275 0 0 1 2 1970.3592365281647 0 0 2 2 1965.3952928674607 0 0 2 2 1965.3952928674607 0 1 1 1 1840.469834804714 0 1 1 1 1840.469834804714 0 1 1 2 1840.7075395781462 0 1 2 2 1836.4792775263727 0 1 2 2 1836.4792775263727 0 2 2 2 1822.26570408957 0 2 2 2 1822.26570408957 1 1 1 1 1827.969616228781 1 1 1 2 1828.8672615487212 1 1 2 2 1824.7291286441794 1 2 2 2 1824.2634784092993
최적의 모델은 다음과 같습니다.
best_sarima.summary()
Dep. Variable: | production | No. Observations: | 218 |
---|---|---|---|
Model: | SARIMAX(0, 1, 2)x(2, 0, 2, 4) | Log Likelihood | -904.133 |
Date: | Sat, 01 Jul 2023 | AIC | 1822.266 |
Time: | 19:13:00 | BIC | 1845.925 |
Sample: | 0 | HQIC | 1831.823 |
- 218 | |||
Covariance Type: | opg |
coef | std err | z | P>|z| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
ma.L1 | -0.9570 | 0.065 | -14.787 | 0.000 | -1.084 | -0.830 |
ma.L2 | 0.3070 | 0.067 | 4.596 | 0.000 | 0.176 | 0.438 |
ar.S.L4 | 0.1125 | 0.101 | 1.119 | 0.263 | -0.084 | 0.310 |
ar.S.L8 | 0.8793 | 0.100 | 8.824 | 0.000 | 0.684 | 1.075 |
ma.S.L4 | 0.2259 | 0.105 | 2.144 | 0.032 | 0.019 | 0.432 |
ma.S.L8 | -0.7199 | 0.075 | -9.636 | 0.000 | -0.866 | -0.573 |
sigma2 | 228.0371 | 21.475 | 10.619 | 0.000 | 185.947 | 270.127 |
Ljung-Box (L1) (Q): | 0.01 | Jarque-Bera (JB): | 4.62 |
---|---|---|---|
Prob(Q): | 0.92 | Prob(JB): | 0.10 |
Heteroskedasticity (H): | 1.29 | Skew: | -0.05 |
Prob(H) (two-sided): | 0.29 | Kurtosis: | 3.71 |
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).