ARIMA 모형 :: 시계열 분석 - mindscale
Skip to content

ARIMA 모형

ARMA(p,q) 모형은 AR(p) 모형과 MA(q) 모형이 합쳐진 모형입니다.

import matplotlib.pyplot as plt
import pandas as pd

df = pd.read_excel('beer.xlsx')
df.head()
quarter production
0 1956-01 284
1 1956-04 213
2 1956-07 227
3 1956-10 308
4 1957-01 262
y = df.production
y.plot()
<Axes: >

ARMA(1, 1)

from  statsmodels.tsa.api import SARIMAX

arma = SARIMAX(y, order=(1, 0, 1)).fit()
arma.summary() 
SARIMAX Results
Dep. Variable: production No. Observations: 218
Model: SARIMAX(1, 0, 1) Log Likelihood -1168.017
Date: Sat, 01 Jul 2023 AIC 2342.034
Time: 19:01:18 BIC 2352.187
Sample: 0 HQIC 2346.135
- 218
Covariance Type: opg
coef std err z P>|z| [0.025 0.975]
ar.L1 0.9998 0.001 1031.315 0.000 0.998 1.002
ma.L1 -0.8493 0.040 -21.298 0.000 -0.927 -0.771
sigma2 2576.5261 301.003 8.560 0.000 1986.571 3166.482
Ljung-Box (L1) (Q): 0.52 Jarque-Bera (JB): 19.61
Prob(Q): 0.47 Prob(JB): 0.00
Heteroskedasticity (H): 0.97 Skew: 0.68
Prob(H) (two-sided): 0.88 Kurtosis: 2.45




Warnings:

[1] Covariance matrix calculated using the outer product of gradients (complex-step).
y.plot()
arma.predict(0, 250).plot()
<Axes: >
from statsmodels.tsa.stattools import kpss
kpss(y)
C:\Users\eupho\AppData\Local\Temp\ipykernel_28656\4027067367.py:2: InterpolationWarning: The test statistic is outside of the range of p-values available in the
look-up table. The actual p-value is smaller than the p-value returned.

  kpss(y)
(1.1416279223770989,
 0.01,
 9,
 {'10%': 0.347, '5%': 0.463, '2.5%': 0.574, '1%': 0.739})
yd = y.diff(1).dropna()
yd.plot()
<Axes: >
kpss(yd)
C:\Users\eupho\AppData\Local\Temp\ipykernel_28656\757044700.py:1: InterpolationWarning: The test statistic is outside of the range of p-values available in the
look-up table. The actual p-value is greater than the p-value returned.

  kpss(yd)
(0.12919635959269035,
 0.1,
 17,
 {'10%': 0.347, '5%': 0.463, '2.5%': 0.574, '1%': 0.739})
d_arma = SARIMAX(yd, order=(1, 0, 1)).fit()
d_arma.summary() 
c:\Users\eupho\anaconda3\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: An unsupported index was provided and will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)
c:\Users\eupho\anaconda3\lib\site-packages\statsmodels\tsa\base\tsa_model.py:473: ValueWarning: An unsupported index was provided and will be ignored when e.g. forecasting.
  self._init_dates(dates, freq)
c:\Users\eupho\anaconda3\lib\site-packages\statsmodels\tsa\statespace\sarimax.py:978: UserWarning: Non-invertible starting MA parameters found. Using zeros as starting parameters.
  warn('Non-invertible starting MA parameters found.'
SARIMAX Results
Dep. Variable: production No. Observations: 217
Model: SARIMAX(1, 0, 1) Log Likelihood -1160.621
Date: Sat, 01 Jul 2023 AIC 2327.242
Time: 19:01:32 BIC 2337.382
Sample: 0 HQIC 2331.338
- 217
Covariance Type: opg
coef std err z P>|z| [0.025 0.975]
ar.L1 -0.0363 0.132 -0.276 0.782 -0.294 0.222
ma.L1 -0.8450 0.050 -16.926 0.000 -0.943 -0.747
sigma2 2573.5432 317.043 8.117 0.000 1952.150 3194.936
Ljung-Box (L1) (Q): 0.26 Jarque-Bera (JB): 18.56
Prob(Q): 0.61 Prob(JB): 0.00
Heteroskedasticity (H): 0.95 Skew: 0.65
Prob(H) (two-sided): 0.82 Kurtosis: 2.42




Warnings:

[1] Covariance matrix calculated using the outer product of gradients (complex-step).
yd.plot()
d_arma.predict(0, 250).plot()
c:\Users\eupho\anaconda3\lib\site-packages\statsmodels\tsa\base\tsa_model.py:836: ValueWarning: No supported index is available. Prediction results will be given with an integer index beginning at `start`.
  return get_prediction_index(
c:\Users\eupho\anaconda3\lib\site-packages\statsmodels\tsa\base\tsa_model.py:836: FutureWarning: No supported index is available. In the next version, calling this method in a model without a supported index will result in an exception.
  return get_prediction_index(
<Axes: >

ARIMA(p, d, q) 모형은 d번 차분한 값을 ARMA(p, q)으로 분석하는 것과 같습니다.

arima = SARIMAX(y, order=(1, 1, 1)).fit()
arima.summary() 
c:\Users\eupho\anaconda3\lib\site-packages\statsmodels\tsa\statespace\sarimax.py:978: UserWarning: Non-invertible starting MA parameters found. Using zeros as starting parameters.
  warn('Non-invertible starting MA parameters found.'
SARIMAX Results
Dep. Variable: production No. Observations: 218
Model: SARIMAX(1, 1, 1) Log Likelihood -1160.613
Date: Sat, 01 Jul 2023 AIC 2327.227
Time: 19:01:49 BIC 2337.366
Sample: 0 HQIC 2331.323
- 218
Covariance Type: opg
coef std err z P>|z| [0.025 0.975]
ar.L1 -0.0364 0.132 -0.277 0.782 -0.294 0.222
ma.L1 -0.8450 0.050 -16.924 0.000 -0.943 -0.747
sigma2 2573.5643 317.059 8.117 0.000 1952.139 3194.989
Ljung-Box (L1) (Q): 0.26 Jarque-Bera (JB): 18.56
Prob(Q): 0.61 Prob(JB): 0.00
Heteroskedasticity (H): 0.95 Skew: 0.65
Prob(H) (two-sided): 0.82 Kurtosis: 2.42




Warnings:

[1] Covariance matrix calculated using the outer product of gradients (complex-step).
y.plot()
arima.predict(0, 250).plot()
<Axes: >

계절성 ARIMA 모형은 계절성에 대해서 별도의 ARIMA 모형으로 분석합니다. seasonal_order의 마지막 값은 계절성의 주기를 나타냅니다.

sarima = SARIMAX(y, order=(1, 1, 0), seasonal_order=(1, 0, 0, 4)).fit()
sarima.summary()
SARIMAX Results
Dep. Variable: production No. Observations: 218
Model: SARIMAX(1, 1, 0)x(1, 0, 0, 4) Log Likelihood -973.448
Date: Sat, 01 Jul 2023 AIC 1952.896
Time: 19:04:53 BIC 1963.036
Sample: 0 HQIC 1956.992
- 218
Covariance Type: opg
coef std err z P>|z| [0.025 0.975]
ar.L1 -0.5972 0.048 -12.414 0.000 -0.691 -0.503
ar.S.L4 0.9516 0.018 54.190 0.000 0.917 0.986
sigma2 440.2050 42.021 10.476 0.000 357.846 522.564
Ljung-Box (L1) (Q): 22.50 Jarque-Bera (JB): 0.01
Prob(Q): 0.00 Prob(JB): 0.99
Heteroskedasticity (H): 1.60 Skew: -0.01
Prob(H) (two-sided): 0.05 Kurtosis: 3.03




Warnings:

[1] Covariance matrix calculated using the outer product of gradients (complex-step).
y.plot()
sarima.predict(0, 250).plot()
<Axes: >

계절성 ARIMA 모형은 계절성의 주기를 제외하고 p, d, q, P, D, Q 모두 6개의 값을 설정해야 합니다. 아래 코드는 이 6개의 값에 0, 1, 2 세 가지를 대입하여 총 729가지 모형을 시도합니다. 그중 AIC가 가장 낮은 모형을 best_sarima에 설정합니다.

from itertools import combinations_with_replacement
import numpy as np
import warnings

warnings.filterwarnings("ignore")

min_aic = np.inf
for ar, ma, seasonal_ar, seasonal_ma in combinations_with_replacement([0, 1, 2], 4):
    try:
        sarima = SARIMAX(y, order=(ar, 1, ma), seasonal_order=(seasonal_ar, 0, seasonal_ma, 4)).fit()
        print(ar, ma, seasonal_ar, seasonal_ma, sarima.aic)
    except:
        pass
    else:
        if sarima.aic < min_aic:
            min_aic = sarima.aic
            best_sarima = sarima
warnings.resetwarnings()
0 0 0 0 2446.993870307465
0 0 0 0 2446.993870307465
0 0 0 1 2269.0870570757484
0 0 0 1 2269.0870570757484
0 0 0 2 2205.6411732862007
0 0 0 2 2205.6411732862007
0 0 1 1 1968.7883160730275
0 0 1 1 1968.7883160730275
0 0 1 2 1970.3592365281647
0 0 2 2 1965.3952928674607
0 0 2 2 1965.3952928674607
0 1 1 1 1840.469834804714
0 1 1 1 1840.469834804714
0 1 1 2 1840.7075395781462
0 1 2 2 1836.4792775263727
0 1 2 2 1836.4792775263727
0 2 2 2 1822.26570408957
0 2 2 2 1822.26570408957
1 1 1 1 1827.969616228781
1 1 1 2 1828.8672615487212
1 1 2 2 1824.7291286441794
1 2 2 2 1824.2634784092993

최적의 모델은 다음과 같습니다.

best_sarima.summary()
SARIMAX Results
Dep. Variable: production No. Observations: 218
Model: SARIMAX(0, 1, 2)x(2, 0, 2, 4) Log Likelihood -904.133
Date: Sat, 01 Jul 2023 AIC 1822.266
Time: 19:13:00 BIC 1845.925
Sample: 0 HQIC 1831.823
- 218
Covariance Type: opg
coef std err z P>|z| [0.025 0.975]
ma.L1 -0.9570 0.065 -14.787 0.000 -1.084 -0.830
ma.L2 0.3070 0.067 4.596 0.000 0.176 0.438
ar.S.L4 0.1125 0.101 1.119 0.263 -0.084 0.310
ar.S.L8 0.8793 0.100 8.824 0.000 0.684 1.075
ma.S.L4 0.2259 0.105 2.144 0.032 0.019 0.432
ma.S.L8 -0.7199 0.075 -9.636 0.000 -0.866 -0.573
sigma2 228.0371 21.475 10.619 0.000 185.947 270.127
Ljung-Box (L1) (Q): 0.01 Jarque-Bera (JB): 4.62
Prob(Q): 0.92 Prob(JB): 0.10
Heteroskedasticity (H): 1.29 Skew: -0.05
Prob(H) (two-sided): 0.29 Kurtosis: 3.71




Warnings:

[1] Covariance matrix calculated using the outer product of gradients (complex-step).