ARIMA. Time series forecasting

We learned from the last article that the ARMA approach combines autoregressive and moving average. In this article, we will talk about the ARIMA process.

So what exactly is ARIMA modeling?

ARIMA (Auto Regressive Integrated Moving Average), as ARMA, is a class of models that describe data behavior based on historical values ​​of a series. Integrated is a process that brings a time series to stationarity.

Any “non-seasonal” time series that demonstrates a pattern and can usually be predicted using the ARIMA process.

Each of the components are specified as model parameters. The standard notation ARIMA (p, d, q) is used. That is, the predicted value = constant + linear combination of lags (p) + linear combination of predicted error lags (q).

Next, we find d (differencing). As mentioned, the goal is to make the time series stationary. But in this process, you need to be careful as you can “overestimate” the series. So how do you determine the correct order of d? The correct order should have the minimum difference required to obtain a near-stationary series that varies around the mean. It is also necessary to consider the graphs of the Autocorrelation Function (ACF), where the first lag quickly reaches zero.

If the autocorrelations are positive for many lags (10 or more), then the series needs further calculation of the differences. But if the lag autocorrelation itself is too negative, then the series has probably been over-computed. If you cannot choose between the series, then calculate the standard deviation in the difference series, and select the minimum value.

Example

In this example, we used data on Google stock prices (NASDAQ: GOOGL) from 09.2015 to 06.2020. The data was generated for the API and pulled into CaseWare IDEA for further analysis. To build the model, the statsmodels package in the Python programming language was used.

Stationarity check. Definition d

If the row is stationary, then d = 0 (ARMA). Null hypothesis ADF = non-stationary time series. Thus, if the p-test value is less than the significance level (0.05), then the hypothesis is rejected, and therefore the time series is stationary.

Graph 1. Stock Price Data (NASDAQ: GOOGLE)

ADF Statistics = -0.665414

P-value = 0.855506

Since the P-value is greater than the significance level, let’s analyze the series and look at the autocorrelation graph.

Graph 2. Autocorrelation at d = 0

The graph clearly shows that the data have a tendency, respectively, the nonstationarity of the series is confirmed.

Let’s look at the data plots and autocorrelation at d = 1 and d = 2

Graph 3. Data at d = 1

Graph 4. Autocorrelation at d = 1

Graph 5. Data at d = 2

Graph 6. Autocorrelation at d = 2

When d = 2, we see on graph 6 that the second lag goes far into the negative zone, and this tells us that the double calculation of the differences is too much for this series.

When d = 1, we see in graph 4 that the series is stationary.

So, we first take the order of calculating the differences d = 1.  

AR (Auto Regression) order. Definition of p.

Plot Partial Autocorrelation (PACF)

Partial autocorrelation of time series will help in finding the order of the autoregressive model. Any autocorrelation in the stationary series can be corrected by adding enough AR terms. So, we initially take the order of the AR member to be as many lags as they cross the significance limit on the PACF plot.

Graph 7. Partial autocorrelation at d = 1.

Looking at the graph, you can specify the value of both 1 and 2. First, we denote p = 1.

Moving Average (MA). Order q.

Also, for a given order d = 1, you can look at the ACF chart to indicate the order MA. Technically speaking, MA is a lagging forecast error. Here, on Chart 4, we do not see any significant deviations, so I advise you to mark q = 1 first.

Building the ARIMA model

Let’s use the ARIMA package, statsmodels. Let’s check 2 models:

ARIMA (1,1,1) – aic = 664.26

ARIMA (2,1,1) – aic = 664.50

Akaike’s criterion excludes the second model, since the model is better than the one with the lower value. Also the AR, MA values ​​are quite significant. Looking at the residuals (graph 8), we see a fairly even variance with almost zero residuals on average.

Graph 9 shows actual and predicted values.

Graph 8. ARIMA balances and variance (1,1,1)

Graph 9. Forecasted values. ARIMA model (1,1,1)

That’s all. We save the results and graphs and insert them into reports. In CaseWare IDEA, you can immediately create a report, record data statistics and corresponding graphs with results.