Equity Return Modeling and Prediction Using Hybrid ARIMA-GARCH Model

In this paper, a hybrid ARIMA-GARCH model is proposed to model and predict the equity returns for three US benchmark indices: Dow Transportation, S&P 500 and VIX. Equity returns are univariate time series data sets, one of the methods to predict them is using the Auto-Regressive Integrated Moving Average (ARIMA) models. Despite the fact that the ARIMA models are powerful and flexible, they are not be able to handle the volatility and nonlinearity that are present in the time series data. However, the Generalized Autoregressive Conditional Heteroscedasticity (GARCH) models are designed to capture volatility clustering behavior in time series. In this paper, we provide motivations and descriptions of the hybrid ARIMA-GARCH model. A complete data analysis procedure that involves a series of hypothesis testings and a model fitting procedure using the Akaike Information Criterion (AIC) is provided in this paper as well. Simulation results of out of sample predictions are also provided in this paper as a reference.

ARIMA models are applied in some cases where data show evidence of non-stationarity, where an initial differencing step (the "integrated" part of the model) can be applied to reduce the non-stationarity. The ARIMA process generates non-stationary series that are integrated of order D . Such processes are often called difference-stationary or unit root processes.
A time series that can be modeled as a stationary ARMA ) , ( q p process after being differenced D times is We can also rewrite (1) with lag operator notation, ARIMA models can be estimated following the Box-Jenkins approach. The Box-Jenkins method includes an iterative three-stage modeling approach: model identification, parameter estimation and model checking. Since stationary process is a necessary condition for an ARIMA model, when the observed time series present trend and non-seasonal behavior, differencing will be applied to the data series to remove the trend.

GARCH Model
The GARCH models are commonly employed in modeling financial time series that exhibit time-varying volatility clustering. GARCH models attempt to address volatility clustering in an innovations process. Volatility clustering occurs when an innovations process does not exhibit significant autocorrelation, but the variance of the process changes with time. If a series exhibits volatility clustering, it suggests that past variances might be predictive of the current variance.  is a constant term, r is the order of the GARCH terms 2  , which represents the number of lagged conditional variances. s is the order of the ARCH terms 2  , which represents the number of lagged innovations.
And i  and j  are the coefficients of the ARCH and GARCH parameters, respectively.

Hybrid ARIMA-GARCH Model
We combine the ARIMA and GARCH models which are specified in 2.1 and 2.2 together, the hybrid ARIMA In the rest of the paper, we will apply the hybrid ARIMA , ( s r model specified in (5) and (6) to model and predict the equity returns.

Analysis and Modeling of Financial Data
In this section, we provide a complete data analysis and model fitting procedure for the returns of three equity benchmark indices, Dow Transportation, S&P 500 and VIX. Note that we analyze the historical data ranged from 12/01/2006 to 12/01/2016 for our modeling in Section 3.

Fetching and Preprocessing of Historical Data
We fetch the historical data from FRED with the FRED Application Programming Interface (API). Given the hybrid ARIMA-GARCH model we proposed in Section 2, we fetch the indices data for all the three benchmarks as our original financial data series. We retrieve 10 years of historical daily data of the targeted indices for a more reliable model fitting.
Since we would like to predict the daily returns of the targeted benchmark indices. We delete all missing and invalid data from the fetched series, and transform the price indices data into daily returns data series. Figure 1 shows the indices and returns performance for all the three interested benchmarks.

Statistical Hypothesis Testings of Historical Data
First, we test if the time series are stationary. Here we choose to apply the Phillips-Perron test since it is robust with respect to unspecified autocorrelation and heteroscedasticity in the disturbance process of the test equation compared to the Dickey Fuller test. The Phillips-Perron test let us test the existence of a unit root for a time series.
Additionally, the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test is also considered in our project, where the absence of a unit root is not a proof of stationarity but of trend-stationarity. This is an important distinction since it is possible for a time series to be non-stationary, have no unit root yet be trend-stationary. (In this project, the testing results show that the price index data of VIX is non-stationary, but trend-stationary with no unit root.) Here the KPSS test is intended to complement the Phillips-Perron test.
Besides, the Leybourne-McCabe stationarity test is applied to the time series data, which tests for the null hypothesis that the data is trend stationary, against the alternative that the data is non-stationary.
The results of the above Phillips-Perron, KPSS and Leybourne-McCabe tests show that the price indices of Dow Transportation, S&P 500 and VIX are not stationary. However, the results of the hypothesis tests indicate that the daily returns of the three benchmark indices are stationary.
The autocorrelation and partial autocorrelation properties of the daily returns data are also studied in this project. And the corresponding results are shown in Figure 2.

Lag Orders Estimation for ARIMA Model Using Akaike Information Criterion (AIC)
In this subsection, we start to specify a model fitting procedure which estimates the orders and coefficients for the proposed hybrid ARIMA-GARCH model.
First, we estimate the lag orders for ARIMA model by comparing the Akaike Information Criterion (AIC) for different AR and MA lag orders. Although AIC does not provide a test of a model in the sense of testing a null hypothesis, it is proposed based on information theory, which offers a relative estimate of the information lost when a given model is used to represent the process that generates the data. Thus, given a collection of models for the data, AIC estimates the quality of each model, the preferred model is the one with the minimum AIC value.
In this paper, in order to find the optimal lag orders of the ARIMA model for the daily returns data of the three benchmark indices, we apply a parameter sweep by calculating the AIC values for each pair of AR and MA lags while their values vary from 1 to 8. The optimal AR and MA lag orders for the ARIMA model are which in the pair corresponding to the minimum AIC value. Table 1 shows the optimal AR and MA lag orders for the three benchmarks.

Testing for ARCH Effects
Since the ARIMA model is built upon the assumption that the data is homoscedastic. In order to take into consideration the changes in variance for the returns series, we need to model the heteroscedastic behavior of the data series, which is an ARCH effect.
In this paper, we perform the Engle's ARCH test (Engle, 1982), which is a Lagrange multiplier test to assess the significance of ARCH effects, to the residuals. The residuals are calculate from the best-fitted ARIMA model in Section 3.3. The testing results indicate that ARCH effect may exist in the returns of all the three interested benchmarks .  ISSN 1923-4023 E-ISSN 1923-4031

Lag Orders Estimation for GARCH Model Using AIC
In this subsection, we use the similar methodology in Section 3.3 to estimate the lag orders of GARCH lags r and ARCH lags s for the GARCH model. We estimate the lag orders for the GARCH model by comparing the AIC values for different pairs of GARCH and ARCH lag orders where each of them varies from 1 to 4. The optimal lag orders correspond to the minimum AIC value. Table 2 shows the optimal GARCH and ARCH lag orders.

Coefficients Estimation for Hybrid ARIMA-GARCH Model
Now we can estimate the coefficients of the proposed hybrid ARIMA-GARCH model given the optimal lag orders for AR, MA, GARCH and ARCH lags found in Sections 3.3 and 3.5. The corresponding coefficients of the ARIMA and GARCH models are shown in Tables 3 and 4.

Numerical Simulation for Equity Returns Using Hybrid ARIMA-GARCH Model
In this section, we carry out some simulation results using the fitted hybrid ARIMA-GARCH model in Section 3. In the simulation, we assume the historical data prior to date 10/01/2016 as the training data. We use the training data to estimate the model parameters and perform an out of sample prediction of 10 consecutive business days of daily returns from 10/03/2016 to 10/14/2016. Given the fitted ARIMA-GARCH model, we perform Monte Carlo simulation to generate 1000 random paths, and select the median value from the simulated paths as the prediction output. Figure 3 shows the predicted returns from the simulation and the realized returns data for the three benchmark indices. Although no comparison experiments are provided to better validate the quality of the proposed modeling and predicting method, the simulation results show that the hybrid ARIMA-GARCH model is an appropriate model for predicting the equity returns.

Conclusions and Future Plans
The hybrid ARIMA-GARCH analaysis of financial equity time series data has not been explored in previous works.
In this paper, we focus on modeling and prediction of the daily returns for three US benchmark indices, Dow Transportation, S&P 500, and VIX. By defining a hybrid ARIMA-GARCH model, we analyze the returns data series and fit our model with the historical data fetched from FRED. The prediction results show that the proposed ARIMA-GARCH model is an appropriate model for predicting the equity returns.
A complete data analysis procedure that involves a series of hypothesis testings and a model fitting procedure using the AIC is provided in the project. The prediction results using the fitted ARIMA-GARCH model are also provided as a reference.
In this paper, we only specify the hybrid ARIMA-GARCH model. In the future, if we would like to further improve the prediction accuracy on equity returns, we could also try other approaches like a combination of the proposed