Can Multistep Nonparametric Regressions Beat Historical Average in Predicting Excess Stock Returns?

Several economic and financial variables are said to have predictive power over excess stock returns. Empirically there is little consensus among academics, whether these variables have predictive power or not. Results are often sensitive to the econometric model of choice. The econometric models can produce biased results due to the high degree of persistence in predictive variables. Apart from high persistence, the relationship between stock return and the predictive variable may also be misspecified in the model. In order to address possible non-linearities and endogeneity between the residuals and persistent independent variables in predictive regressions, multi-step non-parametric and semiparametric regressions are explored in this paper. In these regressions, the conditional mean and the residuals are estimated separately and then added to obtain the predicted excess stock returns. Goyal and Welch's (2008) predictive variables are used to predict excess S&P 500 returns. The predictive performance of both in-sample and out-of-sample of the two proposed models are compared with the historical average, Ordinary Least Squares (OLS) and non-parametric regressions. The performance of the models is evaluated using Root Mean Squared Errors (RMSEs). The explored models, particularly the two-step nonparametric model, outperform the compared models in-sample. Out-of-sample several variables are found to have predictive ability.


Introduction
This paper explores two multi-step non-parametric and semi-parametric methods, which estimate the conditional mean and the residuals separately. Preliminary work done in this area involved using OLS regression of returns on lagged instrument variables that have predictive power over stock returns. While this is not the first attempt to apply non-parametric to predict excess stock returns, see Jin et. al, (2013), Lee et. al, (2014), and Chen & Hong (2016), the models explored in this paper have not been applied before. Prior to the late twentieth century, the consensus in the finance literature was that excess stock returns were entirely unpredictable (Fama, 1970), attributing to the efficient market hypothesis. However, towards the end of the century, numerous studies came out that believed otherwise; several variables were found to have predictive power over excess stock return. Fama and French (1988a) and Poterba and Summers (1988) find that the statistical significance of their univariate model using only past returns improves greatly when predictive variables are added to the model. Among many economic variables that are found to have predictive powers, the most notable are short term interest rates (Fama & Schwert, 1977), yield spreads (Campbell J. Y., 1987), stock market volatility (Goyal & Santa-Clara, 2003;Yin, 2019), book-to-market ratios (Ponti and Schall, 1998), price-earnings ratios (Campbell and Shiller 1988), and dividend-price ratio (Campbell and Shiller, 1988;Fama and French, 1988b;Lettau and Van Nieuwerburgh, 2008). Li and Tsiakas (2017) find excess return to be predictable out-of-sample when many of these economic variables are used in a kitchen sink regression with shrinkage.
Given the noisy nature of stock returns a sizable portion of the series tends to remain unpredictable, however, based on in-sample tests there now seems to be a consensus among the financial economists that the series does contain a significant predictable component (Campbell, 2000). Using bivariate predictive regression Goyal & Welch (2008) show that these predicting variables perform poorly, in comparison with historical average excess stock return in out-of-sample forecasts. Campbell & Thompson (2008) on the other hand, using a priori knowledge about the regression parameters, impose sign restrictions on the regression parameters; and show that many predictive variables have better out-of-sample performance than historical average return. Baltas and Karyampas (2018) attribute the sensitivity in the predictive ability to stages in the business cycle, and Tsiakas, Li and Zhang (2020) find certain variables to have predictive power during expansions and some during recessions.
Controversy surrounding the out-of-sample performance of the predictive variables cast doubt over the predictive ability of these variables. Whether the contradicting results are due to model misspecification pose even serious concern. The non-robust results of return predictability may stem from the statistical tests performed (Lamoureux & Zhou, 1996). Using a linear model when the true data generation process is non-linear may seriously undermine forecasts. Chen & Hong (2016) point out that linear model might not be appropriate to capture the movements in stock return and suggest using non-parametric regressions, which can capture the linearities and non-linearities in the data without imposing parametric restrictions. According to Chen and Hong (2016) the restrictions imposed by Campbell and Thomspon are ways of introducing non-linearity into the model, they too like the latter find predictive variables to outperform historical average in a non-parametric setting. Parametric and non-parametric forecast combination models also reach a similar conclusion (Elliott et. al, 2013;Jin et. al, 2013).
Another plausible reason for contradicting results on the out-of-sample predictive ability of variables noted as predictive variables in the literature is due to the non-stationarities in the explanatory variables. Roll (2002) argues that in the presence of rational expectation if the innovations are identically and independently distributed then the expectation about a future quantity must follow a random walk. Stock prices are based on expectations about a future quantity, and explanatory variables like dividend yield and book to market ratio are in turn functions of stock prices. Thus, these explanatory variables must also follow a random walk. Unbalanced predictive regression of stationary stock return and non-stationary dividend yield may lead one to conclude that dividend yield has no predictive power. Structural breaks might also be present in the data, for instance, Fama and French (2001) have pointed out a dramatic fall in the proportion of firms paying dividends in the late 1970s. If not careful these structural breaks might be incorrectly categorized as non-stationarity. Apart from the term spread prior to 1952 and dividend yield in the period 1926 to 1994, they find the presence of unit root in all popular predictive variables. Using international data Torous, Valkanov and Yan (2004) show that when dividend to price ratio is stationary it has predictive power and not when it is non-stationary. Torous, Valkanov and Yan (2004) find the presence of unit root in almost all commonly used predictive variables, within a 95% confidence interval. In pre-1926and post-1994data Torous, Valkanov, & Yan's (2004 tests indicate the presence of unit root in dividend yield and when dividend yield from those sub-periods are used to predict stock excess return, the predictive power is lost. Thus, the presence of unit root in predictive variables might explain why in certain cases they are found to have predictive power and not in other cases. Due to the possibility of a nonlinear relationship between excess stock return and predictive variables, and nonstationarities in the predictive variables this paper explores two multi-step non-parametric and semi-parametric methods, which estimate the conditional mean and the residuals separately. The motivation is to evaluate whether such augmented non-parametric regressions can predict excess stock return in-sample and out-of-sample. The empirical performances of the proposed models in this paper are compared with the historical mean model, simple OLS model, local constant and local linear non-parametric models, on the basis of the root mean squared (forecast) errors. Analysis is performed using Goyal and Welch's (2008) original data till 2005 and using the extended data till 2019. The results should be relevant to practitioners and academics attempting similar models to predict excess stock returns and help inform their decisions to proceed.
Several methods have been explored to correct this bias. Stambaugh (1999) for instance uses the analytical expression of the bias in univariate linear, popularly known as Stambaugh's bias, and corrects the biased estimates accordingly. The analytical expression of bias derived by Stambaugh (1999) holds only when the dependent variable is stationary and under normality. Both stationarity of predictive variables and normality in error terms are strong assumptions in models of excess return (Roll, 2002). Amihud and Hurvich (2004) propose using a two-step augmented regression where the conditional mean and residuals are estimated separately using linear regression. The work proposed in this paper follows Amihud and Hurvich's (2004) two-step augmented regression, where the parametric models are replaced with non-parametric and semiparametric counterparts.
The paper is organized as follows, section 2 presents the estimation of the two multi-step nonparametric and semiparametric regressions explored, along with the other models used for comparison, section 3 shares the empirical results, and section 4 concludes.

OLS
Preliminary studies use linear regression to predict excess return using other financial variables and their lags, that tend to move with excess return, such a model is shown by (1), where r t is the excess return and x t − 1 is a lagged explanatory variable. The parameters of the simple OLS regression are estimated by (2), where the t th row of matrix X and vector R are (1, x t − 1 ) and (r t ), respectively, and the predicted return, t , OLS is given by (3).
OLS estimates are unbiased if all the information in x t − 1 has been used to predict r t . As most financial variables are highly persistent, there is information about the lags in x t − 1 that is not independent of u t . For instance, if the predicting variable, x t − 1 , follows an AR (1) process like (4), then E(x t − 1 |u t )≠0. If x t − 1 is persistent the error terms in (1) and (4) are not independent of each other and can be expressed using (5), where ξ≠0 and ε t are i.i.d. errors that are independent of v t and its lags. Thus, a simple OLS with autoregressive predicting variables will result in biased estimates.

Historical Average (HA)
Goyal and Welch (2008) compare the simple OLS predicted returns with the Historical Average (HA) returns shown in (6), the predicted returns are the average of the past realized returns.

Nonparametric (NP)
Instead of assuming the data generation process, to be a linear model, as shown in (1), the functional form can be expressed as m(x t − 1 ) using a local constant non-parametric model as shown in (7).
For a discrete random x t − 1 there are n* observations in its neighborhood, let them be x, m(x t − 1 ) is the average of the r t 's corresponding to the x's (Pagan & Ullah, 1999). h is the window width that determines the size of the neighborhood of x t − 1 that will be used to find m(x t − 1 ), as shown in (8).
where ψ t − 1 = (x − x t − 1 )⁄h. A kernel function K can be used for smoothing as illustrated in (9).
While local constant minimizes ∑ [ ] with respect to m; local linear minimizes

∑[ ]
Although the nonparametric regression addresses the specification bias stemming from selecting the functional form between r t and x t − 1 , it does not take into account the predictive bias stemming from highly autoregressive x t − 1 . This paper explores two new multistep nonparametric and semiparametric models to address that predictive regression bias.

Model 1: Multistep Semiparametric Model (Multistep SP)
In the multistep semi-parametric model, excess stock returns are predicted using a combination of linear and non-linear models. Any linear relationship between the excess stock return and the predictive variable is first captured using an OLS regression as (1). The linear prediction is then re-scaled for additional nonlinearities. Any remaining non-linearities and the endogeneity between x t − 1 and u t are then addressed by nonparametrically estimating the residuals of (1), u t , using the residuals of the AR(1) process of x t − 1 , v t . After running the OLS regressions (1) and (4) the residuals are saved and used in a nonparametric regression as shown in (10). The estimated values of t , SP = t ) are then used to update equation (1) as illustrated in (11). The predicted excess stock returns t,SP , is a sum of the predicted excess return from the OLS model in (1) and the predicted residual from (10).

Model 2: Multistep Nonparametric Model (Multistep NP)
The multistep nonparametric model is similar to the previous model discussed, except the linear regressions (1) and (4) are replaced with nonparametric regressions.
Step 1: Excess stock returns are regressed on the predictive variables using nonparametric regressions as in (12) and the residuals, t , NP are saved.
Step 4: Excess stock returns are predicted as the sum of the predicted values of (12) and (14). An across-the-board non-parametric model addresses not only any nonlinear relationship between excess stock return and the predictive variable but also any nonlinear relationship the predictive variable may have with its own past.
r t, NP P = m (x t − 1 ) + m 2 (v t − 1, NP ) (15) r t, NPP = r t, NP + t , NP P In the next section, the predictive performance in-sample and out-of-sample of the two proposed models are compared with the historical average, OLS and nonparametric regressions, for the predictive variables used in Goyal and Welch (2008) and Campbell and Thompson (2008).

Empirical Results
Annual S&P 500 Index return with dividends in excess of the risk-free return are predicted using the historical average in (6), OLS regression model in (1), nonparametric regression (NP) as in (7), proposed multistep semi-parametric (Multistep SP) and nonparametric models (Multistep NP). Data is collected from Amit Goyal's website.  Bold typeface in each row indicates the model with the lowest RMSE when compared till 4 decimal places. Start reports the start year of the sample. ρ is the one-lag autocorrelation of the independent variable. The dependent variable is risk premium with dividends.
The out-of-sample Root Mean Squared Forecast Error (RMSFE) for the original data till 2005 of the aforementioned models is presented in Table 3. Rolling expanding window is used for estimation, with the first sample using 20 years of data. The estimated model is used to forecast the one year ahead excess S&P 500 return. The bold typeface indicates the model with the lowest RMSFE for respective predictive variables. The historic model outperforms the other models in the out-of-sample analysis in half of the cases. In the other half of the variables studied the predictive models were able to out predict the historical average in terms of lower forecast errors. In out-of-sample local constant regressions tend to produce lower forecast errors than corresponding local linear models. The nonparametric and semiparametric models that outperform the historical average in-sample but not in out-of-sample analysis likely suffer from overfitting. Although no model consistently outperforms the others studied, it does indicate which model is better suited based on the variable in question. It is not unusual to expect that each of these variables have unique relationships or possibly influences on stock returns, and one particular model may not be suitable for all. The last three rows present results for dividend yield, earnings price ratio and book to market ratio, for samples starting in year 1928. It can be seen that the results are also sensitive to the starting year. Earning to price ratio does not appear to have predictive ability based on the models tested when the sample starts from 1873. However, changing the start year to 1928 changed the predictive performance of the models, and all the studied models are able to outperform the historic average. Measures such as RMSFE can be swayed by extremely large forecast errors, even if they are rare. Out-of-sample analysis extended till 2019 are presented in Table 4. In the extended data, the gains from non-parametric and semiparametric are reduced and historical average tends to dominate in most variables. However, dividend yield spread, book to market ratio, investment capital and percent of equity issuing continue to show predictive powers in the extended data. Local linear models tend to do better in-sample compared to local constant, whereas out of sample local constant produces lower forecast errors. Bold typeface in each row indicates the model with the lowest RMSFE when compared till 5 decimal places. Start reports the start year of the sample. Expanding window is used for estimation, with 20 years bands. The dependent variable is risk premium with dividends.

Conclusion
Predictability of stock return is an elusive subject, and whether certain variables have predictive power over stock return has yet to cease the interest of many academics and practitioners. The presence of high autocorrelation in the predictive variables and possible non-linearities in their relationship with stock return further complicates the matter. In order to address the possible non-linearity and endogeneity between the residuals due to the persistent independent variables in the predictive regression, multistep semiparametric and non-parametric methods are explored, where the conditional mean and the residuals are estimated separately and added to obtain the predicted excess stock return. Using Goyal and Welch's (2008) predictive variables, the proposed models particularly the multistep nonparametric model produce better estimates of the excess S&P 500 return in-sample than the historical average and OLS regression.
Out-of-sample the results are mixed, while in many variables the historical average dominates in terms of producing lower forecast errors, there are several variables, that are able to better predict the stock excess returns than the historical average. Future research in this area can focus on studying individual variables and their relationship with excess stock returns to find the most suitable forecasting model. Different estimation and forecast windows may also provide forecasting opportunities. In order to reduce overfitting often encountered in non-parametric regression, possible regularization parameters can be explored.