Predicting Daily Probability Distributions of S&P500 Returns

Most approaches in forecasting merely try to predict the next value of the time series.In contrast, this paper presents a framework to predict the full probability distribution. Itis expressed as a mixture model: the dynamics of the individual states is modeled with so-called"experts" (potentially nonlinear neural networks), and the dynamics between the states is modeledusing a hidden Markov approach. The full density predictions are obtained by a weighted superpositionof the individual densities of each expert. This model class is called "hidden Markov experts".Results are presented for daily S&P500 data. While the predictive accuracy of the mean doesnot improve over simpler models, evaluating the prediction of the full density shows a clear out-of-sampleimprovement both over a simple GARCH(1,l) model (which assumes Gaussian distributedreturns) and over a "gated experts" model (which expresses the weighting for each state non-recursivelyas a function of external inputs). Several interpretations are given: the blending ofsupervised and unsupervised learning, the discovery of hidden states, the combination of forecasts,the specialization of experts, the removal of outliers, and the persistence of volatility.

[1]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2]  Ashok N. Srivastava,et al.  Nonlinear gated experts for time series: discovering regimes and avoiding overfitting , 1995, Int. J. Neural Syst..

[3]  James D. Hamilton Analysis of time series subject to changes in regime , 1990 .

[4]  K. Pearson Contributions to the Mathematical Theory of Evolution , 1894 .

[5]  A. B. Poritz,et al.  Linear predictive hidden Markov models and the speech signal , 1982, ICASSP.

[6]  B.-H. Juang,et al.  On the hidden Markov model and dynamic time warping for speech recognition — A unified view , 1984, AT&T Bell Laboratories Technical Journal.

[7]  Jens Timmer,et al.  Modeling Volatility Using State Space Models , 1997, Int. J. Neural Syst..

[8]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[9]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[10]  Louis A. Liporace,et al.  Maximum likelihood estimation for multivariate observations of Markov sources , 1982, IEEE Trans. Inf. Theory.

[11]  A. H. Murphy,et al.  Screening probability forecasts: contrasts between choosing and combining , 1995 .

[12]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[13]  Anthony S. Tay,et al.  Evaluating Density Forecasts , 1997 .

[14]  Andrew J. Filardo Business-Cycle Phases and Their Transitional Dynamics , 1994 .

[15]  A. H. Murphy,et al.  Diagnostic verification of probability forecasts , 1992 .

[16]  Andreas S. Weigend,et al.  Time Series Prediction: Forecasting the Future and Understanding the Past , 1994 .

[17]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[18]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[19]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[20]  James D. Hamilton,et al.  Long Swings in the Dollar: Are They in the Data and Do Markets Know It? The American Economic Review , 1990 .

[21]  J. M. Bates,et al.  The Combination of Forecasts , 1969 .

[22]  Andrew Harvey,et al.  Forecasting, Structural Time Series Models and the Kalman Filter , 1990 .

[23]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[24]  A. Poritz,et al.  Hidden Markov models: a guided tour , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[25]  Thomas H. McCurdy,et al.  Duration-Dependent Transitions in a Markov Model of U.S. GNP Growth , 1994 .

[26]  M. Rosenblatt Remarks on a Multivariate Transformation , 1952 .

[27]  James D. Hamilton,et al.  Autoregressive conditional heteroskedasticity and changes in regime , 1994 .

[28]  S. Shi,et al.  Markov gated experts for time series analysis: beyond regression , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[29]  Bruce E. Hansen,et al.  Erratum: The likelihood ratio test under nonstandard conditions: Testing the Markov switching model of GNP , 1996 .

[30]  L. Baum,et al.  An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology , 1967 .

[31]  Andreas S. Weigend,et al.  Nonlinear Trading Models Through Sharpe Ratio Maximization , 1997, Int. J. Neural Syst..

[32]  Andrew D. Back,et al.  A First Application of Independent Component Analysis to Extracting Structure from Stock Returns , 1997, Int. J. Neural Syst..

[33]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[34]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[35]  Kajal Lahiri,et al.  Predicting cyclical turning points with leading index in a markov switching model , 1994 .

[36]  Jonathan D. Cryer,et al.  Time Series Analysis , 1986, Encyclopedia of Big Data.

[37]  Stephen Gray Modeling the Conditional Distribution of Interest Rates as a Regime-Switching Process , 1996 .

[38]  James D. Hamilton A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle , 1989 .

[39]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.

[40]  Chris Chatfield,et al.  Calculating Interval Forecasts , 1993 .