Abstract
1- Introduction
2- Formulation of calibration with varying window sizes: How to endogenise and make different window sizes comparable
3- Application of the Lagrange regularisation method to a simple linear-regression problem
4- Using the Lagrange regularisation method for detecting the beginning of financial bubbles
5- Conclusion
References
Abstract
Motivated by the question of identifying the start time τ of financial bubbles, we propose an improved calibration approach for time series in which the inception of the latest regime of interest is unknown. By taking into account the tendency of a given model to overfit data, we introduce the Lagrange regularisation of the normalised sum of the squared residuals, χ 2 np(Φ), to endogenously detect the optimal fitting window size := w∗ ∈ [τ : t¯ 2] that should be used for calibration, assuming a fixed pseudo present time t¯ 2. The Lagrange regularisation of χ 2 np(Φ) defines the Lagrange regularised sum of the squared residuals, χ 2 λ (Φ). Its performance is exemplified on a simple Linear Regression problem with a change point and compared against the performances of the Residual Sum of Squares (RSS) := χ 2 (Φ) and RSS/(N-p) := χ 2 np(Φ), where N is the sample size, p is the number of degrees of freedom and Φ is the parameter vector. Applied to synthetic models of financial bubbles with a well-defined transition regime and to a number of financial time series (US S&P500, Brazil IBovespa and China SSEC Indices), χ 2 λ (Φ) is found to provide well-defined reasonable determinations of the starting times for major bubbles such as the bubbles ending with the 1987 Black-Monday, the 2008 Sub-prime crisis and minor speculative bubbles on other Indexes, without any further exogenous information. The application of the method thus allows one to endogenise the determination of the starting time of bubbles, a problem that has yet not received a systematic objective solution. Moreover, the technique appears as a practical solution for comparing goodness-of-fit across unbalanced sample sizes.
Introduction
There is an inverse relationship between the tendency of a model to overfit data and the sample size under consideration. In other words, the smaller the sample size, the larger the flexibility for the model with a fixed number of parameters to overfit [1]. Due this characteristic feature, one cannot directly compare goodness-of-fit metrics of statistical models arbitrarily parametrised by the vector Φ of parameters such as the Residual Sum of Squares, RSS := χ 2 (Φ) or its normalised version RSS (N−p) := χ 2 np(Φ), over unequal sized samples. Here, N denotes the sample size while p is the number of degrees of freedom of a model. This has particular relevance when one is interested in selecting the optimal sub-sample of a data set to calibrate a model, a recurrent issue when estimating time series models in a moving window or/and when the model is only valid in a specific time window, which is unknown a priori.