Theses Doctoral

Application of Distance Covariance to Time Series Modeling and Assessing Goodness-of-Fit

Fernandes, Leon

The overarching goal of this thesis is to use distance covariance based methods to extend asymptotic results from the i.i.d. case to general time series settings. Accounting for dependence may make already difficult statistical inference all the more challenging. The distance covariance is an increasingly popular measure of dependence between random vectors that goes beyond linear dependence as described by correlation. It is defined by a squared integral norm of the difference between the joint and marginal characteristic functions with respect to a specific weight function. Distance covariance has the advantage of being able to detect dependence even for uncorrelated data. The energy distance is a closely related quantity that measures distance between distributions of random vectors. These statistics can be used to establish asymptotic limit theory for stationary ergodic time series. The asymptotic results are driven by the limit theory for the empirical characteristic functions.

In this thesis we apply the distance covariance to three problems in time series modeling: (i) Independent Component Analysis (ICA), (ii) multivariate time series clustering, and (iii) goodness-of-fit using residuals from a fitted model. The underlying statistical procedures for each topic uses the distance covariance function as a measure of dependence. The distance covariance arises in various ways in each of these topics; one as a measure of independence among the components of a vector, second as a measure of similarity of joint distributions and, third for assessing serial dependence among the fitted residuals. In each of these cases, limit theory is established for the corresponding empirical distance covariance statistics when the data comes from a stationary ergodic time series.

For Topic (i) we consider an ICA framework, which is a popular tool used for blind source separation and has found application in fields such as financial time series, signal processing, feature extraction, and brain imaging. The Structural Vector Autogregression (SVAR) model is often the basic model used for modeling macro time series. The residuals in such a model are given by e_t = A S_t, the classical ICA model. In certain applications, one of the components of S_t has infinite variance. This differs from the standard ICA model. Furthermore the e_t's are not observed directly but are only estimated from the SVAR modeling. Many of the ICA procedures require the existence of a finite second or even fourth moment. We derive consistency when using the distance covariance for measuring independence of residuals under the infinite variance case.Extensions to the ICA model with noise, which has a direct application to SVAR models when testing independence of residuals based on their estimated counterparts is also considered.

In Topic (ii) we propose a novel methodology for clustering multivariate time series data using energy distance. Specifically, a dissimilarity matrix is formed using the energy distance statistic to measure separation between the finite dimensional distributions for the component time series. Once the pairwise dissimilarity matrix is calculated, a hierarchical clustering method is then applied to obtain the dendrogram. This procedure is completely nonparametric as the dissimilarities between stationary distributions are directly calculated without making any model assumptions. In order to justify this procedure, asymptotic properties of the energy distance estimates are derived for general stationary and ergodic time series.

Topic (iii) considers the fundamental and often final step in time series modeling, assessing the quality of fit of a proposed model to the data. Since the underlying distribution of the innovations that generate a model is often not prescribed, goodness-of-fit tests typically take the form of testing the fitted residuals for serial independence. However, these fitted residuals are inherently dependent since they are based on the same parameter estimates and thus standard tests of serial independence, such as those based on the autocorrelation function (ACF) or distance correlation function (ADCF) of the fitted residuals need to be adjusted. We apply sample splitting in the time series setting to perform tests of serial dependence of fitted residuals using the sample ACF and ADCF. Here the first f_n of the n data points in the time series are used to estimate the parameters of the model. Tests for serial independence are then based on all the n residuals. With f_n = n/2 the ACF and ADCF tests of serial independence tests often have the same limit distributions as though the underlying residuals are indeed i.i.d. That is, if the first half of the data is used to estimate the parameters and the estimated residuals are computed for the entire data set based on these parameter estimates, then the ACF and ADCF can have the same limit distributions as though the residuals were i.i.d. This procedure ameliorates the need for adjustment in the construction of confidence bounds for both the ACF and ADCF, based on the fitted residuals, in goodness-of-fit testing. We also show that if f_n < n/2 then the asymptotic distribution of the tests stochastically dominate the corresponding asymptotic distributions for the true i.i.d. noise; the stochastic order gets reversed under f_n > n/2.