Theses Doctoral

Three Essays on Panel Data Models in Econometrics

Lu, Lina

My dissertation consists of three chapters that focus on panel data models in econometrics and under high dimensionality; that is, both the number of individuals and the number of time periods are large. This high dimensionality is widely applicable in practice, as economists increasingly face large dimensional data sets. This dissertation contributes to the methodology and techniques that deal with large data sets.
All the models studied in the three chapters contain a factor structure, which provides various ways to extract information from large data sets. Chapter 1 and Chapter 2 use the factor structure to capture the comovement of economic variables, where the factors represent the common shocks and the factor loadings represent the heterogeneous responses to these shocks. Common shocks are widely present in the real world, for example, global financial shocks, macroeconomic shocks and energy price shocks. In applications where common shocks exist, failing to capture these common shocks would lead to biased estimation. Factor models provide a way to capture these common shocks. In contrast to Chapter 1 and Chapter 2, Chapter 3 directly focuses on the factor model with the loadings being constrained, in order to reduce the number of parameters to be estimated.
In addition to the common shocks effect, Chapter 1 considers two other effects: spatial effects and simultaneous effects. The spatial effect is present in models where dependent variables are spatially interacted and spatial weights are specified based on location and distance, in a geographic space or in more general economic, social or network spaces. The simultaneous effect comes from the endogeneity of the dependent variables in a simultaneous equations system, and it is important in many structural economic models. A model including all these three effects would be useful in various fields.
In estimation, all the three chapters propose quasi-maximum likelihood (QML) based estimation methods and further study the asymptotic properties of these estimators by providing a full inferential theory, which includes consistency, convergence rate and limiting distribution. Moreover, I conduct Monte-Carlo simulations to investigate the finite sample performance of these proposed estimators.
Specifically, Chapter 1 considers a simultaneous spatial panel data model with common shocks. Chapter 2 studies a panel data model with heterogenous coefficients and common shocks. Chapter 3 studies a high dimensional constrained factor model.
In Chapter 1, I consider a simultaneous spatial panel data model, jointly modeling three effects: simultaneous effects, spatial effects and common shock effects. This joint modeling and consideration of cross-sectional heteroskedasticity result in a large number of incidental parameters. I propose two estimation approaches, a QML method and an iterative generalized principal components (IGPC) method. I develop full inferential theories for the two estimation approaches and study the trade-off between the model specifications and their respective asymptotic properties. I further investigate the finite sample performance of both methods using Monte-Carlo simulations. I find that both methods perform well and that the simulation results corroborate the inferential theories. Some extensions of the model are considered. Finally, I apply the model to analyze the relationship between trade and GDP using a panel data over time and across countries.
Chapter 2 investigates efficient estimation of heterogeneous coefficients in panel data models with common shocks, which have been a particular focus of recent theoretical and empirical literature. It proposes a new two-step method to estimate the heterogeneous coefficients. In the first step, a QML method is first conducted to estimate the loadings and idiosyncratic variances. The second step estimates the heterogeneous coefficients by using the structural relations implied by the model and replacing the unknown parameters with their QML estimates. Further, Chapter 2 establishes the asymptotic theory of the estimator, including consistency, asymptotic representation, and limiting distribution. The two-step estimator is asymptotically efficient in the sense that it has the same limiting distribution as the infeasible generalized least squares (GLS) estimator. Intensive Monte-Carlo simulations show that the proposed estimator performs robustly in a variety of data setups.
Chapter 3 documents the estimation and inferential theory of high dimensional constrained factor models. Factor models have been widely used in practice. However, an undesirable feature of a high dimensional factor model is that the model has too many parameters. An effective way to address this issue, proposed in Tsai and Tsay (2010), is to decompose the loadings matrix by a high-dimensional known matrix multiplying with a low-dimensional unknown matrix, which Tsai and Tsay (2010) name the constrained factor models. Chapter 3 proposes a QML method to estimate the model and develops the asymptotic properties of its estimators. A new statistic is proposed for testing the null hypothesis of constrained factor models against the alternative of standard factor models. Partially constrained factor models are also investigated. Monte-Carlo simulations confirm the theoretical results and show that the QML estimators and the proposed new statistic perform well in finite samples. Chapter 3 also considers the extension to an approximate constrained factor model where the idiosyncratic errors are allowed to be weakly dependent processes.


  • thumnail for Lu_columbia_0054D_13934.pdf Lu_columbia_0054D_13934.pdf application/pdf 1.73 MB Download File

More About This Work

Academic Units
Thesis Advisors
Bai, Jushan
Ph.D., Columbia University
Published Here
August 6, 2017