2020 Theses Doctoral

# Essays in Econometrics

My dissertation explores two broad areas in econometrics and statistics. The first area is nonparametric identification and estimation with endogeneity using instrumental variables. The second area is related to low-rank matrix recovery and high-dimensional panel data models. The following three chapters study different topics in these areas.

Chapter 1 considers identification and estimation of triangular models with a discrete endogenous variable and an instrumental variable (IV) taking on fewer values. Using standard approaches, the small support set of the IV leads to under-identification due to the failure of the order condition. This chapter develops the first approach to restore identification for both separable and nonseparable models in this case by supplementing the IV with covariates, allowed to enter the model in an arbitrary way. For the separable model, I show that it satisfies a system of linear equations, yielding a simple identification condition and a closed-form estimator. For the nonseparable model, I develop a new identification argument by exploiting its continuity and monotonicity, leading to weak sufficient conditions for global identification. Built on it, I propose a uniformly consistent and asymptotically normal sieve estimator. I apply my approach to an empirical application of the return to education with a binary IV. Though under-identified by the IV alone, I obtain results consistent with the empirical literature using my method. I also illustrate the applicability of the approach via an application of preschool program selection where the supplementation procedure fails.

Chapter 2, written with Jushan Bai, studies low-rank matrix recovery with a non-sparse error matrix. Sparsity or approximate sparsity is often imposed on the error matrix for low-rank matrix recovery in statistics and machine learning literature. In econometrics, on the other hand, it is more common to impose a location normalization for the stochastic errors. This chapter sheds light on the deep connection between the median zero assumption and the sparsity-type assumptions by showing that the principal component pursuit method, a popular approach for low-rank matrix recovery by Candès et al. (2011), consistently estimates the low-rank component under a median zero assumption. The proof relies on a new theoretical argument showing that the median-zero error matrix can be decomposed into a matrix with a sufficient number of zeros and a non-sparse matrix with a small norm that controls the estimation error bound. As no restriction is imposed on the moments of the errors, the results apply to cases when the errors have heavy- or fat-tails.

In Chapter 3, I consider nuclear norm penalized quantile regression for large N and large T panel data models with interactive fixed effects. As the interactive fixed effects form a low-rank matrix, inspired by the median-zero interpretation, the estimator in this chapter extends the one studied in Chapter 2 by incorporating a conditional quantile restriction given covariates. The estimator solves a global convex minimization problem, not requiring pre-estimation of the (number of the) fixed effects. Uniform rates are obtained for both the slope coefficients and the low-rank common component of the interactive fixed effects. The rate of the latter is nearly optimal. To derive the rates, I show new results that establish uniform bounds of norms of certain random matrices of jump processes. The performance of the estimator is illustrated by Monte Carlo simulations.

## Files

- Feng_columbia_0054D_15909.pdf application/pdf 1.49 MB Download File

## More About This Work

- Academic Units
- Economics
- Thesis Advisors
- Bai, Jushan
- Lee, Simon
- Degree
- Ph.D., Columbia University
- Published Here
- July 13, 2020