2024 Theses Doctoral
Machine-Learned Anatomic Subtyping, Longitudinal Disease Evaluation and Quantitative Image Analysis on Chest Computed Tomography: Applications to Emphysema, COPD, and Breast Density
Chronic obstructive pulmonary disease (COPD) and emphysema together are one of the leading causes of death in the United States and worldwide; meanwhile, breast cancer has the highest incidence and second-highest mortality burden of all cancers in women. Imaging markers relevant to each of these conditions are readily identifiable on chest computed tomography (CT): (1) visually-appreciable variants in airway tree structure exist which are associated with increased odds for development of COPD; (2) CT emphysema subtypes (CTES), based on lung texture and spatial features, have been identified by unsupervised clustering and correlate with functional measures and clinical outcomes; (3) dysanapsis, or the ratio of airway caliber to lung volume, is the strongest known predictor of COPD risk, and (4) breast density (i.e., the extent of fibroglandular tissue within the breast) is strongly associated with breast cancer risk.
Machine- and deep-learning frameworks present an opportunity to address unmet needs in each of these directions, leveraging the data from large CT cohorts. Application of unsupervised learning approaches serves to discover new, image-based phenotypes. While topologic and
geometric variation in the structure of the CT-resolved airway tree are well-described, tree- structural subtypes are not fully characterized. Similarly, while the clinical correlates of CTES have been described in large cohort studies, the association of CTES with structural and functional measures of the lung parenchyma are only partially described, and the time-dependent evolution of emphysematous lung texture has not been studied.
Supervised approaches are required to automate CT image assessment, or to estimate CT- based measures from incomplete input data. While dysanapsis can be directly quantified on full- lung CT, the lungs are often only partially imaged in large CT datasets; total lung volume must then be regressed from the observed partial image. Breast density grades, meanwhile, are generally visually assessed, which is laborious to perform at scale. Moreover, current automated methods rely on segmentation followed by intensity thresholding, excluding higher-order features which may contribute to the radiologist assessment.
In this thesis, we present a series of machine-learning methods which address each of these gaps in the field, using CT scans from the Multi-Ethnic Study of Atherosclerosis (MESA), the SubPopulations and InteRmediate Outcome Measures in COPD (SPIROMICS) Study, and an institutional chest CT dataset acquired at Columbia University Irving Medical Center.
First, we design a novel graph-based clustering framework for identifying tree-structure subtypes in Billera-Holmes-Vogtmann (BHV) tree-space, using the airway trees segmented from the full-lung CT scans of MESA Lung Exam 5. We characterize the behavior of our clustering algorithm on a synthetic dataset, describe the geometric and topological variation across tree-structure clusters, and demonstrate the algorithm’s robustness to perturbation of the input dataset and graph tuning parameter.
Second, in MESA Lung Exam 5 CT scans, we quantify the loss of small-diameter airway and pulmonary vessel branches within CTES-labeled lung tissue, demonstrating that depletion of these structures is concentrated within CTES regions, and that the magnitude of this effect is CTES-specific. In a sample of 278 SPIROMICS Visit 1 participants, we find that CTES demonstrate distinct patterns of gas trapping and functional small airways disease (fSAD) on expiratory CT imaging. In the CT scans of SPIROMICS participants imaged at Visit 1 and Visit 5, we update the CTES clustering pipeline to identify longitudinal emphysema patterns (LEPs), which refine CTES by defining subphenotypes informative of time-dependent texture change.
Third, we develop a multi-view convolutional neural network (CNN) model to estimate total lung volume (TLV) from cardiac CT scans and lung masks in MESA Lung Exam 5. We demonstrate that our model outperforms regression on imaged lung volume, and is robust to same- day repeated imaging and longitudinal follow-up within MESA. Our model is directly applicable to multiple large-scale cohorts containing cardiac CT and totaling over ten thousand participants.
Finally, we design a 3-D CNN model for end-to-end automated breast density assessment on chest CT, trained and evaluated on an institutional chest CT dataset of patients imaged at Columbia University Irving Medical Center. We incorporate ordinal regression frameworks for density grade prediction which outperform binary or multi-class classification objectives, and we demonstrate that model performance on identifying high breast density is comparable to the inter-rater reliability of expert radiologists on this task.
Subjects
Files
This item is currently under embargo. It will be available starting 2025-10-09.
More About This Work
- Academic Units
- Biomedical Engineering
- Thesis Advisors
- Laine, Andrew F.
- Degree
- Ph.D., Columbia University
- Published Here
- November 6, 2024