Theses Doctoral

Unsupervised and Weakly-Supervised Learning of Localized Texture Patterns of Lung Diseases on Computed Tomography

Yang, Jie

Computed tomography (CT) imaging enables in vivo assessment of lung parenchyma and several lung diseases. CT scans are key in particular for the diagnosis of 1) chronic obstructive pulmonary disease (COPD), which is the fourth leading cause of death worldwide, and largely overlaps with pulmonary emphysema; and 2) lung cancer, which is the first leading cause of cancer-related death, and manifests in its early stage with the presence of lung nodules.
Most lung CT image analysis methods to-date have relied on supervised learning requiring manually annotated local regions of interest (ROIs), which are slow and labor-intensive to obtain. Machine learning models requiring less or no manual annotations are important for a sustainable development of computer-aided diagnosis (CAD) systems.
This thesis focused on exploiting CT scans for lung disease characterization via two learning strategies: 1) fully unsupervised learning on a very large amount of unannotated image patches to discover novel lung texture patterns for pulmonary emphysema; and 2) weakly-supervised learning to generate voxel-level localization of lung nodules from CT whole-slice labels.
In the first part of this thesis, we proposed an original unsupervised approach to learn emphysema-specific radiological texture patterns. We have designed dedicated spatial and texture features and a two-stage learning strategy incorporating clustering and graph partitioning. Learning was performed on a cohort of 2,922 high-resolution full-lung CT scans, which included a high prevalence of smokers and COPD subjects. Experiments lead to discovering 10 highly-reproducible spatially-informed lung texture patterns and 6 quantitative emphysema subtypes (QES). Our discovered QES were associated independently with distinct risk of symptoms, physiological changes, exacerbations and mortality. Genome-wide association studies identified loci associated with four subtypes.
Then we designed a deep-learning approach, using unsupervised domain adaptation with adversarial training, to label the QES on cardiac CT scans, which included approximately 70% of the lung. Our proposed method accounted for the differences in CT image qualities, and enabled us to study the progression of QES on a cohort of 17,039 longitudinal cardiac and full-lung CT scans.
Overall, the discovered QES provide novel emphysema sub-phenotyping that may facilitate future study of emphysema development, understanding the stages of COPD and the design of personalized therapies.
In the second part of the thesis, we have designed a deep-learning method for lung nodule detection with weak labels, using classification convolutional neural networks (CNNs) with skip-connections to generate high-quality discriminative class activation maps, and a novel candidate screening framework to reduce the number of false positives. Given that the vast majority of annotated nodules are benign, we further exploited a data augmentation framework with a generative adversarial network (GAN) to address the issue of data imbalance for lung cancer prediction. Our weakly-supervised lung nodule detection on 1,000s CT scans achieved competitive performance compared to a fully-supervised method, while requiring 100 times less annotations. Our data augmentation framework enabled synthesizing nodules with high fidelity in specified categories, and is beneficial for predicting nodule malignancy scores and hence improving the accuracy / reliability of lung cancer screening.


  • thumnail for Yang_columbia_0054D_15045.pdf Yang_columbia_0054D_15045.pdf application/pdf 28.7 MB Download File

More About This Work

Academic Units
Biomedical Engineering
Thesis Advisors
Laine, Andrew F.
Ph.D., Columbia University
Published Here
January 22, 2019