2015 Theses Doctoral

# Learning Logic Rules for Disease Classification: With an Application to Developing Criteria Sets for the Diagnostic and Statistical Manual of Mental Disorders

This dissertation develops several new statistical methods for disease classification that directly account for the unique logic structure of criteria sets found in the Diagnostic and Statistical Manual of Mental Disorders. For psychiatric disorders, a clinically significant anatomical or physiological deviation cannot be used to determine disease status. Instead, clinicians rely on criteria sets from the Diagnostic and Statistical Manual of Mental Disorders to make diagnoses. Each criteria set is comprised of several symptom domains, with the domains determined by expert opinion or psychometric analyses. In order to be diagnosed, an individual must meet the minimum number of symptoms, or threshold, required for each domain. If both the overall number of domains and the number of symptoms within each domain are small, an exhaustive search to determine these thresholds is feasible, with the thresholds chosen to minimize the overall misclassification rate. However, for more complicated scenarios, such as incorporating a continuous biomarker into the diagnostic criteria, a novel technique is necessary. In this dissertation, we propose several novel approaches to empirically determine these thresholds.

Within each domain, we start by fitting a linear discriminant function based upon a sample of individuals in which disease status and the number of symptoms present in that domain are both known. Since one must meet the criteria for all domains, an overall positive diagnosis is only issued if the prediction in each domain is positive. Therefore, the overall decision rule is the intersection of all the domain specific rules. We fit this model using several approaches. In the first approach, we directly apply the framework of the support vector machine (SVM). This results in a non-convex minimization problem, which we can approximate by an iterative algorithm based on the Difference of Convex functions algorithm. In the second approach, we recognize that the expected population loss function can be re-expressed in an alternative form. Based on this alternative form, we propose two more iterative algorithms, SVM Iterative and Logistic Iterative. Although the number of symptoms per domain for the current clinical application is small, the proposed iterative methods are general and flexible enough to be adapted to complicated settings such as using continuous biomarker data, high-dimensional data (for example, imaging markers or genetic markers), other logic structures, or non-linear discriminant functions to assist in disease diagnosis.

Under varying simulation scenarios, the Exhaustive Search and both proposed methods, SVM Iterative and Logistic Iterative, have good performance characteristics when compared with the oracle decision rule. We also examine one simulation in which the Exhaustive Search is not feasible and find that SVM Iterative and Logistic Iterative perform quite well. Each of these methods is then applied to a real data set in order to construct a criteria set for Complicated Grief, a new psychiatric disorder of interest. As the domain structure is currently unknown, both a two domain and three domain structure is considered. For both domain structures, all three methods choose the same thresholds. The resulting criteria sets are then evaluated on an independent data set of cases and shown to have high sensitivities. Using this same data, we also evaluate the sensitivity of three previously published criteria sets for Complicated Grief. Two of the three published criteria sets show poor sensitivity, while the sensitivity of the third is quite good. To fully evaluate our proposed criteria sets, as well as the previously published sets, a sample of controls is necessary so that specificity can also be assessed. The collection of this data is currently ongoing. We conclude the dissertation by considering the influence of study design on criteria set development and its evaluation. We also discuss future extensions of this work such as handling complex logic structures and simultaneously discovering both the domain structure and domain thresholds.

## Subjects

## Files

- Mauro_columbia_0054D_12538.pdf application/pdf 1.38 MB Download File

## More About This Work

- Academic Units
- Biostatistics
- Thesis Advisors
- Wang, Yuanjia
- Degree
- Ph.D., Columbia University
- Published Here
- February 24, 2015