Academic Commons


Development and validation of a classification approach for extracting severity automatically from electronic health records

Boland, Mary Regina; Tatonetti, Nicholas P.; Hripcsak, George

Electronic Health Records (EHRs) contain a wealth of information useful for studying clinical phenotype-genotype relationships. Severity is important for distinguishing among phenotypes; however other severity indices classify patient-level severity (e.g., mild vs. acute dermatitis) rather than phenotype-level severity (e.g., acne vs. myocardial infarction). Phenotype-level severity is independent of the individual patient’s state and is relative to other phenotypes. Further, phenotype-level severity does not change based on the individual patient. For example, acne is mild at the phenotype-level and relative to other phenotypes. Therefore, a given patient may have a severe form of acne (this is the patient-level severity), but this does not effect its overall designation as a mild phenotype at the phenotype-level.

We present a method for classifying severity at the phenotype-level that uses the Systemized Nomenclature of Medicine – Clinical Terms. Our method is called the Classification Approach for Extracting Severity Automatically from Electronic Health Records (CAESAR). CAESAR combines multiple severity measures – number of comorbidities, medications, procedures, cost, treatment time, and a proportional index term. CAESAR employs a random forest algorithm and these severity measures to discriminate between severe and mild phenotypes.

Using a random forest algorithm and these severity measures as input, CAESAR differentiates between severe and mild phenotypes (sensitivity = 91.67, specificity = 77.78) when compared to a manually evaluated reference standard (k = 0.716).

CAESAR enables researchers to measure phenotype severity from EHRs to identify phenotypes that are important for comparative effectiveness research.

Electronic Health Records Phenotype Health status indicators Data mining Outcome assessment (Health Care)


  • thumnail for 13326_2015_Article_10.pdf 13326_2015_Article_10.pdf binary/octet-stream 2.33 MB Download File

Also Published In

Journal of Biomedical Semantics

More About This Work

Academic Units
Biomedical Informatics
Published Here
July 31, 2015