GWAS and enrichment analyses of non-alcoholic fatty liver disease identify new trait-associated genes and pathways across eMERGE Network

Namjou, Bahram; Lingren, Todd; Huang, Yongbo; Parameswaran, Sreeja; Cobb, Beth L.; Stanaway, Ian B.; Connolly, John J.; Mentch, Frank D.; Benoit, Barbara; Niu, Xinnan; Wei, Wei-Qi; Carroll, Robert J.; Pacheco, Jennifer A.; Harley, Isaac T. W.; Divanovic, Senad; Carrell, David S.; Larson, Eric B.; Carey, David J.; Verma, Shefali; Ritchie, Marylyn D.; Gharavi, Ali G.; Murphy, Shawn; Williams, Marc S.; Crosslin, David R.; Jarvik, Gail P.; Kullo, Iftikhar J.; Hakonarson, Hakon; Li, Rongling; Xanthakos, Stavra A.; Harley, John B.

Non-alcoholic fatty liver disease (NAFLD) is a common chronic liver illness with a genetically heterogeneous background that can be accompanied by considerable morbidity and attendant health care costs. The pathogenesis and progression of NAFLD is complex with many unanswered questions. We conducted genome-wide association studies (GWASs) using both adult and pediatric participants from the Electronic Medical Records and Genomics (eMERGE) Network to identify novel genetic contributors to this condition.

First, a natural language processing (NLP) algorithm was developed, tested, and deployed at each site to identify 1106 NAFLD cases and 8571 controls and histological data from liver tissue in 235 available participants. These include 1242 pediatric participants (396 cases, 846 controls). The algorithm included billing codes, text queries, laboratory values, and medication records. Next, GWASs were performed on NAFLD cases and controls and case-only analyses using histologic scores and liver function tests adjusting for age, sex, site, ancestry, PC, and body mass index (BMI).

Consistent with previous results, a robust association was detected for the PNPLA3 gene cluster in participants with European ancestry. At the PNPLA3-SAMM50 region, three SNPs, rs738409, rs738408, and rs3747207, showed strongest association (best SNP rs738409 p = 1.70 × 10− 20). This effect was consistent in both pediatric (p = 9.92 × 10− 6) and adult (p = 9.73 × 10− 15) cohorts. Additionally, this variant was also associated with disease severity and NAFLD Activity Score (NAS) (p = 3.94 × 10− 8, beta = 0.85). PheWAS analysis link this locus to a spectrum of liver diseases beyond NAFLD with a novel negative correlation with gout (p = 1.09 × 10− 4). We also identified novel loci for NAFLD disease severity, including one novel locus for NAS score near IL17RA (rs5748926, p = 3.80 × 10− 8), and another near ZFP90-CDH1 for fibrosis (rs698718, p = 2.74 × 10− 11). Post-GWAS and gene-based analyses identified more than 300 genes that were used for functional and pathway enrichment analyses.

In summary, this study demonstrates clear confirmation of a previously described NAFLD risk locus and several novel associations. Further collaborative studies including an ethnically diverse population with well-characterized liver histologic features of NAFLD are needed to further validate the novel findings.


  • thumnail for 12916_2019_Article_1364.pdf 12916_2019_Article_1364.pdf application/pdf 1.68 MB Download File

Also Published In

More About This Work

Published Here
August 10, 2022


NAFLD, Fatty liver, Genetic polymorphism, GWAS, PheWAS, Polygenic risk score