Quantitative systems-level determinants of drug targets

Yao, Lixia; Rzhetsky, Andrey

Background: Modern drug discovery tends to understand disease processes at the molecular level and then determine optimal molecular targets for drug intervention. Inferences have made from all available drug targets, such as how many drug targets there are, or how many novel drug targets could be potentially found in the human genome, to what functional families these proteins belong, and what structural properties make them bind to small molecules tightly and specifically. But all these are very intuitive and qualitative. The key question of which gene or protein in a disease process could be a successful drug target remains unanswered. Results: We analyzed specific systems-level properties of human genes and proteins targeted by 919 FDA-approved drugs 1 and identified a number of quantitative measures that distinguish them from other genes and proteins at a highly significant level. Compared to an average gene and its encoded protein(s), successful drug targets are more highly connected in a molecular interaction network, but are far from being the most highly connected; they have higher betweenness values, lower entropies of tissue expression, and lower ratios of non-synonymous to synonymous single-nucleotide polymorphisms (see Figure 1). We also tested the performance of different classification algorithms (see Figure 2). Furthermore, we have identified human tissues significantly over- or under-targeted relative to the full spectrum of genes active in each tissue. Figure 1 (A) Connectivity distribution for the entire molecular network (A) Connectivity distribution for the entire molecular network; (B) Targets of the successful drugs are significantly more connected than an average gene in the network, but are not the most highly connected genes in the network; (C) Drug targets tend to have higher than average betweenness values; (D) The successful drug targets are not statistically different from the rest of the genes in terms of their clustering coefficients; (E-F): Analysis of the ratio (Cratio) of the number of non-synonymous to synonymous single-nucleotide polymorphisms (SNPs): (E) Successful drug targets have significantly smaller Cratio than human genes on average. (F) The value of Cratio tends to correlate negatively with the gene's connectivity in the network. Figure 2 Receiver operating characteristic (ROC) curves for the four classification algorithms that we used in this study Receiver operating characteristic (ROC) curves for the four classification algorithms that we used in this study. All four methods that we tested performed significantly better than baseline (ROC score of 0.5, corresponding to a random-guess method). The logistic regression performed best. We also built a machine-learning model to demonstrate the usefulness of these quantitative descriptors for predicting drug targets. With increasing availability of experimental data, we foresee that screening the whole human genome for potential novel drug targets could be feasible in near future. Conclusion: We found that genes associated with successful FDA-approved drugs have a number of properties at the network, sequence, and tissue-expression levels that significantly distinguish them from other human genes. Although the drug-target-selection guidelines that we suggest cannot replace expensive experiments, they can help pharmaceutical researchers narrow the prospective set of drug targets at the earliest stage of a drug development project. Specifically, when the pharmaceutical company must decide which target to pursue among pathologic pathways that are not fully understood, connectivity, betweenness, Cratio, and entropy might be useful quantitative estimates of each prospective target's expected success rate.


Also Published In

BMC Bioinformatics

More About This Work

Published Here
September 9, 2014