1471-2105-8-S8-S2 1471-2105 Oral presentation <p>ChIP-on-chip significance analysis reveals ubiquitous transcription factor binding</p> Margolin A Adam adam@dbmi.columbia.edu Palomero Teresa Ferrando A Adolfo Califano Andrea Stolovitzky Gustavo

Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA

Joint Centers for Systems Biology, Columbia University, New York, NY 10032, USA

Systems Biology Group, IBM Research, Yorktown Heights, NY 10598, USA

BMC Bioinformatics <p>Highlights from the Third International Symposium for Computational Biology (ISCB) Student Council Symposium at the Fifteenth Annual International Conference on Intelligent Systems for Molecular Biology (ISMB)</p> Nils Gehlenborg, Manuel Corpas and Sarath Chandra Janga The organizing committee would like to thank the International Biowiki Contest funded by the Korean Bioinformation Center (KOBIC) and the Institute for Systems Biology for financial contributions that made the publication of these highlights possible. Meeting abstracts – A single PDF containing all abstracts in this Supplement is available here http://www.biomedcentral.com/content/pdf/1471-2105-8-S8-info.pdf <p>Third International Society for Computational Biology (ISCB) Student Council Symposium at the Fifteenth Annual International Conference on Intelligent Systems for Molecular Biology (ISMB)</p> Vienna, Austria 21 July 2007 http://www.iscbsc.org/scs3 1471-2105 2007 8 Suppl 8 S2 http://www.biomedcentral.com/1471-2105/8/S8/S2 10.1186/1471-2105-8-S8-S2
20 11 2007 2007 Margolin et al; licensee BioMed Central Ltd.

Background

ChIP-on-chip technology provides a genome-scale view of transcription factor (TF)/target interactions and a systems-level window into transcriptional regulatory networks. However, while many studies have used ChIP-on-chip data to effectively discover new TF targets, statistical methods have fallen short of developing an accurate model to disassociate signals caused by experimental noise from those caused by true biological variation, thus leveraging the technology to provide high confidence predictions of the full range of interactions.

Method

This paper presents a novel method to accurately model the significance of binding events measured by ChIP-on-chip data. For each arrayed probe representing a genomic segment, a ChIP-on-chip microarray measures intensity levels for the IP channel, which is enriched in genomic fragments bound by an immunoprecipitated TF, and the WCE channel, which represents random genomic fragments. Statistical significance is inferred by computing the conditional probability, p(M | A), where M=log2(IPWCE) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGnbqtcqGH9aqpcyGGSbaBcqGGVbWBcqGGNbWzcqaIYaGmdaqadaqaamaalaaabaGaemysaKKaemiuaafabaGaem4vaCLaem4qamKaemyraueaaaGaayjkaiaawMcaaaaa@3B1B@ and A=log2(IP)+log2(WCE)2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGbbqqcqGH9aqpdaWcaaqaaiGbcYgaSjabc+gaVjabcEgaNjabikdaYiabcIcaOiabdMeajjabdcfaqjabcMcaPiabgUcaRiGbcYgaSjabc+gaVjabcEgaNjabikdaYiabcIcaOiabdEfaxjabdoeadjabdweafjabcMcaPaqaaiabikdaYaaaaaa@43C2@ (Fig. 1). A kernel density estimation procedure is used to calculate the joint probability, p(M, A), and for each average intensity value, the mean of the null distribution (i.e. distribution for unbound probes) is inferred as M^A=argmaxMp(M|A) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGnbqtgaqcamaaBaaaleaacqWGbbqqaeqaaOGaeyypa0ZaaCbeaeaacyGGHbqycqGGYbGCcqGGNbWzcyGGTbqBcqGGHbqycqGG4baEaSqaaiabd2eanbqabaGccqWGWbaCcqGGOaakcqWGnbqtcqGG8baFcqWGbbqqcqGGPaqkaaa@4089@. The distribution of p(M | A), for M <M^A MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGnbqtgaqcamaaBaaaleaacqWGbbqqaeqaaaaa@2F16@, is then projected across M^A MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWGnbqtgaqcamaaBaaaleaacqWGbbqqaeqaaaaa@2F16@ to yield the inferred null distribution, which is used to assign statistical significance scores. Probes for replicate experiments and probes with genomic locations within the fragmentation length (~500 bp) are integrated to produce a single significance score for each genomic region.

<p>Figure 1</p>

(Left) Magnitude versus amplitude (MA) plot of a ChIP-on-chip hybridization

(Left) Magnitude versus amplitude (MA) plot of a ChIP-on-chip hybridization. The x-axis represents the average log2 intensity of the IP and WCE channels, and the y-axis represents the log2 ratio of IP/WCE. The black line represents the mean of the inferred null distribution, and the colored lines represent confidence intervals of .1, .01, and .001 probability. The model reveals an intensity dependent mean and variance of the null distribution, and a large number of probes are significantly enriched in the IP channel. (Right) The axes are the same as in the left panel, and colors represent the -log10 p-value of the null distribution.

Results

The method is tested on six different ChIP-on-chip arrays representing replicate experiments for three different TFs (NOTCH1, MYC and HES1). For each experiment, this analysis reveals an order of magnitude more genomic binding events than detected by traditional methods, predicting several thousand interactions for each TF and suggesting previously unappreciated complexity of transcriptional regulatory networks. Several independent experiments are used to provide evidence about the validity of these predictions. First, biochemical validation of more than 20 predicted targets by gene specific ChIP and qPCR confirm the accuracy of false discovery rate statistics computed by the method. Second, binding site enrichment analysis indicates that the strength of binding site signals are maintained over several thousand promoters. Finally, gene expression analysis reveals a coordinated downregulation of gene expression for the entire range of predicted NOTCH1 bound genes upon NOTCH1 inhibition experiments in cell lines, indicating that a large percentage of bound genes are also functionally regulated by NOTCH1.