Integrative genomic approaches in cervical cancer: implications for molecular pathogenesis

gene mutations, epigenetic modifications and transcriptional changes. The chromosomal trans-location mechanism of deregulation, which is common in hematological and mesenchymal malignancies, is an uncommon or unknown phenomenon in CC. Despite extensive genetic and molecular studies, only a handful of leads obtained were causally implicated in CC development. A census of genomic alterations with sufficient evidence (to meet the criteria for replicate studies, more than one study had to identify the change, with losses present in >20% cases and gains in >10% cases) supporting a role in CC is presented in T ables 1 & 2 and is discussed in the following sections. as a single diagnostic entity exhibits differences in clinical poor in response to in papillomavirus an important initiating in cervical tumorigenesis, stratification subclasses for progression issues of diagnosis, at 20q11.2 and CSE1L , ZNF313 and B4GALT5 at 20q13.13. Future integrative applications using additional datasets, such as mutations, DNA methylation and clinical outcomes, will raise the promise of accomplishing the identification of biological pathways and molecular targets for therapies for patients with CC.

Cervical cancer (CC) is the second most common malignancy among women worldwide, accounting for an estimated 0.5 million new cases and 275,000 deaths each year [1]. The development of squamous cell carcinoma proceeds by distinct morphological changes from normal epithelium to carcinoma through low-grade squamous intraepithelial lesions (LSILs)/low-grade cervical intraepithelial neoplasia (LCIN) and highgrade SILs (HSIL)/high-grade CIN (HCIN). The biological behavior of CINs varies, where only certain high-grade lesions progress to invasive cancer when untreated [2,3]. Owing to its distinctive precancerous stages, the accessibility for early detection and variability in biologic progression, CC provides a paradigm for studying the sequential accumulation of genetic alterations in tumor progression. Although infection with high-risk human papillomavirus (HPV) types has been implicated as a major etiological factor [4], CC as a single diagnostic entity exhibits differences in clinical behavior and response to therapy where advanced tumors poorly respond to chemo/radiotherapy. The stratification of CC into subclasses for progression and response to treatment remain elusive. Existing knowledge of genetic and epigenetic alterations remains uninform ative in addressing the issues of diagnosis, tumor progression and response to treatment.
Since the first description of alterations in chromosomal content in CC, considerable efforts have been made to understand the genetic basis of its development [5,6]. These studies have largely focused on gaining information on several classes of genetic alterations, ranging from chromosomes to genes. These include alterations in chromosome copy numbers (gains, losses, amplifications and rearrangements), gene mutations, epigenetic modifications and transcriptional changes. The chromosomal translocation mechanism of deregulation, which is common in hematological and mesenchymal malignancies, is an uncommon or unknown phenomenon in CC. Despite extensive genetic and molecular studies, only a handful of leads obtained were causally implicated in CC development. A census of genomic alterations with sufficient evidence (to meet the criteria for replicate studies, more than one study had to identify the change, with losses present in >20% cases and gains in >10% cases) supporting a role in CC is presented in Tables 1 & 2 and is discussed in the following sections.
Cervical cancer (CC) as a single diagnostic entity exhibits differences in clinical behavior and poor outcomes in response to therapy in advanced tumors. Although infection of high-risk human papillomavirus is recognized as an important initiating event in cervical tumorigenesis, stratification of CC into subclasses for progression and response to treatment remains elusive. Existing knowledge of genetic, epigenetic and transcriptional alterations is inadequate in addressing the issues of diagnosis, progression and response to treatment. Recent technological advances in high -throughput genomics and the application of integrative approaches have greatly accelerated gene discovery, facilitating the identification of molecular targets. In this article, we discuss the results obtained by preliminary integrative ana lysis of DNA copy number increases and gene expression, utilizing the two most common copy number-gained regions of 5p and 20q in identifying gene targets in CC. These analyses provide insights into the roles of genes such as RNASEN, POLS and SKP2 on 5p, KIF3B , RALY and E2F1 at 20q11.2 and CSE1L, ZNF313 and B4GALT5 at 20q13.13. Future integrative applications using additional datasets, such as mutations, DNA methylation and clinical outcomes, will raise the promise of accomplishing the identification of biological pathways and molecular targets for therapies for patients with CC.
Keywords n amplification n cervical carcinoma n chromosome 5p n chromosome 20q n chromosome alteration n gene expression n integrative genomics n precancerous lesion n single nucleotide polymorphism array The changes listed in Table 1 are of several classes of genomic alterations, resulting in copy number gains and losses. Amplification (≥5 copies) of specific regions of the genome is a common phenomenon in many human tumors targeting overexpression of dominantly acting genes [7].
To date, the extent of gene amplification, as well as the genes involved, are not completely realized in CC. However, upon review of the literature on amplifications, we identified a total of 20 recurrent and ten nonrecurrent amplifications (Table 1) [8,9]. A number of potential genes mapped to these amplicons have also been identi fied. However, so far, none of these were proven to be target genes, owing to insufficient biological evidence for contribution as cancer genes in CC. A similar situation exists for copy number changes resulting in gains (>2.5 copies) or losses (<1.5 copies) of specific genomic regions (Table 1). For simple gains of copy numbers, a large number (n = 21) of recurrently gained/over-represented regions were shown by replicate studies. Of these, chromosome 1 and 3q24-q29 regions were also shown to occur early in precancerous lesions [10]. Gains of several other chromosomal regions have been observed in single-reported studies in invasive CC, as well as in precancerous lesions. Most of these are very large regions of genomic gains and were not further refined by ana lysis in additional tumors. In the absence of further studies, these gains (e.g., gains of 3q and 5p spanning larger genomic regions or even the entire chromosomal arm) remain u ninformative ( A large body of data on genetic deletions has been derived either by loss of heterozygosity or comparative genomic hybridization/arraybased studies. A large number of deleted regions meeting the criteria that we applied (>20% cases with deletion and/or confirmed by more than one study) have also been reported ( Table 1). Some of these deletions were also shown to arise at early precancerous stages. For example, 2q35-q37 regions of deletions spanning a 9.5-Mb physical distance showed deletions in high-and low-grade CINs [8]. These findings of complex patterns on genetic losses suggest loss of function of one or more proliferationregulating genes in each of these regions and their involvement in malignant progression of cervical epithelium. Although the expression profiling allowed the identification of a subset of candidate genes, with loss accompanied As stated previously, mutations in tumorsuppressor genes are infrequently reported in CC and, in most instances, the mutations have not been confirmed by independent studies (Table 2). In the absence of mutations, tumor-suppressor genes may be inactivated in the recurrently deleted chromosomal regions by alternative mechanisms such as epigenetic modifications. One of the most well-established epigenetic changes is the promoter DNA hypermethylation-mediated gene silencing. A large number of genes (e.g., CDH1, DAPK, HIC1 and PCDH10 ) exhibiting promoter hypermethylation and associated downregulated expression of the gene have been reported in CC by multiple studies (Table 2) [13][14][15][16]. In addition, several other genes also shown to be methylated by a single study, where a subset of these cannot be completely excluded as targets in the deleted regions (Tables 1 & 2).
Based on the aforementioned discussion, the shortness of causal genetic mutations in CC, although its genomes exhibit complex chromosomal alterations is due to insufficient evidence offered by these studies to identify a cancer gene. Several supportive studies are usually required to establish the role of a specific gene, including finer physical and transcriptional mapping of the altered regions, examination of epigenetic mechanisms, functional ana lysis of target genes, correlations with clinical outcome and efficacy of drugs targeted against specific genes. For most genetic alterations reported in CC, such studies are lacking and often not feasible for obtaining these data. Added to this complexity, molecular heterogeneity within CC and biologic effects of multiple genes in each of the affected genomic regions constitute major obstacles in understanding its pathogenesis.
A number of studies have attempted using single high-throughput approaches to unravel the genetic, epigenetic and transcriptional alterations in CC [17][18][19][20][21]. As noted previously, these studies have revealed specific genomic, expression and epigenetic alterations. However, the failure to identify target genes in CC is largely owing to a lack of understanding of the relationship between how copy number and epigenetic markers influence transcription. How each change influence the other (e.g., amplification or copy number increases on gene overexpression or methylation markers and deletions on downregulated gene expression) remains largely unknown in CC. Integrative genomic ana lyses involving simultaneous assessment of DNA copy numbers, gene expression, mutations and methylation markers, such as cytosine methylation and histone tail modifications, have been demon strated as potential approaches in identifying the candidate genes [22,23].
Cervical cancer genomes typically harbor multiple chromosome aberrations and epigenetic modifications, resulting in deregulated transcriptomes. These changes might play roles in driving malignant transformation. Understanding the relationship between multidimensional levels of genomic modifications might expand our knowledge of the molecular basis of CC. As stated previously, CC genomes are characterized by a number of recurrent genomic copy number losses and gains. Of those exhibiting gains, chromosomal regions 3q, 5p, 20q and 1q were the most common targets [9,101]. In this article, we focus the ana lysis on two of these genomic regions in CC, 5p and 20q, and the outcome of integrative genomic ana lysis using copy number increases and gene expression is presented.

Genomic & transcriptional ana lysis of chromosomes 5p & 20q
Genomic copy number alterations (CNAs) were identified utilizing Affymetrix (CA, USA) 250K NspI single nucleotide polymorphism (SNP) array platform and analyzed using the dChip analytical algorithms in 79 untreated primary CC samples [12,24]. Although this analysis uncovered a multitude of both known and unknown, as well as frequent and rare, altered CNAs, the 5p and 20q regions were the most significant recurrent focal copy number gains. Of these, the 5p CNA gains were found in 43% of tumors, and the 20q CNA gains or amplifications were found in 37% of tumors. The abundance of transcripts of protein-coding genes was measured by the Affymetrix U133A platform and analyzed using the dChip software algorithms among 42 CC cases [12,24]. Expression arrays were normalized using a median-intensity array from normal, as a baseline array using an invariant set normalization, as described previously [12,24]. Briefly, a list of differentially expressed genes with a twofold change was identified, with group means at 90% CIs. A list of overexpressed genes mapped to chromosomes 5 and 20 were identified further and used in subsequent supervised analyses using defined criteria to further obtain over expressed gene signatures of specific chromo somal regions. The resulting gene expression datasets were utilized to correlate with 5p and 20q gains in order to identify expression patterns that were associated with CNAs.

Integrative genomic & transcriptional profiles identify target genes of 5p gains
We identified gains of the entire 5p chromosomal arm in CC and no minimal regions of amplification or gain could be delineated (Figure 1a). The duplications of entire chromosomal arms resulting from isochromosome formation in human tumors are not uncommon (e.g., i[12p] in male testicular germ cell tumor [25] and i[5p][10] in several types of adenocarcinomas and squamous cell carcinomas [101,102]). Since 5p gain was one of the most commonly affected regions in CC genomes, which was validated by fluorescence in situ hybridization assay on a large independent cohort of tumor specimens ( Figure 1C), we hypothesize that the increased dosage of 5p may result in deregulation of genes that may confer oncogenic properties to its host cell [9]. To identify target genes of gain, we performed supervised ana lyses to compare and filter the overexpressed 5p gene set between 5p gains and diploid tumors to see what extent these two platforms will facilitate the identification of target genes. This ana lysis (using a significance level of p < 0.05 and at least twofold increased expression) identified 17 overexpressed genes associated with 5p gain (Figure 1b). In addition, these genes showed several-fold increased expression relative to GAPDH in tumors with 5p gains (>2 copies) compared with tumors showing only two copies ( Figure 1D). Therefore, these genes represent copy number-driven target overexpressed genes, which probably provide growth advantages and/or invasion conferred by chromosome 5p gains.
This analysis identified concurrent 5p gains, with overexpression of potentially relevant genes to cellular processes associated with tumorigenesis, such as signal transduction (OSMR), nucleic acid binding, DNA repair, mitotic cycle (BASP1, TARS, PAIP1, BRD9, RAD1, SKP2 and POLS), oxidative phosphorylation (NNT, SDHA and NDUFS6 ), HPV 16 E1 binding protein (TRIP13), ribosomal synthesis (BXDC2) and miRNA processing (RNASEN). The top overexpressed gene by this ana lysis was RNASEN (Drosha), which executes the initial step in miRNA processing by cleaving pri-miRNA to release pre-miRNA, and plays a major role in tumor progression and prognosis [26]. Muralidhar et al., using a similar integrative approach, have also identified that Drosha copy number increases associated with the overexpression of this gene in CC, and Drosha overexpression was further shown to influence expression of miRNAs implicated in other cancer types [27]. Another gene of importance in cancer is OSMR, which has been shown to be gained and overexpressed in CC and is associated with adverse clinical outcome [28,29]. Oncostatin M is a cytokine related to the IL-6 family of cytokines, and its biological activity is mediated through the receptor complex. Upon ligand binding, oncostatin M receptor activates signaling pathways implicated in cancer, such as

Review Narayan & Murty
STAT, PI3/AKT and angiogenic factor VEGF, and mediates inhibition of tumor growth [30]. The other genes identified in this ana lysis as a consequence of 5p gains possess functions related to nucleic acid binding, DNA repair and mitotic cell cycle (BASP1, TARS, PAIP1, BRD9, RAD1, SKP2 and POLS) and nuclear genes (NNT, SDHA and NDUFS6 ) encoding mitochondrial proteins that play a role in oxidative phosphorylation. Dowen et al. showed that upregulation of SKP2 gene transcription relates to 5p gains in CC cell lines [31]. SKP2 is an F-box family protein that plays a critical role in G1/S cell cycle progression and degrades CDKN1B (p27kip). However, a similar integrated gene dosage and expression ana lysis by Lando and coworkers found a different set of target genes on 5p than we identified in our study [32]. Although the exact role of these overexpressed genes on 5p in CC remains unknown, their identification provides a basis for dissecting the signaling cascades involving their role individually or synergistically as oncogenes in regulating the transformation in CC.

Identification of target genes of 20q amplicons by integrative genomic ana lysis
Chromosome 20q has been reported to be one of the most common targets of NCAs in invasive CC [9,33] and arise at CIN stage 2/3 of tumor Green 5p Red 5q Significantly, differentially overexpressed genes identified between tumors showed more than two copies of 5p, and tumors with two copies of 5p are shown. In the matrix, each row represents the gene expression relative to group mean and each column represents a sample. Overall, as expected, cell lines exhibited higher levels of expression differences than primary tumors (data not shown). The scale bar (-2.0 to +2.0) on the bottom represents the level of expression. (C) Fluorescence in situ hybridization identification of 5p gains in invasive cancer. Green signals represent the 5p15.2 probe and red signals represent the probe mapping to the 5q31 region used as control. (D) Relative expression of differentially expressed genes as a consequence of 5p gain in relation to glyceraldehyde 3-phosphate dehydrogenase in normal and tumors with and without 5p gain is shown in box plot distribution. Middle line across the box represents median value, the upper hinge represents the 75th percentile value and the lower hinge the 25th percentile. The minimum and maximum value data points are shown below and above the box, respectively. GADPH: Glyceraldehyde-3-phosphate dehydrogenase.
development [24,34,35]. Chromosome 20q gains were also shown to be associated with HPV E-7mediated immortalization of human epithelial cells [36]. These data suggest that 20q amplification is an early change in CC development, and the concurrent overexpression of speci fic gene(s) on this genomic region might be critical to transformation. Our copy number SNP array ana lysis of chromosome 20 identified two recurrent and nonoverlapping focal amplicons on 20q at 20q11.2 and 20q13.13 (Figure 2) [24]. The minimum shared region of amplicons at 20q11.2 spans a 4.1-Mb genomic region, and the amplicons at 20q13.13 span a 3.1-Mb physical distance (Figure 2). Since chromo some 20q is one of the commonly gained regions in CC genomes, we hypothesize that the amplicons located within 20q may induce transcriptional activation of specific genes relevant to cellular transformation. Integrative genomic CNAs and expression data ana lysis identified eight overexpressed genes in amplicon 20q11.2 and six in amplicon 20q13.13 (Figure 2

Review Narayan & Murty
a gene encoding for UDP-Gal:b-GlcNAc b-1,4galactosyltransferase (B4GALT5) with transferase activity, a zinc finger protein 313 (ZNF313) and a nuclear function protein (CSE1L). The genes that we found to be upregulated as a consequence of chromosome 20q amplifications are known to play specific roles in tumorigenic processes. For example, E2F1, KIF3B, TPX2 and CSE1L genes play pivotal roles in cell cycle regulation and chromosome segregation (Figure 2). Therefore, the genes identified by this approach provide a basis for testing their significance in relation to HPV infection, a functional role in tumor initiation and progression of CC. Recently, Lando and coworkers, using integrative ana lysis of gene dosage and expression, also found three of the genes (POFUT1, KIF3B and AHCY ) that we identified to be overexpressed as targets of 20q gain [32]. However, Wilting and coworkers, in a similar approach utilizing a smaller sample size and whole-genome ana lysis using differential gene locus mapping and array comparative genomic hybridization expression integration tool, did not identify any of the genes we identified in our study [37]. Therefore, these studies highlight the importance of the application of appropriate algorithms of integrative genomic approaches to identify gene targets that are biologically r elevant to cervical carcinogenesis.

Conclusion
In this article, we have described an integrative genomic strategy utilizing information on recurrent CNAs at 5p, 20q11.2 and 20q13.13, with gene expression to identify genes relevant to genomic copy number gains and amplifications. Utilizing this approach, we demonstrated the robustness of this strategy in identifying genes and genetic pathways relevant to a specific region of genomic copy number increases in tumorigenesis. We conclusively show that a simple approach of systematic integrative genomic ana lysis can lead to better molecular discoveries, which could then be used in identifying r elevant therapeutic targets for CC.

Future perspective
To gain insight into molecular-based therapeutic targets for patients with CC, it is essential to construct an integrated view of multidimensional genomic data from complementary technologies for CNA (gains, amplifications and deletions) with transcription profiles, mutations, miRNA, epigenetic markers and clinical end points. It would be reasonable to speculate such an approach -complemented by appropriate bioinformatics tools and functional data -is likely to advance our molecular knowledge to identify core pathways, leading to individualized molecular-based therapy. Advances in molecular knowledge will also change the detection and prediction of precancerous lesion, making the existing tests obsolete. Among the first hints at this approach is the integrative genomic analysis that we discussed in this article. A significant reduction in the incidence of CC has been already achieved by Pap smear screening. The availability of a prophylactic HPV vaccine is expected to further reduce this incidence. Almost certainly, the advances in molecular and bioinformatic technologies will further enhance our molecular understanding and continue to raise the hope for high cure rates of CC.
Executive summary n Amplification of over 20 different chromosomal regions has been reported in cervical cancer (CC). n CC genomes harbor multiple copy number gains and losses of specific chromosomal regions; some of these were shown to arise at early precancerous stages, suggesting a role for copy number alterations in its tumorigenesis.
n Mutational mechanisms are relatively uncommon in CC. n Promoter hypermethylation and the associated downregulated gene expressions are frequent in CC. n The chromosomal regions of 5p and 20q exhibit the most significant recurrent focal copy number alterations in CC, suggesting their role in tumor formation and progression.
n Integrative genomic ana lysis of 5p gains and 20q11.2 and 20q13.13 amplifications identified overexpressed genes as a consequence of genomic copy number increases. The genes identified are involved in cellular processes associated with specific pathways in tumorigenesis.

Financial & competing interests disclosure
Vundavalli V Murty has received funding from the NIH (grant number CA095647