A recurrent germline PAX5 mutation confers susceptibility to pre-B cell acute lymphoblastic leukemia

Somatic alterations of the lymphoid transcription factor gene PAX5 (also known as BSAP) are a hallmark of B cell precursor acute lymphoblastic leukemia (B-ALL), but inherited mutations of PAX5 have not previously been described. Here we report a new heterozygous germline variant, c.547G>A (p.Gly183Ser), affecting the octapeptide domain of PAX5 that was found to segregate with disease in two unrelated kindreds with autosomal dominant B-ALL. Leukemic cells from all affected individuals in both families exhibited 9p deletion, with loss of heterozygosity and retention of the mutant PAX5 allele at 9p13. Two additional sporadic ALL cases with 9p loss harbored somatic PAX5 substitutions affecting Gly183. Functional and gene expression analysis of the PAX5 mutation demonstrated that it had significantly reduced transcriptional activity. These data extend the role of PAX5 alterations in the pathogenesis of pre-B cell ALL and implicate PAX5 in a new syndrome of susceptibility to pre-B cell neoplasia.

l e t t e r s Somatic alterations of the lymphoid transcription factor gene PAX5 (also known as BSAP) are a hallmark of B cell precursor acute lymphoblastic leukemia (B-ALL) 1-3 , but inherited mutations of PAX5 have not previously been described. Here we report a new heterozygous germline variant, c.547G>A (p.Gly183Ser), affecting the octapeptide domain of PAX5 that was found to segregate with disease in two unrelated kindreds with autosomal dominant B-ALL. Leukemic cells from all affected individuals in both families exhibited 9p deletion, with loss of heterozygosity and retention of the mutant PAX5 allele at 9p13. Two additional sporadic ALL cases with 9p loss harbored somatic PAX5 substitutions affecting Gly183. Functional and gene expression analysis of the PAX5 mutation demonstrated that it had significantly reduced transcriptional activity. These data extend the role of PAX5 alterations in the pathogenesis of pre-B cell ALL and implicate PAX5 in a new syndrome of susceptibility to pre-B cell neoplasia.
B cell precursor ALL is the most common pediatric malignancy. Children with affected siblings have 2-to 4-fold greater risk of developing the disease 4 , and, in occasional cases, ALL is inherited as a mendelian disorder 5 . PAX5, encoding the B cell lineage transcription factor paired box 5, is somatically deleted, rearranged or otherwise mutated in approximately 30% of sporadic B-ALL cases 1-3,6-9 . In Pax5-deficient mice, B cell development is arrested at the pro-B cell stage, and these cells can differentiate in vitro into other lymphoid and myeloid lineages 10 . PAX5 is also essential for maintaining the identity and function of mature B cells 11 , and its deletion in mature B cells results in dedifferentiation to pro-B cells and aggressive lymphomagenesis 12 .
We identified a heterozygous germline PAX5 variant, c.547G>A (NM_016734), encoding p.Gly183Ser (NP_057953), by exome sequencing in two families, one of Puerto Rican ancestry (family 1; Fig. 1a) and the other of African-American ancestry (family 2; Fig. 1b and Supplementary Note). This variant had not previously been described in public databases (Exome Variant Server, 1000 Genomes Project and dbSNP137) or previous sequencing analyses of ALL and cancer genomes 1,2,9 . All affected family members had B-ALL, and all available diagnostic and relapse leukemic samples from both families demonstrated loss of 9p through the formation of an isochromosome of 9q, i(9)(q10), or the presence of dicentric chromosomes involving 9q, both of which resulted in loss of the wild-type PAX5 allele and retention of the PAX5 allele encoding p.Gly183Ser (Fig. 1c,  Supplementary Fig. 1 and Supplementary Table 1).
The germline PAX5 mutation encoding p.Gly183Ser segregated with leukemia in both kindreds; however, several unaffected obligate carriers (family 1: II3, III2 and III3 and family 2: I1, I2, II2 and II3) were also observed, suggesting incomplete penetrance. Unaffected mutation carriers and affected individuals at the time of diagnosis with ALL had normal immunoglobulin levels and no laboratory or clinical evidence of impaired B cell function. Sanger sequencing of cDNA from the peripheral blood of unaffected carriers indicated biallelic transcription of PAX5 (data not shown). The only mutated gene common to both families was PAX5, and no germline copy A recurrent germline PAX5 mutation confers susceptibility to pre-B cell acute lymphoblastic leukemia l e t t e r s number aberrations were found to be shared by affected individuals (Supplementary  Tables 2 and 3).
To determine whether the mutation encoding p.Gly183Ser arose independently in each kindred or instead reflects common ancestry, we compared the risk haplotypes of the families. The families shared a 4.7-kb haplotype spanning five SNPs ( Fig. 1d and Supplementary Note). The relatively small size of this shared haplotype and principal-component analysis of genome-wide SNP genotype data (Supplementary Fig. 2) together implied that the two families were not recently related and differed in ancestry. Moreover, given the reduced fitness due to increased susceptibility to childhood ALL, it is unlikely that such a lethal mutation could be propagated over time. Because the identified haplotype is relatively frequent worldwide (Supplementary Table 4), it is likely that each family's mutation arose independently.
Genomic profiling of tumor samples demonstrated expression of the mutant PAX5 allele encoding p.Gly183Ser in diagnostic and relapse tumor specimens from affected members of family 2, with an average of 1 chimeric fusion and 9 non-silent sequence variants per case and homozygous deletion of CDKN2A with or without CDKN2B in all cases due to loss of 9p and focal deletion of the second allele. Apart from loss of 9p, no other somatic sequence mutations or structural rearrangements were shared by the affected families (Supplementary Tables 1 and 5-12).
As somatic i(9)(q10) or dic(9;v) abnormalities were seen in all of the familial leukemias, we sequenced PAX5 in 44 additional sporadic pre-B-ALL cases with i(9)(q10) or dic(9;v) aberrations to assess whether PAX5 mutations frequently co-occur with loss of 9p. Two leukemic samples had mutations encoding p.Gly183Ser and p.Gly183Val substitutions in the octapeptide domain, and, in others, previously reported variants including p.Pro80Arg and p.Val26Gly 1 were observed ( Fig. 2 and Table 1). We examined the frequency of non-silent PAX5 somatic sequence mutations in a cohort of B-ALL cases with 9p loss through i(9)(q10) or dic(9;v) aberrations (n = 28) and in 2 cohorts of B-ALL without i(9)(q10) or dic(9;v) aberrations (n = 183 and 221; refs. 1,2). We observed a significantly higher frequency of PAX5 mutations in the cohort with isochromosomal or dicentric aberrations of chromosome 9 (P = 0.0001). No germline PAX5 mutations were detected in 39 families with a history of 2 or more cases of cancer, including at least 1 childhood hematological cancer, although 1 familial case of ALL harbored a dic(9;20)(p11;q11.1) alteration and a somatic variant encoding p.Pro80Arg (Table 1 and Supplementary Note).
Previously identified PAX5 somatic mutations commonly result in marked reduction in the transcriptional activation mediated by PAX5. Downstream targets of PAX5 include CD19 and CD79A (also known as IGA and MB-1) 13 . We examined the transactivating activity of the proteins encoded by the wild-type and mutant PAX5 alleles using a PAX5dependent reporter gene assay containing copies of a high-affinity PAX5-binding site derived from the CD19 promoter 14 . Both the p.Gly183Ser and p.Gly183Val alterations resulted in partial but significant reduction in transcriptional activation compared to wild-type PAX5 (P < 0.0001 for both alterations; Fig. 3a). Additionally, there was no detectable difference in the subcellular localization of wild-type and p.Gly183Ser PAX5 (Supplementary Fig. 3). To study the effect of the p.Gly183Ser alteration on CD79A expression, we expressed mutant and wild-type PAX5 in J558 and J558LµM, mouse plasmacytoma cell lines that do not express PAX5 or CD79A. Enforced expression of PAX5 results in expression of CD79A and assembly of the surface immunoglobulin M (sIgM) complex. The amount of sIgM expression may be used to assess the transcriptional activity of PAX5 alleles on the CD79A promoter 13 . Both alleles encoding alterations to Gly183   Fig. 3b). These results suggest that PAX5 mutations affecting Gly183 result in partial loss of PAX5 activity. The identified missense variant p.Gly183Ser is located at a conserved residue in the octapeptide domain of PAX5 that mediates interaction with Groucho transcriptional corepressors 15 (Fig. 2b). Previous studies have shown that GRG4 (also known as TLE4) represses PAX5dependent luciferase activity in cells expressing wild-type PAX5 but not in cells expressing PAX5 octapeptide-domain mutants 15 . We observed GRG4-mediated repression of the transcriptional activity of wild-type and p.Gly183Ser PAX5 (Fig. 3c), suggesting that the effect of the alteration is not mediated by altered interaction with GRG4.
To further explore the effect of the p.Gly183Ser variant on downstream targets, we performed genome-wide transcriptional profiling of J558LµM cells transduced with empty vector or with vector expressing wild-type or mutant PAX5 alleles (examining either all transduced cells marked by red fluorescent protein (RFP) expression or the subset of cells expressing sIgM) and analyzed the expression of genes activated and repressed by PAX5 as previously defined in Pax5 −/− mouse pro-B cells and mature B cells [16][17][18][19] and in human ETV6-RUNX1positive B-ALL 1 . Examining all PAX5expressing cells, we observed profound deregulation of genes activated and repressed by PAX5 in J558LµM cells expressing known loss-of-function alleles (for example, the common exon 2-6 deletion that results in a truncating frameshift PAX5 allele, ∆2-6) or strongly hypomorphic alleles (for example, the PAX5 allele encoding p.Pro80Arg) and less marked deregulation in cells expressing p.Gly183Ser or p.Gly183Val We next examined the transcriptional consequences of the PAX5 mutation encoding p.Gly183Ser by performing transcriptome sequencing (mRNA-seq) of diagnostic and relapse samples obtained from 2 affected individuals in kindred 2 and from 139 sporadic childhood B-ALL samples. We performed gene-set enrichment analysis incorporating gene sets of PAX5-mutated, ETV6-RUNX1-positive   l e t t e r s ALL cases (one-third of which harbor focal PAX5 deletions) 1 , PAX5regulated genes in Pax5 −/− mice [16][17][18][19] and genes regulated during mouse B-lymphoid development 20 . As a limited set of genes is known to be regulated in both mouse pro-B cells and mature B cells and as the overlap between mouse and human PAX5-regulated genes is unknown, we used all previously published PAX5-regulated genes and genes regulated during mouse B cell development [16][17][18][19][20] in an unbiased approach to explore the effects of the PAX5 mutations affecting Gly183 on direct and indirect transcriptional targets of PAX5. This analysis showed striking enrichment of genes deregulated in PAX5-mutated, ETV6-RUNX1-positive ALL, genes activated and repressed by PAX5 (including CD19, CD72 and CD79A), and genes regulated during mouse B-lymphoid development in the signature of familial B-ALL with the PAX5 mutation encoding p.Gly183Ser versus sporadic B-ALL ( Fig. 3d and Supplementary Figs. 6 and 7). We also analyzed the overlap of previously published data and the expression differences between the familial ALL tumor samples and other B-ALL cases stratified by PAX5 mutation status (Supplementary Fig. 8 and Supplementary Table 16). Together, our results suggest that the PAX5 mutation encoding p.Gly183Ser results in attenuation of PAX5 function and deregulation of PAX5 target genes that is less severe than for the previously reported p.Pro80Arg and ∆2-6 alterations that result in marked or complete loss of PAX5 activity. The PAX5 deletions, translocations and sequence mutations identified as somatic events in B-ALL commonly affect the DNA-binding and transactivation domains and result in complete loss or marked attenuation of PAX5 transcriptional activity but are rarely homozygous and are not observed as inherited variants. Moreover, PAX5 loss  SCAND1  H1FX  SH3BP2  KLF2  FAM43A  TFEB  BCAR3  SH3BP5  GALNT6  GADD45G  ID2  APOE  ZNF385A  RPLP1  CNN3  CD72  SBK1  CD19  MEF2B  NGFR  SHANK1  DMWD  HS3ST1  SPNS2  TCTN1  Cr2  NR4A1  CCR6  CPNE5  DUSP4  PACSIN1  ACOT7  TPT1  CD79A  COX5A  NID1  KCNK5  TRIM7  COTL1  STAC2  SRPK3  ILDR1  C3  ACP5  CD2  ATF5  UHRF1  SNX2  EGR3  EHD1  NAP1L1  CDC25B  ANXA2  LGALS1  POLM  SERPINB1  GGA2  CAPN5  SCN4A  HVCN1  NEDD4  EGR1  HBB  UBE2C  PITPNM2  CDCA3  RRM2  CDCA8  PTPN14 -3 +3 s.d. (c) PAX5-dependent reporter gene assay of wild-type and p.Gly183Ser PAX5 run in triplicate with or without cotransfection with 0.05 µg of vector encoding GRG4 as indicated. A p.Tyr179Glu PAX5 mutant that is deficient in binding to GRG4 and empty vector were used as controls. Asterisks indicate significant differences as determined by two-tailed t test (P < 0.0001). NS, not significant. (d) GSEA examining enrichment of genes known to be activated or repressed by PAX5 in experimental systems in the transcriptional profile of familial ALL. A representative heatmap is presented of genes shown to be activated by PAX5 in mouse B cells 17 , which were negatively enriched in the transcriptional signature of familial ALL compared to B-ALL cases (excluding ETV6-RUNX1 ALL; P < 0.01, FDR = 0.09; see also supplementary tables [19][20][21]. Leading-edge genes in this gene set responsible for enrichment are SCAND1 to NR4A1. Four samples from family 2 (diagnostic and relapse samples from individuals IV1 and IV2) show differential expression of PAX5-activated genes compared to a group of 139 sporadic B-ALL cases. This indicates an effect of the mutation encoding p.Gly183Ser on PAX5 function. Red indicates high expression, blue represents low expression. PAX5 mutational status is indicated by the top row of colored boxes: green, wild-type PAX5; yellow, heterozygosity for a PAX5 mutation; magenta, biallelic PAX5 mutation.

Leading edge genes
npg l e t t e r s promotes the development of B-ALL in experimental models that are commonly affected by the acquisition of accompanying second hits in PAX5 (ref. 21), indicating that profound loss of PAX5 activity is commonly a central event in leukemogenesis. In contrast, the inherited PAX5 mutation encoding p.Gly183Ser results in modest attenuation of PAX5 activity in transcriptional reporter assays and is accompanied by somatic loss of the wild-type PAX5 allele due to 9p alterations during leukemogenesis. This model is also consistent with the finding of a significant association of somatic PAX5 hypomorphic mutations coincident with complete loss of the normal PAX5 allele in leukemic cells with absent 9p. These observations suggest that a severe reduction in PAX5 activity is incompatible with normal B-lymphoid development and is deleterious in carriers; by contrast, the partial hypomorphic allele encoding p.Gly183Ser is tolerated as a germline allele, but additional genetic events further reducing PAX5 activity are required to establish the leukemic clone. The universal finding of deletion of wild-type PAX5 in all familial ALL cases, rather than the acquisition of additional hypomorphic PAX5 mutations, suggests that a complete loss of wild-type PAX5 activity is required for developmental arrest and loss of maturation. This notion is supported by our transcriptional profiling of J558LµM cells expressing p.Gly183Ser PAX5 and by familial leukemias showing deregulation of PAX5 target gene expression that is significant but less marked than that observed with known loss-of-function mutations. The differences in the transcriptional profiles of some target gene panels were not as robust as in mouse model systems, presumably owing to inherent germline and somatic genetic and epigenetic variability in human leukemias. In addition, ongoing studies will be of interest to fully characterize the functional consequences of PAX5 octapeptide-domain mutations.
Our findings have clinical implications with regard to options for pre-implantation genetic diagnosis and the possible relevance of somatic 9p alterations as a harbinger of a germline PAX5 mutation. The recent identification of germline TP53 mutations in familial ALL 20,22 and the data presented here strongly implicating PAX5 mutations in a new syndrome of inherited susceptibility to pre-B cell ALL indicate that further sequencing of affected kindreds is required to define the full spectrum of germline variations contributing to ALL pathogenesis.

MeTHodS
Methods and any associated references are available in the online version of the paper.

oNLINe MeTHodS
Subjects and samples. Family 1 was ascertained from the Memorial Sloan-Kettering Cancer Center Clinical Genetics Service. Study subjects provided written informed consent as part of a study to define genomic causes of lymphoid malignancies, and the study was approved by the local research ethics board. Family 2 from St. Jude Children's Research Hospital was ascertained in accord with local institutional review board approval. To protect subject identity, pedigrees were anonymized by alterations that do not affect genetic analysis.
Exome sequencing. Germline DNA (1 µg) from the peripheral leukocytes of affected individuals in remission and unaffected family members was used for whole-exome capture using an Agilent SureSelect 45Mb or 50Mb kit and paired-end sequencing with the Illumina HiSeq 2000 (ref. 23). Family 1 exome data were analyzed using Burrows-Wheeler Aligner (BWA) 24 to align fastq files and generate BAM files, and the Genome Analysis Tool Kit (GATK) 23,25 was used for variant calling. SNP clustering and proximity to indels and the proportion of aligned reads at a site with mapping quality of zero were used for filtering variants. Variant quality score-recalibrated (VQSR) data were then processed using the SNPEff program for functional annotation. Samples from family 2 underwent variant analysis as previously described 20 . Downstream analysis consisted of filtering out low-quality variant calls and those already reported in public databases. The downstream processing of sequence data, variant annotation and the filtering strategy based on a presumed autosomal dominant mode of inheritance with incomplete penetrance are detailed in the Supplementary Note.
Principal-component analysis. From the exome-sequenced samples, singlenucleotide variants seen at a frequency above 5% in the dbSNP database were selected for principal-component analysis. These data were then combined together with 1000 Genomes Project SNP data. SNPs were pruned on the basis of pairwise linkage disequilibrium within a 50-kb window. Data were transformed to calculate eigenvectors and eigenvalues for each sample, and the first two principal components were plotted. SNP array genotyping. SNP array genotyping was performed using Affymetrix SNP 6.0 microarrays on the diagnostic leukemic sample from individual IV6 from family 1 and on germline DNA from unaffected individuals III3, III4 and IV9 and analyzed using the Genotyping Console (Affymetrix). SNP 6.0 arrays were also performed for diagnostic leukemic and remission samples from individuals IV1, IV2 and III4 from family 2, as well as on relapse samples from IV1 and IV2, and data were analyzed by optimal reference normalization 26 and circular binary segmentation 27,28 as previously described 29 using R and dChip 30 . Haplotype analysis was conducted using germline samples from III3, III4 and IV9 and the diagnostic leukemic sample from IV6 from family 1 and the diagnostic and remission samples from IV1, IV2 and III4 from family 2.
In view of the cytogenetic abnormalities in each of the leukemic samples resulting in monosomy 9p, for which Sanger sequencing of the variant encoding p.Gly183Ser demonstrated loss of heterozygosity with retention of the mutant allele, we were able to biologically phase the SNP risk haplotype containing the mutant allele. Beagle phased haplotypes from the 1000 Genomes Project were analyzed for the five-SNP shared haplotype, and frequencies were estimated among the populations in HapMap.
PAX5 sequencing. Sanger sequencing (primer sequences available upon request) of the entire ORF of PAX5 was performed in 44 cases of sporadic ALL characterized by i(9) or dic(9;v) and 31 cases of familial cancer. We also reviewed the coding regions of PAX5 in an additional 8 families that had been exome sequenced or B-ALL cases that had been Sanger sequenced (n = 87 treatmentresistant adult-onset ALLs) as part of other studies. Cases were acquired from St. Jude Children's Research Hospital (Memphis, Tennessee; n = 34 i(9) or dic(9;v) and 28 familial cases), Memorial Sloan-Kettering Cancer Center/Columbia University (New York, New York; n = 2 i(9) or dic(9;v) and 87 treatment-resistant adult-onset ALLs), Radboud University Nijmegen Medical Centre (Nijmegen, The Netherlands; n = 6 i(9) or dic(9;v)), Texas Children's Cancer Center and Human Genome Sequencing Center (Houston, Texas; 7 familial cases), Children's Cancer Institute Australia for Medical Research (Sydney, Australia; n = 2 i(9) or dic(9;v) and 3 familial cases) and the Huntsman Cancer Institute/ Primary Children's Medical Center (Salt Lake City, Utah; 1 familial case). DNA constructs. The CD19 luciferase construct used for PAX5-dependent reporter gene assays contains copies of a high-affinity PAX5-binding site (derived from the CD19 promoter) 14 and was a kind gift from M. Busslinger. The pFLAG-CMV2-Grg4 construct was a kind gift from G. Dressler 31 . The mutations encoding p.Gly183Ser and p.Gly183Val were introduced into the pSG5_PAX5-WT, MSCV-IRES-mRFP-PAX5-WT and pMSCV-Puro-IRES-GFP-PAX5-WT vectors by site-directed mutagenesis (QuikChange, Agilent Technologies). For retroviral expression, wild-type PAX5 and other mutant cDNAs were subcloned as an XhoI-EcoRI fragment into MSCV-Puro-IRES-GFP (MSCV-PIG) or MSCV-IRES-RFP vector.
Cells and antibodies. HEK293 (ATCC CRL-1573) and HEK293T (ATCC CRL-11268) cells were maintained in Iscoves Modified Dulbecco's medium supplemented with 10% FCS and streptomycin. Parental J558 cells (ATCC TIB-6) were grown in DMEM with 10% horse serum 32 . J558LµM cells have been generated from a subline (J558L) that had lost immunoglobulin heavy chain expression by infection with virus encoding a cDNA of the membrane-bound heavy-chain isoform 33 and were grown in RPMI 1640 medium (Invitrogen) supplemented with 10% FBS (Hyclone), 2 mM L-glutamine (Invitrogen), 50 mg/ml gentamicin (Invitrogen), 0.3 µg/ml xanthine (Sigma) and 1 µg/ml mycophenolic acid (Sigma) as previously described 1,34 . Both lines (parental J558 and J558LµM) do not normally express sIgM because they lack expression of CD79A 35 , but partial expression of CD79A can be induced by exogenous expression of PAX5, leading to the upregulation of sIgM 13 . Retroviral supernatants were produced by transient transfection of Phoenix Eco cells with MSCV-PIG-PAX5 constructs and were used to infect J558 cells by spinoculation in the presence of 4 µg/ml polybrene. Rabbit monoclonal antibody to PAX5 (ab109443) and mouse monoclonal antibody to Flag (ab18230) were purchased from Abcam and were used at a 1:250 and 1:500 dilution, respectively. Mouse monoclonal antibodies to β-actin (sc-1615) were purchased from Santa Cruz Biotechnology and to SF2 were purchased from Zymed (32-4500) and were used at a 1:1,000 dilution. Antibodies to IgM conjugated to R-phycoerythrin (PE) (553409) or allophycocyanin (APC) (550676) were obtained from BD Pharmingen (BD Biosciences).

Subcellular fractionation.
Protein expression and subcellular localization of the wild-type and p.Gly183Ser PAX5 proteins were examined using lysates from transiently transfected HEK293 cells separated by sucrose density gradient. The protocol for the separation of nuclei by sucrose gradient was adapted from the one for the Nuclei PurePrep Isolation kit (Sigma). CF buffer (10 mM Tris-HCl, 1 mM MgCl 2 , 1 mM DTT, 10 µM PMSF) and 1.8 M Sucrose Solution (Sigma) were used to create density layers for resolved separation by ultracentrifugation. Fractions were then subjected to SDS-PAGE and immunoblotting with various antibodies to confirm adequate separation of nuclear and cytosolic fractions and to determine localization of recombinant PAX5.
Luciferase assays. We transfected 293T cells with MIR/MSCV-PIG WT or MIR/ MSCV-PIG mutant along with luc-CD19 and pRL-TK Renilla luciferase plasmid DNA (Promega) using FuGene 6 (Roche Diagnostics). For GRG4 repression assays, 500 ng of either the MSCV-PIG empty vector or of MSCV-PIG-PAX5-WT, MSCV-PIG-PAX5-Gly183Ser or MSCV-PIG-PAX5-Tyr179Glu, 2 µg of luc-CD19 construct and 0.1 µg of pRL-TK Renilla luciferase plasmid were cotransfected with or without 50 ng of cDNA for GRG4 in pFLAG-CMV2 into HEK293T cells using X-tremeGENE HP DNA Transfection Reagent (Roche Diagnostics). Forty-eight hours after transfection, cell lysis and measurement of firefly and Renilla luciferase activity was performed using the Dual-Luciferase Reporter Assay System (Promega) according to the manufacturer's instructions. All transfections were performed in triplicate in at least two independent experiments. Firefly luciferase activity was normalized according to corresponding Renilla luciferase activity and reported as mean relative luciferase units (RLU) ± s.e.m.