<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
	<ui>1745-6150-7-32</ui>
	<ji>1745-6150</ji>
	<fm>
		<dochead>Research</dochead>
		<bibl>
			<title>
				<p>Comparative genomic analysis of the DUF71/COG2102 family predicts roles in diphthamide biosynthesis and B12 salvage</p>
			</title>
			<aug>
				<au id="A1" ca="yes"><snm>de Cr&#233;cy-Lagard</snm><fnm>Val&#233;rie</fnm><insr iid="I1"/><email>vcrecy@ufl.edu</email></au>
				<au id="A2"><snm>Forouhar</snm><fnm>Farhad</fnm><insr iid="I2"/><email>farhadf@biology.columbia.edu</email></au>
				<au id="A3"><snm>Brochier-Armanet</snm><fnm>C&#233;line</fnm><insr iid="I3"/><email>celine.brochier-armanet@univ-lyon1.fr</email></au>
				<au id="A4"><snm>Tong</snm><fnm>Liang</fnm><insr iid="I2"/><email>ltong@columbia.edu</email></au>
				<au id="A5"><snm>Hunt</snm><mi>F</mi><fnm>John</fnm><insr iid="I2"/><email>fhunt1@gmail.com</email></au>
			</aug>
			<insg>
				<ins id="I1"><p>Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, 32611, USA</p></ins>
				<ins id="I2"><p>Department of Biological Sciences, Columbia University, Northeast Structural Genomics Consortium, 1212 Amsterdam Ave, New York, NY, 10027, USA</p></ins>
				<ins id="I3"><p>Universit&#233; de Lyon; Universit&#233; Lyon 1; CNRS; UMR5558, Laboratoire de Biom&#233;trie et Biologie Evolutive, 43 boulevard du 11 Novembre 1918, Lyon, Villeurbanne, F-69622, France</p></ins>
			</insg>
			<source>Biology Direct</source>
			<section><title><p>Genomics, bioinformatics and systems biology</p></title></section><issn>1745-6150</issn>
			<pubdate>2012</pubdate>
			<volume>7</volume>
			<issue>1</issue>
			<fpage>32</fpage>
			<url>http://www.biology-direct.com/content/7/1/32</url>
			<xrefbib><pubidlist><pubid idtype="doi">10.1186/1745-6150-7-32</pubid><pubid idtype="pmpid">23013770</pubid></pubidlist></xrefbib>
		</bibl>
		<history><rec><date><day>17</day><month>7</month><year>2012</year></date></rec><acc><date><day>18</day><month>9</month><year>2012</year></date></acc><pub><date><day>26</day><month>9</month><year>2012</year></date></pub></history>
		<cpyrt><year>2012</year><collab>de Cr&#233;cy-Lagard et al.; licensee BioMed Central Ltd.</collab><note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
		<kwdg>
			<kwd>Diphthamide</kwd>
			<kwd>Vitamin B12</kwd>
			<kwd>Amidotransferase</kwd>
			<kwd>Comparative genomics</kwd>
		</kwdg>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<sec>
					<st>
						<p>Background</p>
					</st><p>The availability of over 3000 published genome sequences has enabled the use of comparative genomic approaches to drive the biological function discovery process. Classically, one used to link gene with function by genetic or biochemical approaches, a lengthy process that often took years. Phylogenetic distribution profiles, physical clustering, gene fusion, co-expression profiles, structural information and other genomic or post-genomic derived associations can be now used to make very strong functional hypotheses. Here, we illustrate this shift with the analysis of the DUF71/COG2102 family, a subgroup of the PP-loop ATPase family.</p>
				</sec>
				<sec>
					<st>
						<p>Results</p>
					</st><p>The DUF71 family contains at least two subfamilies, one of which was predicted to be the missing diphthine-ammonia ligase (EC 6.3.1.14), Dph6. This enzyme catalyzes the last ATP-dependent step in the synthesis of diphthamide, a complex modification of Elongation Factor 2 that can be ADP-ribosylated by bacterial toxins. Dph6 orthologs are found in nearly all sequenced Archaea and Eucarya, as expected from the distribution of the diphthamide modification. The DUF71 family appears to have originated in the Archaea/Eucarya ancestor and to have been subsequently horizontally transferred to Bacteria. Bacterial DUF71 members likely acquired a different function because the diphthamide modification is absent in this Domain of Life. In-depth investigations suggest that some archaeal and bacterial DUF71 proteins participate in B12 salvage.</p>
				</sec>
				<sec>
					<st>
						<p>Conclusions</p>
					</st><p>This detailed analysis of the DUF71 family members provides an example of the power of integrated data-miming for solving important &#8220;missing genes&#8221; or &#8220;missing function&#8221; cases and illustrates the danger of functional annotation of protein families by homology alone.</p>
				</sec>
				<sec>
					<st>
						<p>Reviewers&#8217; names</p>
					</st><p>This article was reviewed by Arcady Mushegian, Michael Galperin and L. Aravind.</p>
				</sec>
			</sec>
		</abs>
	</fm>
	<bdy>
		<sec>
			<st>
				<p>Background</p>
			</st><p>In both Archaea and Eucarya, the translation Elongation Factor 2 (EF-2) harbors a complex post-translational modification of a strictly conserved histidine (His<sub>699 </sub>in yeast) called diphthamide <abbrgrp>
					<abbr bid="B1">1</abbr>
				</abbrgrp>. This modification is the target of the diphtheria toxin and the <it>Pseudomonas</it> exotoxin A, which inactivate EF-2 by ADP-ribosylation of the diphthamide <abbrgrp>
					<abbr bid="B2">2</abbr>
					<abbr bid="B3">3</abbr>
				</abbrgrp>. Although the diphthamide biosynthesis pathway was described in the early 1980&#8242;s <abbrgrp>
					<abbr bid="B2">2</abbr>
					<abbr bid="B3">3</abbr>
				</abbrgrp>, the corresponding enzymes have only recently been characterized. <it>In vitro</it> reconstitution experiments have shown that the first step, the transfer of a 3-amino-3-carboxypropyl (ACP) group from <it>S</it>-adenosylmethionine (SAM) to the C-2 position of the imidazole ring of the target histidine residue, is catalyzed in Archaea by the iron-sulfur-cluster enzyme, Dph2 <abbrgrp>
					<abbr bid="B4">4</abbr>
					<abbr bid="B5">5</abbr>
				</abbrgrp> (Figure&#8201;<figr fid="F1">1</figr>A). Genetic and complementation studies have shown that the catalysis of the same first step requires four proteins (Dph1-Dph4) in yeast and other eukaryotes <abbrgrp>
					<abbr bid="B6">6</abbr>
					<abbr bid="B7">7</abbr>
					<abbr bid="B8">8</abbr>
					<abbr bid="B9">9</abbr>
				</abbrgrp>. The subsequent step, trimethylation of an amino group to form the diphthine intermediate, is catalyzed by diphthine synthase, Dph5 (EC 2.1.1.98) (Figure&#8201;<figr fid="F1">1</figr>A) <abbrgrp>
					<abbr bid="B10">10</abbr>
					<abbr bid="B11">11</abbr>
				</abbrgrp>. The last step, the ATP-dependent amidation of the carboxylate group <abbrgrp>
					<abbr bid="B12">12</abbr>
				</abbrgrp>, is catalyzed by diphthine-ammonia ligase (EC 6.3.1.14), but the corresponding gene has not been identified (<url>http://www.orenza.u-psud.fr/</url>). A protein involved in this last step was recently identified in yeast (YBR246W or Dph7), but it is most certainly not directly involved in catalysis as it is not conserved in Archaea and it contains a WD-domain likely to be involved in protein/protein interactions <abbrgrp>
					<abbr bid="B13">13</abbr>
				</abbrgrp>.</p>
			<fig id="F1"><title><p>Figure 1</p></title><caption><p>Structures of diphthamide and B12 precursors and derivatives. (A) The core diphthamide pathway is predicted to contain three enzymes Dph2, Dph5 and Dph6 in Archaea.</p></caption><text>
   <p><b>Structures of diphthamide and B12 precursors and derivatives. </b>(<b>A</b>) The core diphthamide pathway is predicted to contain three enzymes Dph2, Dph5 and Dph6 in Archaea. The formation of diphthine has been reconstituted <it>in vitro</it> using Dph2 and Dph5 from <it>Pyrococcus horikoshii </it><abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp>. The enzyme family catalyzing the last step in Archaea and Eukarya Dph6 was missing. In yeast, the first and last steps require additional proteins (Dph1, Dph3 and Dph7). (<b>B</b>) Predicted Dph6-catalyzed reactions. (<b>C</b>) Ado-Pseudo-B12 structure and hydrolysis site by the bacterial CbiZ enzyme (bCbiZ). Parts (<b>A</b>) and (<b>B</b>) are adapted with permission from Xuling Zhu; Jungwoo Kim; Xiaoyang Su; Hening Lin; <it>Biochemistry </it> 2010, 49, 9649&#8211;9657. Copyright 2010 American Chemical Society.</p>
</text><graphic file="1745-6150-7-32-1"/></fig><p>Using a combination of comparative genomic approaches, we set out to identify a candidate gene for this orphan enzyme family. Based on taxonomic distribution, domain organization of gene fusions, physical clustering on chromosomes, atomic structural data, co-expression, and phenotype data, a promising candidate was identified, the family called Domain of Unknown Function family DUF71(IPR002761) in Interpro <abbrgrp>
					<abbr bid="B14">14</abbr>
				</abbrgrp>. This family is also called ATP_bind_4 (PF01902) in Pfam <abbrgrp>
					<abbr bid="B15">15</abbr>
				</abbrgrp>or Predicted ATPases of PP-loop superfamily (COG2102) in the Cluster of Ortholous Group database <abbrgrp>
					<abbr bid="B16">16</abbr>
				</abbrgrp>. However, detailed analysis of the DUF71 family revealed that this family is almost surely not isofunctional. Some Archaea contain two very divergent copies of the gene, while homologs are found in Bacteria, which are known to lack diphthamide. This observation suggests that some DUF71 members have different functions and probably participate in different biochemical pathways.</p>
		</sec>
		<sec>
			<st>
				<p>Methods</p>
			</st>
			<sec>
				<st>
					<p>Comparative genomics</p>
				</st><p>The BLAST tools <abbrgrp>
						<abbr bid="B17">17</abbr>
					</abbrgrp> and resources at NCBI (<url>http://www.ncbi.nlm.nih.gov/</url>) were routinely used. Multiple sequence alignments were built using ClustalW <abbrgrp>
						<abbr bid="B18">18</abbr>
					</abbrgrp> or Multialin <abbrgrp>
						<abbr bid="B19">19</abbr>
					</abbrgrp>. Protein domain analysis was performed using the Pfam database tools (<url>http://pfam.janelia.org/</url>) <abbrgrp>
						<abbr bid="B15">15</abbr>
					</abbrgrp>. Analysis of the phylogenetic distribution and physical clustering was performed in the SEED database <abbrgrp>
						<abbr bid="B20">20</abbr>
					</abbrgrp>. Results are available in the &#8220;Diphthamide biosynthesis&#8221; and &#8220;DUF71-B12&#8221; subsystem on the public SEED server (<url>http://pubseed.theseed.org/SubsysEditor.cgi</url>). Phylogenetic profile searches were performed on the IMG platform <abbrgrp>
						<abbr bid="B21">21</abbr>
					</abbrgrp> using the phylogenetic query tool (<url>http://img.jgi.doe.gov/cgi-bin/w/main.cgi?section=PhylogenProfiler&amp;page=phyloProfileForm</url>). Physical clustering was analyzed with the SEED subsystem coloring tool or the Seedviewer Compare region tool <abbrgrp>
						<abbr bid="B20">20</abbr>
					</abbrgrp> as well as on the MicrobesOnline (<url>http://www.microbesonline.org/</url>) tree based genome browser <abbrgrp>
						<abbr bid="B22">22</abbr>
					</abbrgrp>. The SPELL microarray analysis resource <abbrgrp>
						<abbr bid="B23">23</abbr>
					</abbrgrp> was used through the <it>Saccharomyces</it> Genome Database (SGD) (<url>http://www.yeastgenome.org/)</url>
					<abbrgrp>
						<abbr bid="B24">24</abbr>
					</abbrgrp> to analyze yeast gene coexpression profiles. Clustering of yeast deletion mutants based on phenotype analysis was analyzed through the yeast fitness database available at <url>http://fitdb.stanford.edu/</url>
					<abbrgrp>
						<abbr bid="B25">25</abbr>
						<abbr bid="B26">26</abbr>
					</abbrgrp>. Mapping of gene distribution profile to taxonomic trees were generated using the iTOL suite (<url>http://itol.embl.de/index.shtml</url>) <abbrgrp>
						<abbr bid="B27">27</abbr>
					</abbrgrp>. Sequence logos were derived using the WebLogo platform <abbrgrp>
						<abbr bid="B28">28</abbr>
					</abbrgrp>.</p>
			</sec>
			<sec>
				<st>
					<p>Structure analysis</p>
				</st><p>Visualization and comparison of protein structures and manual docking of ligand molecules were performed using PyMol (The PyMOL Molecular Graphics System, Version 1.4.1, Schr&#246;dinger, LLC). XtalView <abbrgrp>
						<abbr bid="B7">7</abbr>
					</abbrgrp> was used for the protein docking exercises.</p>
			</sec>
			<sec>
				<st>
					<p>Phylogenetic analyses</p>
				</st><p>The survey of the 1996 complete prokaryotic genomes available at the NCBI (<url>http://www.ncbi.nlm.nih.gov/</url>) using BLASTP <abbrgrp>
						<abbr bid="B17">17</abbr>
					</abbrgrp> (default parameters) allowed identification of 119 bacterial and 144 archaeal DUF71 homologs in addition to the 182 eukaryotes homologs identified in the RefSeq database at the NCBI <abbrgrp>
						<abbr bid="B29">29</abbr>
					</abbrgrp> (Additional file <supplr sid="S1">1</supplr>: Table S1). The retrieved sequences were aligned using MAFFT <abbrgrp>
						<abbr bid="B8">8</abbr>
					</abbrgrp> and the resulting alignment was visually inspected using ED, the alignment editor of the MUST package <abbrgrp>
						<abbr bid="B30">30</abbr>
					</abbrgrp>. The phylogenetic analysis of the 445 sequence was performed using the neighbor-joining distance method implemented in SeaView <abbrgrp>
						<abbr bid="B31">31</abbr>
					</abbrgrp>. The robustness of the resulting tree was assessed by the non-parametric bootstrap method (100 replicates of the original dataset) implemented in SeaView. A second phylogenetic analysis restricted to 50 archaeal and eukaryotic homologs representative of the genetic and genomic diversity of these two Domains was performed using the Bayesian approach implemented in Phylobayes <abbrgrp>
						<abbr bid="B6">6</abbr>
					</abbrgrp> with a LG model.</p>
				<suppl id="S1">
					<title>
						<p>Additional file 1</p>
					</title>
					<text>
						<p>
							<b>Table S1. </b>Genbank RefSeq identities and corresponding organisms for all&#8201;proteins used in the phylogenies.</p>
					</text>
					<file name="1745-6150-7-32-S1.xlsx">
   <p>Click here for file</p>
</file>
				</suppl>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Results and discussion</p>
			</st>
			<sec>
				<st>
					<p>Comparative genomics points to DUF71/COG2102 as a strong candidate for the missing diphthamide synthase family</p>
				</st><p>The distribution of known diphthamide biosynthesis genes in Archaea was analyzed using the SEED database and its tools <abbrgrp>
						<abbr bid="B20">20</abbr>
					</abbrgrp>. The 59 archaeal genomes analyzed all contained an EF-2 encoding gene. Analysis of the distribution of Dph2 and Dph5 in Archaea showed that 58/59 genomes encoded these two proteins. The only archaeon lacking both Dph2 and Dph5 was <it>Korarchaeum cryptofilum</it> OPF8 (Figure&#8201;<figr fid="F2">2</figr>A). We therefore hypothesized that this organism has lost the diphthamide modification pathway even if the <it>K. cryptofilum</it> EF-2 still harbors the conserved His residue at the site of the modification (His<sub>603 </sub>in the <it>K. cryptofilum</it> sequence<it>,</it> Accession B1L7Q0 in UniprotKB). Using the IMG/JGI phylogenetic query tools <abbrgrp>
						<abbr bid="B21">21</abbr>
					</abbrgrp>, we searched for protein families found in all Archaea except <it>Korarchaeum cryptofilum</it> OPF8, present in <it>Saccharomyces cerevisiae</it> and <it>Homo sapiens</it> but absent in <it>Escherichia coli</it> and <it>Bacillus subtilis</it>, as bacteria are known to lack this modification pathway. Only one family, DUF71/COG2102, followed this taxonomic distribution. This family had been described previously as a PP-loop ATPase of unknown function containing a Rossmannoid class HUP domain <abbrgrp>
						<abbr bid="B32">32</abbr>
					</abbrgrp>.</p>
				<fig id="F2"><title><p>Figure 2</p></title><caption><p>Comparative genomic analysis of the DUF71 family. (A) Distribution of the core diphthamide genes Dph2 and Dph5 and of EF-2 and DUF71 in Archaea, according to data derived from the &#8220;Diphthamide biosynthesis&#8220; subsystem in the SEED database.</p></caption><text>
   <p><b>Comparative genomic analysis of the DUF71 family. </b>(<b>A</b>) Distribution of the core diphthamide genes Dph2 and Dph5 and of EF-2 and DUF71 in Archaea, according to data derived from the &#8220;Diphthamide biosynthesis&#8220; subsystem in the SEED database. The tree is a species tree constructed in iTol (itol.embl.de/). The presence and absence of the specific genes was derived from the &#8220;Diphthamide biosynthesis&#8220; subsystem. (<b>B</b>) Physical clustering of DUF71/COG2102 genes with Dph5 in three <it>Methanosarcina </it>genomes derived from the MicrobesOnline database (<url>http://www.microbesonline.org/</url>). (<b>C</b>) Examples of proteins containing domains fused to DUF71 in Archaea and Eucarya. Accession numbers and COG, CDD, or Pfam domain numbers are given in parentheses.</p>
</text><graphic file="1745-6150-7-32-2"/></fig><p>Using the neighborhood analysis tool of the SEED database <abbrgrp>
						<abbr bid="B20">20</abbr>
					</abbrgrp>, physical clustering was generally not observed between the <it>dph2</it>, <it>dph5</it> and <it>DUF71</it> genes except in three <it>Methanosarcina</it> genomes where the <it>dph5</it> is located in the vicinity of <it>DUF71</it> genes (Figure&#8201;<figr fid="F2">2</figr>B). If members of the DUF71 catalyze the last step of diphthamide synthesis they should bind ATP <abbrgrp>
						<abbr bid="B12">12</abbr>
					</abbrgrp>. Structural analysis of the DUF71 protein from <it>Pyrococcus furiosus</it> (PF0828) reveals the presence of two distinct domains: an N-terminal HUP domain that contains a highly conserved PP-motif that interacts with ATP (PDB id: 3RK1) and AMP (PDB id: 3RK0), and a C-terminal 100-residue domain belonging to a novel fold with a highly conserved motif GEGGEF/YE<sub>188</sub>T/S (<it>P</it>. <it>furiosus</it> numbering) that is probably involved in substrate binding and recognition <abbrgrp>
						<abbr bid="B33">33</abbr>
					</abbrgrp>.</p>
			</sec>
			<sec>
				<st>
					<p>Coexpression, phenotype and structural data link the yeast DUF71 to translation and diphthamide biosynthesis</p>
				</st><p>YLR143w is the only <it>S</it>. <it>cerevisiae</it> DUF71 family member. Using YLR143w as input in the SPELL co-expression query tool <abbrgrp>
						<abbr bid="B23">23</abbr>
					</abbrgrp> showed that nearly all co-expressed genes were involved in translation and ribosome biogenesis (Additional file <supplr sid="S2">2</supplr>: Table S2). This observation suggested that the DUF71 protein family has a role in translation as expected for a protein modifying EF-2. Like all known diphthamide synthesis genes, <it>YLR143w</it> is also not essential. More specifically, deletion of any of the five known diphthamide genes confers sordarin resistance in yeast <abbrgrp>
						<abbr bid="B34">34</abbr>
						<abbr bid="B35">35</abbr>
					</abbrgrp> and <it>ylr143w</it>&#916; strain was shown to be as resistant to this compound as the diphthamide deficient strains (see supplemental data in <abbrgrp>
						<abbr bid="B34">34</abbr>
					</abbrgrp>). Furthermore, in a recent complete analysis of relationships between gene fitness profiles (co-fitness) and drug inhibition profiles (co-inhibition) from several hundred chemogenomic screens in yeast <abbrgrp>
						<abbr bid="B25">25</abbr>
						<abbr bid="B26">26</abbr>
					</abbrgrp> available at <url>http://fitdb.stanford.edu/</url>, it was found that among the top ten interactors with YLR143w by homozygous co-sensitivity are DPH5, DPH2, DPH4 (or JJJ3) and the newly identified DPH7 (or YBR246w) (Additional file <supplr sid="S3">3</supplr>: Figure S1). Both the coexpression and phenotype data thereby strongly support the hypothesis that YLR143w catalyzes the missing last step of diphthamide biosynthesis, even if one cannot rule out at this stage that other catalytic subunits yet to be identified may also be required.</p>
				<suppl id="S2">
					<title>
						<p>Additional file 2</p>
					</title>
					<text>
						<p>
							<b> Table S2. </b>GO Term Enrichment Spell analysis (<url>http://imperio.princeton.edu:3000/yeast</url>) with YLR143w as input.</p>
					</text>
					<file name="1745-6150-7-32-S2.xlsx">
   <p>Click here for file</p>
</file>
				</suppl>
				<suppl id="S3">
					<title>
						<p>Additional file 3</p>
					</title>
					<text>
						<p>
							<b>Figure S1. </b>Top 10 interactors with YLR143W by homozygous co-sensitivity in S. cerevisiae (from the Yeast fitness database <url>http://fitdb.stanford.edu/fitdb.cgi?query=YLR143W</url>). <b>Figure S2</b> Multiple sequence alignment of selected Dph6 family and DUF71-B12 family sequences generated using the Multialin platform (<url>http://multalin.toulouse.inra.fr/multalin/</url>) Strictly conserved residues between the two families are in red. Residues conserved only in the Dph6 family are boxed in green. Residues found around the phosphate group of ATP are noted by red arrows. Secondary structural elements, yellow rectangles for &#945;-helix and cyan arrows for &#946;-strand, shown above the alignment, are from the crystal structure of P. furiosus_Dph6 (PF0828) (PDB id: 3RK1). <b>Figure S3</b> Bayesian tree of archaeal and eukaryotic Dph6 sequences. The scale bar represents the average number of substitutions per site. Number at nodes represent posterior probabilities. For clarity only values greater than 0.85 are indicated. <b>Figure S4</b> (Top) Sequence logo derived from 95 Dph6 sequences extracted from Diphthamide subsystem in SEED. The E188 reside (PF0828 numbering) is located at position 10 in the logo. (Bottom) Sequence logo derived of the corresponding region derived from 102 DUF71-B12 sequences extracted from the DUF71-B12 subsystem in SEED. Both logos were made at <url>http://weblogo.berkeley.edu/logo.cgi</url> based on clustalw derived alignments. </p>
					</text>
					<file name="1745-6150-7-32-S3.pdf">
   <p>Click here for file</p>
</file>
				</suppl><p>Finally, comparison of ATP- and AMP-containing structures of PF0828 reveals that the active site of the former has a narrow groove at the end of which only the &#945;-phosphate of ATP is exposed to the solvent whereas the active site of the latter is wide open (Figure&#8201;<figr fid="F3">3</figr>A and B). Also, there is a sharp turn at the &#945;-phosphate of ATP, suggesting that it is the site of the nucleophilic attack. We therefore performed a docking exercise using the EF-2 structure (PDB id: 3B82) <abbrgrp>
						<abbr bid="B36">36</abbr>
					</abbrgrp> with the ATP-containing structure of PF0828. The docking revealed that the active site groove of the ATP-containing structure can easily accommodate diphthine with a few minor clashes between the two structures (Figure&#8201;<figr fid="F3">3</figr>A and B).</p>
				<fig id="F3"><title><p>Figure 3</p></title><caption><p>Structural analysis of the DUF71 (PF0828) putative active site. (A) Docking of modified EF-2 (cyan, PDB id: 3B82) onto ATP-bound structure of PF0828 (yellow, PDB id: 3RK1).</p></caption><text>
   <p><b>Structural analysis of the DUF71 (PF0828) putative activesite. </b>(<b>A</b>) Docking of modified EF-2 (cyan, PDB id: 3B82) onto ATP-bound structure of PF0828 (yellow, PDB id: 3RK1). ATP and several residues of PF0828 (DUF71), which are conserved among archaeal and eukaryotic orthologs, and diphthine of EF-2 (see text for details) are shown in stick models. (<b>B</b>) Close-up stereo pair of panel A. Diphthine of EF-2 and the side chains of conserved residues of PF0828, at the interface of PF0828 and EF-2, are shown in stick models and labeled. (<b>C</b>) Stereo pair view of ATP-binding region of PF0828. Residues that are conserved among Dph6 and DUF71-B12 families are depicted in stick models with carbon atoms in cyan, while the residues that are specific to Dph6 family are shown in stick models with carbon atoms in green. Oxygen and nitrogen atoms are shown in red and blue in all stick models, respectively.</p>
</text><graphic file="1745-6150-7-32-3"/></fig><p>The modeling also showed that the carboxyl group of diphthine resides near the &#945;-phosphate of ATP and carboxylate group of residue Glu<sub>188</sub>, suggesting that nucleophilic attack by diphthine on the &#945;-phosphate of ATP is highly feasible (Figure&#8201;<figr fid="F3">3</figr>B). As shown in Figure&#8201;<figr fid="F3">3</figr>B, the modelling also shows that several residues which are highly conserved among archaeal and eukaryotic PF0828 and YLR143w orthologs beside E<sub>188</sub>, including S<sub>44</sub>, Y<sub>45</sub>, E<sub>78</sub>, Y<sub>103</sub>, Q<sub>104</sub>, A<sub>149</sub>, E<sub>183 </sub>and E<sub>186 </sub>(Additional file <supplr sid="S3">3</supplr>: Figure S2), are at the interface of the modelled complex of PF0828 with EF-2, supporting the hypothesis that they play important roles in EF-2 recognition (Figure&#8201;<figr fid="F3">3</figr>B).</p>
			</sec>
			<sec>
				<st>
					<p>Linking DUF71 family members to ammonia transfer reactions</p>
				</st><p>The diphthine ammonia lyase reaction requires a source of NH<sub>3</sub>
					<abbrgrp>
						<abbr bid="B12">12</abbr>
					</abbrgrp>. Domain fusions involving members of the DUF71 family in the Pfam database <abbrgrp>
						<abbr bid="B15">15</abbr>
					</abbrgrp> suggests the source of NH<sub>3 </sub>might vary depending on the organism. For example, in a few Archaea (e.g. <it>Methanohalophilus mahii</it> DSM 5219, <it>Methanosalsum zhilinae</it> DSM 4017 or &#8216;<it>Candidatus</it> Nanosalinarum sp. J07AB56&#8242;), a COG0367/AsnB asparagine synthetase [glutamine-hydrolyzing] (EC 6.3.5.4) domain is found at the N-terminus of the DUF71 domain (Figure&#8201;<figr fid="F2">2</figr>C). This AsnB domain can be further separated into two subdomains, an N-terminal class-II glutamine amidotransferase domain (GAT-II) <abbrgrp>
						<abbr bid="B37">37</abbr>
					</abbrgrp> and an Asn_Synthase_B_C PP-loop ATPase domain (Figure&#8201;<figr fid="F2">2</figr>C) . This domain organization suggests that in this subset of enzymes, the hydrolysis of glutamine catalyzed by the GAT-II domain could provide the NH<sub>3 </sub>moiety to both the DUF71 and the Asn_Synthase_B_C enzymes. On the other hand, in many eukaryotes such as yeast and <it>Arabidopsis thaliana</it>, two YjgF-YER057c-UK114-like domains are fused to the C-terminus of the DUF71 protein as previously noted by Aravind et al. <abbrgrp>
						<abbr bid="B32">32</abbr>
					</abbrgrp> (Figure&#8201;<figr fid="F2">2</figr>C). The stand-alone members of the YjgF-YER057c-UK114 family, now called the RidA family (for reactive intermediate/imine deaminase A), have been shown to deaminate products generated by PLP-dependent enzymes, which results in the release of NH<sub>3</sub>
					<abbrgrp>
						<abbr bid="B38">38</abbr>
					</abbrgrp>. The RidA domains fused to DUF71 could therefore be involved in providing the NH<sub>3</sub> ammonium moiety for diphthamide synthesis.</p>
			</sec>
			<sec>
				<st>
					<p>The Duf71 family is not monofunctional</p>
				</st><p>The taxonomic distribution of DUF71 homologs in available complete genomes confirmed that DUF71 is present in one or occasionally two copies in all Archaea except the korarchaeon <it>K</it>. <it>cryptofilum</it> (Table&#8201;<tblr tid="T1">1</tblr> and Additional file <supplr sid="S1">1</supplr>: Table S1). This pattern is consistent with an ancient origin of the DUF71 gene in Archaea. In sharp contrast, DUF71 is sporadically distributed in Bacteria, being present only in a few representatives of some phyla (Table&#8201;<tblr tid="T1">1</tblr> and Additional file <supplr sid="S1">1</supplr>: Table S1). This pattern fits either with an ancient origin of DUF71 in Bacteria followed by numerous losses or, conversely, with a more recent acquisition followed by horizontal gene transfer (HGT) among bacterial lineages. To further investigate the evolutionary history of DUF71, we made a phylogenetic analysis of the homologs identified in the three Domains of Life. The resulting tree showed two divergent groups of sequences. The first group contains the eukaryotic and nearly all archaeal sequences (including the predicted yeast DPH6 (YLR143w) and <it>P</it>. <it>furiosus</it> PF0828), whereas the second encompasses all the bacterial sequences as well as the second copy found in a few archaeal genomes (Figure&#8201;<figr fid="F4">4</figr> and Additional file <supplr sid="S3">3</supplr>: Figure S3).</p>
				<table id="T1">
					<title>
						<p>Table 1</p>
					</title>
					<caption>
						<p>
							<b>Taxonomic distribution of DUF71 </b><b>homologs in archaeal and </b><b>bacterial genomes</b>
						</p>
					</caption>
					<tgroup align="left" cols="6">
						<colspec align="left" colname="c1" colnum="1" colwidth="1*"/>
						<colspec align="left" colname="c2" colnum="2" colwidth="1*"/>
						<colspec align="left" colname="c3" colnum="3" colwidth="1*"/>
						<colspec align="left" colname="c4" colnum="4" colwidth="1*"/>
						<colspec align="left" colname="c5" colnum="5" colwidth="1*"/>
						<colspec align="left" colname="c6" colnum="6" colwidth="1*"/>
						<thead valign="top">
							<row rowsep="1">
								<entry colname="c1">
									<p>
										<b>Phylum</b>
									</p>
								</entry>
								<entry colname="c2">
									<p>
										<b>Nb (%) genomes</b>
									</p>
								</entry>
								<entry colname="c3">
									<p>
										<b>Phylum</b>
									</p>
								</entry>
								<entry colname="c4">
									<p>
										<b>Nb (%) genomes</b>
									</p>
								</entry>
								<entry colname="c5">
									<p>
										<b>Phylum</b>
									</p>
								</entry>
								<entry colname="c6">
									<p>
										<b>Nb (%) genomes</b>
									</p>
								</entry>
							</row>
						</thead>
						<tfoot>
							<p>The number of genomes per phylum containing at least one homolog of DUF71 is indicated.</p>
						</tfoot>
						<tbody valign="top">
							<row>
								<entry colname="c1">
									<p>
										<b>
											<it>Archaea</it>
										</b>
									</p>
								</entry>
								<entry colname="c2"/>
								<entry colname="c3"/>
								<entry colname="c4"/>
								<entry colname="c5"/>
								<entry colname="c6"/>
							</row>
							<row>
								<entry colname="c1">
									<p>Crenarchaeota</p>
								</entry>
								<entry colname="c2">
									<p>37/37 (100%)</p>
								</entry>
								<entry colname="c3">
									<p>Korarchaeota</p>
								</entry>
								<entry colname="c4">
									<p>0/1 (0%)</p>
								</entry>
								<entry colname="c5">
									<p>Thaumarchaeota</p>
								</entry>
								<entry colname="c6">
									<p>2/2 (100%)</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>Euryarchaeota</p>
								</entry>
								<entry colname="c2">
									<p>79/79 (100%)</p>
								</entry>
								<entry colname="c3"/>
								<entry colname="c4"/>
								<entry colname="c5"/>
								<entry colname="c6"/>
							</row>
							<row>
								<entry colname="c1">
									<p>
										<b>
											<it>Bacteria</it>
										</b>
									</p>
								</entry>
								<entry colname="c2"/>
								<entry colname="c3"/>
								<entry colname="c4"/>
								<entry colname="c5"/>
								<entry colname="c6"/>
							</row>
							<row>
								<entry colname="c1">
									<p>Acidobacteria</p>
								</entry>
								<entry colname="c2">
									<p>3/7 (42.9%)</p>
								</entry>
								<entry colname="c3">
									<p>Dictyoglomi</p>
								</entry>
								<entry colname="c4">
									<p>0/2 (0%)</p>
								</entry>
								<entry colname="c5">
									<p>Proteobacteria_Epsilon</p>
								</entry>
								<entry colname="c6">
									<p>0/64 (0%)</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>Actinobacteria</p>
								</entry>
								<entry colname="c2">
									<p>1/206 (0.5%)</p>
								</entry>
								<entry colname="c3">
									<p>Elusimicrobia</p>
								</entry>
								<entry colname="c4">
									<p>0/2 (0%)</p>
								</entry>
								<entry colname="c5">
									<p>Proteobacteria_Gamma</p>
								</entry>
								<entry colname="c6">
									<p>27/406 (6.7%)</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>Aquificae</p>
								</entry>
								<entry colname="c2">
									<p>0/10 (0%)</p>
								</entry>
								<entry colname="c3">
									<p>Fibrobacteres</p>
								</entry>
								<entry colname="c4">
									<p>0/2 (0%)</p>
								</entry>
								<entry colname="c5">
									<p>PVC_Chlamydiae</p>
								</entry>
								<entry colname="c6">
									<p>1/73 (1.4%)</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>Bacteroidetes</p>
								</entry>
								<entry colname="c2">
									<p>20/73 (27.4%)</p>
								</entry>
								<entry colname="c3">
									<p>Firmicutes</p>
								</entry>
								<entry colname="c4">
									<p>20/484 (4.1%)</p>
								</entry>
								<entry colname="c5">
									<p>PVC_Planctomycetes</p>
								</entry>
								<entry colname="c6">
									<p>3/6 (50%)</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>Caldiserica</p>
								</entry>
								<entry colname="c2">
									<p>0/1 (0%)</p>
								</entry>
								<entry colname="c3">
									<p>Fusobacteria</p>
								</entry>
								<entry colname="c4">
									<p>0/5 (0%)</p>
								</entry>
								<entry colname="c5">
									<p>PVC_Verrucomicrobia</p>
								</entry>
								<entry colname="c6">
									<p>0/4 (0%)</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>Chlorobi</p>
								</entry>
								<entry colname="c2">
									<p>0/11 (0%)</p>
								</entry>
								<entry colname="c3">
									<p>Gemmatimonadetes</p>
								</entry>
								<entry colname="c4">
									<p>0/1 (0%)</p>
								</entry>
								<entry colname="c5">
									<p>Spirochaetes</p>
								</entry>
								<entry colname="c6">
									<p>1/45 (2.2%)</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>Chloroflexi</p>
								</entry>
								<entry colname="c2">
									<p>5/16 (31.3%)</p>
								</entry>
								<entry colname="c3">
									<p>Ignavibacteria</p>
								</entry>
								<entry colname="c4">
									<p>0/1 (0%)</p>
								</entry>
								<entry colname="c5">
									<p>Synergistetes</p>
								</entry>
								<entry colname="c6">
									<p>0/4 (0%)</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>Chrysiogenetes</p>
								</entry>
								<entry colname="c2">
									<p>0/1 (0%)</p>
								</entry>
								<entry colname="c3">
									<p>Nitrospirae</p>
								</entry>
								<entry colname="c4">
									<p>1/3 (33.3%)</p>
								</entry>
								<entry colname="c5">
									<p>Thermodesulfobacteria</p>
								</entry>
								<entry colname="c6">
									<p>0/2 (0%)</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>Cyanobacteria</p>
								</entry>
								<entry colname="c2">
									<p>0/45 (0%)</p>
								</entry>
								<entry colname="c3">
									<p>Proteobacteria_Alpha</p>
								</entry>
								<entry colname="c4">
									<p>2/204 (1%)</p>
								</entry>
								<entry colname="c5">
									<p>Thermotogae</p>
								</entry>
								<entry colname="c6">
									<p>5/14 (35.7%)</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>Deferribacteres</p>
								</entry>
								<entry colname="c2">
									<p>0/4 (0%)</p>
								</entry>
								<entry colname="c3">
									<p>Proteobacteria_Beta</p>
								</entry>
								<entry colname="c4">
									<p>8/119 (6.7%)</p>
								</entry>
								<entry colname="c5"/>
								<entry colname="c6"/>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>Deinococcus-Thermus</p>
								</entry>
								<entry colname="c2">
									<p>2/17 (11.8%)</p>
								</entry>
								<entry colname="c3">
									<p>Proteobacteria_Delta</p>
								</entry>
								<entry colname="c4">
									<p>1/48 (2.1%)</p>
								</entry>
								<entry colname="c5"/>
								<entry colname="c6"/>
							</row>
						</tbody>
					</tgroup>
				</table>
				<fig id="F4"><title><p>Figure 4</p></title><caption><p>Neighbor-joining phylogenetic tree of the 445 DUF71 homologs identified in public databases.</p></caption><text>
   <p><b>Neighbor-joining phylogenetic tree of </b><b>the 445 DUF71 homologs </b><b>identified in public databases. </b>The scale bar represents the average number of substitutions per site. Numbers at nodes are bootstrap values. For clarity only values greater than 50% are indicated. Colors correspond to the taxonomic affiliation of sequences (see the box on the figure for details). The full tree of Cluster 1 is shown in Additional file <supplr sid="S3">3</supplr>: Figure S3).</p>
</text><graphic file="1745-6150-7-32-4"/></fig><p>This second group emerged from within the archaeal sequences of the first cluster and showed various contradictions with the currently recognized taxonomy because bacterial sequences from distantly related lineages appeared intermixed in the tree (Figure&#8201;<figr fid="F4">4</figr>). These observations together with the extremely patchy distribution of DUF71 in bacteria strongly supports the hypothesis that the bacterial DUF71 was of archaeal origin and spread through this domain mainly by HGT. Interestingly, the second homologs present in a few archaeal genomes emerged from bacterial sequences, suggesting that secondary HGT occurred from Bacteria to Archaea allowing them acquiring a second DUF71 homolog.</p><p>In contrast, a phylogenetic analysis focused on archaeal and eukaryotic sequences strongly supported the separation between these two Domains (posterior probabilities (PP)&#8201;=&#8201;1). Moreover it recovered the monophyly of most eukaryotic and archaeal major lineages (most PP&#8201;&gt;&#8201;0.95, Additional file <supplr sid="S3">3</supplr>: Figure S3), suggesting that DUF71 was present in their ancestors. However, as expected given the small number of amino acid positions analyzed (182 positions), the relationships among these lineages were mainly unresolved (most PP&#8201;&lt;&#8201;0.95) precluding the in-depth analysis of the ancient evolutionary history of DUF71 in Archaea and Eucarya (Additional file <supplr sid="S3">3</supplr>: Figure S3). Nevertheless, the wide distribution of DUF71 in these two Domains (even in highly derived parasites such as <it>Microsporidia</it>, <it>Cryptosporidium</it>, <it>Entamoeba</it> or <it>Nanoarchaeum equitans</it>, not shown) and its ancestral presence in most of their orders/phyla suggested that this gene was present in the last common ancestor of these two Domains. This inference does not imply, however, that no HGT occurred in these Domains. Indeed, some incongruence between the DUF71 phylogeny and the reference phylogeny of organisms <abbrgrp>
						<abbr bid="B39">39</abbr>
					</abbrgrp> suggested putative cases of HGT. For instance, it was observed for the <it>Thermofilum pendens</it> DUF71 that robustly groups with Methanomicrobia (Euryarchaeota) and not with other Thermoproteales (Additional file <supplr sid="S3">3</supplr>: Figure S3).</p><p>Because diphthamide is a modification specific to the archaeal and eukaryotic EF-2 proteins and bacteria lack all known diphthamide biosynthesis genes, we propose that cluster 1 in our phylogeny corresponds to <it>bona fide</it> Dph6 enzymes involved in diphthamide synthesis (Figure&#8201;<figr fid="F4">4</figr>). This function therefore very likely represents the ancestral function of the whole DUF71 family. In contrast, bacteria do not synthesize diphthamide, suggesting that the bacterial DUF71 homologs and the few additional archaeal copies (cluster 2, Figure&#8201;<figr fid="F4">4</figr>) are involved in another function, and thus a functional shift occurred after the HGT of an archaeal bona fide Dph6 to bacteria. Notably, these genes (including PF0295, the second DUF71 copy found in <it>P</it>. <it>furiosus</it>) are strongly clustered on the chromosome with vitamin B12 salvage genes. More precisely 75/102 are adjacent to vitamin B12 transporter genes (such as the BtuCDF genes) <abbrgrp>
						<abbr bid="B40">40</abbr>
					</abbrgrp> and 18/102 are adjacent to <it>cbiB</it> genes encoding adenosylcobinamide-phosphate synthetase, an enzyme shared by the <it>de novo</it> and salvage pathways <abbrgrp>
						<abbr bid="B41">41</abbr>
					</abbrgrp> (Figure&#8201;<figr fid="F5">5</figr>A). This clustering data can be visualized in the &#8220;Duf71-B12&#8221; subsystem in the SEED database, and two typical clusters are shown in Figure&#8201;<figr fid="F5">5</figr>B. On this basis, we hypothesize that the archaeal and bacterial DUF71 genes that cluster with B12 vitamin genes have a role in B12 metabolism.</p>
				<fig id="F5"><title><p>Figure 5</p></title><caption><p>Links between the DUF71 family and B12 salvage (A) Summary of cobinamide derivative salvage in Bacteria and Archaea; arrows with dotted lines denote multiple steps.</p></caption><text>
   <p><b>Links between the DUF71 family and B12 salvage. </b>(<b>A</b>) Summary of cobinamide derivative salvage in Bacteria and Archaea; arrows with dotted lines denote multiple steps. (<b>B</b>) Typical examples of physical clustering of DUF71-B12 genes with B12 salvage genes in Archaea and Bacteria. Abbreviations: Pseudo-B12, adenosylpseudocobalamin; Cbi, Cobinamide; AdoCbi, adenosylCbi; AdoCbi-P, AdenosylCbi-phosphate; AdoCby, adenosylcobyric acid; AP; (R)-1-amino-2-propanol; AP-P, AP-phosphate; Thr-P, L-threonine-phosphate; DMB, 5,6-dimethylbenzimidazole; &#945;-AMP-AP, &#945;-adenylate-AP; CobU, ATP:AdoCbi kinase, GTP:AdoCbi-GDP guanylyltransferase; CobY, NTP:AdoCbi-P nucleotidyltransferase; CobA, ATP:co(I)rrinoid adenosyltransferase; aCbiZ, adenosylcobinamide amidohydrolase; bCbiZ, pseudo-B12 amidohydrolase; CbiB, cobyric acid synthetase; CobD, L-threonine phosphate decarboxylase; CobS, cobalamin (5-P) synthase; CobT, 5,6-dimethylbenzimidazole phosphoribosyltransferase; CobC or CobZ, alpha-ribazole-5&#8242;-phosphate phosphatase; cobY, adenosylcobinamide-phosphate guanylyltransferase; CbiP, cobyric acid synthase; BtuFCD, cobamide transporter subunits.</p>
</text><graphic file="1745-6150-7-32-5"/></fig><p>Finally, some bacterial DUF71 proteins might also have other functions because a set of bacteria such as Clostridium perfringens have two or more DUF71 homologs (Figure&#8201;<figr fid="F4">4</figr> and Additional file <supplr sid="S1">1</supplr>: Table S1). The most extreme example is Dehalococcoides sp. CBDB1, which encodes five DUF71 homologs in its genome. In the case of C. perfringens ATCC 13124 and SM101, one homolog (YP_695745 and YP_698440) clusters both physically and phylogenetically (Figure&#8201;<figr fid="F4">4</figr> and <figr fid="F5">5</figr>A) with the B12 subgroup proteins, whereas the second homolog (YP_695178 and YP_698039) is related to Acinetobacter baumanii (Cluster 3, Figure&#8201;<figr fid="F4">4</figr>) and is not found associated to gene clusters related to B12 salvage (data not shown).</p><p>Therefore, based on phylogenetic and physical clustering the DUF71 proteins were split into: the Dph6 and the DUF71-B12 subgroups that were annotated as such and captured in the &#8220;Diphthamide biosynthesis&#8221; and &#8220;Duf71-B12&#8221; subsystems in the SEED database.</p>
			</sec>
			<sec>
				<st>
					<p>Predicting the function of members of the DUF71-B12 subgroup</p>
				</st><p>As members of the DUF71-B12 subgroup clustered strongly with B12 transport genes and with <it>cbiB</it> (Figure&#8201;<figr fid="F5">5</figr>B), we focused on the early steps on B12 salvage, which are quite diverse because several forms of cobamides [cobalamin-like or Cbl-like compounds] can be salvaged (Figure&#8201;<figr fid="F5">5</figr>A). Cobinamide (Cbi) is adenylated after transport to form adenosylcobinamide (AdoCbi). In most bacteria, AdoCbi is directly phosphorylated by CobU before being transformed after several steps into adenosylcobalamin (AdoCbl or coenzyme B12), in which the lower ligand is 5,6-dimethylbenzimidazole (DMB) (see <abbrgrp>
						<abbr bid="B42">42</abbr>
					</abbrgrp> for review) (Figure&#8201;<figr fid="F5">5</figr>A). Archaea use a different salvage route in which AdoCbi is converted to adenosylcobyric acid (AdoCby), an intermediate of the <it>de novo</it> pathway, by an amidohydrolase, aCbiZ <abbrgrp>
						<abbr bid="B43">43</abbr>
					</abbrgrp> (Figure&#8201;<figr fid="F5">5</figr>A). AdoCby is then converted directly to adenosylcobinamide-phosphate (AdoCbi-P) by CbiB. Finally some bacteria have CbiZ homologs (bCbiZ) that hydrolyze adenosylpseudocobalamin (Ado-Pseudo-B12) <abbrgrp>
						<abbr bid="B44">44</abbr>
					</abbrgrp>, which contains an adenine instead of DMB as its lower ligand (Figure&#8201;<figr fid="F1">1</figr>C and <figr fid="F5">5</figr>A).</p><p>In order to gain insight into the possible function of DUF71-B12 family members, we analyzed the co-distribution pattern of CbiZ, CbiB and DUF71-B12 proteins in Archaea and Bacteria. Interestingly, to a few exceptions, all prokaryotic genomes encoding CbiB harbor either CbiZ or DUF71-B12 (Figure&#8201;<figr fid="F6">6</figr>). However, in bacteria, there was strict anti-correlation between the DUF71-B12 and the CbiZ families (Figure&#8201;<figr fid="F6">6</figr>A). This was not the case in Archaea where quite a few organisms (such as <it>P</it>. <it>furiosis</it> or <it>Methanosarcina mazei</it> Go1) harbored both families (Figure&#8201;<figr fid="F6">6</figr>B). This distribution profile suggests that members of the DUF71-B12 subfamily fulfil the same roles as the bacterial CbiZ enzymes (bCbiZ), either by catalysing the same reaction (cleaving Ado-pseudo-B12 into AdoCby) or by providing another route to salvaging Pseudo-B12. This hypothesis would explain why bacteria would have one or the other while Archaea could carry both (Figure&#8201;<figr fid="F6">6</figr>B), because archaeal CbiZ proteins have been predicted to lack pseudo-B12 cleavage activity <abbrgrp>
						<abbr bid="B44">44</abbr>
					</abbrgrp>.</p>
				<fig id="F6"><title><p>Figure 6</p></title><caption><p>Distribution of DUF71-B12, CbiZ and CbiB in bacterial (A) and archaeal genomes (B).</p></caption><text>
   <p><b>Distribution of DUF71-B12, CbiZ </b><b>and CbiB in bacterial </b><b>(A) and archaeal genomes </b><b>(B). </b>The trees are species tree constructed in iTol (itol.embl.de/), the presence and absence of the specific genes was derived from the &#8220;DUF71-B12&#8221; subsystem in the SEED database.</p>
</text><graphic file="1745-6150-7-32-6"/></fig><p>Detailed analysis of the signature motifs of the two subfamilies reveal that the strictly conserved EGGE/DXE<sub>188 </sub>motif (<it>P</it>. <it>furiosus</it> PF0828 numbering) in Dph6 proteins is replaced by a ENGEF/YH<sub>188 </sub> motif in the DUF71-B12 proteins (Additional file <supplr sid="S3">3</supplr>: Figure S2 and Additional file <supplr sid="S3">3</supplr>: Figure S4). In the Dph6 family, E188 is located near the predicted diphthine binding site and is predicted to be involved in catalysis (Figure&#8201;<figr fid="F3">3</figr>B). The replacement of the strictly conserved E188 residue by a Histidine residue strongly suggest a change in the reaction catalyzed by the DUF71-B12 subfamily compared to the Dph6 family. The structure based comparison between the two subfamilies also strongly supports the hypothesis that their substrates are different, because all residues predicted to be involved in EF-2 binding (Figure&#8201;<figr fid="F3">3</figr>B see section above) are different in the DUF71-B12 subfamily but mostly conserved within this subfamily (Additional file <supplr sid="S3">3</supplr>: Figure S2 and residues in green in Figure&#8201;<figr fid="F3">3</figr>C). Residues that are conserved between the two DUF71 subfamilies (Additional file <supplr sid="S3">3</supplr>: Figure S2 and residues in blue in Figure&#8201;<figr fid="F3">3</figr>C) are found around the phosphate groups of ATP, including S<sub>12</sub>, G<sub>13</sub>, G<sub>14</sub>, K<sub>15</sub>, D<sub>16</sub>, H<sub>48</sub>, and T<sub>189 </sub>(PF0828 sequence numbering) or belong to the C-terminal conserved sequence motif (EGGE/D-X-E188) such as G<sub>182</sub>, G<sub>184</sub>, G<sub>185</sub>, E<sub>186</sub>, F<sub>187</sub> (Additional file <supplr sid="S3">3</supplr>: Figure S2 and Figure&#8201;<figr fid="F3">3</figr>C). Further experimental studies will be required to determine whether DUF71-B12 proteins are Ado-pseudo-B12 amidohydrolases or have another role in Ado-pseudo-B12 salvage.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Conclusions</p>
			</st><p>Our detailed analyses of the DUF71 family members presented here provide an example of the power of comparative genomic approaches for solving important &#8220;missing genes&#8221; or &#8220;missing function&#8221; cases. These analyses simultaneously illustrate the difficulties inherent in accurately annotating gene families. On one hand, the evidence identifying a candidate for the missing Dph6 gene family derived from genomic evidence (mainly phylogenetic distribution and gene fusions) and post-genomic evidence (structure, co-expression analysis and genome-wide phenotype experiments) is so strong that it could be used as an example where the functional annotation of a protein of unknown function could be derived from comparative genomic alone. On the other hand our analyses show that a subgroup of the DUF71 family is most certainly involved in a metabolic pathway unrelated to diphthamide synthesis and that transferring functional annotations from homology scores alone would be inappropriate in this case. We believe that this integrated functional annotation approach will play an important role in future pipelines for annotation of protein families.</p>
		</sec>
		<sec>
			<st>
				<p>Competing interests</p>
			</st><p>The author(s) declare that they have no competing interests.</p>
		</sec>
		<sec>
			<st>
				<p>Authors&#8217; contributions</p>
			</st><p>VdC-L conducted the comparative genomic analysis and made the functional predictions. CB-A performed the phylogenetic analysis. FF, LT and JFH did the structural analysis. All authors participated in writing/reviewing the manuscript. All authors read and approved the final manuscript.</p>
		</sec>
		<sec>
			<st>
				<p>Reviewers&#8217; comments</p>
			</st><p>
				<b>Reviewer number 1: Arcady Mushegian</b>
			</p><p>
				<b>Stowers Institute for Medical Research, 1000 E 50</b>
				<sup>
					<b>th</b>
				</sup> <b>Street, Kansas City, Missouri 64110</b>
			</p><p>The study by de Crecy-Lagard and co-authors pinpoints the DUF71/COG2102 asthe most likely archaeal/eukaryotic ATP-dependent diphthine-ammonia ligase,the so far unaccounted-for enzyme in the pathway of diphtamide biosynthesis, which pathway is responsible for the formation of unique derivative of the conserved histidine within the translation elongation factor 2. A distinct subfamily of this protein family appears to play another role in bacteria and a subset of archaea, most likely in the salvage of an intermediate of cobalamine biosynthesis. The evidence presented in the paper consists of genome context information, sequence-structure prediction and the data from yeast concerning gene expression and chemical-genomics profiling. Taken together, the evidence seems compelling to me. The data from yeast represent partial functional validation of predictions made for prokaryotes. I would recommend only to tone down the suggestion that all this is a &#8220;novel paradigm&#8221; in analysis of gene function: researchers have been inferring gene functions from phenotypes, as well as from directly detected changes in genotype, for a long, long time, and the current study is a logical extension of these approaches. What is different in the last 15&#8201;years is that we can compare these properties across many species with completely sequenced genomes; but even this is a logical extension of the previous work (compare, for example, with work from Yanofskyand Jensen labs on biosynthesis of aromatic amino acids) - it was not any prescription of a previous scientific paradigm that constrained the work, but rather the lack of the data.</p><p>Response: <it>The references to a &#8220;novel paragdim&#8221; were eliminated in the abstract and the introduction as suggested.</it>
			</p><p>
				<b>Reviewer number 2: Michael Galperin</b>
			</p><p>
				<b>NCBI, NLM, NIH Computational Biology Branch, 8600 Rockville Pike MSC 6075, Building 38A, Room 6N601, Bethesda, MD 20894-6075</b>
			</p><p>The paper by de Crecy-Lagard and colleagues is a fine example of using comparative genomics to patch the remaining holes in the metabolic pathways. The key conclusion of this work, prediction of the participation of the members of the DUF71/COG2102 family in diphtamide biosynthesis in archaea and eukaryotes and in B12 metabolism in some bacteria and archaea, is extremely convincing and hardly even needs an experimental verification. The second conclusion, that ammonia used in the diphthine ammonia lyase-catalyzed reaction in different organisms could use generated by two different enzymes, asparagine synthetase and the RidA domain, also sounds convincing. However, proving beyond reasonable doubt that DUF71/COG2102 family members with their ATP-pyrophosphatase activity comprise the key part of diphthine ammonia lyase does not prove that they are the only subunits of this enzyme. Even if the proposed reaction scheme (Figure&#8201;<figr fid="F1">1</figr>B) is correct, there still might be a need for a ligase subunit that couple removal of the AMP moiety from EF2 with its amidation. There is a definite possibility that DUF71/COG2102 family members catalyze all these individual reactions, e.g. using its unique C-terminal 100-aa domain, but that would have to be proven experimentally. The reported involvement of the likely scaffold protein YBR246w (DPH7) appears to support the idea that diphthine ammonia lyase consists of more than one type of subunits. Otherwise, it is a great paper that vividly demonstrates the power of comparative-genomics approaches.</p><p>We added a phrase stating that &#8220;even if one cannot rule out at this stage that other catalytic subunits yet to be identified may also be required&#8221;.</p><p>
				<b>Reviewer number 3: L. Aravind</b>
			</p><p>
				<b>NCBI, NLM, NIH Computational Biology Branch, 8600 Rockville Pike MSC 6075, Building 38A, Room 6N601, Bethesda, MD 20894-6075</b>
			</p><p>This work uses contextual information to identify the diphthine-ammonia ligase in archaea and eukaryotes. It also shows that the yeast protein YBR246W is indeed not the correct ligase, but rather the MJ0570-like PP-loop ATPases. The authors also show that this family has been transferred to certain bacteria where they infer that it is likely to have undergone a functional shift to participate in B12 salvage. They cautiously propose that it might function as a replacement for CbiZ to function as an amidohydrolase (the reverse of the typical PP-loop ATPase reaction) as against a ligase. The conclusions are definitive and the article makes a useful contribution to the understanding of protein modification and cofactor biosynthesis. This said, there are certain issues with the current form of the article that authors necessarily need to address in their revision: 1) (pg 8) The authors state that the MJ0570-like enzymes have a HUP domain followed by a distinct C-terminal domain. They do not explain the meaning of this properly nor cite the reference of the paper (PMID: 12012333) pertaining to the HUP domains where this family was identified as a PP-loop ATPase, along with the observations (Table&#8201;<tblr tid="T1">1</tblr> in that reference) that it has a primarilyarchaeo-eukaryotic phyletic pattern, and that eukaryotic versions might be fused to two C-terminal domains of the YabJ-like chorismate lyase fold (now termed RidA). It should be stated that the N-terminus is a PP-loop ATPase domain of the HUP class of Rossmannoid domains - not all HUP domains are ligases - only the PP-loop and the HIGH nucleotidyltransferases . This clarifies that it is related to other ATP-utilizing amidoligases such as NAD synthethase, GMP synthetase and asparagine synthetase. This would place their inferred amidoligase activity in the context of comparable, known amidoligase activities of related enzymes. In fact it would be advisable to place the fact that these are PP-loop enzymes in the abstract itself.</p><p>The following sentence was added: &#8220;This family had previously been previously described as a PP-loop ATPase of unknown function containing a Rossmannoid class HUP domain (Aravind et al. 2002).&#8221; A reference to the PP-loop ATPase family was added in the abstract as requested. A reference to the same work was added when talking about the RidA fusion. For the phylogenetic distribution the results presented here are a bit different from the previous study because many more genome are available after 10&#8201;years and we show that the family is also bacterial.</p><p>2) The authors persistently refer to the domain as DUF71. This name is no longer current in Pfam and it has long been recognized as mentioned in the reference noted above that these proteins are not &#8220;domains of unknown function&#8221; but PP-loop ATPases. The domain is correctly termed ATP_bind_4 (PF01902) in Pfam. This Pfam (not the misleading DUF71) name and Pfam number should be indicated with just a statement in the introduction that it was formerly DUF71.</p><p>This domain is currently called &#8220;Domain of unknown function DUF71, ATP-binding domain&#8221; in the InterPro database (IPR002761) even if it is called ATP_bind_4 (PF01902) in Pfam. It is much shorter to use (as well as easier for the reader to follow) the DUF71 abbreviation rather than the ATP_bind_4 abbreviation. We therefore prefer to keep DUF71. We however introduced a statement giving the different names of this domain in the InterPro, Pfam and COG databases at the end of the introduction.</p><p>
				<it>3</it>) <it>The authors apparently have </it>
				<it>a misapprehension regarding the </it>
				<it>Methanohalophilus mahii protein both </it>
				<it>in the text and </it>
				<it>the domain architecture rendered </it>
				<it>in the figure</it>. <it>First</it>, <it>these proteins have two </it>
				<it>N-terminal domains fused tothe </it>
				<it>MJ0570-like module: namely a</it>
				<it>N-terminal class-II glutamineamidotransferase (GAT-II, </it>
				<it>e.g. see PMID: 20023723) </it>
				<it>and second PP-loop ATPase </it>
				<it>domain thereafter (i.e. one </it>
				<it>related to asparagine synthetase). </it>
				<it>This GAT domain as </it>
				<it>in the case of </it>
				<it>other PP-loop enzymes could </it>
				<it>supply ammonia by cleaving </it>
				<it>it off glutamine. But </it>
				<it>this does not explain </it>
				<it>which PP-loop domain utilizes </it>
				<it>it. In the case </it>
				<it>of the Asn-synthetase it </it>
				<it>is used by the </it>
				<it>cognate PP-loop domain. In </it>
				<it>this case the presence </it>
				<it>of two PP-loop domains </it>
				<it>suggests that it is </it>
				<it>either utilized by both </it>
				<it>for different reactions or </it>
				<it>else the second domain </it>
				<it>does not receive the </it>
				<it>NH3 from this GAT. </it>
				<it>This also leads to </it>
				<it>the question what reaction </it>
				<it>is the Asn synthetase </it>
				<it>like PP-loop domain catalyzing</it>? </p><p>Quality of written English: Acceptable</p><p>The source of the confusion came from the fact that the Asn Synthase domain (AsnB) contains two domains the GAT-II domain and the Asn_Synthase_B_C PP-Loop ATPase domain. Both the figure and the text were modified to avoid the confusion. Based on the reviewer&#8217;s comments the sentence discussing the potential role of the AsnB domain was modified as follows: &#8220;This domain organization strongly suggests that in this subset of enzymes, the hydrolysis of glutamine catalyzed by the fused GAT-II domain could provide the NH<sub>3 </sub>moiety to both the DUF71 and the Asn_Synthase_B_C enzymes.&#8221;</p><p>4) Based on phyletic complementarity the authors suggest that bacterial CbiZ might be displaced by the bacterial MJ0570-like enzymes. This seems unusual - Why utilize a PP-loop ATPase for the reverse reaction, i.e. amidohydrolase? Typically there is little overlap between the families involved in amidohydrolase as opposed to ATP-dependent ligase activity. Of the almost 12 distinct major inventions of amidoligase activity, hardly any representatives of these superfamilies have been reused as amidohydrolases. So do the authors note anything special in the case of the bacterial representatives that might support such a functional shift?</p><p>This hypothesis is derived from phylogenetic distribution and it is not unprecedented that ligases and hydrolases are found in the same family (see example in PMID:12359880). However, we agree that this hypothesis derives mainly from phylogenetic patterns analysisand beyond the differences in the predicted substrate binding pocket found in the DUF71-B12 family we did not identify specify changes that could point to a shift to hydrolase, hence our caution in our prediction as stated in the text.</p><p>Quality of written English: Acceptable</p>
		</sec>
	</bdy>
	<bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st><p>This work was supported by the US National Science Foundation (grant MCB-1153413 to V. dC-L), the US National Institutes of Health (grant U54GM094597 to G.T. Montelione and the Northeast Structural Genomics Consortium) and the Agence Nationale pour la Recherche (grant ANR-10-BINF-01-0127 Ancestrome) to C. B-A. We thank Raffael Schaffrath and Mike Stark for sharing for sharing unpublished diphthamide related data and critical evaluation of manuscript parts. We thank for Jorge Escalante-Semerena for sharing his immense knowledge on B12 salvage pathways, Diana Downs for disclosing unpublished results on RidA function, Manal Swairjo for chemical insight, and Andrew Hanson for helpful input on the manuscript.</p>
			</sec>
		</ack>
		<refgrp><bibl id="B1"><title><p>Unique modifications of translation elongation factors</p></title><aug><au><snm>Greganova</snm><fnm>E</fnm></au><au><snm>Altmann</snm><fnm>M</fnm></au><au><snm>B&#252;tikofer</snm><fnm>P</fnm></au></aug><source>FEBS J</source><pubdate>2011</pubdate><volume>278</volume><issue>15</issue><fpage>2613</fpage><lpage>2624</lpage></bibl><bibl id="B2"><title><p>ADP-ribosylation of elongation factor 2 by diphtheria toxin. Isolation and properties of the novel ribosyl-amino acid and its hydrolysis products</p></title><aug><au><snm>Van Ness</snm><fnm>BG</fnm></au><au><snm>Howard</snm><fnm>JB</fnm></au><au><snm>Bodley</snm><fnm>JW</fnm></au></aug><source>J Biol Chem</source><pubdate>1980</pubdate><volume>255</volume><issue>22</issue><fpage>10717</fpage><lpage>10720</lpage></bibl><bibl id="B3"><title><p>ADP-ribosylation of elongation factor 2 by diphtheria toxin. NMR spectra and proposed structures of ribosyl-diphthamide and its hydrolysis products</p></title><aug><au><snm>Van Ness</snm><fnm>BG</fnm></au><au><snm>Howard</snm><fnm>JB</fnm></au><au><snm>Bodley</snm><fnm>JW</fnm></au></aug><source>J Biol Chem</source><pubdate>1980</pubdate><volume>255</volume><issue>22</issue><fpage>10710</fpage><lpage>10716</lpage></bibl><bibl id="B4"><title><p>Diphthamide biosynthesis requires an organic radical generated by an iron&#8211;sulphur enzyme</p></title><aug><au><snm>Zhang</snm><fnm>Y</fnm></au><au><snm>Zhu</snm><fnm>X</fnm></au><au><snm>Torelli</snm><fnm>AT</fnm></au><au><snm>Lee</snm><fnm>M</fnm></au><au><snm>Dzikovski</snm><fnm>B</fnm></au><au><snm>Koralewski</snm><fnm>RM</fnm></au><au><snm>Wang</snm><fnm>E</fnm></au><au><snm>Freed</snm><fnm>J</fnm></au><au><snm>Krebs</snm><fnm>C</fnm></au><au><snm>Ealick</snm><fnm>SE</fnm></au><etal/></aug><source>Nature</source><pubdate>2010</pubdate><volume>465</volume><issue>7300</issue><fpage>891</fpage><lpage>896</lpage></bibl><bibl id="B5"><title><p>Mechanistic understanding of Pyrococcus horikoshii Dph2, a [4Fe-4S] enzyme required for diphthamide biosynthesis</p></title><aug><au><snm>Zhu</snm><fnm>X</fnm></au><au><snm>Dzikovski</snm><fnm>B</fnm></au><au><snm>Su</snm><fnm>X</fnm></au><au><snm>Torelli</snm><fnm>AT</fnm></au><au><snm>Zhang</snm><fnm>Y</fnm></au><au><snm>Ealick</snm><fnm>SE</fnm></au><au><snm>Freed</snm><fnm>JH</fnm></au><au><snm>Lin</snm><fnm>H</fnm></au></aug><source>Mol Biosyst</source><pubdate>2011</pubdate><volume>7</volume><issue>1</issue><fpage>74</fpage><lpage>81</lpage></bibl><bibl id="B6"><title><p>PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating</p></title><aug><au><snm>Lartillot</snm><fnm>N</fnm></au><au><snm>Lepage</snm><fnm>T</fnm></au><au><snm>Blanquart</snm><fnm>S</fnm></au></aug><source>Bioinformatics</source><pubdate>2009</pubdate><volume>25</volume><issue>17</issue><fpage>2286</fpage><lpage>2288</lpage></bibl><bibl id="B7"><title><p>XtalView/Xfit&#8212;A versatile program for manipulating atomic coordinates and electron density</p></title><aug><au><snm>McRee</snm><fnm>DE</fnm></au></aug><source>J Struct Biol</source><pubdate>1999</pubdate><volume>125</volume><issue>2&#8211;3</issue><fpage>156</fpage><lpage>165</lpage></bibl><bibl id="B8"><title><p>MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform</p></title><aug><au><snm>Katoh</snm><fnm>K</fnm></au><au><snm>Misawa</snm><fnm>K</fnm></au><au><snm>Kuma</snm><fnm>K-I</fnm></au><au><snm>Miyata</snm><fnm>T</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2002</pubdate><volume>30</volume><issue>14</issue><fpage>3059</fpage><lpage>3066</lpage></bibl><bibl id="B9"><title><p>Diphthamide modification of eEF2 requires a J-domain protein and is essential for normal development</p></title><aug><au><snm>Webb</snm><fnm>TR</fnm></au><au><snm>Cross</snm><fnm>SH</fnm></au><au><snm>McKie</snm><fnm>L</fnm></au><au><snm>Edgar</snm><fnm>R</fnm></au><au><snm>Vizor</snm><fnm>L</fnm></au><au><snm>Harrison</snm><fnm>J</fnm></au><au><snm>Peters</snm><fnm>J</fnm></au><au><snm>Jackson</snm><fnm>IJ</fnm></au></aug><source>J Cell Sci</source><pubdate>2008</pubdate><volume>121</volume><issue>19</issue><fpage>3140</fpage><lpage>3145</lpage></bibl><bibl id="B10"><title><p>Reconstitution of diphthine synthase activity in vitro</p></title><aug><au><snm>Zhu</snm><fnm>X</fnm></au><au><snm>Kim</snm><fnm>J</fnm></au><au><snm>Su</snm><fnm>X</fnm></au><au><snm>Lin</snm><fnm>H</fnm></au></aug><source>Biochemistry</source><pubdate>2010</pubdate><volume>49</volume><issue>44</issue><fpage>9649</fpage><lpage>9657</lpage></bibl><bibl id="B11"><title><p>DPH5, a methyltransferase gene required for diphthamide biosynthesis in Saccharomyces cerevisiae</p></title><aug><au><snm>Mattheakis</snm><fnm>LC</fnm></au><au><snm>Shen</snm><fnm>WH</fnm></au><au><snm>Collier</snm><fnm>RJ</fnm></au></aug><source>Mol Cell Biol</source><pubdate>1992</pubdate><volume>12</volume><issue>9</issue><fpage>4026</fpage><lpage>4037</lpage></bibl><bibl id="B12"><title><p>In vitro biosynthesis of diphthamide, studied with mutant Chinese hamster ovary cells resistant to diphtheria toxin</p></title><aug><au><snm>Moehring</snm><fnm>TJ</fnm></au><au><snm>Danley</snm><fnm>DE</fnm></au><au><snm>Moehring</snm><fnm>JM</fnm></au></aug><source>Mol Cell Biol</source><pubdate>1984</pubdate><volume>4</volume><issue>4</issue><fpage>642</fpage><lpage>650</lpage></bibl><bibl id="B13"><title><p>YBR246W is required for the third step of diphthamide biosynthesis</p></title><aug><au><snm>Su</snm><fnm>X</fnm></au><au><snm>Chen</snm><fnm>W</fnm></au><au><snm>Lee</snm><fnm>W</fnm></au><au><snm>Jiang</snm><fnm>H</fnm></au><au><snm>Zhang</snm><fnm>S</fnm></au><au><snm>Lin</snm><fnm>H</fnm></au></aug><source>J Am Chem Soc</source><pubdate>2011</pubdate><volume>134</volume><issue>2</issue><fpage>773</fpage><lpage>776</lpage></bibl><bibl id="B14"><title><p>InterPro in 2011: new developments in the family and domain prediction database</p></title><aug><au><snm>Hunter</snm><fnm>S</fnm></au><au><snm>Jones</snm><fnm>P</fnm></au><au><snm>Mitchell</snm><fnm>A</fnm></au><au><snm>Apweiler</snm><fnm>R</fnm></au><au><snm>Attwood</snm><fnm>TK</fnm></au><au><snm>Bateman</snm><fnm>A</fnm></au><au><snm>Bernard</snm><fnm>T</fnm></au><au><snm>Binns</snm><fnm>D</fnm></au><au><snm>Bork</snm><fnm>P</fnm></au><au><snm>Burge</snm><fnm>S</fnm></au><etal/></aug><source>Nucleic Acids Res</source><pubdate>2012</pubdate><volume>40</volume><issue>D1</issue><fpage>D306</fpage><lpage>D312</lpage></bibl><bibl id="B15"><title><p>The Pfam protein families database</p></title><aug><au><snm>Finn</snm><fnm>RD</fnm></au><au><snm>Mistry</snm><fnm>J</fnm></au><au><snm>Tate</snm><fnm>J</fnm></au><au><snm>Coggill</snm><fnm>P</fnm></au><au><snm>Heger</snm><fnm>A</fnm></au><au><snm>Pollington</snm><fnm>JE</fnm></au><au><snm>Gavin</snm><fnm>OL</fnm></au><au><snm>Gunasekaran</snm><fnm>P</fnm></au><au><snm>Ceric</snm><fnm>G</fnm></au><au><snm>Forslund</snm><fnm>K</fnm></au><etal/></aug><source>Nucleic Acids Res</source><pubdate>2010</pubdate><volume>38</volume><issue>suppl_1</issue><fpage>D211</fpage><lpage>D222</lpage></bibl><bibl id="B16"><title><p>The COG database: an updated version includes eukaryotes</p></title><aug><au><snm>Tatusov</snm><fnm>R</fnm></au><au><snm>Fedorova</snm><fnm>N</fnm></au><au><snm>Jackson</snm><fnm>J</fnm></au><au><snm>Jacobs</snm><fnm>A</fnm></au><au><snm>Kiryutin</snm><fnm>B</fnm></au><au><snm>Koonin</snm><fnm>E</fnm></au><au><snm>Krylov</snm><fnm>D</fnm></au><au><snm>Mazumder</snm><fnm>R</fnm></au><au><snm>Mekhedov</snm><fnm>S</fnm></au><au><snm>Nikolskaya</snm><fnm>A</fnm></au><etal/></aug><source>BMC Bioinforma</source><pubdate>2003</pubdate><volume>4</volume><issue>1</issue><fpage>41</fpage></bibl><bibl id="B17"><title><p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs</p></title><aug><au><snm>Altschul</snm><fnm>SF</fnm></au><au><snm>Madden</snm><fnm>TL</fnm></au><au><snm>Schaffer</snm><fnm>AA</fnm></au><au><snm>Zhang</snm><fnm>J</fnm></au><au><snm>Zhang</snm><fnm>Z</fnm></au><au><snm>Miller</snm><fnm>W</fnm></au><au><snm>Lipman</snm><fnm>DJ</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>1997</pubdate><volume>25</volume><issue>17</issue><fpage>3389</fpage><lpage>3402</lpage></bibl><bibl id="B18"><title><p>Clustal W and Clustal X version 2.0</p></title><aug><au><snm>Larkin</snm><fnm>MA</fnm></au><au><snm>Blackshields</snm><fnm>G</fnm></au><au><snm>Brown</snm><fnm>NP</fnm></au><au><snm>Chenna</snm><fnm>R</fnm></au><au><snm>McGettigan</snm><fnm>PA</fnm></au><au><snm>McWilliam</snm><fnm>H</fnm></au><au><snm>Valentin</snm><fnm>F</fnm></au><au><snm>Wallace</snm><fnm>IM</fnm></au><au><snm>Wilm</snm><fnm>A</fnm></au><au><snm>Lopez</snm><fnm>R</fnm></au><etal/></aug><source>Bioinformatics</source><pubdate>2007</pubdate><volume>23</volume><issue>21</issue><fpage>2947</fpage><lpage>2948</lpage></bibl><bibl id="B19"><title><p>Multiple sequence alignment with hierarchical clustering</p></title><aug><au><snm>Corpet</snm><fnm>F</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>1988</pubdate><volume>16</volume><issue>22</issue><fpage>10881</fpage><lpage>10890</lpage></bibl><bibl id="B20"><title><p>The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes</p></title><aug><au><snm>Overbeek</snm><fnm>R</fnm></au><au><snm>Begley</snm><fnm>T</fnm></au><au><snm>Butler</snm><fnm>RM</fnm></au><au><snm>Choudhuri</snm><fnm>JV</fnm></au><au><snm>Chuang</snm><fnm>HY</fnm></au><au><snm>Cohoon</snm><fnm>M</fnm></au><au><snm>de Cr&#233;cy-Lagard</snm><fnm>V</fnm></au><au><snm>Diaz</snm><fnm>N</fnm></au><au><snm>Disz</snm><fnm>T</fnm></au><au><snm>Edwards</snm><fnm>R</fnm></au><etal/></aug><source>Nucleic Acids Res</source><pubdate>2005</pubdate><volume>33</volume><issue>17</issue><fpage>5691</fpage><lpage>5702</lpage></bibl><bibl id="B21"><title><p>The integrated microbial genomes system: an expanding comparative analysis resource</p></title><aug><au><snm>Markowitz</snm><fnm>VM</fnm></au><au><snm>Chen</snm><fnm>I-MA</fnm></au><au><snm>Palaniappan</snm><fnm>K</fnm></au><au><snm>Chu</snm><fnm>K</fnm></au><au><snm>Szeto</snm><fnm>E</fnm></au><au><snm>Grechkin</snm><fnm>Y</fnm></au><au><snm>Ratner</snm><fnm>A</fnm></au><au><snm>Anderson</snm><fnm>I</fnm></au><au><snm>Lykidis</snm><fnm>A</fnm></au><au><snm>Mavromatis</snm><fnm>K</fnm></au><etal/></aug><source>Nucleic Acids Res</source><pubdate>2010</pubdate><volume>38</volume><issue>suppl 1</issue><fpage>D382</fpage><lpage>D390</lpage></bibl><bibl id="B22"><title><p>The MicrobesOnline web site for comparative genomics</p></title><aug><au><snm>Alm</snm><fnm>EJ</fnm></au><au><snm>Huang</snm><fnm>KH</fnm></au><au><snm>Price</snm><fnm>MN</fnm></au><au><snm>Koche</snm><fnm>RP</fnm></au><au><snm>Keller</snm><fnm>K</fnm></au><au><snm>Dubchak</snm><fnm>IL</fnm></au><au><snm>Arkin</snm><fnm>AP</fnm></au></aug><source>Genome Res</source><pubdate>2005</pubdate><volume>15</volume><issue>7</issue><fpage>1015</fpage><lpage>1022</lpage></bibl><bibl id="B23"><title><p>Exploring the functional landscape of gene expression: directed search of large microarray compendia</p></title><aug><au><snm>Hibbs</snm><fnm>MA</fnm></au><au><snm>Hess</snm><fnm>DC</fnm></au><au><snm>Myers</snm><fnm>CL</fnm></au><au><snm>Huttenhower</snm><fnm>C</fnm></au><au><snm>Li</snm><fnm>K</fnm></au><au><snm>Troyanskaya</snm><fnm>OG</fnm></au></aug><source>Bioinformatics</source><pubdate>2007</pubdate><volume>23</volume><issue>20</issue><fpage>2692</fpage><lpage>2699</lpage></bibl><bibl id="B24"><title><p>Saccharomyces genome database: the genomics resource of budding yeast</p></title><aug><au><snm>Cherry</snm><fnm>JM</fnm></au><au><snm>Hong</snm><fnm>EL</fnm></au><au><snm>Amundsen</snm><fnm>C</fnm></au><au><snm>Balakrishnan</snm><fnm>R</fnm></au><au><snm>Binkley</snm><fnm>G</fnm></au><au><snm>Chan</snm><fnm>ET</fnm></au><au><snm>Christie</snm><fnm>KR</fnm></au><au><snm>Costanzo</snm><fnm>MC</fnm></au><au><snm>Dwight</snm><fnm>SS</fnm></au><au><snm>Engel</snm><fnm>SR</fnm></au><etal/></aug><source>Nucleic Acids Res</source><pubdate>2012</pubdate><volume>40</volume><issue>D1</issue><fpage>D700</fpage><lpage>D705</lpage></bibl><bibl id="B25"><title><p>Systematic analysis of genome-wide fitness data in yeast reveals novel gene function and drug action</p></title><aug><au><snm>Hillenmeyer</snm><fnm>M</fnm></au><au><snm>Ericson</snm><fnm>E</fnm></au><au><snm>Davis</snm><fnm>R</fnm></au><au><snm>Nislow</snm><fnm>C</fnm></au><au><snm>Koller</snm><fnm>D</fnm></au><au><snm>Giaever</snm><fnm>G</fnm></au></aug><source>Genome Biol</source><pubdate>2010</pubdate><volume>11</volume><issue>3</issue><fpage>R30</fpage></bibl><bibl id="B26"><title><p>The chemical genomic portrait of yeast: uncovering a phenotype for all genes</p></title><aug><au><snm>Hillenmeyer</snm><fnm>ME</fnm></au><au><snm>Fung</snm><fnm>E</fnm></au><au><snm>Wildenhain</snm><fnm>J</fnm></au><au><snm>Pierce</snm><fnm>SE</fnm></au><au><snm>Hoon</snm><fnm>S</fnm></au><au><snm>Lee</snm><fnm>W</fnm></au><au><snm>Proctor</snm><fnm>M</fnm></au><au><snm>St.Onge</snm><fnm>RP</fnm></au><au><snm>Tyers</snm><fnm>M</fnm></au><au><snm>Koller</snm><fnm>D</fnm></au><etal/></aug><source>Science</source><pubdate>2008</pubdate><volume>320</volume><issue>5874</issue><fpage>362</fpage><lpage>365</lpage></bibl><bibl id="B27"><title><p>Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation</p></title><aug><au><snm>Letunic</snm><fnm>I</fnm></au><au><snm>Bork</snm><fnm>P</fnm></au></aug><source>Bioinformatics</source><pubdate>2007</pubdate><volume>23</volume><issue>1</issue><fpage>127</fpage><lpage>128</lpage></bibl><bibl id="B28"><title><p>WebLogo: a sequence logo generator</p></title><aug><au><snm>Crooks</snm><fnm>GE</fnm></au><au><snm>Hon</snm><fnm>G</fnm></au><au><snm>Chandonia</snm><fnm>J-M</fnm></au><au><snm>Brenner</snm><fnm>SE</fnm></au></aug><source>Genome Res</source><pubdate>2004</pubdate><volume>14</volume><issue>6</issue><fpage>1188</fpage><lpage>1190</lpage></bibl><bibl id="B29"><title><p>NCBI Reference sequences (RefSeq): current status, new features and genome annotation policy</p></title><aug><au><snm>Pruitt</snm><fnm>KD</fnm></au><au><snm>Tatusova</snm><fnm>T</fnm></au><au><snm>Brown</snm><fnm>GR</fnm></au><au><snm>Maglott</snm><fnm>DR</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2012</pubdate><volume>40</volume><issue>D1</issue><fpage>D130</fpage><lpage>D135</lpage></bibl><bibl id="B30"><title><p>MUST, a computer package of management utilities for sequences and trees</p></title><aug><au><snm>Philippe</snm><fnm>H</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>1993</pubdate><volume>21</volume><issue>22</issue><fpage>5264</fpage><lpage>5272</lpage></bibl><bibl id="B31"><title><p>SeaView Version 4: a multiplatform graphical user interface for sequence alignment and phylogenetic tree building</p></title><aug><au><snm>Gouy</snm><fnm>M</fnm></au><au><snm>Guindon</snm><fnm>S</fnm></au><au><snm>Gascuel</snm><fnm>O</fnm></au></aug><source>Mol Biol Evol</source><pubdate>2010</pubdate><volume>27</volume><issue>2</issue><fpage>221</fpage><lpage>224</lpage></bibl><bibl id="B32"><title><p>Monophyly of class I aminoacyl tRNA synthetase, USPA, ETFP, photolyase, and PP-ATPase nucleotide-binding domains: implications for protein evolution in the RNA world</p></title><aug><au><snm>Aravind</snm><fnm>L</fnm></au><au><snm>Anantharaman</snm><fnm>V</fnm></au><au><snm>Koonin</snm><fnm>EV</fnm></au></aug><source>Proteins: Structure, Funct Bioinform</source><pubdate>2002</pubdate><volume>48</volume><issue>1</issue><fpage>1</fpage><lpage>14</lpage></bibl><bibl id="B33"><title><p>A large conformational change in the putative ATP pyrophosphatase PF0828 induced by ATP binding</p></title><aug><au><snm>Forouhar</snm><fnm>F</fnm></au><au><snm>Saadat</snm><fnm>N</fnm></au><au><snm>Hussain</snm><fnm>M</fnm></au><au><snm>Seetharaman</snm><fnm>J</fnm></au><au><snm>Lee</snm><fnm>I</fnm></au><au><snm>Janjua</snm><fnm>H</fnm></au><au><snm>Xiao</snm><fnm>R</fnm></au><au><snm>Shastry</snm><fnm>R</fnm></au><au><snm>Acton</snm><fnm>TB</fnm></au><au><snm>Montelione</snm><fnm>GT</fnm></au><etal/></aug><source>Acta Crystallogr Sect F Struct Biol Cryst Commun</source><pubdate>2011</pubdate><volume>67</volume><issue>11</issue><fpage>1323</fpage><lpage>1327</lpage></bibl><bibl id="B34"><title><p>A chemical genomic screen in Saccharomyces cerevisiae reveals a role for diphthamidation of translation Elongation Factor 2 in inhibition of protein synthesis by Sordarin</p></title><aug><au><snm>Botet</snm><fnm>J</fnm></au><au><snm>Rodriguez-Mateos</snm><fnm>M</fnm></au><au><snm>Ballesta</snm><fnm>JPG</fnm></au><au><snm>Revuelta</snm><fnm>JL</fnm></au><au><snm>Remacha</snm><fnm>M</fnm></au></aug><source>Antimicrob Agents Chemother</source><pubdate>2008</pubdate><volume>52</volume><issue>5</issue><fpage>1623</fpage><lpage>1629</lpage></bibl><bibl id="B35"><title><p>A versatile partner of eukaryotic protein complexes that is involved in multiple biological processes: Kti11/Dph3</p></title><aug><au><snm>B&#228;r</snm><fnm>C</fnm></au><au><snm>Zabel</snm><fnm>R</fnm></au><au><snm>Liu</snm><fnm>S</fnm></au><au><snm>Stark</snm><fnm>MJR</fnm></au><au><snm>Schaffrath</snm><fnm>R</fnm></au></aug><source>Mol Microbiol</source><pubdate>2008</pubdate><volume>69</volume><issue>5</issue><fpage>1221</fpage><lpage>1233</lpage></bibl><bibl id="B36"><title><p>The nature and character of the transition state for the ADP-ribosyltransferase reaction</p></title><aug><au><snm>Jorgensen</snm><fnm>R</fnm></au><au><snm>Wang</snm><fnm>Y</fnm></au><au><snm>Visschedyk</snm><fnm>D</fnm></au><au><snm>Merrill</snm><fnm>AR</fnm></au></aug><source>EMBO Rep</source><pubdate>2008</pubdate><volume>9</volume><issue>8</issue><fpage>802</fpage><lpage>809</lpage></bibl><bibl id="B37"><title><p>Amidoligases with ATP-grasp, glutamine synthetase-like and acetyltransferase-like domains: synthesis of novel metabolites and peptide modifications of proteins</p></title><aug><au><snm>Iyer</snm><fnm>LM</fnm></au><au><snm>Abhiman</snm><fnm>S</fnm></au><au><snm>Maxwell Burroughs</snm><fnm>A</fnm></au><au><snm>Aravind</snm><fnm>L</fnm></au></aug><source>Mol BioSys</source><pubdate>2009</pubdate><volume>5</volume><issue>12</issue><fpage>1636</fpage><lpage>1660</lpage></bibl><bibl id="B38"><title><p>Conserved YjgF protein family deaminates reactive enamine/imine intermediates of Pyridoxal 5&#8242;-Phosphate (PLP)-dependent enzyme reactions</p></title><aug><au><snm>Lambrecht</snm><fnm>JA</fnm></au><au><snm>Flynn</snm><fnm>JM</fnm></au><au><snm>Downs</snm><fnm>DM</fnm></au></aug><source>J Biol Chem</source><pubdate>2012</pubdate><volume>287</volume><issue>5</issue><fpage>3454</fpage><lpage>3461</lpage></bibl><bibl id="B39"><title><p>Phylogeny and evolution of the Archaea: one hundred genomes later</p></title><aug><au><snm>Brochier-Armanet</snm><fnm>C</fnm></au><au><snm>Forterre</snm><fnm>P</fnm></au><au><snm>Gribaldo</snm><fnm>S</fnm></au></aug><source>Curr Opin Microbiol</source><pubdate>2011</pubdate><volume>14</volume><issue>3</issue><fpage>274</fpage><lpage>281</lpage></bibl><bibl id="B40"><title><p>In vitro functional characterization of BtuCD-F, the Escherichia coli ABC transporter for vitamin B12 uptake</p></title><aug><au><snm>Borths</snm><fnm>EL</fnm></au><au><snm>Poolman</snm><fnm>B</fnm></au><au><snm>Hvorup</snm><fnm>RN</fnm></au><au><snm>Locher</snm><fnm>KP</fnm></au><au><snm>Rees</snm><fnm>DC</fnm></au></aug><source>Biochemistry</source><pubdate>2005</pubdate><volume>44</volume><issue>49</issue><fpage>16301</fpage><lpage>16309</lpage></bibl><bibl id="B41"><title><p>The CbiB protein of Salmonella enterica is an integral membrane protein involved in the last step of the de novo corrin ring biosynthetic pathway</p></title><aug><au><snm>Zayas</snm><fnm>CL</fnm></au><au><snm>Claas</snm><fnm>K</fnm></au><au><snm>Escalante-Semerena</snm><fnm>JC</fnm></au></aug><source>J Bacteriol</source><pubdate>2007</pubdate><volume>189</volume><issue>21</issue><fpage>7697</fpage><lpage>7708</lpage></bibl><bibl id="B42"><title><p>Conversion of cobinamide into adenosylcobamide in bacteria and archaea</p></title><aug><au><snm>Escalante-Semerena</snm><fnm>JC</fnm></au></aug><source>J Bacteriol</source><pubdate>2007</pubdate><volume>189</volume><issue>13</issue><fpage>4555</fpage><lpage>4560</lpage></bibl><bibl id="B43"><title><p>CbiZ, an amidohydrolase enzyme required for salvaging the coenzyme B12 precursor cobinamide in archaea</p></title><aug><au><snm>Woodson</snm><fnm>JD</fnm></au><au><snm>Escalante-Semerena</snm><fnm>JC</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2004</pubdate><volume>101</volume><issue>10</issue><fpage>3591</fpage><lpage>3596</lpage></bibl><bibl id="B44"><title><p>The cobinamide amidohydrolase (cobyric acid-forming) CbiZ enzyme: a critical activity of the cobamide remodelling system of Rhodobacter sphaeroides</p></title><aug><au><snm>Gray</snm><fnm>MJ</fnm></au><au><snm>Escalante-Semerena</snm><fnm>JC</fnm></au></aug><source>Mol Microbiol</source><pubdate>2009</pubdate><volume>74</volume><issue>5</issue><fpage>1198</fpage><lpage>1210</lpage></bibl></refgrp>
	</bm>
</art>