<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2164-15-S6-S17</ui>
   <ji>1471-2164</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Viral diversity and clonal evolution from unphased genomic data</p>
         </title>
         <aug>
            <au ca="yes" ce="yes" id="A1"><snm>Khiabanian</snm><fnm>Hossein</fnm><insr iid="I1"/><email>hossein@c2b2.columbia.edu</email></au>
            <au ce="yes" id="A2"><snm>Carpenter</snm><fnm>Zachary</fnm><insr iid="I1"/><email>zwc2101@columbia.edu</email></au>
            <au ce="yes" id="A3"><snm>Kugelman</snm><fnm>Jeffrey</fnm><insr iid="I2"/><email>jeffrey.kugelman@us.army.mil</email></au>
            <au id="A4"><snm>Chan</snm><fnm>Joseph</fnm><insr iid="I1"/><email>jmc2213@columbia.edu</email></au>
            <au id="A5"><snm>Trifonov</snm><fnm>Vladimir</fnm><insr iid="I1"/><email>v_trifonov@yahoo.com</email></au>
            <au id="A6"><snm>Nagle</snm><fnm>Elyse</fnm><insr iid="I2"/><email>elyse.r.nagle.ctr@mail.mil</email></au>
            <au id="A7"><snm>Warren</snm><fnm>Travis</fnm><insr iid="I2"/><email>travis.k.warren.ctr@mail.mil</email></au>
            <au id="A8"><snm>Iversen</snm><fnm>Patrick</fnm><insr iid="I3"/><email>piversen@sarepta.com</email></au>
            <au id="A9"><snm>Bavari</snm><fnm>Sina</fnm><insr iid="I2"/><email>sina.bavari@us.army.mil</email></au>
            <au id="A10"><snm>Palacios</snm><fnm>Gustavo</fnm><insr iid="I2"/><email>gustavo.f.palacios.ctr@mail.mil</email></au>
            <au ca="yes" id="A11"><snm>Rabadan</snm><fnm>Raul</fnm><insr iid="I1"/><email>rabadan@c2b2.columbia.edu</email></au>
         </aug>
         <insg>
            <ins id="I1"><p>Department of Systems Biology and Department of Biomedical Informatics, Columbia University College of Physicians and Surgeons, New York, New York, USA</p></ins>
            <ins id="I2"><p>Genomics Division, The U.S. Army Medical Research Institute of Infectious Diseases, Fort Detrick, Maryland, USA</p></ins>
            <ins id="I3"><p>Discovery Unit, Sarepta Therapeutics, Corvallis, Oregon, USA</p></ins>
         </insg>
         <source>BMC Genomics</source>
         
         
         <supplement><title><p>Proceedings of the Twelfth Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics</p></title><editor>Laxmi Parida, Gurinder Atwal and Bud Mishra</editor><note>Proceedings</note></supplement><conference><title><p>Twelfth Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics</p></title><location>Cold Spring Harbor, NY, USA</location><date-range>19-22 October 2014</date-range><url>http://cs.nyu.edu/~parida/RECOMB-CG2014/</url></conference><issn>1471-2164</issn>
         <pubdate>2014</pubdate>
         <volume>15</volume>
         <issue>Suppl 6</issue>
         <fpage>S17</fpage>
         <url>http://www.biomedcentral.com/1471-2164/15/S6/S17</url>
         <xrefbib><pubid idtype="doi">10.1186/1471-2164-15-S6-S17</pubid></xrefbib>
      </bibl>
      <history><pub><date><day>17</day><month>10</month><year>2014</year></date></pub></history>
      <cpyrt><year>2014</year><collab>Khiabanian et al.; licensee BioMed Central Ltd.</collab><note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/4.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (<url>http://creativecommons.org/publicdomain/zero/1.0/</url>) applies to the data made available in this article, unless otherwise stated.</note></cpyrt>
      <kwdg>
         <kwd>Clonal evolution</kwd>
         <kwd>Evolutionary dynamics</kwd>
         <kwd>Viral genomic diversity</kwd>
         <kwd>Marburgvirus</kwd>
      </kwdg>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Clonal expansion is a process in which a single organism reproduces asexually, giving rise to a diversifying population. It is pervasive in nature, from within-host pathogen evolution to emergent infectious disease outbreaks. Standard phylogenetic tools rely on full-length genomes of individual pathogens or population consensus sequences (phased genotypes).</p>
               <p>Although high-throughput sequencing technologies are able to sample population diversity, the short sequence reads inherent to them preclude assessing whether two reads originate from the same clone (unphased genotypes). This obstacle severely limits the application of phylogenetic methods and investigation of within-host dynamics of acute infections using this rich data source.</p>
            </sec>
            <sec>
               <st>
                  <p>Methods</p>
               </st>
               <p>We introduce two measures of diversity to study the evolution of clonal populations using unphased genomic data, which eliminate the need to construct full-length genomes. Our method follows a maximum likelihood approach to estimate evolutionary rates and times to the most recent common ancestor, based on a relaxed molecular clock model; independent of a growth model. Deviations from neutral evolution indicate the presence of selection and bottleneck events.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We evaluated our methods <it>in silico </it>and then compared it against existing approaches with the well-characterized 2009 H1N1 influenza pandemic. We then applied our method to high-throughput genomic data from marburgvirus-infected non-human primates and inferred the time of infection and the intra-host evolutionary rate, and identified purifying selection in viral populations.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>Our method has the power to make use of minor variants present in less than 1% of the population and capture genomic diversification within days of infection, making it an ideal tool for the study of acute RNA viral infection dynamics.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>A single rapidly evolving RNA virus can give rise to a swarm of related descendants <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Clonal expansions can be observed during an acute infection as pathogens replicate within a host <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp> or in an outbreak of an emerging pathogen, when a novel virus propagates through a susceptible host population. A viral population diversifies as it expands, enabling the virus to explore larger sections of the fitness landscape <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. Studying the dynamics of viral diversification can yield insight into when a host was originally infected, how fast a pathogen is evolving, and if specific genomic alterations are being selected for in a particular host or treatment regime.</p>
         <p>Clonal populations founded by a single ancestor consist of individual organisms with highly similar, though not necessarily identical, genomes. The consensus genome is a constructed sequence representing the majority allele at each residue; hence, it may not truly exist in the viral population and fails to capture the whole mutant distribution in the sub-population structure. Viral diversity in acute infection has been previously studied through single genome amplification and combinations of RT-PCR and cloning <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>. These studies have utilized both phylogenetic techniques and exponential growth models to quantify viral evolution <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B7">7</abbr></abbrgrp>. With advances in high-throughput sequencing technologies, studying viral genomic diversity and its role in inter- and intra-host evolution has become more feasible. Ultra-deep sequencing has been employed to investigate systems of chronic infections in which viral populations have reached sustained levels of diversity <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp>, as well as to investigate intra-host evolution of viral infections utilizing minor variants <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>. However, given estimated viral evolutionary rates of 10<sup>-4 </sup>to 10<sup>-6 </sup>substitutions/site/year, intra-host evolutionary dynamics during the first few days of an acute infection are dominated by very rare variants that only exist in less than 1% of the population <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. Due to inherent limitation on the length of the reads produced by high-throughput sequencing technologies, standard phylogenetic algorithms and consensus-based methodologies fail as the coexistence of very rare polymorphisms in each individual viral clone cannot be determined. In other words, the mutations cannot be phased as the information of their linkage with respect to the viral genome is lost (Supplementary Fig. S1, in Additional file <supplr sid="S1">1</supplr>) <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B11">11</abbr><abbr bid="B13">13</abbr></abbrgrp>.</p>
         <suppl id="S1">
            <title>
               <p>Additional file 1</p>
            </title>
            <caption>
               <p>Supplementary Methods, Figures, and Tables. This file contains the ethics statement, and the details of high-throughput sequencing of marburgvirus samples. It also contains Figures S1, S2, and S3, and Tables S1, S2, and S3.</p>
            </caption>
            <file name="1471-2164-15-S6-S17-S1.pdf">
   <p>Click here for file</p>
</file>
         </suppl>
         <p>In this manuscript, we introduce a method to study the dynamics of clonal evolutions without the need for phased data. Our methodology provides a means to estimate the starting time and evolutionary rates without assuming a model of growth. We validate our method both using a simulated clonal expansion and using genomic data from the 2009 H1N1 influenza pandemic. In the latter case, phylogenetic analyses using full-length genomes are treated as the gold standard, with which our evolutionary dynamic estimates strongly agree <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>. We then apply our method to genetic data where phase information is missing. Specifically, we infer the intra-host evolutionary dynamics of viral infections <it>in vivo</it>, using high-throughput, deep sequence data obtained from marburgvirus-infected non-human primates (NHP).</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <p><b>Measures of diversity</b>. If the genome of the expansion's initiating clone is known, the frequencies of the diverging alleles from the seed, as well as their genomic positions (segregating sites), are evident in its descendants. Therefore, we define total divergence, <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i1"><m:msub>
   <m:mrow>
      <m:mi>D</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mi>T</m:mi>
   </m:mrow>
</m:msub>
<m:mrow>
   <m:mo class="MathClass-open">(</m:mo>
   <m:mrow>
      <m:msub>
         <m:mrow>
            <m:mi>t</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
   </m:mrow>
   <m:mo class="MathClass-close">)</m:mo>
</m:mrow>
<m:mo class="MathClass-rel">=</m:mo>
<m:msub>
   <m:mrow>
      <m:mo class="MathClass-op"> &#8721;</m:mo>
   </m:mrow>
   <m:mrow>
      <m:mi>s</m:mi>
   </m:mrow>
</m:msub>
<m:msub>
   <m:mrow>
      <m:mi>x</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mi>s</m:mi>
   </m:mrow>
</m:msub>
<m:mrow>
   <m:mo class="MathClass-open">(</m:mo>
   <m:mrow>
      <m:msub>
         <m:mrow>
            <m:mi>t</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
   </m:mrow>
   <m:mo class="MathClass-close">)</m:mo>
</m:mrow>
</m:math></inline-formula>, where <it>x<sub>s</sub></it>(<it>t<sub>i</sub></it>) is the frequency of a diverging allele at time <it>t<sub>i</sub></it>, positioned at segregating sites, <it>s</it>. Knowledge of the alleles present within the seeding clone is commonly unavailable. In lieu of this information, an approximated proxy for the initial seeding genome from the samples collected early in the expansion is often used. Even though some polymorphisms become fixed and some disappear from the population, <it>D<sub>T</sub></it>, as a measure of divergence, will always increase with time (Figure <figr fid="F1">1</figr>).</p>
         <fig id="F1"><title><p>Figure 1</p></title><caption><p>Clonal expansions arise as asexual growth from a single clone</p></caption><text>
   <p><b>Clonal expansions arise as asexual growth from a single clone</b>. <b>Left: </b>Phylogenetic algorithms compare the descendant clones across their genomes to reconstruct the evolutionary history. We, however, measure population diversity across the segregating sites, via summing their frequency, to estimate evolutionary properties. <b>Right: </b>The estimates for the simulated datasets. Estimates based on MAF represent the lower bound of those based on <it>D<sub>T</sub></it>. The mean evolutionary rate, <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i25"><m:mrow>
   <m:mo>&#9001;</m:mo>
   <m:msub>
      <m:mrow>
         <m:mover accent="true">
            <m:mrow>
               <m:mi>&#956;</m:mi>
            </m:mrow>
            <m:mo class="MathClass-op">&#175;</m:mo>
         </m:mover>
      </m:mrow>
      <m:mrow>
         <m:mi>i</m:mi>
      </m:mrow>
   </m:msub>
   <m:mo>&#9002;</m:mo>
</m:mrow>
</m:math></inline-formula>, is given in 10<sup>-4 </sup>substitutions/site per time point. The standard errors are derived from 95% confidence intervals via 1,000 simulated datasets.</p>
</text><graphic file="1471-2164-15-S6-S17-1"/></fig>
         <p>To avoid approximating the genome of the initial seeding clone, we propose to estimate the genomic diversity at time <it>t<sub>i </sub></it>with the sum of the minimal allele frequencies (MAF) at segregating sites. Minimal allele frequency can be best represented by one minus the frequency of the dominant allele at a segregating site. By definition, <it>x<sub>s</sub></it>(<it>t<sub>i</sub></it>) are always equal or larger than MAF; therefore, estimates based on sum of MAF represent the lower bound of those from <it>D<sub>T</sub></it>. Strong differences between the two measures indicate selection or bottlenecks, as changes in <it>D<sub>T </sub></it>measure time and divergence from the seed and the sum of MAF indicates variations in population diversity at a particular time.</p>
         <p><b>Mathematical framework.</b> Consider a clonal expansion with <it>N</it>(<it>t<sub>i</sub></it>) clones at time <it>t<sub>i</sub></it>, after a single initial clone began reproducing at time <it>t<sub>0</sub></it>. Independent of a model of growth, we define <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i2"><m:msub>
   <m:mrow>
      <m:mover accent="true">
         <m:mrow>
            <m:mi>&#956;</m:mi>
         </m:mrow>
         <m:mo class="MathClass-op">&#175;</m:mo>
      </m:mover>
   </m:mrow>
   <m:mrow>
      <m:mi>i</m:mi>
   </m:mrow>
</m:msub>
<m:mo class="MathClass-rel">=</m:mo>
<m:mfrac>
   <m:mrow>
      <m:mn>1</m:mn>
   </m:mrow>
   <m:mrow>
      <m:msub>
         <m:mrow>
            <m:mi>t</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
      <m:mo class="MathClass-bin">-</m:mo>
      <m:msub>
         <m:mrow>
            <m:mi>t</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mn>0</m:mn>
         </m:mrow>
      </m:msub>
   </m:mrow>
</m:mfrac>
<m:munderover accent="false" accentunder="false">
   <m:mrow>
      <m:mo class="MathClass-op"> &#8747;</m:mo>
   </m:mrow>
   <m:mrow>
      <m:msub>
         <m:mrow>
            <m:mi>t</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mn>0</m:mn>
         </m:mrow>
      </m:msub>
   </m:mrow>
   <m:mrow>
      <m:msub>
         <m:mrow>
            <m:mi>t</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
   </m:mrow>
</m:munderover>
<m:mi>&#956;</m:mi>
<m:mfenced close=")" open="(" separators="">
   <m:mrow>
      <m:mi>&#964;</m:mi>
   </m:mrow>
</m:mfenced>
<m:mstyle class="text">
   <m:mtext class="textsf" mathvariant="sans-serif">d</m:mtext>
</m:mstyle>
<m:mi>&#964;</m:mi>
</m:math></inline-formula> to be the average of evolutionary rates between time <it>t<sub>i </sub></it>and <it>t<sub>0</sub></it>. The average Hamming distance between any of these clones and the seed can be approximated by <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i3"><m:msub>
   <m:mrow>
      <m:mover accent="true">
         <m:mrow>
            <m:mi>&#956;</m:mi>
         </m:mrow>
         <m:mo class="MathClass-op">&#175;</m:mo>
      </m:mover>
   </m:mrow>
   <m:mrow>
      <m:mi>i</m:mi>
   </m:mrow>
</m:msub>
<m:mi>l</m:mi>
<m:mrow>
   <m:mo class="MathClass-open">(</m:mo>
   <m:mrow>
      <m:msub>
         <m:mrow>
            <m:mi>t</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
      <m:mo class="MathClass-bin">-</m:mo>
      <m:msub>
         <m:mrow>
            <m:mi>t</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mn>0</m:mn>
         </m:mrow>
      </m:msub>
   </m:mrow>
   <m:mo class="MathClass-close">)</m:mo>
</m:mrow>
</m:math></inline-formula>, where <it>l </it>is the size of the genome. Assuming that the <it>N</it>(<it>t<sub>i</sub></it>) clones truly represent the frequencies of the segregating sites at time <it>t<sub>i</sub></it>, <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i4"><m:mo>&#9001;</m:mo>
<m:mi>d</m:mi>
<m:mrow>
   <m:mo class="MathClass-open">(</m:mo>
   <m:mrow>
      <m:msub>
         <m:mrow>
            <m:mi>t</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
   </m:mrow>
   <m:mo class="MathClass-close">)</m:mo>
   <m:mo>&#9002;</m:mo>
</m:mrow>
</m:math></inline-formula>, the expected distance of the descendants to the original clone, can be re-written as: <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i5"><m:mrow>
   <m:mi>d</m:mi>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>t</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mo class="MathClass-rel">&#8771;</m:mo>
   <m:msub>
      <m:mrow>
         <m:mo>&#8721;</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mi>s</m:mi>
      </m:mrow>
   </m:msub>
   <m:msub>
      <m:mrow>
         <m:mi>x</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>s</m:mi>
      </m:mrow>
   </m:msub>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>t</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:msub>
      <m:mrow>
         <m:mi>D</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>T</m:mi>
      </m:mrow>
   </m:msub>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>t</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
</m:mrow>
</m:math></inline-formula>.</p>
         <p>To study the early days of intra-host evolution, we assume negligible back-mutations. Nonetheless, back-mutations and different rates per base can be accounted for by modifying the definition of <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i4"><m:mo>&#9001;</m:mo><m:mi>d</m:mi><m:mrow><m:mo class="MathClass-open">(</m:mo><m:mrow><m:msub><m:mrow><m:mi>t</m:mi></m:mrow><m:mrow><m:mi>i</m:mi></m:mrow></m:msub></m:mrow><m:mo class="MathClass-close">)</m:mo><m:mo>&#9002;</m:mo></m:mrow></m:math></inline-formula> with more fitting substitution models <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>. Note that <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i4"><m:mo>&#9001;</m:mo><m:mi>d</m:mi><m:mrow><m:mo class="MathClass-open">(</m:mo><m:mrow><m:msub><m:mrow><m:mi>t</m:mi></m:mrow><m:mrow><m:mi>i</m:mi></m:mrow></m:msub></m:mrow><m:mo class="MathClass-close">)</m:mo><m:mo>&#9002;</m:mo></m:mrow></m:math></inline-formula> differs from intra-population nucleotide diversity, &#960;<abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, which is derived from the pairwise comparison of the present genomes at time <it>t<sub>i</sub></it>, whereas <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i4"><m:mo>&#9001;</m:mo><m:mi>d</m:mi><m:mrow><m:mo class="MathClass-open">(</m:mo><m:mrow><m:msub><m:mrow><m:mi>t</m:mi></m:mrow><m:mrow><m:mi>i</m:mi></m:mrow></m:msub></m:mrow><m:mo class="MathClass-close">)</m:mo><m:mo>&#9002;</m:mo></m:mrow></m:math></inline-formula> is derived from comparing those genomes to the original clone at time <it>t<sub>0</sub></it>.</p>
         <p>Let<inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i9"><m:msub>
   <m:mrow>
      <m:mi>m</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mi>j</m:mi>
   </m:mrow>
</m:msub>
<m:mfenced close=")" open="(" separators="">
   <m:mrow>
      <m:msub>
         <m:mrow>
            <m:mi>t</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
   </m:mrow>
</m:mfenced>
</m:math></inline-formula> be the number of accumulated polymorphisms at time <it>t<sub>i </sub></it>in sequence <it>j </it>since the start of the expansion at time <it>t<sub>0</sub></it>. Assuming <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i10"><m:mrow>
   <m:msub>
      <m:mrow>
         <m:mi>m</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>j</m:mi>
      </m:mrow>
   </m:msub>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>t</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
</m:mrow>
</m:math></inline-formula> is Poisson distributed with mean <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i11"><m:msub>
   <m:mrow>
      <m:mover accent="true">
         <m:mrow>
            <m:mi>&#956;</m:mi>
         </m:mrow>
         <m:mo class="MathClass-op">&#175;</m:mo>
      </m:mover>
   </m:mrow>
   <m:mrow>
      <m:mi>i</m:mi>
   </m:mrow>
</m:msub>
<m:mi>l</m:mi>
<m:mrow>
   <m:mo class="MathClass-open">(</m:mo>
   <m:mrow>
      <m:msub>
         <m:mrow>
            <m:mi>t</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
      <m:mo class="MathClass-bin">-</m:mo>
      <m:msub>
         <m:mrow>
            <m:mi>t</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mn>0</m:mn>
         </m:mrow>
      </m:msub>
   </m:mrow>
   <m:mo class="MathClass-close">)</m:mo>
</m:mrow>
</m:math></inline-formula>, the log-likelihood of the observed state is <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i12"><m:mrow>
   <m:mi>L</m:mi>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mover accent="true">
                  <m:mrow>
                     <m:mi>&#956;</m:mi>
                  </m:mrow>
                  <m:mo class="MathClass-op">&#175;</m:mo>
               </m:mover>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-punc">,</m:mo>
         <m:msub>
            <m:mrow>
               <m:mi>t</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>0</m:mn>
            </m:mrow>
         </m:msub>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mo class="MathClass-rel">&#8776;</m:mo>
   <m:msub>
      <m:mrow>
         <m:mo>&#8721;</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mi>i</m:mi>
         <m:mo class="MathClass-punc">,</m:mo>
         <m:mi>j</m:mi>
      </m:mrow>
   </m:msub>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>m</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>j</m:mi>
            </m:mrow>
         </m:msub>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:msub>
                  <m:mrow>
                     <m:mi>t</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>i</m:mi>
                  </m:mrow>
               </m:msub>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">log(</m:mtext>
         </m:mstyle>
         <m:msub>
            <m:mrow>
               <m:mover accent="true">
                  <m:mrow>
                     <m:mi>&#956;</m:mi>
                  </m:mrow>
                  <m:mo>&#175;</m:mo>
               </m:mover>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
         <m:mi>l</m:mi>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:msub>
                  <m:mrow>
                     <m:mi>t</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>i</m:mi>
                  </m:mrow>
               </m:msub>
               <m:mo class="MathClass-bin">-</m:mo>
               <m:msub>
                  <m:mrow>
                     <m:mi>t</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mn>0</m:mn>
                  </m:mrow>
               </m:msub>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">)</m:mtext>
         </m:mstyle>
         <m:mo class="MathClass-bin">-</m:mo>
         <m:msub>
            <m:mrow>
               <m:mover accent="true">
                  <m:mrow>
                     <m:mi>&#956;</m:mi>
                  </m:mrow>
                  <m:mo>&#175;</m:mo>
               </m:mover>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
         <m:mi>l</m:mi>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:msub>
                  <m:mrow>
                     <m:mi>t</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>i</m:mi>
                  </m:mrow>
               </m:msub>
               <m:mo class="MathClass-bin">-</m:mo>
               <m:msub>
                  <m:mrow>
                     <m:mi>t</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mn>0</m:mn>
                  </m:mrow>
               </m:msub>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
</m:mrow>
</m:math></inline-formula>. In all summations, <it>i </it>counts the number of time points, and <it>j </it>counts the number of sampled viral clones in <it>t<sub>i</sub></it>. Since the total number of mutations in the population can be counted across the genomes, or equivalently via the frequency of the segregating sites, a crucial observation can be made that <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i13"><m:mrow>
   <m:msub>
      <m:mrow>
         <m:mo>&#8721;</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mi>j</m:mi>
      </m:mrow>
   </m:msub>
   <m:msub>
      <m:mrow>
         <m:mi>m</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>j</m:mi>
      </m:mrow>
   </m:msub>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>t</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:mi>N</m:mi>
   <m:mfenced close=")" open="(" separators="">
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>t</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:mfenced>
   <m:msub>
      <m:mrow>
         <m:mo> &#8721;</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mi>s</m:mi>
      </m:mrow>
   </m:msub>
   <m:msub>
      <m:mrow>
         <m:mi>x</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>s</m:mi>
      </m:mrow>
   </m:msub>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>t</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:mi>N</m:mi>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>t</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mo>&#9001;</m:mo>
   <m:mi>d</m:mi>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>t</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
      <m:mo>&#9002;</m:mo>
   </m:mrow>
</m:mrow>
</m:math></inline-formula>, leading to<inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i14"><m:mrow>
   <m:mi>L</m:mi>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mover accent="true">
                  <m:mrow>
                     <m:mi>&#956;</m:mi>
                  </m:mrow>
                  <m:mo class="MathClass-op">&#175;</m:mo>
               </m:mover>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-punc">,</m:mo>
         <m:msub>
            <m:mrow>
               <m:mi>t</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>0</m:mn>
            </m:mrow>
         </m:msub>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mo class="MathClass-rel">&#8776;</m:mo>
   <m:msub>
      <m:mrow>
         <m:mo>&#8721;</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mi>i</m:mi>
      </m:mrow>
   </m:msub>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:mi>N</m:mi>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:msub>
                  <m:mrow>
                     <m:mi>t</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>i</m:mi>
                  </m:mrow>
               </m:msub>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
         <m:mo>&#9001;</m:mo>
         <m:mi>d</m:mi>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:msub>
                  <m:mrow>
                     <m:mi>t</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>i</m:mi>
                  </m:mrow>
               </m:msub>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
            <m:mo>&#9002;</m:mo>
         </m:mrow>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">log(</m:mtext>
         </m:mstyle>
         <m:msub>
            <m:mrow>
               <m:mover accent="true">
                  <m:mrow>
                     <m:mi>&#956;</m:mi>
                  </m:mrow>
                  <m:mo>&#175;</m:mo>
               </m:mover>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
         <m:mi>l</m:mi>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:msub>
                  <m:mrow>
                     <m:mi>t</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>i</m:mi>
                  </m:mrow>
               </m:msub>
               <m:mo class="MathClass-bin">-</m:mo>
               <m:msub>
                  <m:mrow>
                     <m:mi>t</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mn>0</m:mn>
                  </m:mrow>
               </m:msub>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">)</m:mtext>
         </m:mstyle>
         <m:mo class="MathClass-bin">-</m:mo>
         <m:mi>N</m:mi>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:msub>
                  <m:mrow>
                     <m:mi>t</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>i</m:mi>
                  </m:mrow>
               </m:msub>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
         <m:msub>
            <m:mrow>
               <m:mover accent="true">
                  <m:mrow>
                     <m:mi>&#956;</m:mi>
                  </m:mrow>
                  <m:mo>&#175;</m:mo>
               </m:mover>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
         <m:mi>l</m:mi>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:msub>
                  <m:mrow>
                     <m:mi>t</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>i</m:mi>
                  </m:mrow>
               </m:msub>
               <m:mo class="MathClass-bin">-</m:mo>
               <m:msub>
                  <m:mrow>
                     <m:mi>t</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mn>0</m:mn>
                  </m:mrow>
               </m:msub>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
</m:mrow>
</m:math></inline-formula>. Thus, the maximum likelihood estimate (MLE) of the evolutionary rates and the time of the initial clone can be derived from maximizing <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i15"><m:mrow>
   <m:mi>L</m:mi>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mover accent="true">
                  <m:mrow>
                     <m:mi>&#956;</m:mi>
                  </m:mrow>
                  <m:mo class="MathClass-op">&#175;</m:mo>
               </m:mover>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-punc">,</m:mo>
         <m:msub>
            <m:mrow>
               <m:mi>t</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>0</m:mn>
            </m:mrow>
         </m:msub>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
</m:mrow>
</m:math></inline-formula>. In these estimates, <it>D<sub>T </sub></it>or the sum of MAF at <it>t<sub>i </sub></it>are used to approximate <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i4"><m:mo>&#9001;</m:mo><m:mi>d</m:mi><m:mrow><m:mo class="MathClass-open">(</m:mo><m:mrow><m:msub><m:mrow><m:mi>t</m:mi></m:mrow><m:mrow><m:mi>i</m:mi></m:mrow></m:msub></m:mrow><m:mo class="MathClass-close">)</m:mo><m:mo>&#9002;</m:mo></m:mrow></m:math></inline-formula>. Maximizing a likelihood, in which there are more parameters than data points without any constraints will lead to over-fitting the data. We follow Sanderson's modeling of a relaxed molecular clock and penalized likelihood approach <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>, and utilizing a non-parametric regularization term,<inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i17"><m:mrow>
   <m:mi>W</m:mi>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mover accent="true">
                  <m:mrow>
                     <m:mi>&#956;</m:mi>
                  </m:mrow>
                  <m:mo class="MathClass-op">&#175;</m:mo>
               </m:mover>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:msub>
      <m:mrow>
         <m:mo> &#8721;</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mi>i</m:mi>
      </m:mrow>
   </m:msub>
   <m:msup>
      <m:mrow>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:msub>
                  <m:mrow>
                     <m:mover accent="true">
                        <m:mrow>
                           <m:mi>&#956;</m:mi>
                        </m:mrow>
                        <m:mo>&#175;</m:mo>
                     </m:mover>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>i</m:mi>
                  </m:mrow>
               </m:msub>
               <m:mo class="MathClass-bin">-</m:mo>
               <m:msub>
                  <m:mrow>
                     <m:mover accent="true">
                        <m:mrow>
                           <m:mi>&#956;</m:mi>
                        </m:mrow>
                        <m:mo>&#175;</m:mo>
                     </m:mover>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>i</m:mi>
                     <m:mo class="MathClass-bin">-</m:mo>
                     <m:mn>1</m:mn>
                  </m:mrow>
               </m:msub>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
      </m:mrow>
      <m:mrow>
         <m:mn>2</m:mn>
      </m:mrow>
   </m:msup>
</m:mrow>
</m:math></inline-formula>, we minimize <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i18"><m:mrow>
   <m:mtext>&#936;</m:mtext>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mover accent="true">
                  <m:mrow>
                     <m:mi>&#956;</m:mi>
                  </m:mrow>
                  <m:mo class="MathClass-op">&#175;</m:mo>
               </m:mover>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-punc">,</m:mo>
         <m:msub>
            <m:mrow>
               <m:mi>t</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>0</m:mn>
            </m:mrow>
         </m:msub>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">)</m:mtext>
         </m:mstyle>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mo class="MathClass-bin">-</m:mo>
         <m:mi>L</m:mi>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:msub>
                  <m:mrow>
                     <m:mover accent="true">
                        <m:mrow>
                           <m:mi>&#956;</m:mi>
                        </m:mrow>
                        <m:mo class="MathClass-op">&#175;</m:mo>
                     </m:mover>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>i</m:mi>
                  </m:mrow>
               </m:msub>
               <m:mo class="MathClass-punc">,</m:mo>
               <m:msub>
                  <m:mrow>
                     <m:mi>t</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mn>0</m:mn>
                  </m:mrow>
               </m:msub>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
         <m:mo class="MathClass-bin">+</m:mo>
         <m:mi>&#955;</m:mi>
         <m:mi>W</m:mi>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:msub>
                  <m:mrow>
                     <m:mover accent="true">
                        <m:mrow>
                           <m:mi>&#956;</m:mi>
                        </m:mrow>
                        <m:mo class="MathClass-op">&#175;</m:mo>
                     </m:mover>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>i</m:mi>
                  </m:mrow>
               </m:msub>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
      </m:mrow>
   </m:mrow>
</m:mrow>
</m:math></inline-formula>, where &#955; is the smoothing parameter. For very large &#955;, minimizing <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i19"><m:mtext>&#160;</m:mtext>
<m:mrow>
   <m:mtext>&#936;</m:mtext>
</m:mrow>
</m:math></inline-formula> leads to estimates equal to predictions under a strict molecular clock model. On the other hand, small &#955; leads to over-fitting the likelihood, and the estimates will be highly affected by small changes in the data. Therefore, an intermediate value of &#955; should be chosen, so that the estimates follow the data while avoiding numerical artifacts caused by over-fitting. We determine this value by minimizing <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i20"><m:mtext>&#160;</m:mtext>
<m:mrow>
   <m:mtext>&#936;</m:mtext>
</m:mrow>
</m:math></inline-formula> over a range of values for &#955; and comparing the resulting values of <it>L </it>versus those of <it>W</it>, by scaling them between 0 and 1. In other words, the maximum <it>L </it>is obtained when <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i21"><m:mi>&#955;</m:mi>
<m:mo class="MathClass-rel">=</m:mo>
<m:mn>0</m:mn>
</m:math></inline-formula> (corresponding to scaled <it>L </it>and <it>W </it>of 1) and the minimum <it>W </it>is obtained when <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i22"><m:mi>&#955;</m:mi>
<m:mo class="MathClass-rel">&#8594;</m:mo>
<m:mi>&#8734;</m:mi>
</m:math></inline-formula> (corresponding to scaled <it>L </it>and <it>W </it>of 0). We choose the value of &#955; that results in equally weighted scaled <it>L </it>and scaled <it>W </it><abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. For all optimization problems in our method, we employ the non-linear Active Set algorithm <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> as implemented in MATLAB and R. In each optimization, we require <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i23"><m:mn>0</m:mn>
<m:mo class="MathClass-rel">&lt;</m:mo>
<m:msub>
   <m:mrow>
      <m:mover accent="true">
         <m:mrow>
            <m:mi>&#956;</m:mi>
         </m:mrow>
         <m:mo class="MathClass-op">&#175;</m:mo>
      </m:mover>
   </m:mrow>
   <m:mrow>
      <m:mi>i</m:mi>
   </m:mrow>
</m:msub>
</m:math></inline-formula> and <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i24"><m:msub>
   <m:mrow>
      <m:mi>t</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mn>0</m:mn>
   </m:mrow>
</m:msub>
<m:mo class="MathClass-rel">&lt;</m:mo>
<m:msub>
   <m:mrow>
      <m:mi>t</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mn>1</m:mn>
   </m:mrow>
</m:msub>
</m:math></inline-formula>.</p>
         <p>To calculate standard errors for estimates of <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i25"><m:mrow><m:mo>&#9001;</m:mo><m:msub><m:mrow><m:mover accent="true"><m:mrow><m:mi>&#956;</m:mi></m:mrow><m:mo class="MathClass-op">&#175;</m:mo></m:mover></m:mrow><m:mrow><m:mi>i</m:mi></m:mrow></m:msub><m:mo>&#9002;</m:mo></m:mrow></m:math></inline-formula> and <it>t<sub>0</sub></it>, we generate 1,000 bootstrap sets by permuting the sequences in each dataset. Using each dataset's smoothing parameter, we obtain maximum likelihood estimates for <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i25"><m:mrow><m:mo>&#9001;</m:mo><m:msub><m:mrow><m:mover accent="true"><m:mrow><m:mi>&#956;</m:mi></m:mrow><m:mo class="MathClass-op">&#175;</m:mo></m:mover></m:mrow><m:mrow><m:mi>i</m:mi></m:mrow></m:msub><m:mo>&#9002;</m:mo></m:mrow></m:math></inline-formula> and <it>t<sub>0</sub></it>. The bootstrap estimates are normally distributed and are used to calculate 95% confidence intervals. The presence of purifying selection can be measured through <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i27"><m:mrow>
   <m:mi>&#969;</m:mi>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:mi>&#946;</m:mi>
   <m:mfrac>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>&#956;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>n</m:mi>
               <m:mi>o</m:mi>
               <m:mi>n</m:mi>
               <m:mo class="MathClass-bin">-</m:mo>
               <m:mi>s</m:mi>
               <m:mi>y</m:mi>
               <m:mi>n</m:mi>
               <m:mi>.</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>&#956;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>s</m:mi>
               <m:mi>y</m:mi>
               <m:mi>n</m:mi>
               <m:mi>.</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:mfrac>
</m:mrow>
</m:math></inline-formula>, when it is less than 1. Here, &#955; is the ratio between the number of synonymous to non-synonymous sites in the genome, which we obtain by randomly mutating the viral genome one million times, assuming equal probability for transition and transversion events.</p>
         <p><b>Simulated data.</b>Starting from a single homogenous 10,000 base-long clone, we simulated an exponentially expanding population at 12 time steps. The substitution rate was set at 10<sup>-4 </sup>substitutions/site per time point in addition to a noise term with a mean of zero and standard deviation of 10<sup>-4</sup>. At each time point, 5,000 sequences were randomly sampled, simulating a typical depth of 5,000x for deep-sequencing. We repeated this procedure 1,000 times.</p>
         <p><b>Influenza data.</b> Influenza consensus full-length sequences were obtained from Influenza Virus Resource Database <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> and GISAID <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>, selecting H1N1 pandemic strains collected between March 2009 and March 2010. We aligned the sequences of each segment using the MUSCLE algorithm, and further manual curation.</p>
         <p><b>High-throughput marburgvirus data.</b>Two separate animal studies provided the samples used in this study. Blood from cynomolgus macaques was collected from NHP therapeutic efficacy trial control animals (saline treated only) on days 8 and 10 of the infection. The viral RNA was extracted and sequenced. We rigorously cleaned the sequence reads to remove systematic errors and identified statistically significant single nucleotide substitutions. The ethics statement and details of the library preparation, sequencing, and variant calling are provided in Supplementary Methods (Additional file <supplr sid="S1">1</supplr>).</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p><b>Simulated data.</b> In a set of 1,000 simulations, the estimates of evolutionary rates between time points captured the expected evolutionary dynamics (<inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i25"><m:mrow><m:mo>&#9001;</m:mo><m:msub><m:mrow><m:mover accent="true"><m:mrow><m:mi>&#956;</m:mi></m:mrow><m:mo class="MathClass-op">&#175;</m:mo></m:mover></m:mrow><m:mrow><m:mi>i</m:mi></m:mrow></m:msub><m:mo>&#9002;</m:mo></m:mrow></m:math></inline-formula> = 10<sup>-4 </sup>substitutions/site per time point), within statistical fluctuations, as shown in Figure <figr fid="F2">2</figr> (right). In particular, the estimates from <it>D<sub>T </sub></it>found the average of evolutionary rates, <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i25"><m:mrow><m:mo>&#9001;</m:mo><m:msub><m:mrow><m:mover accent="true"><m:mrow><m:mi>&#956;</m:mi></m:mrow><m:mo class="MathClass-op">&#175;</m:mo></m:mover></m:mrow><m:mrow><m:mi>i</m:mi></m:mrow></m:msub><m:mo>&#9002;</m:mo></m:mrow></m:math></inline-formula>, to be 0.99 &#177; 0.22 &#215; 10<sup>-3 </sup>substitutions/site per time point, and the starting time of the expansion to be at -0.03 &#177; 0.62. The estimates based on MAF indicated the lower bound of those from <it>D<sub>T </sub></it>(Figure <figr fid="F1">1</figr>).</p>
         <fig id="F2"><title><p>Figure 2</p></title><caption><p>The maximum likelihood estimates for the 2009 H1N1 influenza pandemic</p></caption><text>
   <p><b>The maximum likelihood estimates for the 2009 H1N1 influenza pandemic</b>. <b>Left: </b>Whole-genome data estimates based on both MAF and <it>D<sub>T. </sub></it><b>Right: </b>Individual segments' estimates based on <it>D<sub>T</sub></it>. (For the MAF-base estimates, see Supplementary Fig. S2, in Additional file <supplr sid="S1">1</supplr>.) PB2, PB1, and PA encode the RNA polymerase; HA and NA encode the glycoproteins hemagglutinin and neuraminidase; NP, M, and NS segments code the nucleoprotein, matrix proteins and non-structural proteins. Due to structural constraints and small size, the latter three segments accumulate the least number of mutations. Our estimates for the evolutionary rates, the starting time of the expansion, and presence of strong purifying selection (&#969; = 0.22) corroborated phylogenetic results. The mean evolutionary rate, <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i25"><m:mrow><m:mo>&#9001;</m:mo><m:msub><m:mrow><m:mover accent="true"><m:mrow><m:mi>&#956;</m:mi></m:mrow><m:mo class="MathClass-op">&#175;</m:mo></m:mover></m:mrow><m:mrow><m:mi>i</m:mi></m:mrow></m:msub><m:mo>&#9002;</m:mo></m:mrow></m:math></inline-formula>, is in 10<sup>-3 </sup>substitutions/site/year and <it>t<sub>0 </sub></it>is in days. The standard errors are the 95% confidence intervals via bootstrapping.</p>
</text><graphic file="1471-2164-15-S6-S17-2"/></fig>
         <p><b>The 2009 H1N1 influenza pandemic</b>. The influenza genome consists of eight single-stranded RNA segments, which code for 10 or more proteins. The novel influenza A virus responsible for the 2009 pandemic was first identified in late March in California and Mexico <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>, and spread quickly, as very limited previous immunity to the new strain existed within the human population. Phylogenetic analyses estimated the most recent common ancestor of this strain to have arisen around January 2009 (no earlier than August 2008), and to have evolved with a rate of 3.67 &#177; 3.05 &#215; 10<sup>-3 </sup>substitutions/site/year <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B27">27</abbr></abbrgrp>. These analyses also identified purifying selection during the pandemic (&#969; &lt; 1) <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. The exact genome of the initial virus that infected the human population is not known; however, we approximated a proxy based on the consensus genomes of strains collected early in the expansion (Additional file <supplr sid="S2">2</supplr>). We found the estimates for the mean of evolutionary rates between time points, <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i25"><m:mrow><m:mo>&#9001;</m:mo><m:msub><m:mrow><m:mover accent="true"><m:mrow><m:mi>&#956;</m:mi></m:mrow><m:mo class="MathClass-op">&#175;</m:mo></m:mover></m:mrow><m:mrow><m:mi>i</m:mi></m:mrow></m:msub><m:mo>&#9002;</m:mo></m:mrow></m:math></inline-formula> and the starting time of the pandemic, <it>t<sub>0</sub></it>, based on both <it>D<sub>T </sub></it>and sum of MAF to be consistent across all segments (Figure <figr fid="F2">2</figr> (right) and Supplementary Fig. S2, in Additional file <supplr sid="S1">1</supplr>). As there has been no evidence for reassortment events during the 2009 H1N1 clonal expansion in humans <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>, we concatenated the segments and estimated <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i25"><m:mrow><m:mo>&#9001;</m:mo><m:msub><m:mrow><m:mover accent="true"><m:mrow><m:mi>&#956;</m:mi></m:mrow><m:mo class="MathClass-op">&#175;</m:mo></m:mover></m:mrow><m:mrow><m:mi>i</m:mi></m:mrow></m:msub><m:mo>&#9002;</m:mo></m:mrow></m:math></inline-formula> and <it>t<sub>0 </sub></it>using whole-genome data. As shown in Figure <figr fid="F2">2</figr> (left), the MAF-based estimates for <it>t<sub>0 </sub></it>agreed with those from <it>D<sub>T</sub></it>, and were found to be between November 2008 and January 2009. We also estimated <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i25"><m:mrow><m:mo>&#9001;</m:mo><m:msub><m:mrow><m:mover accent="true"><m:mrow><m:mi>&#956;</m:mi></m:mrow><m:mo class="MathClass-op">&#175;</m:mo></m:mover></m:mrow><m:mrow><m:mi>i</m:mi></m:mrow></m:msub><m:mo>&#9002;</m:mo></m:mrow></m:math></inline-formula> of 1.82 &#177; 1.28 &#215; 10<sup>-3 </sup>and 3.02 &#177; 0.66 &#215; 10<sup>-3 </sup>substitutions/site/year during the pandemic, according to <it>D<sub>T </sub></it>and sum of MAF, respectively. We also identified a strong purifying selection during this period (&#969; = 0.22), corroborating results from phylogenetic methods.</p>
         <suppl id="S2">
            <title>
               <p>Additional file 2</p>
            </title>
            <caption>
               <p>Approximated genome of the seed of the H1N1 pandemic. This file, in FASTA format, contains the approximated proxy for the genome of the initial 2009 H1N1 virus, based on the genomic consensus of strains collected in March 2009: A/Mexico/LaGloria-4/2009(H1N1), A/Mexico/LaGloria-4/2009(H1N1), and A/California/05/2009(H1N1).</p>
            </caption>
            <file name="1471-2164-15-S6-S17-S3.txt">
   <p>Click here for file</p>
</file>
         </suppl>
         <p><b>Deep sequencing of marburgvirus from infected NHP.</b> Marburgvirus, in the <it>Filoviridae </it>family, is a single-stranded RNA genome of about 19,000 bases that encodes seven proteins, with an estimated evolutionary rate of 0.1-1.0 &#215; 10<sup>-3 </sup>substitutions/site/year <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. Cynomolgus macaque constitutes a commonly used model organism for infection of filoviruses, recapitulating some of the clinical features of infection in humans. Marburgvirus causes hemorrhagic fevers in humans and NHP, who typically succumb to the infection in 8-12 days.</p>
         <p>Working from an existing study of cynomolgus macaques infected with a Musoke strain marburgvirus, we utilized deep sequencing data (coverage depth &gt;10,000x) of viral RNA collected at different time points from four samples (505113, 052803, C0507178, and 0602167, as shown in Supplementary Table S1, in Additional file <supplr sid="S1">1</supplr>). We obtained frequency estimates as low as 0.05% for an average of 60 variants per sample (range 26 to 110, as listed in Supplementary Tables S2 and S4, in Additional file <supplr sid="S1">1</supplr> and Additional file <supplr sid="S3">3</supplr> respectively). We found ~3.5 times more transitions than transversions across samples (Supplementary Table S3, in Additional file <supplr sid="S1">1</supplr>), and observed a very homogenous viral population in the challenge stock (day 0) and a subsequent increase in viral diversity over time <it>in vivo </it>in all four individual experiments (Figure <figr fid="F3">3</figr>). The four independent analyses showed similar results, 1) an increasing genomic diversity with <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i25"><m:mrow><m:mo>&#9001;</m:mo><m:msub><m:mrow><m:mover accent="true"><m:mrow><m:mi>&#956;</m:mi></m:mrow><m:mo class="MathClass-op">&#175;</m:mo></m:mover></m:mrow><m:mrow><m:mi>i</m:mi></m:mrow></m:msub><m:mo>&#9002;</m:mo></m:mrow></m:math></inline-formula> of 0.23-1.50 &#215; 10<sup>-3 </sup>substitutions/site/year for non-synonymous substitutions and 1.29-3.81 &#215; 10<sup>-3 </sup>for all substitutions; 2) 2-8 days to convergence with the reference, approximately the amount of time spent propagating the virus after it was originally sequenced <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>.</p>
         <suppl id="S3">
            <title>
               <p>Additional file 3</p>
            </title>
            <caption>
               <p>Supplementary Table S4. This file contains Table S4, the list of variants statistically present in at least one temporal data for four NHP marburgvirus-infected samples.</p>
            </caption>
            <file name="1471-2164-15-S6-S17-S2.xls">
   <p>Click here for file</p>
</file>
         </suppl>
         <fig id="F3"><title><p>Figure 3</p></title><caption><p>The maximum likelihood estimates for four marburgvirus samples from infected NHP</p></caption><text>
   <p><b>The maximum likelihood estimates for four marburgvirus samples from infected NHP</b>. We found the estimated intra-host evolutionary rates for non-synonymous substitutions to be in similar range. In three samples, MAF-based and <it>D<sub>T </sub></it>measures differed for synonymous substitutions, due to increases in frequency of a single allele. In the fourth sample, MAF-based and <it>D<sub>T </sub></it>measures were identical and overlapped. The mean evolutionary rate, <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i25"><m:mrow><m:mo>&#9001;</m:mo><m:msub><m:mrow><m:mover accent="true"><m:mrow><m:mi>&#956;</m:mi></m:mrow><m:mo class="MathClass-op">&#175;</m:mo></m:mover></m:mrow><m:mrow><m:mi>i</m:mi></m:mrow></m:msub><m:mo>&#9002;</m:mo></m:mrow></m:math></inline-formula>, is in 10<sup>-3 </sup>substitutions/site/year and <it>t<sub>0 </sub></it>is in days. The standard errors are the 95% confidence intervals via bootstrapping.</p>
</text><graphic file="1471-2164-15-S6-S17-3"/></fig>
         <p>Acknowledging the caveat that each of the four samples went through different host-specific immune responses, we combined the data and obtained estimate for <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2164-15-S6-S17-i25"><m:mrow><m:mo>&#9001;</m:mo><m:msub><m:mrow><m:mover accent="true"><m:mrow><m:mi>&#956;</m:mi></m:mrow><m:mo class="MathClass-op">&#175;</m:mo></m:mover></m:mrow><m:mrow><m:mi>i</m:mi></m:mrow></m:msub><m:mo>&#9002;</m:mo></m:mrow></m:math></inline-formula> to be 2.11 &#177; 1.76 &#215; 10<sup>-3 </sup>substitutions/site/year for non-synonymous substitutions and 2.95 &#177; 0.48 &#215; 10<sup>-3 </sup>for all substitutions (Supplementary Fig. S3, in Additional file <supplr sid="S1">1</supplr>). We also identified strong purifying selection (&#969; = 0.43).</p>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>We have proposed two measures of genetic diversity, derived independently of phasing information: 1) total divergence, <it>D<sub>T</sub></it>, the sum of frequencies of diverging alleles from the original clone, and 2) the sum of minimal allele frequencies (MAF) at segregating sites. Our methodology is robust to recombination or reassortment events within a clonal population because such evolutionary processes do not affect our measures of genetic diversity. Since the numbers of sites with diverging alleles in a sampled population, acquired within the first few days of an acute infection or the early months of an outbreak, are much smaller than the length of the viral genome, the assumption that their distribution between two time points can be approximated with Poisson distributions holds. Assuming negligible positive selection and back-mutations, <it>D<sub>T</sub></it>, increases over time by definition; thus, it measures divergence from the seed of the expansion. On the other hand, the sum of MAF measures population diversity at a particular moment in time. Therefore, strong differences between the two measures indicate deviations from neutral evolution, selection, or bottlenecks. Our approach is particularly novel in its independence from an assumed growth model or previously published evolutionary rates, used in similar applications to intra-host data <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Since we assume that the number of segregating sites is much smaller than the length of the viral genome, and that the infection starts by a genetically uniform population, our method is applicable to lytic viruses, and cannot be applied to integrating or lysogenic viruses. Based on these measures, we followed a penalized maximum likelihood approach and a model of relaxed molecular clock <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>, and were able to estimate the starting point in time and evolutionary rate of clonal expansions.</p>
         <p>To evaluate our method with well-characterized examples of clonal expansion, we calibrated it with a set of simulated sequences following a relaxed molecular clock model, and obtained estimates that capture the evolutionary parameters of the generating model. We found the estimates obtained from sum of MAF to be the lower bound of those from total divergence. With the purpose of comparing and validating our methodology with standard phylogenetic techniques, we utilized phased whole-genome sequence data from the 2009 influenza pandemic. Limiting the data to the H1N1 isolates collected within the first year after the start of the pandemic, our estimates for the mean evolutionary rate, the starting time of the expansion, and presence of strong purifying selection corroborated with phylogenetic results <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B27">27</abbr></abbrgrp></p>
         <p>The novelty and most important application of our method is in analyzing unphased temporal data to which phylogenetic methods cannot be applied. During the course of an acute infection, the diversification of the viral population is not reflected in the consensus sequence, as most changes are minor, rare variants. To study viral intra-host diversity, we employed genomic data obtained from high-throughput ultra-deep sequencing of marburgvirus from four infected NHP, sampled at days 8 and 10 of the infection. The results showed consistent increases in viral diversity and the starting time of the intra-host expansion was found in agreement with the experimental setup <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. MAF-based diversity measures for non-synonymous substitutions in three of the infected NHP presented extremely good approximations for <it>D<sub>T</sub></it>, which is especially important when the seed of a clonal expansion is not known (Figure <figr fid="F3">3</figr>). In particular, we found the estimated intra-host evolutionary rates for non-synonymous substitutions to be in similar range but higher than those reported from inter-host phylogenetic analysis <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. Combining the data from four samples corroborated with individual analyses, and the ratio of non-synonymous to synonymous substitutions rates indicated similar strong purifying selection to inter-host transmission of marburgvirus <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>.</p>
         <p>In three samples, MAF-based and <it>D<sub>T </sub></it>diversity measures differed for synonymous substitutions, due to increases in frequency of a single allele (E142E) in the <it>L </it>gene. This allele increased from 6% in the seed stock to 62% (052803), 57% (505113), and 92% (C0507178) on day 8. The frequencies on day 10 were similar to those on day 8, except in one sample (052803), in which it fell to 31%. In one sample (0602167) the frequency of this allele was found to be 23% on both day 8 and day 10, not affecting MAF. Synonymous mutations have been shown to contribute to viral fitness in other viruses <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>, and despite the fact that this allele did not alter the coding of the L protein, the presence of a selection pressure that leads to increases in its frequency cannot be ruled out.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>As technology progresses, deep sequencing of temporal samples is becoming more readily available; however, due to missing phasing information, the application of standard phylogenetic methods to these data sources is limited. The measures of diversity defined in this manuscript present a distinct advantage over methods based on consensus sequences, specifically because of their power to analyze genomic diversification within days of an infection. This method is an ideal tool to pinpoint the time of infection, to estimate the evolutionary rate within a host, and to identify early markers of selection, in the course of an acute infection.</p>
      </sec>
      <sec>
         <st>
            <p>List of abbreviations</p>
         </st>
         <p>Non-human primates (NHP). Minimal allele frequencies (MAF).</p>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The authors declare that they have no competing interests.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>HK designed, developed, and validated the mathematical model; ZC analyzed the influenza dataset, JK analyzed the marburgvirus dataset, JC and VT contributed to the mathematical model, EN, TW, PI, and SB contributed to and GP directed the analysis of the high-throughput data, RR designed the study and directed research. All authors wrote and edited the manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The authors would like to thank A. Jacunski, D. Rosenbloom, J. Wang, K. Emmet, and O. Balaga, for insightful discussions and comments on the manuscript.</p>
         </sec>
         <sec>
            <st>
               <p>Declaration</p>
            </st>
            <p>This work was funded by the Defense Threat Reduction Agency (DTRA) Project No. 1899628 and DTRA grant W81XWH-13-2-0029. The publication of this work was funded by a grant from the Geneva Foundation (HDTRA1-14-1-0016). The funders had no role in the design, collection, analysis, and interpretation of data.</p>
            <p>This article has been published as part of <it>BMC Genomics </it>Volume 15 Supplement 6, 2014: Proceedings of the Twelfth Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics. The full contents of the supplement are available online at <url>http://www.biomedcentral.com/bmcgenomics/supplements/15/S6</url>.</p>
         </sec>
      </ack>
      <refgrp><bibl id="B1"><title><p>Rapid evolution of RNA genomes</p></title><aug><au><snm>Holland</snm><fnm>J</fnm></au><au><snm>Spindler</snm><fnm>K</fnm></au><au><snm>Horodyski</snm><fnm>F</fnm></au><au><snm>Grabau</snm><fnm>E</fnm></au><au><snm>Nichol</snm><fnm>S</fnm></au><au><snm>VandePol</snm><fnm>S</fnm></au></aug><source>Science</source><pubdate>1982</pubdate><volume>215</volume><issue>4540</issue><fpage>1577</fpage><lpage>1585</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.7041255</pubid><pubid idtype="pmpid" link="fulltext">7041255</pubid></pubidlist></xrefbib></bibl><bibl id="B2"><title><p>Unifying the epidemiological and evolutionary dynamics of pathogens</p></title><aug><au><snm>Grenfell</snm><fnm>BT</fnm></au><au><snm>Pybus</snm><fnm>OG</fnm></au><au><snm>Gog</snm><fnm>JR</fnm></au><au><snm>Wood</snm><fnm>JL</fnm></au><au><snm>Daly</snm><fnm>JM</fnm></au><au><snm>Mumford</snm><fnm>JA</fnm></au><au><snm>Holmes</snm><fnm>EC</fnm></au></aug><source>Science</source><pubdate>2004</pubdate><volume>303</volume><issue>5656</issue><fpage>327</fpage><lpage>332</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1090727</pubid><pubid idtype="pmpid" link="fulltext">14726583</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>The causes and consequences of HIV evolution</p></title><aug><au><snm>Rambaut</snm><fnm>A</fnm></au><au><snm>Posada</snm><fnm>D</fnm></au><au><snm>Crandall</snm><fnm>KA</fnm></au><au><snm>Holmes</snm><fnm>EC</fnm></au></aug><source>Nature reviews Genetics</source><pubdate>2004</pubdate><volume>5</volume><issue>1</issue><fpage>52</fpage><lpage>61</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nrg1246</pubid><pubid idtype="pmpid" link="fulltext">14708016</pubid></pubidlist></xrefbib></bibl><bibl id="B4"><title><p>Mutational and fitness landscapes of an RNA virus revealed through population sequencing</p></title><aug><au><snm>Acevedo</snm><fnm>A</fnm></au><au><snm>Brodsky</snm><fnm>L</fnm></au><au><snm>Andino</snm><fnm>R</fnm></au></aug><source>Nature</source><pubdate>2013</pubdate></bibl><bibl id="B5"><title><p>Identification and characterization of transmitted and early founder virus envelopes in primary HIV-1 infection</p></title><aug><au><snm>Keele</snm><fnm>BF</fnm></au><au><snm>Giorgi</snm><fnm>EE</fnm></au><au><snm>Salazar-Gonzalez</snm><fnm>JF</fnm></au><au><snm>Decker</snm><fnm>JM</fnm></au><au><snm>Pham</snm><fnm>KT</fnm></au><au><snm>Salazar</snm><fnm>MG</fnm></au><au><snm>Sun</snm><fnm>C</fnm></au><au><snm>Grayson</snm><fnm>T</fnm></au><au><snm>Wang</snm><fnm>S</fnm></au><au><snm>Li</snm><fnm>H</fnm></au><etal/></aug><source>Proceedings of the National Academy of Sciences of the United States of America</source><pubdate>2008</pubdate><volume>105</volume><issue>21</issue><fpage>7552</fpage><lpage>7557</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.0802203105</pubid><pubid idtype="pmcid">2387184</pubid><pubid idtype="pmpid" link="fulltext">18490657</pubid></pubidlist></xrefbib></bibl><bibl id="B6"><title><p>Quantifying the diversification of hepatitis C virus (HCV) during primary infection: estimates of the in vivo mutation rate</p></title><aug><au><snm>Ribeiro</snm><fnm>RM</fnm></au><au><snm>Li</snm><fnm>H</fnm></au><au><snm>Wang</snm><fnm>S</fnm></au><au><snm>Stoddard</snm><fnm>MB</fnm></au><au><snm>Learn</snm><fnm>GH</fnm></au><au><snm>Korber</snm><fnm>BT</fnm></au><au><snm>Bhattacharya</snm><fnm>T</fnm></au><au><snm>Guedj</snm><fnm>J</fnm></au><au><snm>Parrish</snm><fnm>EH</fnm></au><au><snm>Hahn</snm><fnm>BH</fnm></au><etal/></aug><source>PLoS pathogens</source><pubdate>2012</pubdate><volume>8</volume><issue>8</issue><fpage>e1002881</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.ppat.1002881</pubid><pubid idtype="pmcid">3426522</pubid><pubid idtype="pmpid" link="fulltext">22927817</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>BEAST: Bayesian evolutionary analysis by sampling trees</p></title><aug><au><snm>Drummond</snm><fnm>AJ</fnm></au><au><snm>Rambaut</snm><fnm>A</fnm></au></aug><source>BMC evolutionary biology</source><pubdate>2007</pubdate><volume>7</volume><fpage>214</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2148-7-214</pubid><pubid idtype="pmcid">2247476</pubid><pubid idtype="pmpid" link="fulltext">17996036</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>Whole-genome characterization of human and simian immunodeficiency virus intrahost diversity by ultradeep pyrosequencing</p></title><aug><au><snm>Bimber</snm><fnm>BN</fnm></au><au><snm>Dudley</snm><fnm>DM</fnm></au><au><snm>Lauck</snm><fnm>M</fnm></au><au><snm>Becker</snm><fnm>EA</fnm></au><au><snm>Chin</snm><fnm>EN</fnm></au><au><snm>Lank</snm><fnm>SM</fnm></au><au><snm>Grunenwald</snm><fnm>HL</fnm></au><au><snm>Caruccio</snm><fnm>NC</fnm></au><au><snm>Maffitt</snm><fnm>M</fnm></au><au><snm>Wilson</snm><fnm>NA</fnm></au><etal/></aug><source>Journal of virology</source><pubdate>2010</pubdate><volume>84</volume><issue>22</issue><fpage>12087</fpage><lpage>12092</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/JVI.01378-10</pubid><pubid idtype="pmcid">2977871</pubid><pubid idtype="pmpid" link="fulltext">20844037</pubid></pubidlist></xrefbib></bibl><bibl id="B9"><title><p>Hepatitis C virus transmission bottlenecks analyzed by deep sequencing</p></title><aug><au><snm>Wang</snm><fnm>GP</fnm></au><au><snm>Sherrill-Mix</snm><fnm>SA</fnm></au><au><snm>Chang</snm><fnm>KM</fnm></au><au><snm>Quince</snm><fnm>C</fnm></au><au><snm>Bushman</snm><fnm>FD</fnm></au></aug><source>Journal of virology</source><pubdate>2010</pubdate><volume>84</volume><issue>12</issue><fpage>6218</fpage><lpage>6228</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/JVI.02271-09</pubid><pubid idtype="pmcid">2876626</pubid><pubid idtype="pmpid" link="fulltext">20375170</pubid></pubidlist></xrefbib></bibl><bibl id="B10"><title><p>Whole genome deep sequencing of HIV-1 reveals the impact of early minor variants upon immune recognition during acute infection</p></title><aug><au><snm>Henn</snm><fnm>MR</fnm></au><au><snm>Boutwell</snm><fnm>CL</fnm></au><au><snm>Charlebois</snm><fnm>P</fnm></au><au><snm>Lennon</snm><fnm>NJ</fnm></au><au><snm>Power</snm><fnm>KA</fnm></au><au><snm>Macalalad</snm><fnm>AR</fnm></au><au><snm>Berlin</snm><fnm>AM</fnm></au><au><snm>Malboeuf</snm><fnm>CM</fnm></au><au><snm>Ryan</snm><fnm>EM</fnm></au><au><snm>Gnerre</snm><fnm>S</fnm></au><etal/></aug><source>PLoS pathogens</source><pubdate>2012</pubdate><volume>8</volume><issue>3</issue><fpage>e1002529</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.ppat.1002529</pubid><pubid idtype="pmcid">3297584</pubid><pubid idtype="pmpid" link="fulltext">22412369</pubid></pubidlist></xrefbib></bibl><bibl id="B11"><title><p>Application of next-generation sequencing technologies in virology</p></title><aug><au><snm>Radford</snm><fnm>AD</fnm></au><au><snm>Chapman</snm><fnm>D</fnm></au><au><snm>Dixon</snm><fnm>L</fnm></au><au><snm>Chantrey</snm><fnm>J</fnm></au><au><snm>Darby</snm><fnm>AC</fnm></au><au><snm>Hall</snm><fnm>N</fnm></au></aug><source>The Journal of general virology</source><pubdate>2012</pubdate><volume>93</volume><issue>Pt 9</issue><fpage>1853</fpage><lpage>1868</lpage><xrefbib><pubidlist><pubid idtype="pmcid">3709572</pubid><pubid idtype="pmpid" link="fulltext">22647373</pubid></pubidlist></xrefbib></bibl><bibl id="B12"><title><p>Clinical implications of HIV-1 minority variants</p></title><aug><au><snm>Li</snm><fnm>JZ</fnm></au><au><snm>Kuritzkes</snm><fnm>DR</fnm></au></aug><source>Clinical infectious diseases : an official publication of the Infectious Diseases Society of America 2013</source><volume>56</volume><issue>11</issue><fpage>1667</fpage><lpage>1674</lpage></bibl><bibl id="B13"><title><p>De novo assembly of highly diverse viral populations</p></title><aug><au><snm>Yang</snm><fnm>X</fnm></au><au><snm>Charlebois</snm><fnm>P</fnm></au><au><snm>Gnerre</snm><fnm>S</fnm></au><au><snm>Coole</snm><fnm>MG</fnm></au><au><snm>Lennon</snm><fnm>NJ</fnm></au><au><snm>Levin</snm><fnm>JZ</fnm></au><au><snm>Qu</snm><fnm>J</fnm></au><au><snm>Ryan</snm><fnm>EM</fnm></au><au><snm>Zody</snm><fnm>MC</fnm></au><au><snm>Henn</snm><fnm>MR</fnm></au></aug><source>BMC genomics</source><pubdate>2012</pubdate><volume>13</volume><fpage>475</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-13-475</pubid><pubid idtype="pmcid">3469330</pubid><pubid idtype="pmpid" link="fulltext">22974120</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><title><p>Origins and evolutionary genomics of the 2009 swine-origin H1N1 influenza A epidemic</p></title><aug><au><snm>Smith</snm><fnm>GJ</fnm></au><au><snm>Vijaykrishna</snm><fnm>D</fnm></au><au><snm>Bahl</snm><fnm>J</fnm></au><au><snm>Lycett</snm><fnm>SJ</fnm></au><au><snm>Worobey</snm><fnm>M</fnm></au><au><snm>Pybus</snm><fnm>OG</fnm></au><au><snm>Ma</snm><fnm>SK</fnm></au><au><snm>Cheung</snm><fnm>CL</fnm></au><au><snm>Raghwani</snm><fnm>J</fnm></au><au><snm>Bhatt</snm><fnm>S</fnm></au><etal/></aug><source>Nature</source><pubdate>2009</pubdate><volume>459</volume><issue>7250</issue><fpage>1122</fpage><lpage>1125</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature08182</pubid><pubid idtype="pmpid" link="fulltext">19516283</pubid></pubidlist></xrefbib></bibl><bibl id="B15"><title><p>Evolutionary characterization of the pandemic H1N1/2009 influenza virus in humans based on non-structural genes</p></title><aug><au><snm>Wang</snm><fnm>C</fnm></au><au><snm>Zhang</snm><fnm>Y</fnm></au><au><snm>Wu</snm><fnm>B</fnm></au><au><snm>Liu</snm><fnm>S</fnm></au><au><snm>Xu</snm><fnm>P</fnm></au><au><snm>Lu</snm><fnm>Y</fnm></au><au><snm>Luo</snm><fnm>J</fnm></au><au><snm>Nolte</snm><fnm>DL</fnm></au><au><snm>Deliberto</snm><fnm>TJ</fnm></au><au><snm>Duan</snm><fnm>M</fnm></au><etal/></aug><source>PloS one</source><pubdate>2013</pubdate><volume>8</volume><issue>2</issue><fpage>e56201</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pone.0056201</pubid><pubid idtype="pmcid">3572024</pubid><pubid idtype="pmpid" link="fulltext">23418535</pubid></pubidlist></xrefbib></bibl><bibl id="B16"><title><p>aCCR: Evolution of Protein Molecules</p></title><aug><au><snm>Jukes</snm><fnm>TH</fnm></au></aug><publisher>New York: Academic Press</publisher><pubdate>1969</pubdate></bibl><bibl id="B17"><title><p>A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences</p></title><aug><au><snm>Kimura</snm><fnm>M</fnm></au></aug><source>J Mol Evol</source><pubdate>1980</pubdate><volume>16</volume><issue>2</issue><fpage>111</fpage><lpage>120</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1007/BF01731581</pubid><pubid idtype="pmpid">7463489</pubid></pubidlist></xrefbib></bibl><bibl id="B18"><title><p>Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences</p></title><aug><au><snm>Shapiro</snm><fnm>B</fnm></au><au><snm>Rambaut</snm><fnm>A</fnm></au><au><snm>Drummond</snm><fnm>AJ</fnm></au></aug><source>Molecular biology and evolution</source><pubdate>2006</pubdate><volume>23</volume><issue>1</issue><fpage>7</fpage><lpage>9</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">16177232</pubid></xrefbib></bibl><bibl id="B19"><title><p>Mathematical model for studying genetic variation in terms of restriction endonucleases</p></title><aug><au><snm>Nei</snm><fnm>M</fnm></au><au><snm>Li</snm><fnm>WH</fnm></au></aug><source>Proceedings of the National Academy of Sciences of the United States of America</source><pubdate>1979</pubdate><volume>76</volume><issue>10</issue><fpage>5269</fpage><lpage>5273</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.76.10.5269</pubid><pubid idtype="pmcid">413122</pubid><pubid idtype="pmpid">291943</pubid></pubidlist></xrefbib></bibl><bibl id="B20"><title><p>A nonparametric approach to estimating divergence times in the absence of rate constancy</p></title><aug><au><snm>Sanderson</snm><fnm>MJ</fnm></au></aug><source>Molecular biology and evolution</source><pubdate>1997</pubdate><volume>14</volume><issue>12</issue><fpage>1218</fpage><lpage>1231</lpage><xrefbib><pubid idtype="doi">10.1093/oxfordjournals.molbev.a025731</pubid></xrefbib></bibl><bibl id="B21"><title><p>Estimating absolute rates of molecular evolution and divergence times: a penalized likelihood approach</p></title><aug><au><snm>Sanderson</snm><fnm>MJ</fnm></au></aug><source>Molecular biology and evolution</source><pubdate>2002</pubdate><volume>19</volume><issue>1</issue><fpage>101</fpage><lpage>109</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/oxfordjournals.molbev.a003974</pubid><pubid idtype="pmpid" link="fulltext">11752195</pubid></pubidlist></xrefbib></bibl><bibl id="B22"><title><p>A multiresolution weak-lensing mass reconstruction method</p></title><aug><au><snm>Khiabanian</snm><fnm>H</fnm></au><au><snm>Dell'Antonio</snm><fnm>IP</fnm></au></aug><source>Astrophys J</source><pubdate>2008</pubdate><volume>684</volume><issue>2</issue><fpage>794</fpage><lpage>803</lpage><xrefbib><pubid idtype="doi">10.1086/590232</pubid></xrefbib></bibl><bibl id="B23"><title><p>Procedures for Optimization Problems with a Mixture of Bounds and General Linear Constraints</p></title><aug><au><snm>Gill</snm><fnm>PE</fnm></au><au><snm>Murray</snm><fnm>W</fnm></au><au><snm>Saunders</snm><fnm>MA</fnm></au><au><snm>Wright</snm><fnm>MH</fnm></au></aug><source>Acm T Math Software</source><pubdate>1984</pubdate><volume>10</volume><issue>3</issue><fpage>282</fpage><lpage>298</lpage><xrefbib><pubid idtype="doi">10.1145/1271.1276</pubid></xrefbib></bibl><bibl id="B24"><title><p>The influenza virus resource at the National Center for Biotechnology Information</p></title><aug><au><snm>Bao</snm><fnm>Y</fnm></au><au><snm>Bolotov</snm><fnm>P</fnm></au><au><snm>Dernovoy</snm><fnm>D</fnm></au><au><snm>Kiryutin</snm><fnm>B</fnm></au><au><snm>Zaslavsky</snm><fnm>L</fnm></au><au><snm>Tatusova</snm><fnm>T</fnm></au><au><snm>Ostell</snm><fnm>J</fnm></au><au><snm>Lipman</snm><fnm>D</fnm></au></aug><source>Journal of virology</source><pubdate>2008</pubdate><volume>82</volume><issue>2</issue><fpage>596</fpage><lpage>601</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/JVI.02005-07</pubid><pubid idtype="pmcid">2224563</pubid><pubid idtype="pmpid" link="fulltext">17942553</pubid></pubidlist></xrefbib></bibl><bibl id="B25"><title><p>A global initiative on sharing avian flu data</p></title><aug><au><snm>Bogner</snm><fnm>P</fnm></au><au><snm>Capua</snm><fnm>I</fnm></au><au><snm>Cox</snm><fnm>NJ</fnm></au><au><snm>Lipman</snm><fnm>DJ</fnm></au></aug><source>Nature</source><pubdate>2006</pubdate><volume>442</volume><issue>7106</issue><fpage>981</fpage><lpage>981</lpage></bibl><bibl id="B26"><title><p>Geographic dependence, surveillance, and origins of the 2009 influenza A (H1N1) virus</p></title><aug><au><snm>Trifonov</snm><fnm>V</fnm></au><au><snm>Khiabanian</snm><fnm>H</fnm></au><au><snm>Rabadan</snm><fnm>R</fnm></au></aug><source>The New England journal of medicine</source><pubdate>2009</pubdate><volume>361</volume><issue>2</issue><fpage>115</fpage><lpage>119</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1056/NEJMp0904572</pubid><pubid idtype="pmpid" link="fulltext">19474418</pubid></pubidlist></xrefbib></bibl><bibl id="B27"><title><p>Pandemic (H1N1) 2009 virus revisited: an evolutionary retrospective</p></title><aug><au><snm>Christman</snm><fnm>MC</fnm></au><au><snm>Kedwaii</snm><fnm>A</fnm></au><au><snm>Xu</snm><fnm>J</fnm></au><au><snm>Donis</snm><fnm>RO</fnm></au><au><snm>Lu</snm><fnm>G</fnm></au></aug><source>Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases</source><pubdate>2011</pubdate><volume>11</volume><issue>5</issue><fpage>803</fpage><lpage>811</lpage><xrefbib><pubid idtype="doi">10.1016/j.meegid.2011.02.021</pubid></xrefbib></bibl><bibl id="B28"><title><p>Reassortment of pandemic H1N1/2009 influenza A virus in swine</p></title><aug><au><snm>Vijaykrishna</snm><fnm>D</fnm></au><au><snm>Poon</snm><fnm>LL</fnm></au><au><snm>Zhu</snm><fnm>HC</fnm></au><au><snm>Ma</snm><fnm>SK</fnm></au><au><snm>Li</snm><fnm>OT</fnm></au><au><snm>Cheung</snm><fnm>CL</fnm></au><au><snm>Smith</snm><fnm>GJ</fnm></au><au><snm>Peiris</snm><fnm>JS</fnm></au><au><snm>Guan</snm><fnm>Y</fnm></au></aug><source>Science</source><pubdate>2010</pubdate><volume>328</volume><issue>5985</issue><fpage>1529</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1189132</pubid><pubid idtype="pmcid">3569847</pubid><pubid idtype="pmpid" link="fulltext">20558710</pubid></pubidlist></xrefbib></bibl><bibl id="B29"><title><p>Molecular evolution of viruses of the family Filoviridae based on 97 whole-genome sequences</p></title><aug><au><snm>Carroll</snm><fnm>SA</fnm></au><au><snm>Towner</snm><fnm>JS</fnm></au><au><snm>Sealy</snm><fnm>TK</fnm></au><au><snm>McMullan</snm><fnm>LK</fnm></au><au><snm>Khristova</snm><fnm>ML</fnm></au><au><snm>Burt</snm><fnm>FJ</fnm></au><au><snm>Swanepoel</snm><fnm>R</fnm></au><au><snm>Rollin</snm><fnm>PE</fnm></au><au><snm>Nichol</snm><fnm>ST</fnm></au></aug><source>Journal of virology</source><pubdate>2013</pubdate><volume>87</volume><issue>5</issue><fpage>2608</fpage><lpage>2616</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/JVI.03118-12</pubid><pubid idtype="pmcid">3571414</pubid><pubid idtype="pmpid" link="fulltext">23255795</pubid></pubidlist></xrefbib></bibl><bibl id="B30"><title><p>Ebola virus genome plasticity as a marker of its passaging history: a comparison of in vitro passaging to non-human primate infection</p></title><aug><au><snm>Kugelman</snm><fnm>JR</fnm></au><au><snm>Lee</snm><fnm>MS</fnm></au><au><snm>Rossi</snm><fnm>CA</fnm></au><au><snm>McCarthy</snm><fnm>SE</fnm></au><au><snm>Radoshitzky</snm><fnm>SR</fnm></au><au><snm>Dye</snm><fnm>JM</fnm></au><au><snm>Hensley</snm><fnm>LE</fnm></au><au><snm>Honko</snm><fnm>A</fnm></au><au><snm>Kuhn</snm><fnm>JH</fnm></au><au><snm>Jahrling</snm><fnm>PB</fnm></au><etal/></aug><source>PloS one</source><pubdate>2012</pubdate><volume>7</volume><issue>11</issue><fpage>e50316</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pone.0050316</pubid><pubid idtype="pmcid">3509072</pubid><pubid idtype="pmpid" link="fulltext">23209706</pubid></pubidlist></xrefbib></bibl><bibl id="B31"><title><p>Micro-scale signature of purifying selection in Marburg virus genomes</p></title><aug><au><snm>Hughes</snm><fnm>AL</fnm></au></aug><source>Gene</source><pubdate>2007</pubdate><volume>392</volume><issue>1-2</issue><fpage>266</fpage><lpage>272</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.gene.2006.12.038</pubid><pubid idtype="pmpid" link="fulltext">17306473</pubid></pubidlist></xrefbib></bibl></refgrp>
   </bm>
</art>