<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1745-6150-5-4</ui>
   <ji>1745-6150</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Including RNA secondary structures improves accuracy and robustness in reconstruction of phylogenetic trees</p>
         </title>
         <aug>
            <au id="A1" ce="yes">
               <snm>Keller</snm>
               <fnm>Alexander</fnm>
               <insr iid="I1"/>
               <email>a.keller@biozentrum.uni-wuerzburg.de</email>
            </au>
            <au id="A2" ce="yes">
               <snm>F&#246;rster</snm>
               <fnm>Frank</fnm>
               <insr iid="I1"/>
               <email>frank.foerster@biozentrum.uni-wuerzburg.de</email>
            </au>
            <au id="A3">
               <snm>M&#252;ller</snm>
               <fnm>Tobias</fnm>
               <insr iid="I1"/>
               <email>Tobias.Mueller@biozentrum.uni-wuerzburg.de</email>
            </au>
            <au id="A4">
               <snm>Dandekar</snm>
               <fnm>Thomas</fnm>
               <insr iid="I1"/>
               <email>dandekar@biozentrum.uni-wuerzburg.de</email>
            </au>
            <au ca="yes" id="A5">
               <snm>Schultz</snm>
               <fnm>J&#246;rg</fnm>
               <insr iid="I1"/>
               <email>Joerg.Schultz@biozentrum.uni-wuerzburg.de</email>
            </au>
            <au ca="yes" id="A6">
               <snm>Wolf</snm>
               <fnm>Matthias</fnm>
               <insr iid="I1"/>
               <email>matthias.wolf@biozentrum.uni-wuerzburg.de</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Bioinformatics, University of W&#252;rzburg, Am Hubland, 97074 W&#252;rzburg, Germany</p>
            </ins>
         </insg>
         <source>Biology Direct</source>
         <issn>1745-6150</issn>
         <pubdate>2010</pubdate>
         <volume>5</volume>
         <issue>1</issue>
         <fpage>4</fpage>
         <url>http://www.biology-direct.com/content/5/1/4</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">20078867</pubid>
               <pubid idtype="doi">10.1186/1745-6150-5-4</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>21</day>
               <month>12</month>
               <year>2009</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>15</day>
               <month>1</month>
               <year>2010</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>15</day>
               <month>1</month>
               <year>2010</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2010</year>
         <collab>Keller et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>In several studies, secondary structures of ribosomal genes have been used to improve the quality of phylogenetic reconstructions. An extensive evaluation of the benefits of secondary structure, however, is lacking.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>This is the first study to counter this deficiency. We inspected the accuracy and robustness of phylogenetics with individual secondary structures by simulation experiments for artificial tree topologies with up to 18 taxa and for divergency levels in the range of typical phylogenetic studies. We chose the internal transcribed spacer 2 of the ribosomal cistron as an exemplary marker region. Simulation integrated the coevolution process of sequences with secondary structures. Additionally, the phylogenetic power of marker size duplication was investigated and compared with sequence and sequence-structure reconstruction methods. The results clearly show that accuracy and robustness of Neighbor Joining trees are largely improved by structural information in contrast to sequence only data, whereas a doubled marker size only accounts for robustness.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>Individual secondary structures of ribosomal RNA sequences provide a valuable gain of information content that is useful for phylogenetics. Thus, the usage of ITS2 sequence together with secondary structure for taxonomic inferences is recommended. Other reconstruction methods as maximum likelihood, bayesian inference or maximum parsimony may equally profit from secondary structure inclusion.</p>
            </sec>
            <sec>
               <st>
                  <p>Reviewers</p>
               </st>
               <p>This article was reviewed by Shamil Sunyaev, Andrea Tanzer (nominated by Frank Eisenhaber) and Eugene V. Koonin.</p>
            </sec>
            <sec>
               <st>
                  <p>Open peer review</p>
               </st>
               <p>Reviewed by Shamil Sunyaev, Andrea Tanzer (nominated by Frank Eisenhaber) and Eugene V. Koonin. For the full reviews, please go to the Reviewers' comments section.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>In the last decades, traditional morphological systematics has been augmented by novel molecular phylogenetics. One advantage of molecular data is the increased amount of parsimonious informative characters retained from genes that are usable for the inference of evolutionary relationships. This transition from few morphological features to abundant nucleotide or amino acid information has been a breakthrough for investigations of species relationships <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>.</p>
         <p>However, genetic data often inherits ambiguous information about phylogenetic relationships. Especially for very closely or distantly related taxa, certain parts of data sets may contradict each other or carry insufficient information. Phylogeneticists counter such problems e.g. by increase of the marker's size by inclusion of more nucleotides, thus increasing the amount of available data <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. Moreover, different markers are combined, so that for example nuclear or mitochondrial genes are concatenated to increase the power of phylogenetic inferences <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. These methods however face new problems. Increase of the number of nucleotides does not necessarily improve the accuracy of a tree reconstruction. Stochastically, only the robustness of the results is increased, if the complete elongated sequence evolved under the same evolutionary constraints <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. The second method, marker concatenation, combines genes that result from different evolutionary processes and thus indeed include different evolutionary signals that may improve accuracy. However, they need to be investigated with marker-specific phylogenetic procedures as e.g. varying substitution models <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>.</p>
         <p>In this study we evaluate an alternative method applicable to ribosomal RNA (rRNA) genes that increases information content without addition of nucleotides. As non-coding RNA fragments of the genome, the rRNA gene is generally capable of folding into a secondary structure. In most cases, these structures are necessary for cell function and are thus evolutionarily conserved. Accordingly, structural information may be treated as a conserved marker. Secondary structures of ribosomal RNA therefore offer an additional source of information for tree reconstruction. In particular this is a major advantage in cases where secondary structures are very conserved, yet mutations of nucleotides occur frequently. This applies to the internal transcribed spacer 2 (ITS2) of the eukaryote ribosomal cistron <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>. Its secondary structure is evolutionarily maintained as it is of importance in ribogenesis. By contrast, the evolutionary rate of its sequence is relatively high and it is not present in the mature ribosome.</p>
         <p>ITS2 sequences have been commonly used to infer phylogenies. Moreover, several studies already included secondary structures in their analyses either by morphometrical matrices or by sequence-structure alignments <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>. All these studies agree that the resulting reconstructions are improved by the secondary structures. However, no study has investigated and evaluated this benefit in detail. Evaluations of phylogenetic procedures are typically performed by two different means: the most commonly applied confidence measure in phylogenetics is non-parametric bootstrapping. Bootstrap support values are a measure of robustness of the tree and allow identification of trees or parts of trees that are not unambiguously supported by the data <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>. The second point of interest is accuracy measured by the distance between the real and the reconstructed tree. As the 'real' biological tree of life is not available, a switch to sequence simulations along 'real' artificial trees is necessary <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. In this study we (1) simulate ITS2 sequences along evolutionary trees and (2) compare the results of tree reconstructions by sequence only data and combined sequence-structure data. Additionally, (3) the benefit of structural data is compared with that of sequence elongation. Furthermore, (4) a small biological example of plant phylogeny is presented in which reconstructions that either base on sequence-only or sequence-structure data are compared.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>The overall calculation time took 80,000 processor hours on our 40 nodes network cluster. Each node comprised four Xeon 2.33 GHz cores. In total 448 GB RAM were used by the cluster.</p>
         <p>The shapes of bootstrap, Quartet distance and Robinson-Foulds distance distributions were similar for equidistant and variable distance trees. However, the branches of the trees for each underlying data set (sequence, sequence-structure and doubled sequence) received higher bootstrap support values and fewer false splits with constant branch lengths compared to variable distances, though differences were minimal (Figs. <figr fid="F1">1</figr>, <figr fid="F2">2</figr>, <figr fid="F3">3</figr> and <figr fid="F4">4</figr>). Only Quartet distances are shown, since they are congruent with the results of the Robinson-Foulds distance (Additional file <supplr sid="S1">1</supplr>). Additionally, we included a relative per-branch representation of accuracy divided by the number of internal nodes in the Additional file <supplr sid="S1">1</supplr>.</p>
         <suppl id="S1">
            <title>
               <p>Additional file 1</p>
            </title>
            <text>
               <p><b>Normalized Quartet distance and Robinson-Foulds plots</b>. Similar to Figures <figr fid="F2">2</figr> and <figr fid="F4">4</figr>, but showing per-branch Quartet distances as a normalized standard i.e. divided by number of splits. Robinson-Foulds Distances are given in absolute and normalized versions.</p>
            </text>
            <file name="1745-6150-5-4-S1.PDF">
               <p>Click here for file</p>
            </file>
         </suppl>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>Bootstrap support values for equidistant trees</p>
            </caption>
            <text>
               <p><b>Bootstrap support values for equidistant trees</b>. All five ancestral sequences were combined for a given scenario. (a) Boxplot and solid splines are for 14 taxa scenarios of the three methods. Dashed lines and dotted lines are splines of ten and 18 taxa, respectively. (b) Direct comparison of the 14 taxa splines and medians of all three methods. Sample sizes are 7,000, 11,000 and 15,000 for each of the ten, 14 and 18 taxa scenarios, respectively. Splines show a decrease of robustness with increased number of taxa used and increased branch lengths. Secondary structure and doubled sequences show an improvement in robustness in contrast to normal sequence information.</p>
            </text>
            <graphic file="1745-6150-5-4-1"/>
         </fig>
         <fig id="F2">
            <title>
               <p>Figure 2</p>
            </title>
            <caption>
               <p>Quartet distances values for equidistant trees</p>
            </caption>
            <text>
               <p><b>Quartet distances values for equidistant trees</b>. All five ancestral sequences were combined for a given scenario. (a) Boxplot and solid splines are for 14 taxa scenarios of the three methods. Dashed lines and dotted lines are splines of ten and 18 taxa, respectively. (b) Direct comparison of the 14 taxa splines and medians of all three methods. The samples size of each scenario is 1,000. The accuracy of tree topologies decreases with more taxa and greater evolutionary distances between sequences. Trees calculated with secondary structures or doubled sequences show greater accuracy than those determined with normal sequences.</p>
            </text>
            <graphic file="1745-6150-5-4-2"/>
         </fig>
         <fig id="F3">
            <title>
               <p>Figure 3</p>
            </title>
            <caption>
               <p>Bootstrap support values for trees with variable branch lengths</p>
            </caption>
            <text>
               <p><b>Bootstrap support values for trees with variable branch lengths</b>. Subfigures are explained in Figure 1. Sample sizes are 7,000, 11,000 and 15,000 for each of the ten, 14 and 18 taxa scenarios, respectively.</p>
            </text>
            <graphic file="1745-6150-5-4-3"/>
         </fig>
         <fig id="F4">
            <title>
               <p>Figure 4</p>
            </title>
            <caption>
               <p>Quartet distances values for trees with variable branch lengths</p>
            </caption>
            <text>
               <p><b>Quartet distances values for trees with variable branch lengths</b>. Subfigures are explained in Figure 2. The samples size of each scenario is 1,000.</p>
            </text>
            <graphic file="1745-6150-5-4-4"/>
         </fig>
         <p>Bootstrap values and tree distances obtained by differing ancestor sequences were similar in their distributions and thus combined for each scenario during the analysis process. Naturally, with increasing branch lengths, all three investigated data sets (sequences, doubled sequences and sequence-structure) became less accurate and robust, i.e. Quartet distances increased and bootstrap support of nodes decreased. This effect was also observable with an increasing number of external nodes.</p>
         <p>Differences between the three methods also increased with evolutionary distance and number of taxa. Thus, the three methods (especially sequence-structure and doubled sequence) yielded almost similar results with low divergence (e.g. branch length 0.05) and few taxa (e.g. 10 taxa), whereas the results were different with branch lengths above 0.25 and at least 14 taxa.</p>
         <p>For the lowest branch length we simulated, i.e. 0.025, in comparison to medium divergences a decreased accuracy and bootstrap support was observable with all three methods. This is explainable by too few base changes as providing information for phylogenetic tree reconstruction.</p>
         <p>Sequence data performed best in reconstruction of trees (as the maximum and minimum of the spline-curves for bootstraps and tree distances, respectively) at a divergence level between 0.05 and 0.1. Sequence-structure shifted the optimal performance to higher divergences. This effect was also observable for doubled sequence, however it was not as prominent as for sequence-structure.</p>
         <p>In general, the robustness of recalculated trees was highest for doubled sequence information contents. However, inclusion of secondary structures largely increased the bootstrap support values of nodes in contrast to normal sequence data. There is thus a robustness benefit to using secondary structure that is not directly comparable to benefits achieved by marker elongation.</p>
         <p>Additionally, the accuracy of the trees benefitted from secondary structures: the number of false splits was significantly reduced compared to sequence as well as doubled sequence data. Thus sequences-structures yielded the most accurate results in our comparisons.</p>
         <p>The results of trees reconstructed with sequence data and sequence-structure data for the plant example were very different. Sequence only information resulted in a correct topology reconstruction of genera (Fig. <figr fid="F5">5</figr>). However, the family of the Malvaceae could not be resolved. This supports the notion that the optimum divergence level of ITS2 sequences is at the species/genus level (see as well Additional file <supplr sid="S2">2</supplr>). By contrast, all genera and families could be resolved with secondary structures. This results in a flawless tree topology and highlights the improved accuracy. Furthermore, the robustness of the tree has been enhanced and the optimal divergence level has been widened.</p>
         <suppl id="S2">
            <title>
               <p>Additional file 2</p>
            </title>
            <text>
               <p><b>Empirical pairwise distances</b>. Pairwise distances of an ITS2 case study that integrates secondary structure.</p>
            </text>
            <file name="1745-6150-5-4-S2.PDF">
               <p>Click here for file</p>
            </file>
         </suppl>
         <fig id="F5">
            <title>
               <p>Figure 5</p>
            </title>
            <caption>
               <p>Tree topology of the plants example</p>
            </caption>
            <text>
               <p><b>Tree topology of the plants example</b>. Left side: topology and bootstrap values of sequence only data. Right side: corresponding tree with inclusion of secondary structure. Families of the species are given at the right end. GenBank identifiers are in parenthesis after the species names.</p>
            </text>
            <graphic file="1745-6150-5-4-5"/>
         </fig>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <sec>
            <st>
               <p>Number of Taxa and Divergence</p>
            </st>
            <p>Based on the simulations, we draw several conclusions regarding phylogenetic tree reconstructions with and without secondary structures. First of all, the robustness of a tree and its accuracy were significantly negatively correlated with number of taxa. This is the case even for normalized per-branch accuracy data (Additional file <supplr sid="S1">1</supplr>). Graybeal <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> argues that an increased taxon sampling enhances accuracy of a resolved tree in the 'Felsenstein zone'. We argue that such an enhancement is the case for special occurrences of long branch attraction, but not, according to our study, for general tree topologies. This is in accordance with Bremer et al. <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> as well as Rokas and Carroll <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>, who also notice a slight decrease in accuracy with increased taxon sampling.</p>
            <p>Secondly, according to Yang <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>, a gene has an optimum level of sequence divergence for phylogenetic studies. The upper limits are reached when the observed difference is saturated, whereas the lower boundary is lack of information content caused by too few substitutions. We observed a similar pattern so that we are able to estimate the divergence level of best performance for ITS2 sequences with and without secondary structures. However, these differ for sequence data and sequence-structure data in two ways: inclusion of secondary structures shifted the best performance to a higher level of divergence. Thus, organisms that are more distantly related can be included in phylogenies. Furthermore, the range of optimal performance is wider for sequence-structure data. A shift to more distantly related sequences does not necessarily mean that relationships of closely related taxa are not any more resolvable. In a review Coleman <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> also identified this potential of ITS2 secondary structures by discussing several case studies. The small biological example of the Malvales and Sapindales in this study supports this notion. Our study mainly covers artificial data: a large scale comparison with biological data regarding the extension of the performance span is still desirable.</p>
         </sec>
         <sec>
            <st>
               <p>Robustness and Accuracy</p>
            </st>
            <p>A substantial benefit to tree robustness was observable when including secondary structure information. Trees reconstructed with secondary structures are generally better bootstrap-supported by the data than those resulting from sequence only data <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. This is caused by a gain of information content due to increased number of states possible for each nucleotide (unpaired, paired). This information is extractable with a suitable combined score matrix as implemented in 4SALE <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> or similar by site partitioning as in PHASE <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>.</p>
            <p>The major benefit we identified for phylogenetics is the improvement of accuracy. Sequences-structures performed far better than sequences alone in matching the 'real' tree, especially for high divergences. The resulting immense profit for phylogeneticists is obvious. It is the most crucial property of a phylogenetic tree to be as accurate as possible.</p>
         </sec>
         <sec>
            <st>
               <p>Secondary structure vs. Marker elongation</p>
            </st>
            <p>Both, inclusion of secondary structures and increase of the number of nucleotides improved the reconstructed phylogenetic trees. However, inclusion of secondary structure in the reconstruction process is not equivalent to marker elongation. The major effect of more nucleotides is to increase the bootstrap support values. This has already been demonstrated by other authors <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B5">5</abbr></abbrgrp>. With a theoretical increase of marker's length to infinitely large, corresponding bootstraps within a tree will stochastically be maximized as they exactly represent the data. In contrast, the benefit of secondary structures is predominantly the improvement of a tree's accuracy. Thus, additional sequence elongation and secondary structures represent different types of information increase. As the secondary structure analysis already covers the whole marker region of the ITS2 sequence, sequence elongation is not possible for real biological data.</p>
            <p>The results retained in this study for the ITS2 region may be transfered to other ribosomal genes. However, the combination of a conserved secondary structure with a variable sequence seems to be of major benefit in phylogenetic studies. Other ribosomal markers, as e.g. 5.8S or 28S rRNA genes may profit less from addition of secondary structures than the ITS2, as the markers themselves are relatively conserved.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>Secondary structures of ribosomal RNA provide a valuable gain of information content that is useful for phylogenetics. Both, the robustness and accuracy of tree reconstructions are improved. Furthermore, this enlarges the optimal range of divergence levels for taxonomic inferences with ITS2 sequences. Thus, the usage of ITS2 sequence together with secondary structure for taxonomic inferences is recommended <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. This pipeline is theoretically as well applicable to other reconstruction methods as maximum likelihood, bayesian inference or maximum parsimony. They may equally profit from secondary structure inclusion.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Simulation of ITS2 Sequences</p>
            </st>
            <p>Simulations of ITS2 sequences were performed with SISSI v0.98 <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. Secondary structures were included in the simulation process of coevolution by application of two separate substitution models (Fig. <figr fid="F6">6</figr>, Additional file <supplr sid="S3">3</supplr>: Tab. 1 and Tab. 2): firstly we used a nucleotide 4 &#215; 4 GTR substitution model <it>Q</it><sub><it>seq </it></sub>for the evolution of unpaired nucleotides and secondly a dinucleotide 16 &#215; 16 GTR substitution model <it>Q</it><sub><it>struct </it></sub>for substitution of bases that form stem regions <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B27">27</abbr></abbrgrp>. <it>Q</it><sub><it>seq </it></sub>and <it>Q</it><sub><it>struct </it></sub>were both estimated by a manually verified alignment based on 500 individual ITS2 sequences and structures with a variant of the method described by M&#252;ller and Vingron <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. For lack of information about insertion and deletion events in the ITS2 region, such were not included into the simulations.</p>
            <suppl id="S3">
               <title>
                  <p>Additional file 3</p>
               </title>
               <text>
                  <p><b>Substitution matrices</b>. Nucleotide 4 &#215; 4 GTR substitution model <it>Q</it><sub><it>seq </it></sub>for the evolution of unpaired nucleotides and a dinucleotide 16 &#215; 16 GTR substitution model <it>Q</it><sub><it>struct</it></sub>.</p>
               </text>
               <file name="1745-6150-5-4-S3.PDF">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Flowchart of simulation and phylogenetic reconstruction process</p>
               </caption>
               <text>
                  <p><b>Flowchart of simulation and phylogenetic reconstruction process</b>. Simulation of 2000 replicates of sequence sets was performed along the reference trees with an ancestral ITS2 sequence. Out of these, 1000 sequence-structure, sequence and concatenated sets were generated. Multiple sequence alignments were created for each of these sets and evolutionary distances were estimated with Profile Neighbor Joining. Resulting trees were afterwards compared with the reference trees and regarding their bootstrap support values. <it>Q</it><sub><it>seq </it></sub>and <it>Q</it><sub><it>struct </it></sub>are the substitution models for unpaired and paired regions, respectively.</p>
               </text>
               <graphic file="1745-6150-5-4-6"/>
            </fig>
            <p>Simulations were started given (a) an ancestral sequence and (b) a reference tree that contained (c) specific branch lengths and (d) a certain number of taxa. In total, we used 10 different branch lengths, 5 ancestral sequences and 6 different trees (3 topologies for equal and variable branch length) resulting in 300 different combinatory conditions as evolutionary scenarios. (a) Ancestral sequences and structures were taken from the ITS2 database after HMM annotation <abbrgrp><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr></abbrgrp>. They represented a cross section of the Eukaryota i.e. <it>Arabidopsis </it>(Plants) [GenBank:<ext-link ext-link-id="1245677" ext-link-type="gen">1245677</ext-link>], <it>Babesia </it>(Alveolata) [GenBank:<ext-link ext-link-id="119709754" ext-link-type="gen">119709754</ext-link>], <it>Gigaspora </it>(Fungi) [GenBank:<ext-link ext-link-id="3493494" ext-link-type="gen">3493494</ext-link>], <it>Gonium </it>(Green Algae) [GenBank:<ext-link ext-link-id="3192577" ext-link-type="gen">3192577</ext-link>] and <it>Haliotis </it>(Animals) [GenBank:<ext-link ext-link-id="15810877" ext-link-type="gen">15810877</ext-link>]. (b) The complete procedure was accomplished for two trees that shared a similar topology (Fig. <figr fid="F7">7</figr>). Tree shapes were chosen to resemble trees of a previously published simulation study <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. The first was a tree that included constant branch lengths, whereas the second tree alternately varied +/- 50% of a given branch length. (c) The used branch lengths were 0.025, 0.05, 0.01, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4 and 0.45. For comparison, pairwise distances of a typical phylogenetic study with ITS2 sequences have been added as Additional file <supplr sid="S2">2</supplr>. (d) Reference trees were calculated for 10, 14 and 18 taxa. The ancestral sequence served as an origin of the simulated sequences, but was not included in the reconstruction process and resulting tree.</p>
            <fig id="F7">
               <title>
                  <p>Figure 7</p>
               </title>
               <caption>
                  <p>Reference tree topologies used for simulation process</p>
               </caption>
               <text>
                  <p><b>Reference tree topologies used for simulation process</b>. Trees (a), (b) and (c) were trees with equidistance of branches. Trees (d), (e) and (f) were the corresponding variable trees with varying branch lengths. Trees (a) and (d) include ten taxa, (b) and (e) 14 taxa and (c) and (f) 18 taxa.</p>
               </text>
               <graphic file="1745-6150-5-4-7"/>
            </fig>
            <p>Each simulated sequence set contained sequences according to the number of taxa. Sequence sets were accepted as composed of ITS2-like sequences if the structure of each sequence had been determinable by homology modeling with a threshold of 75% helix transfer <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. For homology modeling, the ancestral sequence served as a template. Thus, each structure had four helices with the third helix as the longest. This acceptance scheme has been introduced for two reasons: the data is very similar to biological samples <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> and the structure prediction method is equal to that used at the ITS2 database <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> as well as phylogenetic reconstructions <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. In total, 2,000 valid sequence sets were obtained for each scenario, what corresponds to 600,000 sequence sets summarized over all scenarios.</p>
            <p>The complete sequence set is downloadable at the Supplements section of the ITS2 Database <url>http://its2.bioapps.biozentrum.uni-wuerzburg.de/</url>.</p>
         </sec>
         <sec>
            <st>
               <p>Sequences and Structures of the Data Sets</p>
            </st>
            <p>Sequence data set: for each scenario, the order of the 2,000 simulated sequence sets retained from SISSI was shuffled. The first 1,000 were chosen and used as a sequence data set.</p>
            <p>Sequence-structure data set: for each of the sequence sets used in the sequence data set, we determined the individual secondary structure of each sequence by homology modeling with at least 75% helix transfer <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. The ancestral sequence was used as a template. Thus, for the sequence-structure data set we combined sequences with their respective secondary structures according to Seibel et al. <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. Note, this approach using individual secondary structures is in contrast to alignments only guided by a consensus structure. Doubled nucleotide data set: The remaining 1,000 simulated sequence sets were used to exemplify effects on phylogenetic analyses of a hypothetical ITS2 gene size duplication. Each sequence of these sets was concatenated with a corresponding sequence of the sequence data set (same taxon in the simulation trees). Thus we received a data set of doubled nucleotide content that includes as well 1,000 sequence sets.</p>
         </sec>
         <sec>
            <st>
               <p>Reconstruction of Simulated Phylogenetic Trees</p>
            </st>
            <p>For each simulated sequence set, ClustalW v2.0.10 <abbrgrp><abbr bid="B34">34</abbr></abbrgrp> was used for calculation of multiple sequence alignments. In the cases of sequences and doubled sequences we used an ITS2 specific 4 &#215; 4 scoring matrix <abbrgrp><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr></abbrgrp>. For secondary structures, we translated sequence-structure information prior to alignment into pseudoproteins as described for 4SALE v1.5 <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B35">35</abbr></abbrgrp>. Pseudoproteins were coded such that each of the four nucleotides may be present in three different states: unpaired, opening base-pair and closing base-pair. Thus, an ITS2 specific 12 &#215; 12 scoring matrix was used for calculation of the alignment <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>.</p>
            <p>Reconstruction of phylogenetic trees for all trees has been performed with Profile Neighbor Joining (PNJ) of a console version of ProfDistS 0.9.8 <abbrgrp><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr></abbrgrp>. With this we estimated improvements due to secondary structures, but keep the method of reconstruction constant. We decided in favor of PNJ and against other methods like maximum likelihood, Bayesian inference and parsimony for several reasons: the distance matrices are independent of insertion and deletion events, the algorithm is very fast and a pipeline for reconstructions with PNJ using secondary structures is already published <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. However beneficial effects may be transferable to these methods. Profile building was allowed with default settings. General time reversible models (GTRs) were applied with the corresponding 4 &#215; 4 and 12 &#215; 12 substitution matrices for sequences and sequences-structures, respectively.</p>
         </sec>
         <sec>
            <st>
               <p>Robustness and Accuracy</p>
            </st>
            <p>Profile Neighbor Joining trees were bootstrapped with 100 pseudo-replicates to retain information about the stability of the resulting tree. Bootstrap support values of all tree branches obtained from the 1,000 sequence sets of a certain scenario were extracted and pooled. Furthermore, the resulting trees were compared to the respective reference tree. In this regard, two tree distance quantification methods were applied, Robinson-Foulds distances using the Phylip Package v3.68 <abbrgrp><abbr bid="B38">38</abbr></abbrgrp> and Quartet distances using Qdist v1.0.6 <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. Results of all sequence sets were combined for a given scenario to receive the distributions of bootstrap values, Quartet distances and Robinson-Foulds distances, respectively. The result of each 14-taxa-scenario was plotted as a boxplot with notches using R v2.9.0 <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>. An interpolating spline curve was added. For the remaining scenarios (10 and 18 taxa) only spline curves were added for the sake of clarity.</p>
         </sec>
         <sec>
            <st>
               <p>Short biological case study</p>
            </st>
            <p>Here we provide a short example of ITS2 secondary structure phylogeny, applied to biological data: we sampled sequences of three plant families using the ITS2-database browse feature (database accessed: June 2009): Thymelaeaceae (Malvales), Malvalceae (Malvales) and Sapindaceae (Sapindales). For each family we chose two sequences of the first two appearing genera. Tree reconstruction followed the methods described by Schultz and Wolf <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> and is equivalent to the reconstruction procedure used for the simulated sequence sets. Furthermore, the same procedure was applied without secondary structure information for comparison.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The authors declare that they have no competing interests.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>AK, JS, MW and TD designed the study. FF and AK performed the simulation experiments and analyses. FF and TM estimated the substitution models used for simulations and reconstructions. AK, FF and MW drafted the manuscript. All authors contributed to writing the paper, read the final manuscript and approved it.</p>
      </sec>
      <sec>
         <st>
            <p>Reviewers' comments</p>
         </st>
         <sec>
            <st>
               <p>Reviewer's report 1</p>
            </st>
            <p>
               <it>Shamil Sunyaev, Division of Genetics, Dept. of Medicine, Brigham &amp; Women's Hospital and Harvard Medical School</it>
            </p>
            <p>This manuscript demonstrates the utility of taking into account secondary structure in the phylogenetic analysis. Using comprehensive simulations and a real dataset of ITS2 sequences the authors demonstrated that for higher sequence divergence trees constructed with the help of secondary structure information improve accuracy and robustness. Another interesting result is that addition of taxa may reduce accuracy of tree reconstruction at least in terms of quartet distance between reconstructed and true trees.</p>
            <sec>
               <st>
                  <p>Author's response</p>
               </st>
               <p>Thanks a lot for this positive report!</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Reviewer's report 2</p>
            </st>
            <p>
               <it>Andrea Tanzer, Institute for Theoretical Chemistry, University of Vienna (nominated by Frank Eisenhaber, Bioinformatics Institute (BII) Agency for Science, Technology and Research, Singapore)</it>
            </p>
            <p>General comments:</p>
            <p>The manuscript "Ribosomal Secondary Structures improve Accuracy and Robustness in Reconstruction of Phylogenetic Trees" compares different methods to improve the quality of phylogenetic analysis. RNA secondary structure information has been included in a variety of previous phylogenetic analysis, but this is the first study exploring the effect on the resulting trees in detail.</p>
            <p>The authors use internal transcribed spacer 2 of ribosomal RNAs, a well established set of markers, to simulate a broad spectrum of 300 different scenarios. In addition, they compare their results from the simulations to a set of biological examples from selected plant species.</p>
            <p>Overall, the manuscript is carefully written and the authors chose analysis and method appropriately. The simulated sequence set could be used for future studies.</p>
            <p>Minor comments:</p>
            <p>*) The title might be a little bit miss-leading since 'Ribosomal Secondary Structures' do not improve the 'Accuracy and Robustness in Reconstruction of Phylogenetic Trees' in general and the method should be applicable to other RNA markers. Therefore, I suggest something like "Including Secondary Structures improve Accuracy and Robustness in Reconstruction of Phylogenetic Trees".</p>
            <p>*) The setup for the simulations is quite complex. It might help the reader if you add a table or figure to the supplemental material that summarizes the individual conditions for each data set produced.</p>
            <p>Alternatively, you could just add to the text that you use 10 different branch length, 5 ancestral sequences and 6 different trees (3 topologies for equal and variable branch length) resulting in 300 different conditions. If I understand this correctly, then you retrieved for each of these 300 conditions 2,000 sequence sets (a total of 600,000 sets), where each set contains 10, 14 and 18 taxa, resp., depending on the tree topology used. These numbers should be mentioned in the text.</p>
            <p>*) The set of simulated sequences should be accessible, such that it can be downloaded and used by the community for further studies. Maybe put a link on the website of the ITS2 database.</p>
            <p>*) Predicting secondary structures of single sequences occasionally results in (mfe) structures of unexpected shapes. One way to get around this problem is the calculation of consensus structures of a set of related sequences. The resulting consensus structures can then be used for contraint folding of those sequences that could not be folded correctly in the first place. Furthermore, the sequences might fold into a number of equally good structures, but folding programs present only the first result (under default settings). The 'true' structure could as well be among the best folds, but not necessarily the optimal one (suboptimal folding). After all, folding algorithms only make the most plausible predictions. In this study, prediction of RNA secondary structures includes homology modelling. It is of question weather this is the most efficient method. However, since the structures deposited at the ITS2 database were created that way, it seems legitimate to apply it here a well.</p>
            <sec>
               <st>
                  <p>Author's response</p>
               </st>
               <p>Thank you for carefully reading the manuscript. We addressed the minor comments regarding text changes and included the necessary information within the text. The set of simulated sequences is now downloadable at the Supplement section of the ITS2 Database <url>http://its2.bioapps.biozentrum.uni-wuerzburg.de/</url>. We totally agree that there are other possibly more efficient methods concerning structure prediction. However, as already stated by Dr. Tanzer 'structures deposited at the ITS2 database were created that way [homology modelling], it seems legitimate to apply it here as well'. The big advantage of the ITS2 is, that the core folding pattern is already known. Therefore, we have an external criterium to check for the correctness of the predicted structures.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Reviewer's report 3</p>
            </st>
            <p>
               <it>Eugene V. Koonin, National Center for Biotechnology Information, NIH, Bethesda</it>
            </p>
            <p>This is a useful method evaluation work that shows quite convincingly the inclusion of RNA secondary structure information into phylogenetic analysis improves the accuracy of neighbor-joining trees. My only regrets are about a certain lack of generality. It would be helpful to see a similar demonstration for for at least two different kinds of nucleic acid sequences not only ITS2. Also, at the end of the Conclusion section, the authors suggest that secondary structure could help also with other phylogenetic approaches (ML etc). Showing this explicitly would be helpful, especially, given that NJ is hardly the method of choice in today's phylogenetics.</p>
            <sec>
               <st>
                  <p>Author's response</p>
               </st>
               <p>Thank you for your encouraging report. For ITS2 the core structure is well known and there are about 200,000 individual secondary structures available. However, it is absolutely right that it would be helpful to perform an analysis also on other types of phylogenetic RNA markers. Unfortunately, today there is no comparable amount of data available concerning secondary structures of other RNAs. Similarily, there are no programs to run an analysis on other methods such as parsimony, maximum likelihood and/or bayesian methods simultanously considering sequence and secondary structure information.</p>
            </sec>
         </sec>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The assistance of Richard Copley (Oxford, United Kingdom) in language correction is greatly appreciated. Financial support for this study was provided by the Deutsche Forschungsgemeinschaft (DFG) grant (Mu-2831/1-1). AK was supported by the BIGSS graduate school of the land Bavaria. FF was supported by the Bundesministerium f&#252;r Bildung und Forschung (BMBF) grant FUNCRYPTA.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya</p>
            </title>
            <aug>
               <au>
                  <snm>Woese</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kandler</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Wheelis</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1990</pubdate>
            <volume>87</volume>
            <issue>12</issue>
            <fpage>4576</fpage>
            <lpage>4579</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.87.12.4576</pubid>
                  <pubid idtype="pmcid">54159</pubid>
                  <pubid idtype="pmpid">2112744</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>More characters or more taxa for a robust phylogeny-case study from the Coffee family (Rubiaceae)</p>
            </title>
            <aug>
               <au>
                  <snm>Bremer</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Jansen</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Oxelman</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Backlund</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lantz</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>KJ</fnm>
               </au>
            </aug>
            <source>Syst Biol</source>
            <pubdate>1999</pubdate>
            <volume>48</volume>
            <issue>3</issue>
            <fpage>413</fpage>
            <lpage>435</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1080/106351599260085</pubid>
                  <pubid idtype="pmpid">12066290</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>The evolutionary history of the coral genus <it>Acropora </it>(Scleractinia, Cnidaria) based on a mitochondrial and a nuclear marker: reticulation, incomplete lineage sorting, or morphological convergence?</p>
            </title>
            <aug>
               <au>
                  <snm>van Oppen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>McDonald</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Willis</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2001</pubdate>
            <volume>18</volume>
            <issue>7</issue>
            <fpage>1315</fpage>
            <lpage>1329</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11420370</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Snake phylogeny: evidence from nuclear and mitochondrial genes</p>
            </title>
            <aug>
               <au>
                  <snm>Slowinski</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lawson</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Mol Phylogenet Evol</source>
            <pubdate>2002</pubdate>
            <volume>24</volume>
            <issue>2</issue>
            <fpage>194</fpage>
            <lpage>202</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1055-7903(02)00239-7</pubid>
                  <pubid idtype="pmpid" link="fulltext">12144756</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Reliability of Bayesian posterior probabilities and bootstrap frequencies in phylogenetics</p>
            </title>
            <aug>
               <au>
                  <snm>Erixon</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Svennblad</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Britton</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Oxelman</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Syst Biol</source>
            <pubdate>2003</pubdate>
            <volume>52</volume>
            <issue>5</issue>
            <fpage>665</fpage>
            <lpage>73</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1080/10635150390235485</pubid>
                  <pubid idtype="pmpid">14530133</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Molecular phylogenetics: state-of-the-art methods for looking into the past</p>
            </title>
            <aug>
               <au>
                  <snm>Whelan</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Li&#242;</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Goldman</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <issue>5</issue>
            <fpage>262</fpage>
            <lpage>72</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(01)02272-7</pubid>
                  <pubid idtype="pmpid" link="fulltext">11335036</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>The effect of recombination on the accuracy of phylogeny estimation</p>
            </title>
            <aug>
               <au>
                  <snm>Posada</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Crandall</snm>
                  <fnm>KA</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>2002</pubdate>
            <volume>54</volume>
            <issue>3</issue>
            <fpage>396</fpage>
            <lpage>402</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11847565</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Nuclear and mitochondrial data reveal different evolutionary processes in the Lake Tanganyika cichlid genus <it>Tropheus</it></p>
            </title>
            <aug>
               <au>
                  <snm>Egger</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Koblm&#252;ller</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sturmbauer</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Sefc</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>BMC Evol Biol</source>
            <pubdate>2007</pubdate>
            <volume>7</volume>
            <fpage>137</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2148-7-137</pubid>
                  <pubid idtype="pmcid">2000897</pubid>
                  <pubid idtype="pmpid" link="fulltext">17697335</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>ITS2 is a double-edged tool for eukaryote evolutionary comparisons</p>
            </title>
            <aug>
               <au>
                  <snm>Coleman</snm>
                  <fnm>AW</fnm>
               </au>
            </aug>
            <source>TIG</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <issue>7</issue>
            <fpage>370</fpage>
            <lpage>375</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12850441</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Pan-eukaryote ITS2 homologies revealed by RNA secondary structure</p>
            </title>
            <aug>
               <au>
                  <snm>Coleman</snm>
                  <fnm>AW</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2007</pubdate>
            <volume>35</volume>
            <issue>10</issue>
            <fpage>3322</fpage>
            <lpage>3329</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkm233</pubid>
                  <pubid idtype="pmcid">1904279</pubid>
                  <pubid idtype="pmpid" link="fulltext">17459886</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>A stochastic model for the evolution of autocorrelated DNA sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Sch&#246;niger</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>von Haeseler</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Mol Phylogenet Evol</source>
            <pubdate>1994</pubdate>
            <volume>3</volume>
            <issue>3</issue>
            <fpage>240</fpage>
            <lpage>7</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/mpev.1994.1026</pubid>
                  <pubid idtype="pmpid" link="fulltext">7529616</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>High apparent rate of simultaneous compensatory base-pair substitutions in ribosomal RNA</p>
            </title>
            <aug>
               <au>
                  <snm>Tillier</snm>
                  <fnm>ERM</fnm>
               </au>
               <au>
                  <snm>Collins</snm>
                  <fnm>RA</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1998</pubdate>
            <volume>148</volume>
            <issue>4</issue>
            <fpage>1993</fpage>
            <lpage>2002</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1460107</pubid>
                  <pubid idtype="pmpid">9560412</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>The advantages of the ITS2 region of the nuclear rDNA cistron for analysis of phylogenetic relationships of insects: a <it>Drosophila </it>example</p>
            </title>
            <aug>
               <au>
                  <snm>Young</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Coleman</snm>
                  <fnm>AW</fnm>
               </au>
            </aug>
            <source>Mol Phylogenet Evol</source>
            <pubdate>2004</pubdate>
            <volume>30</volume>
            <fpage>236</fpage>
            <lpage>242</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1055-7903(03)00178-7</pubid>
                  <pubid idtype="pmpid">15022773</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Structural partitioning, paired-sites models and evolution of the ITS transcript in <it>Syzygium </it>and Myrtaceae</p>
            </title>
            <aug>
               <au>
                  <snm>Biffin</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Harrington</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Crisp</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Craven</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Gadek</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Mol Phylogenet Evol</source>
            <pubdate>2007</pubdate>
            <volume>43</volume>
            <fpage>124</fpage>
            <lpage>139</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.ympev.2006.08.013</pubid>
                  <pubid idtype="pmpid" link="fulltext">17070713</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Phylogenetic reconstruction using secondary structures of Internal Transcribed Spacer 2 (ITS2, rDNA): finding the molecular and morphological gap in Caribbean gorgonian corals</p>
            </title>
            <aug>
               <au>
                  <snm>Grajales</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Aguilar</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Sanchez</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>BMC Evol Biol</source>
            <pubdate>2007</pubdate>
            <volume>7</volume>
            <fpage>90</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2148-7-90</pubid>
                  <pubid idtype="pmcid">1913914</pubid>
                  <pubid idtype="pmpid" link="fulltext">17562014</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>ITS2 data corroborate a monophyletic chlorophycean DO-group (Sphaeropleales)</p>
            </title>
            <aug>
               <au>
                  <snm>Keller</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Schleicher</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>F&#246;rster</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Ruderisch</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Dandekar</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>M&#252;ller</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>BMC Evol Biol</source>
            <pubdate>2008</pubdate>
            <volume>8</volume>
            <fpage>218</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2148-8-218</pubid>
                  <pubid idtype="pmcid">2519086</pubid>
                  <pubid idtype="pmpid" link="fulltext">18655698</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Confidence limits on phylogenies: an approach using the bootstrap</p>
            </title>
            <aug>
               <au>
                  <snm>Felsenstein</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Evolution</source>
            <pubdate>1985</pubdate>
            <volume>39</volume>
            <issue>4</issue>
            <fpage>1993</fpage>
            <lpage>2002</lpage>
            <xrefbib>
               <pubid idtype="doi">10.2307/2408678</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Hillis</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Bull</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Syst Biol</source>
            <pubdate>1993</pubdate>
            <volume>42</volume>
            <issue>2</issue>
            <fpage>182</fpage>
            <lpage>192</lpage>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Application and accuracy of molecular phylogenies</p>
            </title>
            <aug>
               <au>
                  <snm>Hillis</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Huelsenbeck</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Cunningham</snm>
                  <fnm>CW</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1994</pubdate>
            <volume>264</volume>
            <issue>5159</issue>
            <fpage>671</fpage>
            <lpage>7</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.8171318</pubid>
                  <pubid idtype="pmpid" link="fulltext">8171318</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Is it better to add taxa or characters to a difficult phylogenetic problem?</p>
            </title>
            <aug>
               <au>
                  <snm>Graybeal</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Syst Biol</source>
            <pubdate>1998</pubdate>
            <volume>47</volume>
            <fpage>9</fpage>
            <lpage>17</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1080/106351598260996</pubid>
                  <pubid idtype="pmpid">12064243</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>More genes or more taxa? The relative contribution of gene number and taxon number to phylogenetic accuracy</p>
            </title>
            <aug>
               <au>
                  <snm>Rokas</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Carroll</snm>
                  <fnm>SB</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2005</pubdate>
            <volume>22</volume>
            <issue>5</issue>
            <fpage>1337</fpage>
            <lpage>44</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msi121</pubid>
                  <pubid idtype="pmpid" link="fulltext">15746014</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>On the best evolutionary rate for phylogenetic analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Syst Biol</source>
            <pubdate>1998</pubdate>
            <volume>47</volume>
            <fpage>125</fpage>
            <lpage>33</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1080/106351598261067</pubid>
                  <pubid idtype="pmpid">12064232</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>4SALE - a tool for synchronous RNA sequence and secondary structure alignment and editing</p>
            </title>
            <aug>
               <au>
                  <snm>Seibel</snm>
                  <fnm>PN</fnm>
               </au>
               <au>
                  <snm>M&#252;ller</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Dandekar</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Schultz</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>498</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-7-498</pubid>
                  <pubid idtype="pmcid">1637121</pubid>
                  <pubid idtype="pmpid" link="fulltext">17101042</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Bayesian phylogenetics using an RNA substitution model applied to early mammalian evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Jow</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Hudelot</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Rattray</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Higgs</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2002</pubdate>
            <volume>19</volume>
            <issue>9</issue>
            <fpage>1591</fpage>
            <lpage>1601</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12200486</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>ITS2 sequence-structure analysis in phylogenetics: a how-to manual for molecular systematics</p>
            </title>
            <aug>
               <au>
                  <snm>Schultz</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Mol Phylogenet Evol</source>
            <pubdate>2009</pubdate>
            <volume>52</volume>
            <fpage>520</fpage>
            <lpage>523</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.ympev.2009.01.008</pubid>
                  <pubid idtype="pmpid">19489124</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>In silico sequence evolution with site-specific interactions along phylogenetic trees</p>
            </title>
            <aug>
               <au>
                  <snm>Gesell</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>von Haeseler</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <issue>6</issue>
            <fpage>716</fpage>
            <lpage>722</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti812</pubid>
                  <pubid idtype="pmpid" link="fulltext">16332711</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Identifying site-specific substitution rates</p>
            </title>
            <aug>
               <au>
                  <snm>Meyer</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>von Haeseler</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2003</pubdate>
            <volume>20</volume>
            <issue>2</issue>
            <fpage>182</fpage>
            <lpage>189</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msg019</pubid>
                  <pubid idtype="pmpid" link="fulltext">12598684</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Modeling amino acid replacement</p>
            </title>
            <aug>
               <au>
                  <snm>M&#252;ller</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Vingron</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>2000</pubdate>
            <volume>37</volume>
            <issue>6</issue>
            <fpage>761</fpage>
            <lpage>776</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1089/10665270050514918</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>The internal transcribed spacer 2 database-a web server for (not only) low level phylogenetic analyses</p>
            </title>
            <aug>
               <au>
                  <snm>Schultz</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>M&#252;ller</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Achtziger</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Seibel</snm>
                  <fnm>PN</fnm>
               </au>
               <au>
                  <snm>Dandekar</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <issue>Supp 2</issue>
            <fpage>W704</fpage>
            <lpage>707</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkl129</pubid>
                  <pubid idtype="pmcid">1538906</pubid>
                  <pubid idtype="pmpid" link="fulltext">16845103</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>The ITS2 Database II: homology modelling RNA structure for molecular systematics</p>
            </title>
            <aug>
               <au>
                  <snm>Selig</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Muller</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Dandekar</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Schultz</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2008</pubdate>
            <issue>36 Database</issue>
            <fpage>D377</fpage>
            <lpage>80</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2238964</pubid>
                  <pubid idtype="pmpid" link="fulltext">17933769</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>5.8S-28S rRNA interaction and HMM-based ITS2 annotation</p>
            </title>
            <aug>
               <au>
                  <snm>Keller</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Schleicher</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Schultz</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>M&#252;ller</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Dandekar</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>2009</pubdate>
            <volume>430</volume>
            <issue>1-2</issue>
            <fpage>50</fpage>
            <lpage>7</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.gene.2008.10.012</pubid>
                  <pubid idtype="pmpid" link="fulltext">19026726</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Bayes or bootstrap? A simulation study comparing the performance of Bayesian Markov Chain Monte Carlo sampling and bootstrapping in assessing phylogenetic confidence</p>
            </title>
            <aug>
               <au>
                  <snm>Alfaro</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Zoller</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lutzoni</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2003</pubdate>
            <volume>20</volume>
            <issue>2</issue>
            <fpage>255</fpage>
            <lpage>266</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msg028</pubid>
                  <pubid idtype="pmpid" link="fulltext">12598693</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Homology modeling revealed more than 20,000 rRNA internal transcribed spacer 2 (ITS2) secondary structures</p>
            </title>
            <aug>
               <au>
                  <snm>Wolf</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Achtziger</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Schultz</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Dandekar</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>M&#252;ller</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>RNA</source>
            <pubdate>2005</pubdate>
            <volume>11</volume>
            <issue>11</issue>
            <fpage>1616</fpage>
            <lpage>1623</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1261/rna.2144205</pubid>
                  <pubid idtype="pmcid">1370847</pubid>
                  <pubid idtype="pmpid" link="fulltext">16244129</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>ClustalW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice</p>
            </title>
            <aug>
               <au>
                  <snm>Thompson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Gibson</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1994</pubdate>
            <volume>22</volume>
            <issue>22</issue>
            <fpage>4673</fpage>
            <lpage>4680</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/22.22.4673</pubid>
                  <pubid idtype="pmcid">308517</pubid>
                  <pubid idtype="pmpid" link="fulltext">7984417</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Synchronous visual analysis and editing of RNA sequence and secondary structure alignments using 4SALE</p>
            </title>
            <aug>
               <au>
                  <snm>Seibel</snm>
                  <fnm>PN</fnm>
               </au>
               <au>
                  <snm>M&#252;ller</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Dandekar</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>BMC Res Notes</source>
            <pubdate>2008</pubdate>
            <volume>1</volume>
            <fpage>91</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1756-0500-1-91</pubid>
                  <pubid idtype="pmcid">2587473</pubid>
                  <pubid idtype="pmpid" link="fulltext">18854023</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>ProfDist: a tool for the construction of large phylogenetic trees based on profile distances</p>
            </title>
            <aug>
               <au>
                  <snm>Friedrich</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Dandekar</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>M&#252;ller</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>9</issue>
            <fpage>2108</fpage>
            <lpage>2109</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti289</pubid>
                  <pubid idtype="pmpid" link="fulltext">15677706</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>ProfDistS: (profile-) distance based phylogeny on sequence - structure alignments</p>
            </title>
            <aug>
               <au>
                  <snm>Wolf</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ruderisch</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Dandekar</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Schultz</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>M&#252;ller</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2008</pubdate>
            <volume>24</volume>
            <fpage>2401</fpage>
            <lpage>2402</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btn453</pubid>
                  <pubid idtype="pmpid" link="fulltext">18723521</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>PHYLIP - Phylogeny Inference Package (Version 3.2)</p>
            </title>
            <aug>
               <au>
                  <snm>Felsenstein</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Cladistics</source>
            <pubdate>1989</pubdate>
            <volume>5</volume>
            <fpage>164</fpage>
            <lpage>166</lpage>
         </bibl>
         <bibl id="B39">
            <title>
               <p>QDist-quartet distance between evolutionary trees</p>
            </title>
            <aug>
               <au>
                  <snm>Mailund</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Pedersen</snm>
                  <fnm>CNS</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <issue>10</issue>
            <fpage>1636</fpage>
            <lpage>7</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bth097</pubid>
                  <pubid idtype="pmpid" link="fulltext">14962942</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <aug>
               <au>
                  <cnm>R Development Core Team</cnm>
               </au>
            </aug>
            <source>R: A Language and Environment for Statistical Computing</source>
            <publisher>R Foundation for Statistical Computing, Vienna, Austria</publisher>
            <pubdate>2009</pubdate>
            <url>http://www.R-project.org</url>
         </bibl>
      </refgrp>
   </bm>
</art>

