Log on / register
BioMed Central home | Journals A-Z | Feedback | Support
 
Open AccessHighly AccessResearch

Rooting the tree of life by transition analyses

Thomas Cavalier-Smith email

Department of Zoology, University of Oxford, South Parks Road, Oxford, OX1 3PS, UK

author email corresponding author email

Biology Direct 2006, 1:19doi:10.1186/1745-6150-1-19

The electronic version of this article is the complete one and can be found online at: http://www.biology-direct.com/content/1/1/19

Received: 5 July 2006
Accepted: 11 July 2006
Published: 11 July 2006

© 2006 Cavalier-Smith; licensee BioMed Central Ltd.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background

Despite great advances in clarifying the family tree of life, it is still not agreed where its root is or what properties the most ancient cells possessed – the most difficult problems in phylogeny. Protein paralogue trees can theoretically place the root, but are contradictory because of tree-reconstruction artefacts or poor resolution; ribosome-related and DNA-handling enzymes suggested one between neomura (eukaryotes plus archaebacteria) and eubacteria, whereas metabolic enzymes often place it within eubacteria but in contradictory places. Palaeontology shows that eubacteria are much more ancient than eukaryotes, and, together with phylogenetic evidence that archaebacteria are sisters not ancestral to eukaryotes, implies that the root is not within the neomura. Transition analysis, involving comparative/developmental and selective arguments, can polarize major transitions and thereby systematically exclude the root from major clades possessing derived characters and thus locate it; previously the 20 shared neomuran characters were thus argued to be derived, but whether the root was within eubacteria or between them and archaebacteria remained controversial.

Results

I analyze 13 major transitions within eubacteria, showing how they can all be congruently polarized. I infer the first fully resolved prokaryote tree, with a basal stem comprising the new infrakingdom Glidobacteria (Chlorobacteria, Hadobacteria, Cyanobacteria), which is entirely non-flagellate and probably ancestrally had gliding motility, and two derived branches (Gracilicutes and Unibacteria/Eurybacteria) that diverged immediately following the origin of flagella. Proteasome evolution shows that the universal root is outside a clade comprising neomura and Actinomycetales (proteates), and thus lies within other eubacteria, contrary to a widespread assumption that it is between eubacteria and neomura. Cell wall and flagellar evolution independently locate the root outside Posibacteria (Actinobacteria and Endobacteria), and thus among negibacteria with two membranes. Posibacteria are derived from Eurybacteria and ancestral to neomura. RNA polymerase and other insertions strongly favour the monophyly of Gracilicutes (Proteobacteria, Planctobacteria, Sphingobacteria, Spirochaetes). Evolution of the negibacterial outer membrane places the root within Eobacteria (Hadobacteria and Chlorobacteria, both primitively without lipopolysaccharide): as all phyla possessing the outer membrane β-barrel protein Omp85 are highly probably derived, the root lies between them and Chlorobacteria, the only negibacteria without Omp85, or possibly within Chlorobacteria.

Conclusion

Chlorobacteria are probably the oldest and Archaebacteria the youngest bacteria, with Posibacteria of intermediate age, requiring radical reassessment of dominant views of bacterial evolution. The last ancestor of all life was a eubacterium with acyl-ester membrane lipids, large genome, murein peptidoglycan walls, and fully developed eubacterial molecular biology and cell division. It was a non-flagellate negibacterium with two membranes, probably a photosynthetic green non-sulphur bacterium with relatively primitive secretory machinery, not a heterotrophic posibacterium with one membrane.

Reviewers

This article was reviewed by John Logsdon, Purificación López-García and Eric Bapteste (nominated by Simonetta Gribaldo).

Open peer review

Reviewed by John Logsdon, Purificación Lopez-García and Eric Bapteste (nominated by Simonetta Gribaldo). For the full reviews, please go to the Reviewers' comments section.

Background

Correctly placing the root of the evolutionary tree of all life would enable us to deduce rigorously the major characteristics of the last common ancestor of life. It is probably the most difficult problem of all in phylogenetics, but not yet solved – contrary to widespread assumptions [1,2]. It is also most important to solve correctly because the result colours all interpretations of evolutionary history, influencing ideas of which features are primitive or derived and which branches are deeper and more ancient than others [1]. The wrong answer misleads profoundly in numerous ways. Establishing the root of a small part of the tree is more straightforward, yet often surprisingly difficult for organisms without plentiful fossils [3,4]. Usually the root of a subtree is located by comparisons with known outgroups. However, outgroups for the entire tree are air, rocks and water, not other organisms, vastly increasing the problem, which uniquely involves the origin of life – not just transitions between known types of organism. Here I explain how this seemingly intractable problem can be solved by supplementing standard molecular phylogenetic methods with the very same conceptual methods that were originally used to establish 'known outgroups' in well-defined parts of the tree, long before sequencing was invented. I then apply these methods comprehensively to establish far more closely than ever before where the root of the tree of life actually is.

I show here that, in conjunction with palaeontology and sequence trees, the methods of transition analysis and congruence testing demonstrate that archaebacteria are the youngest bacterial phylum and that the root lies within eubacteria, specifically among negibacteria of the superphylum Eobacteria, probably between Chlorobacteria and all other living organisms (Table 1 summarizes the prokaryotic nomenclature used here, which is slightly revised from previously [1], primarily by excluding Eurybacteria from Posibacteria). Chlorobacteria comprise photosynthetic 'non-sulphur' green bacteria like Chloroflexus and Heliothrix, some little-studied heterotrophs (e.g. Thermomicrobium, Dehalococcoides) and some apparently deeper-branching lineages known only from environmental DNA sequences and thus of unknown properties [1]. I use cladistic and transition analysis to provide the first rooted and fully resolved tree for all ten phyla of bacteria recognized here.

Table 1. The nomenclature and classification used here for prokaryotes (=Bacteria)

I also provide new perspectives on the evolution of bacterial flagella and the cell envelope and conclude that the last common ancestor (cenancestor) of all life was a highly developed non-flagellate Gram-negative eubacterium with murein cell walls, acyl ester phospholipids, and probably non-oxygenic photosynthesis and gliding motility. It was more primitive than other eubacteria in probably lacking lipopolysaccharide, hopanoids, cytochrome b, catalase, the HslV ring protease homologue of proteasomes, spores, the machinery based on outer membrane (OM) protein Omp85 used by more advanced negibacteria to insert outer membrane proteins, type I, type II, and type III secretion mechanisms, and TonB-energized OM import systems. I briefly discuss implications of this novel rooting of the universal tree for understanding primordial cell biology and the history of life and its impact on global climate.

The primacy of transition analysis

Classically three types of argument have been used to distinguish in-groups and out-groups. First, the fossil record. Among vertebrates, birds and mammals must be derived from reptiles, not vice versa, because reptile fossils are so much earlier. Likewise reptiles are derived from amphibians that were objectively earlier, amphibians from bony fish as fish are more ancient.

Second is transition analysis [5], which can often polarize major changes, showing that A went to B, not B to A. Thus, when birds originated, forelimbs previously used for walking were transformed into wings. We rule out the reverse by comparative/developmental and selective arguments. 18th century comparisons showed the structural and developmental homology of all pentadactyl limbs. Before palaeontology gave a time scale and evidence of direction it was obvious that wings were specializations of legs, not the reverse. Fore and hind limbs were clearly homologous throughout tetrapods; they must first have been essentially the same, as in amphibians and reptiles, not highly differentiated as in birds. It would be impossible mechanistically (developmentally or mutationally) to have evolved the very different bird wing and leg as the first tetrapod limbs – subsequent derivation of essentially similar reptilian five-toed legs from each would be equally improbable: that scenario would place birds at the tetrapod root; to become reptiles they would have to separate their fused trunk skeletons into discrete bones, convert feathers to scales, and evolve teeth. Such changes would be mechanistically complex, difficult, and of no selective advantage.

Transition analysis, if imaginative and critical, often clearly polarizes change unambiguously in the complete absence of a fossil record. Fossils are static and discontinuous; they do not show transitions or continuity directly and can be interpreted properly only by critical transition analysis, which is therefore the most fundamental way to polarize the direction of evolutionary change. For vertebrate evolution the fossil record is a valuable extra, inessential benefit. It is important to note that not all transitions can be clearly polarized when studied individually. Some evolutionary changes can in principle occur in either direction; evolutionary direction in such cases can only be established by reference to other changes that can be polarized and their relationship to the topology of the tree. It is the subset of changes that have a sufficient degree of complexity to allow unambiguous polarization that are of key importance for rooting trees. The key question that decides the utility of a particular character for this purpose is whether its evolution has enough evidence of directionality, which may be inherent in the process of evolutionary change itself or deducible by comparison between an evolved state and its putative precursors and knowledge of their phylogenetic distribution. Without evidence of directionality a character cannot be used clearly to polarize the tree.

I call the third approach congruence testing. One searches for congruence across major parts of the evolutionary tree between what analyses of individual transitions tell us, to ensure that the whole story is consistent; consistent historically and compatible with comparative morphology, genetics, developmental biology, and ecology. Thus in reptiles not only the ancestors of snakes but numerous different lizard groups lost limbs. Consistency across the whole tetrapod tree excludes its root from any group of limbless reptile. In unicellular organisms character losses have been equally confusing; yet though useful morphological characters are fewer, transition and congruence testing eventually enable losses to be identified and polarize transitions, especially by adding molecular cladistic characters [1,6]. Historically, biologists studying macroorganisms worked on many parts of the tree at once, using cross comparisons to hone arguments and criteria; such critical evaluation rejected discordant scenarios and subhypotheses. With congruence testing a serious mistake in one part of the tree may be revealed by incongruence with other parts. If two polarizations in different parts of the tree are incongruent (contradictory), then either the topology of the tree is incorrect or one of the polarizations is incorrect, and the source of the conflict can be sought for and at least one of the interpretations corrected in the light of the overall evidence from as many sources as possible. Usually it will be found that one of the lines of evidence is weaker than the others and has been given too much weight or is positively misleading or fundamentally misinterpreted. Search for congruence among multiple lines of evidence – the more diverse the better and resolving apparent contradictions by weighing up the evidence is not special to evolutionary biology but fundamental to all science. Its importance is easily overlooked by specialists familiar with only one field. Gaucher et al. [7] have rightly stressed that such an integrative approach, though recently unfashionable, is sorely needed in the face of the mass of new genomic data to suggest biologically well-grounded hypotheses to guide detailed experimental studies in the laboratory.

Problems with sequence trees

Recent discussions about the root of the universal tree mostly fail to consider any palaeontological evidence or execute either transition analysis, or congruence testing and focus solely on sequence trees. Single-gene trees, notably of rRNA and unusually well-conserved proteins like cytochromes, RuBisCO and chaperones, have been valuable in clustering together relatively closely related organisms, especially if morphology was inadequate to establish their closest relatives (often because of character losses). Occasionally they made major breakthroughs, as in the recognition of Archaebacteria and Proteobacteria in prokaryotes and Cercozoa in protozoa [8]. Unfortunately, such trees have four serious limitations. First is limited resolution, especially for basal eukaryotes and prokaryotes, where branching order is almost totally unresolved and must be established otherwise. Second is pervasive systematic biases in evolutionary mode, which affect segments of the tree differentially causing some branches to be placed entirely incorrectly [2]; all sequence trees require testing and corroboration by other evidence. Such testing is sophisticated in the eukaryote part of the tree now [6], but for prokaryotes a regrettable tendency to take 16S rRNA trees as gospel truth and ignore other evidence persists; critical cladistic analyses are rare [1,9,10]. Thirdly, lateral gene transfer, commoner in bacteria than eukaryotes, but of uneven frequency, also places occasional branches incorrectly on single-gene trees [11,12]. Fourthly, single-gene trees are always unrooted, lacking inherent evidence of direction; any nucleotide can substitute reversibly for any other. These severe limitations of sequence trees emphatically do not mean that they are worthless. On the contrary, they are indispensable, but they must be interpreted critically and supplemented by cladistic, transition analysis and congruence testing, and by critical palaeontology, in order to produce a reliable and comprehensive picture. Some perceptive molecular biologists now appreciate the need to integrate sequence trees into the broader and time-based framework provided by palaeontology [7]. This synthetic approach to the history of life should become much more widespread [7].

Paralogue rooting failed clearly to root the tree

Gene duplications can in principle be used to root a subtree like eukaryotes or the whole tree of life. If duplication was just prior to the last common ancestor of a group and all descendants retain both paralogues, data from both can be combined in one tree. In theory, each paralogue would give an identical tree, with both trees linked by a line connecting their roots (Fig. 1a). In practice paralogue rooting is highly problematic; different gene pairs put roots in contradictory places and the two subtrees may not be identical [13] (Fig. 1b). This is because double trees are subject to systematic biases and/or poor resolution like single-gene trees [1]. For many paralogue pairs these problems are worse than most single-gene trees; this arises because most paralogues kept in all descendants of a particular ancestor underwent temporary dramatically elevated rates of change immediately following duplication when their contrasting functions that allowed both to survive originated [1]. For two proteins in the same cell compartment (virtually all in bacteria) this general principle (analogous to ecological limiting similarity dictating species coexistence [14]) makes transiently hyper-fast early divergence between paralogues almost inevitable. Thus sister paralogues are each very long branches on the twin tree [15,16] that evolve with different constraints: the worst combination of properties for accurate phylogenetic construction [1,2]. Any lineage of either or both paralogues that underwent similar major changes in rate or mode will be put artifactually closer to the apparent root than is correct. Interesting possible exceptions, which might give sensible roots, are sister paralogues retaining almost the same functions in separate compartments, e.g. cytosolic and endoplasmic reticulum Hsp90 [17].

thumbnailFigure 1. The logic and problems of paralogue rooting. In theory (A), two genes that arose from a single parent by duplication immediately prior to the common ancestor of the group under study should yield two identical trees joined together by a line (shown extra thick) between the roots (stars) of each tree. Letters are taxa. In practice (B), stochasticity and systematic biases in evolutionary modes and rates yield trees with partially incorrect topology and often-misplaced roots [1]. Misplaced branches (red) are shown as extra long, but in practice misplaced taxa often do not reveal themselves so neatly. In practice, root positions in paralogue subtrees may both be right (very rare: I recall no examples), both wrong but the same (implying strong systematic biases), both wrong but different (often reflecting stochasticity and poor resolution), or one right and one wrong. When such conflicts occur among different paralogue pairs (or triples, etc.), as is almost invariable, other means are required to decide between them.

I previously highlighted two contrasting classes of universal paralogue tree [1]. Those for metabolic enzymes mostly place the root within eubacteria (in conflicting places with different enzymes [18]) and show weak support for monophyly of archaebacteria, which nest within eubacteria. In sharp contrast, trees for DNA-handling enzymes, molecules associated with ribosomes [15], and a few others, e.g. membrane ATPase [16], typically place the root on a very long stem that separates archaebacteria and eubacteria into unambiguously distinct branches. The latter trees are the minority but have often ('somewhat surprisingly': [2]) been accepted as genuinely locating the root, and the conflicting majority showing eubacterial roots ignored [19,20]. Such neglect of important conflicting evidence and of other approaches that may be more productive stems from the first paralogue trees used for rooting being of the minority type [15,16] and from a perceived fit to long-standing assumptions (devoid of sound evidence) that archaebacteria are as ancient as eubacteria. Instead of ignoring conflicting evidence, we need to understand why the trees differ and which most reliably locates the root. In essence, we are caught between the Scylla of strongly systematically-biased molecules that give the wrong root with high confidence and the Charybdis of less-biased, weakly-resolving molecules that give the right and several wrong versions of the root with too little support to distinguish them [1]; transition analysis, if critically applied, can pilot us into safer waters. Although it may not give the absolute certainty that some crave, it can allow us to reconstruct the past history of life with much higher confidence than anyone would have dreamed of a few decades ago.

Cladistic analysis of discrete characters can improve the resolution of ambiguous trees

Molecular sequence trees have not established the branching order of the nine eubacterial phyla recognized here (Table 1). Basal resolution of single-gene trees like 16S rRNA is totally inadequate. Multi-gene trees and genomic trees confirm most major clusters indicated by single-gene trees, but lack resolution in most key areas and are still too weakly sampled taxonomically [21-23], with Chlorobacteria still unrepresented (the first to include a chlorobacterium appeared during review of this paper; it is remarkably congruent with the present analysis if properly rooted and is discussed in responses to referee 3). Some evolutionarily key organisms are greatly neglected. A worse problem with multiple-gene trees is genome-wide systematic biases that can give the wrong topology with increasing confidence as data are added [2]. Cladistic reasoning about unique or rare changes has a special role in formulating and testing relationships, having been decisive in eukaryotes, e.g. in creating and strongly corroborating the chromalveolate theory [24-26] and locating the root of the eukaryotic subtree [3,4,6]. The value of such characters depends on their complexity and rareness. Ideally one prefers congruence among several; when congruent they may be sounder than many genome-wide comparisons. This paper uses rare discrete characters to establish unambiguously the branching order among the 10 eubacterial phyla, and to establish the monophyly of Posibacteria, by seeking synapomorphies that group them together in the same way as has been very successful in eukaryotes [3,4,6,27,28].

Multiple transition analyses of complex multimolecular characters can root the tree

Figure 2 emphasizes that the most fundamental question concerning the root of the tree of life is whether the ancestral cell had two bounding membranes (i.e. was a negibacterium, as argued here) or just one membrane as in archaebacteria and posibacteria (collectively therefore called unibacteria [1]), as has traditionally been widely assumed. To decide this one must correctly polarize the transition between cells that have two membranes (most bacteria, in eight phyla, grouped as Negibacteria in Table 1) and those with only a single surface membrane (eukaryotes and two bacterial phyla: Posibacteria and Archaebacteria); in other words one must decide whether evolution occurred from Negibacteria to Unibacteria or the reverse. Given the topology of the tree, if it can be shown that Posibacteria evolved from Negibacteria, not the reverse, then the root cannot lie between neomura and eubacteria, as widely supposed, but must lie within negibacteria. Thus we can firmly establish the position of the root of the tree by determining (a) its correct topology and (b) the direction of major transitions within it. This paper shows that several major transitions within eubacteria can be unambiguously polarized and that no strongly polarized transitions conflict with each other. I show that all the more robust polarizations are consistent with a negibacterial root but that several of them contradict alternative hypotheses: i.e. that the root is between neomura and eubacteria [15,16], or between posibacteria and negibacteria, or within either neomura [13] or posibacteria. The direction of some transitions is ambiguous, but enough can be polarized sufficiently confidently to exclude all phyla except Chlorobacteria from the root of the tree.

thumbnailFigure 2. Evolutionary relationships among the four major kinds of cell. The horizontal red arrow indicates the position of the universal root as inferred from the first protein paralogue trees, i.e. between neomura and eubacteria. To determine whether the root is really there or within eubacteria, as suggested instead by many paralogue trees for metabolic enzymes, we must correctly polarize the direction of the negibacteria/posibacteria transition that took place in bacteria that had already evolved flagella. As argued in detail in the text, flagellar evolution and wall/envelope evolution both strongly favour a transition from negibacteria to posibacteria (continuous black arrow), not from posibacteria to negibacteria (broken red arrow). This places the root within Negibacteria and shows that the ancestral cell had two bounding membranes, not just one as traditionally assumed. A negibacterial root also fits the fossil record, which shows that Negibacteria are more than twice as old as eukaryotes [1, 129]. As negibacteria are the only prokaryotes that use sunlight to fix carbon dioxide this is also the only position that would have allowed the first ecosystems to have been based on photosynthesis, without which extensive evolution might have been impossible. Posibacteria, archaebacteria and eukaryotes were probably all ancestrally heterotrophs, whereas negibacteria are likely to have been ancestrally photosynthetic and diversified by evolving all the known types of photosystem and major antenna pigments.

Of the five major transitions shown by green bars in Fig. 2, the prokaryote-eukaryote transition was analyzed in considerable detail before [27,29], as were the eubacterial to neomuran and the neomuran to archaebacterial transitions [1]. The first transition, from non-life to negibacteria, i.e. the origin of the first cell has also been considered in detail [30,31]. Since those papers were published major advances have been made in rooting the eukaryote tree of life [3,4,28], which have important implications for the universal tree. It is now generally accepted that all extant anaerobic eukaryotes had ancestors with aerobic mitochondria [32] and highly probable that the root lies between unikonts and bikonts [3,4,28]. Thus the last common ancestor of eukaryotes was a sexual aerobe with mitochondrion, and probably also a cilium and capacity to make pseudopodia and dormant cysts. The fact that the eukaryote cenancestor had mitochondria, which arose from enslaved α-proteobacteria [33], means that eukaryotes must have evolved long after eubacteria, which must have diversified to produce proteobacteria and α-proteobacteria before the first eukaryote. This raises a severe problem for the common, but seldom critically evaluated, assumption that the root lies between neomura and eubacteria (red arrow Fig. 2); on that widespread assumption [15,16] eukaryotes would have originated in the very first bifurcation on the neomuran side of the tree. Given that hypothetical position of the root and the topology of the tree, the basal eubacterial group would have been posibacteria; negibacteria would probably not have evolved by the time of the primary neomuran bifurcation, whereas proteobacteria and α-proteobacteria would each have arisen much later still. Such a later origin of α-proteobacteria than eukaryotes is now untenable. Bayesian relaxed molecular clock analyses calibrated by multiple palaeontological dates for 143 proteins [34,35] and for 18S rRNA [36] suggest that the eukaryote cenancestor was only 0.9–1.1 Gy old, whereas the fossil record indicates that eubacteria are at least 2.8 and probably about 3.5 Gy old [1,37]. Thus there is now a very strong temporal and evolutionary incompatibility between the now well-established chimaeric and aerobic nature of the oldest eukaryote and the widespread (and, I have argued, false [1]) assumption that neomura are as ancient as eubacteria. There is no fossil evidence whatever that archaebacteria are older than eubacteria – or even as old as them; given the extensive phylogenetic evidence that archaebacteria are sisters of eukaryotes, it is now very hard indeed to escape the conclusion that neomura were derived from eubacteria, not the reverse, and that the universal root lies in eubacteria not between eubacteria and neomura.

Here I use transition analysis arguments that are entirely independent of the fossil record to show that this is indeed the case and that both the tree topology and the root shown on Fig. 2 are correct. I provide the first detailed analysis of the negibacteria to posibacteria transition, which unambiguously polarizes it in that direction, and argue that Posibacteria evolved from the new phylum Eurybacteria, established here (Table 2). I give new evidence for the monophyly of Posibacteria, for the derived nature of Actinobacteria compared with Endobacteria, and a new argument from proteasome evolution that also places the universal root within eubacteria and thus excludes it from the eubacteria/neomura junction. I also analyze 13 transitions within negibacteria (the eight shown on Fig. 2 plus five less important ones within gracilicutes) in sufficient detail unambiguously to root the tree, and map other characters onto the resulting tree. Given this root, sequence trees, cladistic trees, the fossil record, and polarizations deduced by transition analysis are all congruent and thus mutually reinforcing. I also argue that a root within negibacteria is ecologically plausible but any other position is not. One paragraph is first necessary to summarize the conclusions form the previous polarization of the neomuran revolution [1].

Table 2. The 10 phyla (=divisions) of the kingdom Bacteria* recognized here

The neomuran revolution

Morphological fossil evidence that eubacteria are several times older than eukaryotes plus strong phylogenetic evidence that archaebacteria are holophyletic sisters of eukaryotes (together comprising the clade neomura [29]), not their paraphyletic ancestors, strongly indicate that archaebacteria are much younger than eubacteria [1]. Transition analysis showed that 19 major changes in the immediate common ancestor of neomura can all be polarized in the direction from eubacteria to neomura, most by strong selective arguments, none making sense in reverse [1]. These numerous coevolving changes constitute the 'neomuran revolution', the second most important change in cell organization apart from the immediately following origin of eukaryotes [1]. Most of the 19 (now 20) neomuran innovations are explicable as consequences of stronger cotranslational protein secretion associated with the replacement of murein cell walls by cotranslationally-synthesized N-linked glycoproteins (neomura means new walls), or of the simultaneous replacement of eubacterial DNA gyrase by core histones [1]. Both key innovations were arguably adaptations to thermophily [1]. For want of space these very detailed arguments are not repeated or even summarized here; nor shall I repeat my detailed discussion of the fossil record and the weakness of claims from it of an early origin for neomura [1]. The best attempt since then to date the primary divergence of eukaryotes using sequence trees multiply calibrated by the fossil record [34,35] is consistent with my argument that eukaryotes are well over a billion years younger than eubacteria [1]. The recent discovery of histone genes in crenarchaeotes [38] eliminates one line of 'evidence' for claims that archaebacteria are ancestral to eukaryotes rather than their sisters by supporting my contention that histones were already present in the last common ancestor of archaebacteria and of neomura [1]. This considerably strengthens the thesis that the large differences in DNA-handling enzymes of neomura, compared with eubacteria, were caused by rapid coevolutionary adaptation to the origin of histones in the neomuran cenancestor [1].

Methods

The main methods used were transition analysis and congruence testing as outlined above. BLAST and examination of resulting alignments and domain identifications by CDD was frequently used to check homology among potentially related sequences and to extend the literature information about the distribution of key characters across phyla. All BLAST results mentioned were by simple P-BLAST, except for those for Omp85, which additionally used PSI-BLAST in an unsuccessful attempt to detect more divergent homologues in Chlorobacteria. In many cases I used several phylogenetically divergent queries and also reciprocal BLASTs of hits that were rather low; in some cases reciprocal BLAST was dramatically better at picking up strong relationships. BLAST hits with E values above 10 were considered to lack detectable homology.

Results and discussion

To orient the reader in the following complex discussion, Fig. 3 indicates the 12 major transitions that will be discussed; five lesser transitions within Gracilicutes are also considered, making 17 in all (13 within eubacteria). I shall start with the evidence that actinobacteria are sisters of or ancestral to neomura, then work systematically down the tree to the root, discussing each transition in turn, and finally discuss overall implications of this new rooting. As Fig. 3 indicates, a major new line of evidence for polarizing the upper part of the tree concerns stepwise increases in complexity of the HslV and proteasomal proteases, both of which are absent from Chlorobacteria. Before explaining the logic, I provide a little background information about controlled proteolysis within hollow cylindrical macromolecular assemblies, which is essential for all life. I have attempted to present the following discussion in sufficient detail for specialists to check and criticize the validity of all the major points, but have shorn away as much detail as possible to expose the fundamental evolutionary points and to attempt to make the argument reasonably accessible to a broad audience. It is an analysis and synthesis, not a comprehensive review.

thumbnailFigure 3. Key molecular cladistic characters that help root the tree of life. Green bars mark major evolutionary innovations. Those explained in detail in previous publications [1, 24, 26] are labelled in blue. Those introduced for the first time or discussed in more detail in the present paper are in red. The three most fundamental changes in cell structure (the origin of unibacteria by loss of the negibacterial outer membrane [1, 5]; the neomuran revolution involving novel chromatin and glycoprotein secretion and much coadaptive macromolecular evolution [1, 5, 29, 62]; and the origin of the eukaryote cell [5, 27, 62]) are marked by thicker bars. So also are the three major transitions, whose key importance and decisiveness for rooting the tree of life are explained here for the first time: the origins of the proteasome, of flagella, and of Omp85 for insertion of OM β-barrel proteins. The three major kinds of cell from the viewpoint of their having fundamentally distinct membrane topology (eukaryotes, unibacteria, negibacteria) [5, 29, 56, 62] are shown by thumbnail sketches (isoprenoid ether lipids in red, outer membranes in blue). Thumbnail sketches also illustrate the inferred times of origin of two key cylindrical macromolecular assemblies (the OM β-barrel protein Omp85 and HslVU/proteasome ATP-dependent regulated proteases) and the two-step increased complexity of the latter. Negibacterial taxa are shown in black, Posibacteria in orange, and neomuran taxa in brown. Gracilicutes comprise four negibacterial phyla with either a very thin peptidoglycan layer or no peptidoglycan at all in their cell envelope: Proteobacteria, Planctobacteria, Spirochaetae, Sphingobacteria (Table 1 explains the formal bacterial taxon names used here for precision and brevity). Evidence for the relatively late dating of the neomuran revolution was explained in detail previously [1]. Note that although Chlorobacteria and Endobacteria are shown as holophyletic, either or both might actually be paraphyletic; I suspect that Endobacteria may be paraphyletic as the most divergent actinobacterium has endospores, but think that Chlorobacteria are probably not. Conversely, it is uncertain whether actinobacteria are paraphyletic as shown or paraphyletic; see text – further work is needed to decide. For simplicity, five additional polarizations within Gracilicutes that are also discussed are not shown; see the more comprehensive Fig. 7 for them and additional characters mapped onto the tree. Note that the ~2.8 Gy date for the origin of cyanobacteria is based solely on hopanoid biomarkers; since no earlier organic deposits have been found that are sufficiently well preserved and with enough extractable hydrocarbons for such biomarker analysis, this is a minimum date (though its validity also depends on the assumption that such hydrocarbons have not migrated vertically in the rocks since being formed, which is hard to test).

Intra-cylinder ATP-dependent proteolysis (protein digestion)

Three different families of ring-shaped or cylindrical macromolecular assemblies have evolved to allow controlled ATP-dependent proteolysis in cells [39]. I shall argue that two of them, ClpP protease and Lon protease [40] had evolved prior to the bacterial cenancestor, whereas HslVU [39] evolved only after the divergence of Chlorobacteria and higher organisms. In all three cases the proteolytic site is inside a hollow cylinder, in its central part as far away from entry channels as possible, which maximally protects external proteins from digestion unless they are actively pulled inside with the help of an associated ATP-dependent chaperone that recognizes only the correct proteins for destruction. In the Lon protease the chaperone and ATPase activities are part of a single large tripartite multifunctional polypeptide chain that is capable of self-assembly. Its N-terminal region is important for this assembly; its middle part has the ATPase/chaperone function; and its C-terminus has the protease activity. The protease and ATPase moieties each independently assemble into hexameric rings and it is thought that they then form a two-tiered hexamer with the digestive site on the inside. By contrast, in ClpP and HslVU/proteasomes, the chaperones and proteases are distinct and much smaller polypeptides coded by evolutionarily unrelated genes (but are confusingly given similar names despite this). Each assembles as a hollow ring and the whole assembly is formed by an ATPase ring sticking to each end of the protease ring/cylinder, in a suitable position to monitor substrate entry.

Lon is present right across the living world but not found in every species [40]; soluble LonA proteases are eubacterial or mitochondrial, whilst archaebacteria only have membrane-bound ones (LonB) with an extra membrane-spanning domain inserted within the ATPase domain. ClpP is present throughout eubacteria and in all chloroplasts, but not in any eukaryotes without plastids; this suggests that it was lost prior to the origin of eukaryotes but regained by photosynthetic eukaryotes when a cyanobacterium was enslaved to make chloroplasts. It is also absent from all archaebacteria except Pyrobaculum and Methanosarcina; as the latter is known to have acquired vast numbers of genes from eubacteria by lateral transfer [41] it is probable that ClpP was lost in the neomuran not the eukaryote ancestor (Fig. 3), and that both archaebacteria reacquired it by lateral transfer; proper phylogenetic analysis is needed to test this.

ClpP protease is a ring with 7-fold symmetry [40], whereas its unrelated chaperone ATPase ring (ClpX or A [42]) has 6-fold symmetry, being made from six monomers. HslV protease has six subunits (Fig. 4), as does its unrelated chaperone ring HslU, which allosterically activates it [43]. The proteolytic cylinder has 7-fold symmetry and its unrelated ATPase chaperone 6-fold symmetry. However, sequence analysis indicates a rather complex pattern of relationships. Although ClpP, HslV, and proteasomal proteases are all very distantly related, ClpP serine protease belongs to a different superfamily (acyl-CoA decarboxylase/isomerase) from proteasome α- and β-subunits and HslV (threonine NTN hydrolases) [19] and cannot therefore be their ancestor. Thus the heptameric proteasomal protease is much more closely related to the hexameric HslV, not to the ClpP protease, which has a fundamentally different tertiary protein-folding pattern. The ClpX and HslU chaperones are closely related members of the AAA+ ATPase superfamily; thus they probably either had a common ancestor or one evolved from the other. The proteasomal chaperones are also AAA+ ATPases but belong to a different family, being related to the ATPase domain of the 3-domain membrane inserted protease FtsH of eubacteria and chloroplasts; in fact the ATPase component of all ATP-dependent proteases including Lon and FtsH are AAA+ ATPases that assemble as hexamers, like other still more distant members of that family.

thumbnailFigure 4. Schematic longitudinal sections through the two-tier HslV and the four-tier bacterial 20S proteasome core particle. Red dots are proteolytic active centres. Thumbnail sketches on the left of the main figure are cross sections through the proteolytic chamber showing respectively their 6-fold and 7-fold symmetry. Evolution from the 12-mer HslV to the 28-mer proteasome by duplication to form α- and β-subunits forming heptameric rings is shown by the arrow; loss of proteolytic activity by the new α-subunit (black) coupled with a new ability to stack onto the β-subunits would have expanded the digestive cavity radially and longitudinally and kept potentially vulnerable external proteins further away from the proteolytic centres. Changed dimensions and shape of the α-subunit's ATPase binding surface probably favoured replacement of the HslU ATPase ring by a different one. Hypothetical evolution in the reverse direction by loss of the α-subunit's would have created a less efficient purely β-subunit 14-mer that might have lost any ability to bind an ATPase ring through adapting to α-subunit binding instead and with a broader digestive cavity and entry pore more likely to digest the wrong proteins. It is unlikely that it could have survived purifying selection long enough to reduce its symmetry to sixfold and find a new ATPase partner to bind and thus generate HslVU. No selective advantage for simplification of a proteasome to HslV is apparent. Subunit shapes simplified from [199].

Proteasomes are hollow cylindrical organelles for intracellular digestion of denatured proteins, found in neomura and advanced actinobacteria (Actinomycetales) only. They have a 15 nm long hollow cylindrical core, the 20S proteasome, with internal proteolytic activity: additional ATP-dependent chaperone structures at either end feed denatured proteins into it for digestion. In all proteasomes the central core has 7-fold rotational symmetry and four tiers of seven protein subunits (Fig. 4). In actinobacteria and archaebacteria the central core has only two kinds of protein: two inner tiers of identical proteolytic β-subunits (threonine proteases) and outer ones of the evolutionarily related non-proteolytic α-subunits (Fig. 4). This notable differentiation in function of the α- and β-subunits and associated change in their symmetry during the evolution of the threonine NTN protein hydrolases is the crux of my argument in the next section for polarizing the direction of evolution. In eukaryotes the core is far more complex, each protease subunit being different; this complication arose by repeated gene duplication during the origin of eukaryotes. In eukaryotes, each end is capped by a complex 'base' of several different proteins, including 6 different, but related, AAA+ ATPase chaperones, and a multiprotein lid open at one side to allow denatured proteins entry [44]. Adding 'base' and lid created a functional 26S proteasome of 31–41 different proteins (Fig. 5) [45]. Actinobacterial and archaebacterial proteasomes are much simpler: ends are terminated by a ring of six identical, but directly related, chaperone AAA+ ATPase proteins, so bacterial proteasomes are built of only three different proteins of two evolutionary groups.

thumbnailFigure 5. Proteasome evolution showing step-wise increase in complexity, first to the HslV ring protease, then to the 20S proteasome, and lastly to the 26S proteasome; the two major transitions in proteasome structure important for polarizing the tree are marked by grey bars. Blue bars mark four other important evolutionary transitions that also congruently polarize the tree. HslV has 6-fold symmetry (a 2-tiered ring of 12 identical subunits) and arose from a monomeric NTN hydrolase, probably just before Hadobacteria diverged. HslV rings interact with an unrelated chaperone ATPase, HslU, also having 6-fold ring symmetry, like ClpX chaperone from which it arguably evolved and virtually all AAA+ ATPase proteins, which originated in a burst of gene duplications prior to the last common ancestor of all life [19]. The 4-tier proteolytic core of the 6-tiered 20S proteasome evolved in a common ancestor of neomura and Actinomycetales (jointly proteates) of the subphylum Actinobacteria by another gene duplication that generated its catalytic β- and non-catalytic α-subunits from HslV, with an associated symmetry change to 7-fold: all four rings forming the core of the proteasomal cylinder have 7 subunits, but the 6-fold-symmetric HslU was replaced by another hexameric ATPase ring from a different AAA+ family to make the proteasome 'base' (red in the two-colour sketch of the archaebacterial proteasome at the top left). Glycobacteria [1] comprise all the typical negibacteria with OM lipopolysaccharide, i.e. all negibacterial phyla listed in Table 2 except Hadobacteria and Chlorobacteria).

HslV to proteasome differentiation polarizes the evolutionary transition

I argue here that the proteasome 20S core particle evolved from the simpler HslV, not the reverse. If this evolutionary polarization is correct, it excludes the root of the universal tree from a clade comprising neomura and actinomycete actinobacteria (Fig. 5), the only organisms that have the shared derived character of a proteasome with distinct but evolutionarily related α- and β-subunits, only one of which is enzymatically active. My argument does not depend on the sometimes-controversial fossil evidence [1] or on archaebacteria being holophyletic not paraphyletic [1]. My analysis, if correct, establishes the universal root within eubacteria, in agreement with paralogue trees for metabolic enzymes [1], confirming that archaebacteria are highly derived, not a primary domain of life, and that long-standing interpretations of early life assuming a molecular clock for rRNA have been grossly misleading [1,46].

As explained above, HslV is a single protein, evolutionarily related to both the α- and the β-subunits of the proteasomal digestive core. Twelve HslV molecules are arranged as two tiers of six identical subunits. In its active form it has an HslU ATPase ring at each end. Thus the 24-molecule HslVU protease is markedly simpler than, yet partially evolutionarily related to, the actinomycete/archaebacterial proteasome. The simplest interpretation of the evolutionary origin of proteasomes is that the core proteasome originated from HslV protease by a gene duplication that made functionally distinct α- and α-subunits arranged as a four-tier core rather than as a two-tier core as in HslVU. This increased the length of the protective cylinder and the associated increase in the number of subunits per ring to seven increased the diameter of its hollow lumen, thus expanding the proteolytic chamber in both directions. These concerted changes thus increased the capacity of the proteasome to digest larger proteins and to protect cytosolic proteins from accidental digestion compared with the simpler and smaller HslVU.

I suggest that the increased diameter of the core caused problems with its previous association with HslU, so that this was replaced by a larger more distantly related AAA+ ATPase ring to form the cap attached to each end of the 20S proteasome. In archaebacteria the cap ring is a hexameric ATPase [47] that is related to the ATPase domain only of FtsH protease; its homologue in actinomycetes is a similar hexamer also of identical protein subunits, but interaction with the 20S core has not yet been directly demonstrated [48]. FtsH is very conserved and found in all eubacteria, including actinobacteria where it coexists with the putative cap ring; thus unlike HslVU it did not disappear when proteasomes evolved by being directly converted into the cap ring. Instead gene duplication and with one copy only losing its N-terminal membrane-insertion domain and C-terminal protease domains was probably involved. However, neomura have partially related proteins with two separate ATPase domains, which in eukaryotes form a hexameric ATPase (Cdc48) responsible for chaperoning proteins out of the ER lumen for degradation. Cdc48 seems more closely related to the proteasome cap, which in the ancestral eukaryote became differentiated into a heteromeric structure by gene duplication and divergence, than it is to FtsH, which was probably lost in the ancestral neomuran. Since related two domain ATPases are also found in a sprinkling of Posibacteria and even a few negibacteria higher in the tree (apparently not in Chlorobacteria), such proteins (rather than FtsH) might have been ancestral to the proteasomal ring protease; phylogenetic analysis of each domain is needed to establish the precise evolutionary relationships among them. The main point for this paper is that the ATPase regulatory cap of the proteasome originated from a different AAA+ ATPase from HslU and its origin was a complex process involving gene duplication, domain deletion, and the origin of a novel ability to bind the newly arisen α-subunits of the 20S proteasome. It was not a simple process of molecular transformation with retention of all main functions. Moreover, as for the proteasomal proteolytic core, there was a further increase in regulatory ATPase complexity, involving even more extensive gene duplication, to make the eukaryote 26S proteasome.

An important point is that the α- and β-subunits of the proteasome appear to have diverged from HslV in opposite but mutually complementary directions. Four functions present simultaneously in HslV are partitioned between them. The β-subunits retained the threonine proteolytic active centre at the N-terminus and the capacity to assemble into a homomeric two-tier ring of 12 subunits. But they lost the distally constricted inner rim that narrows the ends of HslV to prevent entry of unfolded proteins into the proteolytic cavity (Fig. 4) and the capacity to bind to the regulatory ATPase ring. At the same time they acquired a new ability to bind the β-subunit ring by the same region of the molecule that lost the distal constriction. It is very likely that these two changes came about by a concerted remodelling of this region of the polypeptide chain. By contrast, the α-subunits lost the proteolytic centre and ability to form two-tier homomeric rings, but retained the distal constriction and the ability to bind an ATPase ring, albeit a different one as argued above. Thus it was the opposite end of the molecule, away from the ATPase-binding site, that was mainly modified in β-subunits.

It is well known that evolution can involve simplification as well as stepwise increases in complexity. Therefore, the fact that one can see functional advantages in the proposed increase of complexity from smaller and simpler HslV to larger and more complex 20S proteasomes, though adaptively much more plausible than evolution in the reverse direction, for which no selective advantage is apparent, is not in itself proof that evolution occurred in that direction. How can we rule out the alternative theoretical possibility of evolution in the reverse direction from the 20S proteasome to HslV by simplification? The clinching argument concerns the differentiation in function between the proteasome α- and β-subunits.

Though logically possible, direct reversal is mechanistically and evolutionarily highly unlikely. It would entail the loss of the non-catalytic α-subunits that serve as a toroidal adaptor for binding the two-tiered proteolytic β-subunit rings to the terminal ATPase rings. Such loss would generate an intermediate two-tiered β-subunit 14-mer without a narrowly constricted protein entry channel or any ability to bind regulatory ATPase rings. Thus it would be very harmful by digesting proteins that it should not and would be strongly selected against. Three major changes would be needed to convert such a defective β-subunit into HslV. The probability that it could simultaneously change its symmetry from 7-fold to six-fold, evolve a narrow entry channel, and evolve an ability to bind an ATPase ring in the short time before the mutant strain was rapidly eliminated by such adverse selection is negligible. Thus simple reversal would in practice be evolutionarily impossible. The fact that mutant Thermoplasmas without proteasomes can survive, unless subjected to heat shock, does not contradict this argument. Nor does the fact that proteasomes appear to have been lost by a few actinomycetes that are endoparasites of animals. Simple loss of an entire structure has been observed repeatedly in evolution, but reversal of evolution of a complex highly differentiated structure to form a more generalized and simpler one closely mimicking an ancestral state has, as far as I am aware never been clearly documented. Thus evolution of HslVU from 20S proteasomes is so improbable that we can safely polarize the actual evolutionary change in the opposite direction.

The transition from bacterial proteasomes to eukaryotic 26S proteasomes involved even more complex changes and differentiation among the different subunits, so it could not have occurred in the opposite direction either. The actinobacterial/archaebacterial proteasome is undoubtedly ancestral to the eukaryotic one, not the reverse. The far greater complexity of 26S proteasomes is associated with the origin of ubiquitin, unknown in bacteria but present in all eukaryotes as the most conserved protein of all. Ubiquitin is covalently attached to proteins to target them for destruction by 26S proteasomes; the lid includes proteins helping to recognize the polyubiquitin tags, remove them and push the target protein into the proteasomal digestive lumen. Clearly the extra complexity of base and lid coevolved with the origin of ubiquitin tagging. The greater heterogeneity of the eukaryote proteasome core reflects the greater diversity of substrates that need digesting compared with bacteria.

Arguing that HslVU evolved from proteasomes would leave totally unanswered how 20S proteasomes evolved. If HslV were not the ancestor of the α- and β-subunits, what is? There are no other candidates. Polarizing the tree in the direction shown in Figs 4, 5 explains the origin of proteasomes from HslV in a gradual way that is mechanistically and evolutionarily plausible. Polarizing it in the opposite direction totally fails to explain the origin of proteasomes and postulates changes that are mechanistically and selectively unreasonable, and is thus doubly defective scientifically. Thus mechanistic, selective, and phylogenetic arguments all unambiguously polarize the direction of evolution from HslVU to the more complex 20S proteasome with larger digestive cavity and more strongly bound ATPase caps, not the reverse. This important evolutionary step took place prior to the last common ancestor of all Actinomycetales, as proteasomes are found in all free-living actinomycete genomes so far sequenced, spread right across the 16S rRNA tree [49] and are absent only in a few parasites, almost certainly secondary losses such as are widespread in parasites – perhaps allowed by their greater degree of buffering from environmental heat shocks inside animal bodies. It could have taken place at any time between then and the origin of actinobacteria themselves, about twice as early, judging from 16S rRNA trees [50]. The exact timing is uncertain as genomes of earlier diverging actinobacteria (Bifidobacterium, Symbiobacterium [51]) lack both proteasome and HslV genes. Presumably one or other was present in their common ancestor shared with Actinomycetales, and has been lost since they diverged. Since, as discussed below, there have probably been many losses of HslVU within eubacteria, but proteasome loss has never been clearly demonstrated among free-living bacteria, losses of HslV seem more likely. If proteasomes have never been lost from free-living bacteria, they evolved only in the immediate common ancestor of Actinomycetales, and thus may be only half as old as actinobacteria. If that is correct and proteasomes have always been vertically inherited, neomura must be more closely related to Actinomycetales (as several other characters such as cholesterol biosynthesis also suggested [1]), making Actinobacteria paraphyletic. However, these parsimony arguments are not decisive evidence for actinobacterial paraphyly. We need more data on early diverging actinobacteria; finding either HslV or proteasomes among them would clarify this. The glycosyltransferases discussed below that support a posibacterial ancestry for neomuran N-linked oligosaccharide biosynthesis are found in Lactobacillales (Endobacteria) but not Actinomycetales; it is thus likely that characters relevant to neomuran origins have been differentially lost in different posibacterial lineages since the origin of neomura from a posibacterium. The important point is that multiple lines of evidence show that either actinomycetes or endobacteria are their nearest eubacterial relatives.

This argument polarizing the evolutionary direction from HslV to the 20S actinomycete core proteasome, not the reverse, uses paralogue rooting – but in a novel way not suffering from the usual tree-reconstruction artifacts: it stresses not sequence trees but two successive increases in complexity of quaternary protein structure – from monomeric NTN hydrolases to hexameric HslV to 14-meric core proteasomes with sharply differentiated functions and 3D structures for the α- and β-subunits. This polarization provides strong evidence that actinomycetes and neomura together form a clade, which I designate 'proteates' because the proteasome with core 7-fold symmetry is its synapomorphy, and thus excludes the root of the tree of life from anywhere within proteates. One can hardly suppose that the complex proteasome core was the ancestral state for all life and that monomeric NTN hydrolases ultimately evolved from it via HslV and two progressive simplifications involving a change of chaperone partner and then its loss. Yet the 'standard model' of bacterial evolution assuming a root between archaebacteria and eubacteria must assume just that and specifically put it between archaebacteria and actinomycetes (to explain their sharing proteasomes). Proteasome evolution excludes the root from proteates (neomura plus actinomycetes) but does not positively locate it. To do this we must polarise several other evolutionary transitions (Fig. 3), as explained below.

The red herring of lateral gene transfer might be raised against the above interpretation. Gille et al. [52] assumed that proteasome genes were laterally transferred from archaebacteria to the common ancestor of actinomycetes. However, they presented no phylogenetic analysis to support this assumption; unpublished trees give no support for lateral transfer, but as the α- and β-subunits and HslV proteins are very divergent and with too long branches for satisfactory phylogenetic analysis, such a possibility cannot be excluded with total confidence (J. Archibald pers. comm.). However there is no positive reason to invoke the total replacement of HslVU by three foreign genes; possibly Gille et al. did so through being unaware of the evidence of a vertical relationship between actinobacteria and neomura and the likelihood that actinobacteria are much older than archaebacteria [1], making the assumed lateral transfer temporally impossible if it is assumed into their cenancestor (though possibly more likely if it were into the ancestor of Actinomycetales alone). Furthermore, assuming lateral transfer from archaebacteria leaves the origin of archaebacterial proteasomes themselves totally unexplained, and ignores the undoubted homology between HslV and proteasomal subunits, and is thus untenable for three independent reasons. Given this homology, there had to be a transition between HslV and proteasomes at some stage.

HslVU is found in Endobacteria (i.e. low-GC Gram-positives plus mycoplasmas, spiroplasmas: see Table 2) and four phyla of Negibacteria [52,53]: proteobacteria, spirochaetes, Sphingobacteria, and many Eurybacteria (e.g. Heliobacteria, Thermotoga, but absent from Fusobacteria, which presumably lost it). HslVU is absent from the two entirely non-flagellate bacterial phyla (Chlorobacteria, Cyanobacteria) that are among the best candidates for early diverging life. But this absence is not itself a strong argument for considering them to be primitive, for it is likely that HslVU can be lost evolutionarily. If, as Figs 3 and 5 suggest, HslVU evolved prior to the origin of Hadobacteria, it must have been lost by Cyanobacteria. Its absence from Clostridiales and mycoplasmas suggests loss within Endobacteria. HslVU is currently unknown in Planctobacteria, which for reasons discussed below are unlikely to be at the base of the tree, and thus may have been lost by them. HslV is also absent from Hadobacteria except for Thermus, but as its HslV has a highest BLAST hit to Thermotoga and an HslU with highest hit to Aquifex, it might have been a thermophilically adaptive lateral acquisition from these unrelated hyperthermophiles. If that were true, cyanobacteria and Deinococcus need not have lost it, as HslV may have originated after Hadobacteria and Cyanobacteria arose, not just before Hadobacteria as shown on Fig. 3.

Interestingly, trypanosomatid protozoa and Apicomplexa retained proteobacterial HslVU in their mitochondria as well as proteasomes in the cytosol and nucleus – the only known organisms with both [52,53]. The fact that no bacteria are known to harbour both HslV and proteasomes is consistent with HslV having evolved directly into proteasomes.

Given the position of the root of the tree deduced from Omp85 evolution, as explained below, the earliest diverging phylum, Chlorobacteria, lacks HslVU. It is therefore likely that they never possessed it and that it evolved in the last common ancestor of all other bacteria, as shown on Figs 3 and 5. The absence of HslVU from Chlorobacteria, though probably the primitive state consistent with the rooting shown, is – I stress – not the primary reason for that rooting, merely a very minor corroboration, given the likelihood that HslVU was lost several times within negibacteria.

In sum, there were three successive increases in complexity: first from an ancestral monomer threonine protease to hexameric HslV, thus increasing the proteolytic repertoire of the common ancestor of eubacteria other than chlorobacteria; then to a 14-mer of two proteins in the actinomycete/archaebacterial 20S core proteasome with an expanded digestive cavity and differentiated function of its α- and β-subunits; thirdly to the markedly more internally differentiated eukaryotic 26S proteasome with expanded proteolytic scope and selectivity. The two latter compellingly polarize the tree of life from non-proteates to proteates and from unibacteria to eukaryotes respectively (Figs 3, 5), and therefore place its root within or among the other eubacterial groups.

Before explaining why the root must be within Negibacteria, I will briefly map onto the tree three main peptidases that further digest the peptide products of the cylindrical ATP-dependent proteases: tricorn peptidases [54], tetrahedral (TET) peptidases [55], and TPP proteases [55]. All are multimeric with a central digestive cavity, but each with unique structures dissimilar from the cylindrical enzymes discussed above. Tricorn peptidases are the most phylogenetically widespread; they were probably present in the prokaryote cenancestor but lost by the ancestral eukaryote at the origin of the 26S proteasome. TET peptidases were probably also lost then and occur only in prokaryotes, mostly those apparently lacking tricorn – for which they may substitute. The statement that TET is more widespread than tricorn [55] seems mistaken, but I agree that tricorn is more ancient. As tricorn needs protein cofactors but TET does not, TET could be acquired by lateral transfer and substitute for tricorn more easily than the reverse; phylogenetic analysis is needed to see if its scattered distribution arose thus, and not by differential loss. Tricorn is a complex two-domain protein with both domains present from Chloroflexus (Chlorobacteria) to archaebacteria. BLAST reveals an additional stand-alone paralogue of the C-terminal proteolytic domain only in taxa ranging from Cyanobacteria to Endobacteria; this appears to be absent from Actinobacteria and archaebacteria and perhaps was lost when 20S proteasomes evolved. TPP peptidases are large proteins, like tricorn, but restricted to eukaryotes. BLAST indicates that their proteolytic domain is homologous to the much smaller subtilisin proteases of endobacteria and some negibacteria; the stronger hits to endobacteria fit the topology of Fig. 3; TPP could have evolved from a smaller posibacterial protease by adding a domain.

Membranome evolution: from negibacteria to posibacteria

For understanding cell evolution we must consider not only genomes but also evolution of the membranome: the set of different genetic membranes that make the cohering supramolecular framework for cell structure [56]. Bacteria fall into two very distinct subkingdoms with respect to cell envelope structure: Negibacteria, all with a double envelope with an outer membrane lying outside the cytoplasmic membrane, and Unibacteria in which the cytoplasmic membrane is typically the only membrane. Proteins of the cytoplasmic membrane are always bundles of α-helices and are inserted directly into it by the SecYE translocon. In most negibacteria outer membrane proteins (Omps) are never α-helix bundles, but almost always β-barrels, some of which form large hydrophilic pores in it, e.g. porins; Omps are translocated across the cytoplasmic membrane by SecYE and then insert specifically into the outer membrane. Of the 10 bacterial phyla (Table 1) only two (Archaebacteria, Posibacteria) are Unibacteria: the rest, which include the majority of bacteria, are all Negibacteria [1].

The most fundamental question about the origin of the first cell [30] is did it have just one membrane, like most Posibacteria, as usually assumed, or two surface membranes like all negibacteria (most bacteria) as Blobel [57] and I [30,37] argued. The negibacterial double envelope is so complex that it must have arisen only once. I previously argued that the origin of the first cell is easier to understand in simple selectively advantageous stages if it was a negibacterium with two membranes [31]; that obcell theory simultaneously explained the origin of the genetic code, the first cell, and the negibacterial outer membrane. That detailed transition analysis from precellular life to the first cell will not be discussed again here, but the origin of the posibacterial cell wall can now be better understood than before because of advances in understanding its biogenesis and also in the wall and membrane structure of the Selenobacteria (see Table 2), which I shall argue are probably ancestral to all Posibacteria.

Fig. 6 contrasts cell envelope structure in posibacteria and negibacteria. In Posibacteria, except for the almost entirely parasitic Mollicutes (mycoplasmas and spiroplasmas, which lost murein walls) the murein peptidoglycan layers are very thick and are attached to the cytoplasmic membrane by covalently attached lipoproteins with their lipid tails embedded in the outer leaflet of the phospholipid bilayer. In negibacteria the murein is usually much thinner and attached instead to the outer membrane (OM) by covalently attached murein lipoproteins with their lipid tails embedded in the inner phospholipid leaflet of the OM lipid bilayer; unlike in mycoplasmas, lipoproteins are retained in negibacteria even when murein is lost (most Planctobacteria). In Chlorobacteria and Hadobacteria the outer leaflet of the OM bilayer is also simple phospholipid, but in all six other phyla it is lipopolysaccharide (within Spirochaetes a greatly modified version is present in Leptospira [58], whereas the obligately parasitic spirochaetes have totally lost it; a few proteobacteria have simplified it to lipooligosaccharide). Unlike the cytoplasmic membrane, the OM is pierced by hollow cylindrical β-barrel porin proteins that allow small molecules to diffuse freely across it [59]. At intervals the OM is in direct and strong adhesive contact with the inner membrane at points known as Bayer's patches where there is a hole in the thin murein wall. As OM proteins and lipids are all synthesized by enzymes associated with the inner, cytoplasmic membrane, they have to be transported to the OM secondarily, through the periplasm for proteins [59] and probably via the Bayer's patches for lipids. Posibacteria entirely lack both the OM and this transport machinery. The OM and Bayer's patch structure can have evolved only once in prokaryote history as its structural and biogenetic complexity is so great. Transition analysis asks would it have been easier for a negibacterium to have lost the OM (evolution from bottom to top in Fig. 6) and make its wall thicker or for a posibacterium simultaneously to add an OM to a cell without them and simultaneously make the wall thinner and invent machinery for export of both lipids and proteins to it and to make the proteins that would make this complex system function (evolution from top to bottom in Fig. 6)?

thumbnailFigure 6. Contrasting cell envelope structure in posibacteria and negibacteria. OM phospholipids, and when present possibly also lipopolysaccharides (LPS), may pass from their site of synthesis in the cytoplasmic membrane to the OM at the Bayer's patch contact sites, but this is not proven and only one protein (Imp) needed for LPS export is yet known. During its biosynthesis murein is secreted across the cytoplasmic membrane by isoprenol carriers. Lipoprotein (LP) is cotranslationally synthesised in both groups. Conversion of a negibacterial wall to a posibacterial wall as shown would be very much simpler than the reverse, requiring only a mutation causing sudden murein hypertrophy that could have broken the OM away from the Bayer's patches, preventing further lipid transfer and OM regrowth, plus the origin of sortases with a novel recognition system for covalently attaching murein lipoproteins (MLP) to the wall. As the negibacteria most closely related to Posibacteria (Eurybacteria) are glycobacteria with much more complex OM, secretion, and import mechanisms than Chlorobacteria (which lack lipopolysaccharide, most porins, Omp85, type I, II, and III secretion machinery, and probably the LolDE lipoprotein release mechanism, of more advanced bacteria), evolution in the reverse direction of such a complex OM in one step from a posibacteria would be practically impossible (see text) and immensely more difficult than the stepwise increase in its complexity possible with a chlorobacterial root of the tree. As the transitional stage between negibacteria and posibacteria had flagella, adding an outer membrane to a posibacterium and evolving a lipid export mechanism in one step would be even more complicated and improbable, as flagellar biogenesis would have had to be conserved and modified at the same time (see Fig. 8). No satisfactory mechanistic explanation has ever been given of how it could possibly have occurred.

As negibacteria can evolve murein walls predominantly much thicker than usual (e.g. Deinococcus), while still retaining thinner Bayer's patch regions to allow the OM to grow, I argued that if such a negibacterium mutated its wall growth machinery so as suddenly to increase its thickness still more dramatically it could overnight become so thick as to break away the OM from its attachments to the cytoplasmic membrane at the Bayer's patches [29,30]. Thereafter there would be no biophysical mechanism for newly made OM lipids to diffuse in a continuous bilayer to regenerate the lost OM, so it was permanently lost and could never reacquire an OM. Most OM proteins would become useless and their genes inevitably degenerate and be deleted. The new unimembranous bacterium was the first posibacterium – the ancestor of all Posibacteria and ultimately neomura also. Thus the initial step of the transition from a negibacterium to a posibacterium could have been very simple mechanistically; loss of the outer membrane could have occurred by a single mutation causing murein hypertrophy. Murein lipoproteins that originally linked the OM to the murein could be retained for linking the thicker wall instead to the cytoplasmic membrane and modified as necessary; a key modification would be the longer retention of the signal peptide to anchor them to the cytoplasmic membrane at least until after they were cross-linked to the murein. As discussed below, all posibacteria have related machinery for achieving this, which establishes their monophyly. In negibacteria the signal sequence must be cleaved after protein secretion to allow the lipoprotein to move to and diffuse within the OM bilayer (with the help of periplasmic chaperones [60]) prior to being cross-linked to murein.

Evolution in the opposite direction from a posibacterium would have required numerous mutations in at least dozens of genes to evolve a lipid, protein, and lipoprotein export machinery; as the closest negibacterial relatives of Endobacteria are Selenobacteria with the exceedingly complex lipopolysaccharide, this would also have had to evolve at that juncture! Of course, this machinery had to have evolved sometime. The key question is: was it mechanistically easier to do so suddenly by the saltatory addition of an extra membrane to a unibacterial cell? Or is it evolutionarily more understandable if it arose more gradually over many generations, and did so in three distinct stages: (1) forming a simple outer membrane with no lipopolysaccharide by differentiation between two pre-existing membranes as in the obcell theory of the origin of negibacteria to make the first Chlorobacteria [30,31], and (2) later becoming more complex by adding the Omp85 mechanism for inserting OM β-barrel proteins in the common ancestor of Hadobacteria and all other life-forms and (3) then evolving impermeable lipopolysaccharide and associated complex secretion/import machinery in the common ancestor of Cyanobacteria and all other life-forms (Fig. 3)? I have long considered a transition from posibacterium to negibacterium to be so difficult mechanistically as to be almost impossible in practice. Apparently only one person has ever tried to suggest how it might have happened: Dawes [61] suggested that the OM could have evolved from the forespore membrane that encloses the spores of typical endospore forming Gram-positives (Teichobacteria) prior to their germination. However this has never seemed plausible to me, as the engulfing forespore membrane could only have been retained as an OM if Bayer's patches and their lipid export machinery and porins all evolved in one cell generation; failing that, such a hopeful monster would immediately have lost the OM again. The problem is even greater than that as the transitional intermediate between negibacteria and posibacteria must have had flagella (Fig. 3). So flagella that originally supposedly evolved in posibacteria would have had immediately to penetrate the saltatorily formed OM, which they now do with the help of a lipoprotein L-ring (see discussion below on flagellar origins). It is hard to accept that the negibacterial mechanisms for both OM and flagellar biogenesis, including a key change in the mechanism of lipoprotein secretion, evolved saltatorily in a single cell generation. Therefore I have long rejected the widespread assumption that unibacteria are ancestral to negibacteria [1,5,29-31,56,62,63]. None of the thousands of implicit supporters of that majority view has ever tried to explain how such an exceedingly improbable transition might have occurred. The onus is on them to do so if they wish to continue to hold that view despite the extensive contrary arguments. Can Dawes' theoretically possible speculation be converted into an evolutionarily acceptable theory? I strongly doubt it; to me it is no more plausible than the other idea he discussed, that the nuclear envelope also evolved from the forespore/spore two membranes, which nobody accepts.

However one aspect of his theory does seem correct: this is that the transition between negibacteria and posibacteria almost certainly occurred in an endospore-forming bacterium. This is strongly indicated by the fact that Selenobacteria (phylum Eurybacteria: Tables 1, 2; Fig. 7) have a fairly typical Gram-negative envelope with an outer membrane and thin sacculus [64], yet have endospores that are indistinguishable from those of Endobacteria [65-67]. Furthermore they strongly group with and appear to be paraphyletic to Endobacteria on rRNA and protein trees. Thus there is little doubt that the endospore-forming negibacterial Selenobacteria are specifically related to the posibacterial Endobacteria and probably also ancestral to them, in which case the transition did occur in the direction from negibacteria to posibacteria, shown in Figs 2, 3, 5 and 7 and first argued in detail two decades ago [29,30]. Woese, indeed, suggested this for Endobacteria only [68], but neither he nor others have yet accepted that it is true also for actinobacteria [29], as they do not branch with Endobacteria on rRNA trees, though sometimes they do so with Endobacteria plus Selenobacteria. The Selenobacteria/Endobacteria branch is generally called the 'low-GC Gram positives', but this is very misleading in cell biological terms as Selenobacteria have typical negibacterial walls and negative or very weakly positive Gram-staining; moreover not all members of this branch are low in GC). The negibacterial envelope ultrastructure of Selenobacteria such as the heterotrophic Selenomonas, Sporomusa, and the phototrophic Heliobacteria has been known for some time, which led me to exclude them from Posibacteria and to group them in the phylum Eurybacteria [69] with Fusobacterium, which also has a Gram-negative envelope, but unlike the others lacks flagella. Later however, through doubt whether their outer membranes were really related to those of negibacteria, I more conservatively included them and Thermotogales in Posibacteria [1]. Advances in envelope chemistry of Selenomonas clearly show that it is a genuine negibacterium, though with significant differences from other negibacteria, e.g. incorporation of cadaverine in its murein and absence of the Braun lipoprotein [64]. Genomic evidence discussed below for Thermotoga likewise indicates that its toga is a highly modified OM, so I now exclude it also from Posibacteria. The fact that its toga balloons away from the cell surface may be a consequence of its radically modified peptidoglycan [70] preventing murein lipoproteins from attaching it closely.

thumbnailFigure 7. The rooted tree of life emphasizing key novelties and synapomorphies. Thumbnail sketches show major variants in cell morphology (microtubular skeleton red; peptidoglycan wall brown; outer membrane blue). The most likely root position is as shown; the possibility that it may lie within Chlorobacteria instead cannot yet be ruled out. Lowest level groups including or consisting entirely of photosynthetic organisms are in green or purple. The frequently misplaced hyperthermophilic eubacteria are in red; indel analysis confirms that Aquifex is a very divergent proteobacterium [79]. The new negibacterial infrakingdom Gracilicutes segregates four phyla from the other negibacteria. Planctobacteria probably lost or reduced murein twice, as free-living Verrucomicrobia have murein. Note that 12 synapomorphies support the earliest branching of Chlorobacteria. The fact that mitochondria were present in the cenancestral eukaryote and that their ancestors, α-proteobacteria are a relatively recently derived of the eubacterial phylum Proteobacteria, proves that eubacteria must be significantly older than eukaryotes and decisively refutes suggestions that eubacteria may be derived from eukaryotes. As α-proteobacteria are nowhere near the root of the tree (irrespective of whether it is rooted beside or within chlorobacteria or as some mistakenly think between neomura and eubacteria) eukaryotes are substantially younger. The age of ~900 My for eukaryotes is based on a recent Bayesian analysis of 143 proteins multiply calibrated from the fossil record [35] and my own critical interpretation of the direct fossil record [129]. This tree, though constructed from rare discrete cladistic characters, is remarkably similar to a 31-protein, 191 species universal sequence tree published while this paper was being reviewed [175]; see responses to comments by referee 3 for discussion of the few differences, all but one (the position of Aquifex) in regions poorly supported on the sequence tree.

Thus the above evidence establishes the origin of Endobacteria from Selenobacteria, but what were the ancestors of Actinobacteria? Are they sisters of or derived from endobacteria, despite not grouping with them on many sequence trees, or did they evolve independently from a separate group of negibacteria? Clearly the proposed mechanism of OM loss by murein hypertrophy is mechanistically sufficiently simple that it might in principle have happened twice. However, as murein hypertrophy is likely also to disrupt cell division, viability would probably need to be simultaneously maintained by independent mutations in the septation machinery. Thus, although murein hypertrophy offers a biophysically very plausible mechanism for OM loss, the generation thus of fully viable offspring could have been evolutionary very difficult (though much less so than a hypothetical transition from posibacterium to negibacterium) and thus relatively late and unique in history. One major feature of the biogenesis of the similarly thick walls of actinobacteria and endobacteria strongly favours a common origin. This is the possession of a universal mechanism for covalent anchoring of surface proteins to the cell wall of Posibacteria. This requires sortase enzymes, which are extracellular transpeptidases positioned in the cytoplasmic membrane. Surface protein precursors that enter the secretory pathway via N-terminal signal peptides have specific C-terminal sorting signals with an LPXTG motif or related recognition sequences, which stimulate sortase-mediated cleavage and the covalent attachment of their C-terminal end to murein peptidoglycan cross-bridges. Genomes of all Posibacteria encode multiple sortase genes, which have diversified to use multiple different substrate classes with different sorting signal motif sequences, and are involved in anchoring a diverse array of structures, including pili on the posibacterial surface [71]. Sortase diversity is greatest in Endobacteria, which have four different sortase classes; Actinobacteria have only two sortase classes, one shared with Endobacteria [71]. The chlorobacterium Chloroflexus has one protein with an N-terminal region homologous with sortases, but there is no evidence that it acts as a sortase. A small subclade of proteobacteria has one sortase-related enzyme, which does not fall into any of the five posibacterial sortase paralogue classes, plus a few proteins with a putative sortase recognition motif; it may be a very divergent sortase, but biochemical evidence for such a role is wanting [71]. Homologues of sortases are otherwise entirely unknown from negibacteria despite scores of complete genomes being now available.

Phylogenetic analysis is needed to see whether the isolated proteobacterial sortase-like proteins could have been acquired by lateral transfer or are rare relics of a negibacterial ancestor of the posibacterial sortase family. Present evidence suggests that a major diversification of sortase enzymes, and possibly even the origin of the whole sortase-based protein attachment mechanism, took place in a common ancestor of Actinobacteria and Endobacteria at the time when murein thickening eliminated the outer membrane. Sortase family 3 [71] is a clear synapomorphy for Posibacteria. Overall the evidence is consistent with the view that Actinobacteria are either sisters of or derived from Endobacteria and that their failure to group together on most sequence trees is a phylogenetic artefact.

Although I have argued above (and present even more compelling arguments below based on flagellar evolution) that posibacteria evolved from negibacteria, it is important to note that it is not quite so evolutionarily difficult as I once thought [29,30] to add some kind of extra outer lipid membrane to a basically unimembranous cell. This is shown both by the case of the archaebacterium Ignicoccus discussed in a later section and by the presence in mycobacteria and corynebacteria (which form a related subgroup of actinobacteria) of a unique outer lipid layer. Although its structure is less well known than is the much simpler OM of negibacteria, it is clear that this 'mycomembrane' is chemically and structurally utterly different from the negibacterial OM and has evolved independently [72]. The lipids are not phospholipids but mycolates or corynomycolates [73,74] and the lipid layers are thicker. Polysaccharides are abundant outside it as are lipopolysaccharides, but these are chemically unrelated to and should not be confused with those of negibacteria. The major cell wall carbohydrates are arabinogalactans [73], as in plants. Although the protein channels that allow nutrient uptake through this thick impermeable layer are misleadingly called porins they are unrelated to the porins of Negibacteria in sequence or structure [75]. The most abundant mycobacterial porin MspA has a much longer cylindrical pore than negibacterial porins and no clear protein relatives in any other group [76,77]. Far from weakening the contrast between negibacteria and posibacteria, the existence of the non-homologous mycomembrane, which clearly evolved in response to the same selective pressures for impermeability as the negibacterial lipolysaccharide (during the origin of glycobacteria – see Fig 5 legend – as discussed in detail in a later section) shows that such selective pressures do not necessarily produce a membrane with the specific properties of the glycobacterial/negibacterial OM. Thus the fact that all negibacteria except Chlorobacteria have a common mechanism for targeting their OM β-barrel proteins (discussed in detail later), which is not found in these actinobacteria or Ignicoccus is very strong evidence for their monophyly. What the existence of the mycomembrane does mean, however, is that arguments for the ancestral character of the negibacterial envelope must rest on the new polarizations within the tree discussed in this paper, not on the original argument based on the difficulty of adding a second membrane [29,30]. Thus it is not the number of membranes per se that is important but their structure and biogenetic mechanisms; this allows us easily to distinguish homology (within negibacteria) and analogous convergence (Ignicoccus and mycobacteria/corynebacteria). Despite such superficially similar convergence the distinction between negibacteria and posibacteria remains fundamental.

Monophyly of Posibacteria

The monophyly of Posibacteria plus Eurybacteria is weakly shown by some 16S rRNA and protein trees, but is often absent from single or multigene protein trees (but seldom more than weakly or moderately contradicted); commonly it is broken by a usually weak association of Cyanobacteria and Actinobacteria, which seems devoid of biological rationale but may reflect base-compositional similarities. The failure of posibacteria to form a clade on multigene trees might be taken as evidence that they each independently lost the outer membrane, but the shared sortase mechanism for covalently attaching lipoproteins to their thick murein walls discussed above renders this highly improbable. It seems more likely that Actinobacteria are excluded from the Endobacteria/Eurybacteria (usually misleadingly called the low GC Gram-positives) by their exceptionally high GC content. Perhaps also a systematically elevated rate of molecular evolution may draw them towards their true relatives, the neomura, which have very long branches on trees for all molecules drastically modified during the neomuran revolution [1]. I see no other way of reconciling the compelling evidence from cell wall and proteasome evolution with most sequence trees. If Endobacteria and Actinobacteria diverged almost immediately after the origin of posibacteria, such biases would probably overwhelm any historical signal for their relationship, a phenomenon known in eukaryotes [2,46] – if the bias is sufficiently strong, this artefact could even happen if Endobacteria are paraphyletic ancestors of actinobacteria, which might therefore be substantially younger than many sequence trees suggest; the contradictory branching order among the three glidobacterial phyla, Endobacteria/Eurybacteria, and Actinobacteria seen in different single and multigene trees is another reason for not taking any of them too seriously. Exoflagella without an L-ring [78] for binding the OM are a synapomorphy for Posibacteria, as are their single envelope membrane of acyl ester lipids and the sortase machinery.

Substantial shared deletions in chaperones Hsp90 and Hsp70 of all Posibacteria compared with all negibacteria except Eurybacteria, e.g. Fusobacterium and Thermotoga [79] suggest that Endobacteria, Actinobacteria, and Eurybacteria are all related (Fig. 8) and that sequence trees that group either Endobacteria or Actinobacteria with cyanobacteria or eobacteria that lack these deletions are artifactual. Actinobacteria and Endobacteria are also the only bacteria with resistant resting spores, except for myxobacteria; sporulation and spore germination programmes are so complex that spores probably evolved once only in their common ancestor [1]. These developmental programmes of endobacteria and actinobacteria should be compared in detail to check that they are synapomorphic for Posibacteria. That actinobacterial spores are exospores and less resistant than endobacterial endospores does not preclude a direct relationship; in fungi basidiomycete exospores and ascomycete endospores ultimately had a common origin. If posibacterial endospores and exospores are related, endospores must be the ancestral state, as the unique mode of origin of the endospore by a forespore cell engulfing its sister evolved prior to the divergence of the eurybacterial Selenobacteria and Endobacteria, and on sequence trees Endobacteria nest within Selenobacteria. This developmental mechanism by forespore engulfment is so complex that it is unlikely to have arisen convergently. As endospore formation existed prior to the loss of the OM and origin of the sortase machinery by the ancestral posibacterium, actinobacterial exospores must either have been derived from endobacterial endospores or, less likely, evolved independently.

thumbnailFigure 8. Schematic comparison of the three different basal body structures of eubacterial flagella with the putative ancestral junctional pore complex and the related type III secretion injector. The exoflagella of Proteobacteria and Planctobacteria (Exoflagellata), Sphingobacteria, and Eurybacteria project through the outer membrane, with which they are associated by a lipoprotein L-ring (made of FlgH protein units). Spirochaetes have endoflagella within the periplasmic space that do not penetrate the outer membrane and thus need no L-ring. Exoflagella and spirochaete endoflagella both have a P-ring (made of FlgI protein units) thought to act as a bushing for free rotation within the thin peptidoglycan wall (sacculus). Both P-ring and L-ring are absent from the exoflagella of Posibacteria (Actinobacteria and Endobacteria). Posibacterial flagella would automatically have become external when the ancestral outer membrane was lost. The more complex multiprotein shaft of spirochaetes, clearly a derived character (see text) is shown by its greater thickness. If junctional pore complexes also use a basal type III secretion apparatus, flagella and type III injectors probably evolved from them independently. If junctional pore complexes lack type III secretion homologues, it is likely that they evolved during the origin of flagella only and that type III injectors evolved later in the ancestral exoflagellate by simplification of flagella (dashed arrow); see text for discussion. The diagram assumes that ExbB/TonB/OmpA only associated with the basal body of the flagella and evolved into the flagellar stator MotAB during the origin of flagella.

The recent demonstration that the most early diverging actinobacterium, the filamentous Symbiobacterium, has endospores and highest BLAST hits to Endobacteria for nearly half its conserved proteins [51] has now demolished the classical distinction between Endobacteria and Actinobacteria [1,49]. Symbiobacterium is a high-GC Gram-positive with posibacterial envelope structure [50]. Unless its strong grouping with Actinobacteria on 16S rRNA trees