The evolutionary origin of the eukaryotic cell represents an enigmatic, yet largely incomplete, puzzle. Several mutually incompatible scenarios have been proposed to explain how the eukaryotic domain of life could have emerged. To date, convincing evidence for these scenarios in the form of intermediate stages of the proposed eukaryogenesis trajectories is lacking, presenting the emergence of the complex features of the eukaryotic cell as an evolutionary deus ex machina. However, recent advances in the field of phylogenomics have started to lend support for a model that places a cellular fusion event at the basis of the origin of eukaryotes (symbiogenesis), involving the merger of an as yet unknown archaeal lineage that most probably belongs to the recently proposed ‘TACK superphylum’ (comprising Thaumarchaeota, Aigarchaeota, Crenarchaeota and Korarchaeota) with an alphaproteobacterium (the protomitochondrion). Interestingly, an increasing number of so-called ESPs (eukaryotic signature proteins) is being discovered in recently sequenced archaeal genomes, indicating that the archaeal ancestor of the eukaryotic cell might have been more eukaryotic in nature than presumed previously, and might, for example, have comprised primitive phagocytotic capabilities. In the present paper, we review the evolutionary transition from archaeon to eukaryote, and propose a new model for the emergence of the eukaryotic cell, the ‘PhAT (phagocytosing archaeon theory)’, which explains the emergence of the cellular and genomic features of eukaryotes in the light of a transiently complex phagocytosing archaeon.
The evolutionary origin of the eukaryotic cell represents an enigmatic, yet largely incomplete, puzzle. Despite the large number of mutually incompatible scenarios that have been proposed to explain its origin (for a recent overview, see ), some consensus has been reached about the following points : first, the LECA (last eukaryotic common ancestor) most probably contained the mitochondrial progenitor derived from endosymbiosis with an alphaproteobacterium and, secondly, eukaryotic genomes have a chimaeric nature: genes for information storage and processing are archaea-related, and genes for metabolic or ‘operational’ processes are mostly bacterial in nature (but not necessarily derived from the mitochondrial progenitor). Finally, a significant fraction of the eukaryotic genes encode proteins that are restricted to eukaryotes, the so-called ESPs (eukaryotic signature proteins). Beyond this, the picture gets blurry. Currently, two major questions are of interest: (i) what was the nature of the host that took up the alphaproteobacterium, and (ii) when did eukaryotic complexity arise, before (‘mitochondria-late’) or after (‘mitochondria-early’) the endosymbiotic event that led to the establishment of the mitochondrion? Mitochondria-late models, of which the Archezoa model  is its main protagonist, argue that eukaryotes gradually evolved from a lineage devoid of mitochondria (the proposed Archezoa) and eventually acquired mitochondria (Figures 1A and 1B). Hence such scenarios are compatible with the classical three-pronged classification of the domains of life . In contrast, mitochondria-early models argue that the eukaryotic lineage emerged from a symbiosis between an archaeon and a bacterium (the mitochondrial ancestor), and that this ‘symbiogenesis/fusion’ subsequently triggered the evolution of typical eukaryotic features [5,6] (Figures 1C and 1D).
The evolutionary relationships between the three domains of life
Although both models provide a fairly complete explanation, a smoking gun is missing: cellular intermediates that would support either scenario are unknown to biology thus far . The previously presumed amitochondriate eukaryotes (‘archezoans’) were shown to have once contained mitochondria in their evolutionary past [5,8]. Conversely, evidence for the existence of intermediate cellular entities that would support fusion-like scenarios is also lacking . Obviously, in the absence of intermediate stages of eukaryotic evolution, the gap between extant prokaryotic and eukaryotic cell types remains puzzlingly large , and we can at best guess about the forces that supposedly triggered the emergence of cellular complexity that is characteristic for eukaryotes.
Despite apparent methodological complications , the most recent efforts to shed light on this issue from a phylogenomics point of view seemingly lend support to symbiogenesis/fusion-like scenarios at the base of the eukaryotic origin. These studies are largely consistent with an extended version of the eocyte theory [10,11], which suggest that eukaryotes have emerged from within the archaeal domain of life [12,13] or more specifically from the recently proposed ‘TACK superphylum’ (comprising Thaumarchaeota, Aigarchaeota, Crenarchaeota and Korarchaeota) [14,15]. This idea is supported further by the fact that members of this archaeal group uniquely share a number of presumed ESPs , including bona fide orthologues of actin , tubulin , ESCRT (endosomal sorting complex required for transport) proteins [18–20], ubiquitin protein modifier system components , as well as several eukaryote-type transcription and translation factors (see  for an overview).
Another argument in favour of such a scenario comes from an energetic viewpoint: the acquisition of the mitochondria is believed to have alleviated the eukaryotic progenitor from the bioenergetic constraints to which bacterial and archaeal cells are subjected, and allowed for the immense expansion in the number of genes, and hence allow for the emergence of cellular complexity . In the present paper, we elaborate on these theories, and integrate recent insights to propose a new variant of the fusion model to explain the emergence of the eukaryotic cell. Yet, before we do so, we first zoom in on the genomic differences between prokaryotes and eukaryotes, as these might reveal clues about the processes that have occurred during the evolutionary ‘dark ages’ of the eukaryotic cell.
The evolutionary ‘dark ages’ of the eukaryotic cell
On the basis of comparative analyses of genome sequence data that have become available during the last decade, it has become increasingly clear that the evolutionary gap between prokaryotes and eukaryotes remains extremely large. Irrespective of the available hypotheses to explain this observation, the LECA is inferred to have been highly complex. LECA is inferred to have contained most of the typical eukaryotic features, which, due to the lack of evidence for pre-existing intermediate eukaryotic life forms, seems to have arisen within a relatively short timeframe. Despite the fact that little is known about the period during which the prokaryote-to-eukaryote transition was established, it is clear that three evolutionary forces, i.e. gene duplication, HGT (horizontal gene transfer) and gene genesis, have played a major role in the emergence of eukaryotes. In this section, we briefly discuss these forces.
Massive gene duplication at the basis of the eukaryotic emergence
Gene duplication is widely considered to be a crucial mechanism for evolutionary innovation . The existence of a large amount of pan-eukaryotic paralogues suggests that gene duplication has been a major driving force in eukaryotic evolution. For example, whereas many components of the eukaryotic core machinery comprise two or more gene copies, their prokaryotic counterparts mostly contain only a single gene. A detailed reconstruction of eukaryotic genome content has identified three important lessons . First, the ancestral eukaryotic gene repertoire seems to have approximately doubled in size before the onset of the major eukaryotic radiations. Secondly, gene duplication seems to have played a primordial role in the emergence of eukaryotic features, as functional classes that affiliated with ‘typical’ eukaryotic processes (protein fate, superstructure forming proteins, etc.) have been subjected to extensive duplications. Thirdly, a significant part of the paralogous gene content of the ancestral eukaryotic gene content seems to be a result of lateral gene transfer, which, at least in part, was acquired via the endosymbiosis that gave rise to the emergence of the mitochondrion (see below). A telling example of how duplication has shaped eukaryotic gene content is depicted by the evolution of the Ras superfamily of small GTPases that play pivotal roles in the process of cellular compartmentalization in eukaryotes [25,26]. This superfamily is characterized by several ancient duplication and gene-transfer events. The latter is exemplified by the Rho family of GTPases, which are ubiquitous regulators of phagocytosis in eukaryotes, and which most probably have a bacterial origin .
The origin of bacterial genes in eukaryotes
In addition to gene duplication, HGT events have played a major role in the emergence of the eukaryotic cell, as is evident by the significant amount of genes that clearly are of bacterial origin that reside on eukaryotic genomes. On the basis of phylogenetic studies, part of these bacterial genes can be readily attributed to the alphaproteobacterial endosymbiont from which mitochondria evolved . Yet, the origin of the remaining set of bacterial genes in eukaryotic genomes, which seemingly lacks phylogenetic coherence, is a matter of debate . Whereas some try to explain that these genes have been the result of additional gene-transfer events , for example, by means of additional transient endosymbiotic partners, or a result of gene influx via predation  or parasitism , others envisage that HGT between prokaryotic lineages before  and after [1,32] the origin of mitochondria can readily explain the mosaic nature of bacterial genes in eukaryotes. Hence, whereas both lines of thought explicitly imply a major role for HGT in eukaryotic evolution, the main dispute centres on the question of whether it was a single event (the mitochondrial endosymbiosis) or multiple events that gave rise to these bacterial genes. Importantly, each of these scenarios does not necessarily exclude one another, therefore it is tempting to think that each of them might have played their part in the emergence of eukaryotes.
Eukaryotic innovations at the basis of eukaryogenesis
Eukaryotes have evolved cellular structures and systems that lack a functional counterpart in prokaryotes. Examples of these include the characteristic endomembrane systems in eukaryotes, such as the nucleus, endoplasmatic reticulum and Golgi apparatus, and the existence of a cytoskeleton that facilitates higher-order processes such as cellular motility, phagocytosis and vesicular trafficking. Many of the proteins that are involved in these processes (although not exclusively, e.g. [27,33]), are dominated by ESPs , which generally comprise up to one-third of the gene content of extant eukaryotes. Whereas in the past it has been proposed that such proteins might have been contributed by an extinct donor lineage , this is not likely, as ESPs are involved in a wide variety of eukaryotic processes. Moreover, ESPs are continually being discovered in prokarytotic genomes that have recently been sequenced.
Rather, the emergence of ESPs has been proposed to be the result of molecular innovation events that can be classified into three categories : (i) reuse of prokaryotic proteins and domains for the same biochemical function, but in a different context; (ii) emergence of new biochemical functions and protein superfamilies, but within existing protein folds; and (iii) domains with bona fide new folds, ‘invented’ during the early stages of eukaryotic evolution. The first category includes proteins for which prokaryotic homologues can be readily detected using sensitive sequence or structural homology detection algorithms, and is characterized by a drastic acceleration of sequence divergence. For the third category, it was found that many new domains either had α-helical or metal-chelated supported structure. Because such structures are less dependent on the complex hydrogen-bonding for their stability, it was hypothesized that domains that harbour them are more susceptible for evolution of new folds .
The ‘PhAT’ (phagocytosing archaeon theory) scenario for the origin of eukaryotic complexity
As outlined above, the lack of information about the early stages of eukaryotic evolution currently precludes the rationalization of detailed hypotheses to explain the emergence of the eukaryotic cell in its full complexity. Yet, in the light of continuous discovery of new pieces of this enigmatic puzzle, a revision of previous hypotheses seems justified. In this section, we propose the PhAT for the emergence of the eukaryotic cell, which differs from previous hypotheses in that it involves the emergence of a relatively sophisticated archaeal lineage that stands at the basis of the process of eukaryogenesis. Our hypothesis provides: (i) a rational explanation for the highly mosaic prokaryotic genome content of extant eukaryotes, (ii) an explanation for the significant evolutionary distance between archaeal and eukaryotic orthologous genes as opposed to that of Alphaproteobacteria and mitochondria, and (iii) an alternative explanation for the origin of the nucleus. The hypothesis can be broken down into a number of stages (Figure 2) as follows.
The PhAT scenario for the origin of the eukaryotic cell
1. A transient TACK archaeon
The starting point of the envisioned scenario is an ancestral archaeal lineage, probably belonging to the recently proposed ‘TACK superphylum’  that is supposed to have contained the full collection of ESPs that have currently been identified in archaea, such as actin [16,27], tubulin , the ubiquitin protein modifier machinery  and several proteins involved in transcription and translation. It is possible that this archaeal lineage was only transient .
2. Loss of cell wall
Subsequently, this ancient TACK lineage is envisaged to have lost its proteinacious cell wall, allowing for the evolution of a more flexible actin-based cytoskeleton, such as observed in plasma-like organisms. Ancient duplications of the archaeal actin gene could have been the basis of the emergence of this flexibility [27,37], allowing for the formation of branched actin polymers. Such actin-based structures could have facilitated the formation of cellular protrusions.
3. Primitive phagocytosis causing rampant HGT
Next, the cytoskeleton matured into a primitive phagocytosis machinery. Critically, the primordial uptake and digestion of other prokaryotic cells exposes the phagocytosing archaeon to increasing amounts of genomic DNA of the ingested cells. The latter would have resulted in rampant HGT to the archaeal genome, causing the emergence of a highly mosaic host genome. As a consequence of the high prevalence of HGT, the host genome destabilizes, for example due to detrimental gene dosage effects , causing the genetic material to evolve at accelerated rates. Moreover, the ephemeral elevation of evolutionary rates of sequence evolution also explains why the similarity between eukaryotic and archaeal genes is generally weak as compared with those between Alphaproteobacteria and mitochondria , as the latter genes have not been exposed to such rate acceleration.
4. Evolution of a primitive nucleus to protect genomic integrity
In order to protect the genetic integrity of the host cell genome from further phagocytosed genetic material, a protective membrane boundary is formed via invagination events, giving rise to a primitive karyotic cell type. The protective membrane structure stabilizes the host genome and restores the evolutionary rates to normal. Now that phagocytosis has become an integrated part of its lifestyle, the organism continues to take up prokaryotic cells. One particular cell, probably an ancient alphaproteobacterium, is phagocytosed, but not digested, and establishes a endosymbiotic interaction with the host cell, similar to those that are often observed in microbial eukaryotes (e.g. amoebae or ciliates) . Conceivably, the symbiotic interaction was already established before the ingestion of this protomitochondrial cell type.
5. Energy-induced genome expansion and innovation
Finally, the alphaproteobacterial endosymbiont reductively evolves into an ATP-generating organelle: the mitochondrion. In contrast with genes from ingested cells, the stable endosymbiotic interaction between the mitochondrion and host cell allows organelle–host gene transfer events. The newly provided surplus of energy allows the host cell to evolve cellular complexity , such as the maturation of the nucleus and additional endomembrane systems. Released from the evolutionary pressure to retain a small compact genome, the host genome is allowed to expand. This relaxed genomic stringency allows for genomic rearrangements and results in a playground for genomic innovation via recombination of existing genes, gene duplications and gene genesis events, allowing for the expansion of existing protein families and for the emergence of novel protein folds. In addition, with energy production now taking place at the mitochondrial membranes rather than the outer membrane, the volume of the cell is no longer restricted to typical prokaryotic dimensions.
In the present paper, we present a scenario (PhAT) for the origin of the eukaryotic cell, which has a primitive phagocytosing archaeon at its basis. Although this idea has been suggested before [27,40], PhAT represents the first explicit model that implements archaeal phagocytosis at the basis of the process of eukaryogenesis, as it provides an explanation for the origin of the nucleus and mitochondria, as well as for the mosaic bacterial gene content in eukaryotes. Although thought-provoking, the current model is as plausible (or implausible) as any other theory for the emergence of a eukaryotic cell that has been proposed in the past with respect to the lack of supporting evidence for intermediate eukaryotic life forms.
Whereas phylogenomic support for fusion-like origin of the eukaryotic cell is growing steadily [12–15], the mosaic bacterial gene content in eukaryotic genomes is seemingly at odds with such models. In the past, this anomaly has been explained: (i) by the existence of multiple endosymbiotic partners, each of which contributed genetic material to the host cell [2,41], or (ii) by HGT among prokaryotes before , and after [1,32], the single ancient acquisition of the alphaproteobacterial ancestor of mitochondria. Although we acknowledge that HGT is a major driving force in prokaryotic genome evolution, we do not believe that the bacterial ‘pot-pourri’ of genes in eukaryotic genomes can be explained by a single ancient acquisition event (i.e. the mitochondrial ancestor). Indeed, differential losses notwithstanding, approximately 20% of ancestral eukaryotic genes present in LECA show signatures of multiple bacterial and archaeal origins . For example, several eukaryotic genes involved in key functions, such as glycolysis , central anaerobic metabolism  and mitochondrial translation , do not display phylogenetic affiliation with currently sequenced Alphaproteobacteria. Hence, rather than invoking a single acquisition event, we propose that a significant part of these bacterial genes originate from phagocytotic ingestion of prokaryotes, similar to the ‘you are what you eat’ concept proposed by Doolittle . It should be noted, however, that in Doolittle's scenario, the phagocytosing cell was envisaged to be a fully fledged eukaryote, rather than an archaeon as in the PhAT scenario.
Traditionally, the origin of the nucleus has been explained via an endosymbiosis event, such as in certain variants of the eocyte theory ) or via membrane invagination events (‘autogenous karyogenesis’). The best-known autogenous model is the endokaryotic hypothesis , which envisages that the nucleus was formed to decouple the process of transcription from translation as a defence mechanism against an invasion of self-splicing introns. Under the PhAT scenario, we propose another variant of the autogenous model, which entails the emergence of the nucleus as a defence mechanism against phagocytosis-induced HGT. Prokaryotes have evolved several mechanisms to both promote (e.g. conjugation, gene transfer agents, DNA-containing vesicles) and counter [CRISPR (cluster of regularly interspaced palindromic repeats) and restriction-modification systems] the exchange of genetic material. Apart from the fact that the efficacy of CRISPR-based systems is thus far restricted to defence against viral and plasmid sequences, we envisage that the available defence systems were insufficient to protect the host cell against the destabilizing effects of phagocytosis-induced HGT, prompting the emergence of an alternative mechanism. Indeed, the fact that eukaryotic genomes do not encode these prokaryotic defence systems seems to support this idea, as does the notion that rates of HGT from prokaryotes to eukaryotes are much lower than between prokaryotes. Archaeal genomes, in particular those of members of the TACK superphylum, encode a number of eukaryotic proteins that have the ability to affect membrane bending , and we envisage that such proteins have been instrumental in the initial formation of nucleus-like compartments. Although seemingly far-fetched, the formation of ‘higher-order’ intracellular membrane systems has occurred several times independently in evolution, and is not restricted to eukaryotes: several prokaryotic lineages, including cyanobacteria (thylakoid membranes), members of the Planctomycetes–Chlamydia–Verrucomicrobiae superphylum (possibly also including Poribacteria) , and, most recently, Archaea , are known to contain membrane structures, each equipped with specific functions. Therefore the emergence of the nucleus as a protective barrier against excessive HGT in the primitive phagocytosing archaeon should not be rendered implausible on these grounds. Arguably, despite its costs, phagocytosis is obviously a highly profitable activity, as is evident from by the plethora of phagocytic protists that exist in Nature.
The evolutionary transition from prokaryote to eukaryote has been a complex journey in which non-tree-like processes have probably played a decisive role [22,36]. Unfortunately, we are currently lacking the resolution that is needed to distinguish the processes that have been instrumental in this transition from the ephemeral ones. Still, with the recent advances in next-generation sequencing technologies, as well as in high-throughput techniques that allow for the genomic exploration of the vast uncharacterized microbial diversity (e.g. single-cell genomics, metagenomics), the coming years will see a vast expansion of new and exciting genomic data that will help us obtaining this resolution. In addition, the genomic characterization of presumed intermediates in eukaryotic evolution, such as the recently discovered illusive ‘Myojin parakaryote’ , will undoubtedly shed new light on the dark ages of the eukaryotic cell.
Molecular Biology of Archaea 3: An Independent Meeting held at the Max Planck Institute for Terrestrial Microbiology, Marburg, Germany, 2–4 July 2012. Organized and Edited by Sonja-Verena Albers (Max Planck Institute for Terrestrial Microbiology, Germany), Bettina Siebers (University of Duisberg-Essen, Germany) and Finn Werner (University College London, U.K.).
We apologize to those authors whose work we could not refer to owing to space constraints.
The work in T.E.'s laboratory is supported by the Swedish Research Council [grant number 621-2009-4813], and by European Research Council (ERC) [grant number 310039-PUZZLE_CELL] and Marie Curie European Reintegration Grant (ERG) [grant number 268259-RICKOCHET] grants from the European Union.