Archaeal viruses, or archaeoviruses, display a wide range of virion morphotypes. Whereas the majority of those morphotypes are unique to archaeal viruses, some are more widely distributed across different cellular domains. Tailed double-stranded DNA archaeoviruses are remarkably similar to viruses of the same morphology (order Caudovirales) that infect many bacterial hosts. They have, so far, only been found in one phylum of the archaea, the Euryarchaeota, which has led to controversial hypotheses about their origin. In the present paper, we describe the identification and analysis of a putative provirus present in the genome of a mesophilic thaumarchaeon. We show that the provirus is related to tailed bacterial and euryarchaeal viruses and encodes a full complement of proteins that are required to build a tailed virion. The recently discovered wide distribution of tailed viruses in Euryarchaeota and the identification of a related provirus in Thaumarchaeota, an archaeal phylum which might have branched off before the separation of Crenarchaeota and Euryarchaeota, suggest that an association of these viruses with Archaea might be more ancient than previously anticipated.
Viruses of Archaea, the third domain of life, exhibit an impressive range of diversity [1,2]. On the basis of their various morphologies and genomes, these archaeoviruses have been classified into many different, often novel, families [1,3]. Their unique nature has inspired speculation on the evolution and diversification of viruses in general [1,4]. Whereas many viruses isolated from thermophilic archaea are not reminiscent of any known bacterial or eukaryotic virus, some morphotypes of archaeoviruses are found in the other two domains. In particular, tailed ds (double-stranded) DNA archaeoviruses display morphologies that are indistinguishable from the well-characterized head-and-tail viruses infecting bacteria [5–7]. Until recently, these tailed viruses were found to be associated with archaeal species belonging to only two classes of the phylum Euryarchaeota. This was in contrast with the dominance of tailed viruses in bacteria. The narrow phylogenetic distribution as well as the relatively high sequence similarity to tailed bacterial viruses led to the suggestion that these archaeoviruses emerged in archaea as a result of a recent interdomain transfer from bacteria . However, with accumulation of complete archaeal genome sequences, it became recently clear that tailed viruses are associated with organisms from the majority of euryarchaeal classes and that they co-evolved with their hosts .
Since tailed dsDNA viruses have only been found in Euryarchaeota, but not in the second major phylum of archaea, the Crenarchaeota, the study of archaeoviruses from additional phyla might provide clues about the ancestry of tailed viruses. Recently, a novel archaeal phylum, the Thaumarchaeota, has been proposed on the basis of comparative genomics and phylogenomic analyses of three genomes of ammonia-oxidizing archaea residing in marine and thermophilic environments [9,10]. They are representatives of an abundant and widespread group of organisms in terrestrial and marine environments [11,12]. Although these ammonia-oxidizers were long considered to be affiliated with Crenarchaeota, the detailed analysis of information-processing genes revealed that they constitute a separate phylum that might have emerged before the divergence of Crenarchaeota and Euryarchaeota [9,10]. Interestingly, no proviruses can be identified in the genomes of the two available thaumarchaeal genomes. In the present paper, we describe the first archaeal provirus identified in the genome of a thaumarchaeote that has recently been obtained in laboratory culture. Its structure and similarity to known tailed bacterial and archaeal viruses sheds light on the evolution and distribution of these entities in the three domains of life.
A draft genome sequence of the ammonia-oxidizing thaumarchaeon Candidatus ‘Nitrososphaera viennensis’ strain EN76 (M. Tourna, M. Stieglmeier, A. Spang, T. Rattei and C. Schleper, unpublished work) was obtained by 454 pyrosequencing (Genome Sequencer FLX system with GS FLX Titanium Methods, Roche) and subsequent assembly . The investigation of this draft genome sequences of Ca. ‘Nitrososphaera viennensis’ revealed a contig of ~24 kb encoding a relatively high proportion (9/30; 30%) of putative proteins with counterparts in bacterial and archaeal viruses (Figure 1A and see Supplementary Table S1 at http://www.biochemsoctrans.org/bst/039/bst0390082add.htm). This suggested that the respective region might represent a provirus integrated into the host chromosome, which we designate as Nvie-Pro1.
Nvie-Pro1 is related to tailed bacterial and archaeal viruses
The large subunit of the terminase
Nvie-Pro1 was found to encode a homologue of the large subunit of the terminase (TerL) (protein Nvie-2; Supplementary Table S1). TerL is one of the hallmark proteins exclusively encoded by tailed bacterial and euryarchaeal dsDNA viruses of the order Caudovirales as well as eukaryotic herpesviruses [8,14]. TerL proteins are composed of two functionally distinct domains: the ATPase domain and the nuclease domain . The ATPase domain powers the translocation of the viral genomic DNA into empty procapsids, whereas the nuclease domain is responsible for cutting the concatameric viral DNA into genome-length units. The two domains display a set of conserved motifs [14,15]. Alignment of the putative TerL from Nvie-Pro1 with homologues encoded by diverse bacterial, euryarchaeal and eukaryotic viruses showed that all of the motifs characteristic to TerL proteins are also conserved in the proviral sequence (Figure 1B and see Supplementary Figure S1 at http://www.biochemsoctrans.org/bst/039/bst0390082add.htm).
Additionally, a BLAST search against the environmental sequence database at NCBI revealed several TerL homologues in the marine metagenome. Notably, the Nvie-Pro1 TerL displays considerably higher sequence identity with proteins from the environmental database (e.g. ECU80075, 45% identity over 362 amino acids) than it does with proteins from known bacterial and euryarchaeal (pro)viruses (Supplementary Table S1). It is therefore possible that marine thaumarchaea are also infected by head-and-tail viruses, although it cannot be excluded that these homologues represent marine bacterial (pro)viruses.
A multifunctional MCP (major capsid protein)
In tailed dsDNA viruses and herpesviruses, virion assembly starts with a scaffolding-protein-dependent construction of an empty procapsid which subsequently undergoes maturation by proteolytic cleavage of the scaffolding protein . The protease responsible for this maturation step is usually encoded immediately upstream of the genes for scaffolding and MCPs . The majority of tailed dsDNA viruses of the order Caudovirales and herpesviruses encode a specific capsid maturation protease, which is structurally distinct from the known cellular proteases [18,19]. Interestingly, the herpesvirus-like protease genes have been found to be rather frequently displaced in tailed bacterial viruses by genes encoding ClpP-like serine proteases while preserving their location in the viral genome as well as the role in the capsid maturation process . Besides the TerL homologue, Nvie-Pro1 also encodes a putative MCP related to those encoded by viruses of the order Caudovirales and herpesviruses [8,20,21], another hallmark protein unique to this viral lineage . This putative MCP (Nvie-8; Supplementary Table S1) is 576 amino acids long and appears to be composed of at least two distinct domains.
The N-terminal part of the protein (residues 1–202) shares significant sequence similarity with chymotrypsin-like serine proteases (Figure 2). Notably, the three residues constituting the catalytic triad of chymotrypsin-like proteases are perfectly conserved in the Nvie-Pro1 protein (His44, Asp81 and Ser178; Figure 2). The position of nvie-8 gene in the proviral genome as well as its fusion to the gene for the MCP (see below) strongly suggest that it might have been involved in the capsid maturation of the virus that gave rise to Nvie-Pro1. Notably, even though chymotrypsin-, ClpP- and herpesvirus-like proteases all belong to the serine protease superfamily, they have distinct structural folds and are believed to have originated independently . To the best of our knowledge, Nvie-8 represents the first example of a chymotrypsin-like maturation protease encoded by a (putative) head-and-tail (pro)virus.
Domain organization of the putative protease–MCP encoded by Nvie-Pro1
The C-terminal half of the protein is occupied by a putative MCP domain. The identity of this domain could not be deduced using conventional BLAST searches. We therefore exploited a more sensitive structural-fold-recognition-based approach for distant homology prediction. For this purpose, the protein sequence was submitted to the Structure Prediction Meta Server . Using this approach, the C-terminal domain (residues 283–576) was recognized as a homologue of the MCP of siphovirus HK97 (PDB code 3E8K) with a highly significant score of 177.75 (scores above 50 are considered to be significant) . In order to verify the validity of this prediction, we performed a homology-based structural modelling experiment. The three-dimensional model of the putative MCP of Nvie-Pro1 was generated with MODELLER program  using the X-ray structure of the MCP gp5 of HK97  as a template (Figure 2). The stereochemical quality of the model was then assessed using ProSA-web  and compared with that of the template X-ray structure. The ProSA-web quality score (Z) for the Nvie-Pro1 model (Z=−5.24) was similar to that calculated for the template structure (Z=−5.88) and was well within the score range calculated for other experimentally determined structures, which is from −2 to −11.2 for proteins of ~300 amino acids in length . The good quality of the model indicates that the C-terminal half of the protein Nvie-8 can adopt the HK97 gp5-like topology without extensively violating the known protein folding rules and is therefore likely to represent the MCP of Nvie-Pro1. Using a similar bioinformatic approach, we have shown previously that euryarchaeal tailed viruses also utilize the HK97-like structural fold for capsid construction . The observation that Nvie-Pro1 encodes both the TerL homologue and the HK97-like MCP strongly suggests that this thaumarchaeal provirus is related to tailed bacterial and euryarchaeal viruses.
Putative scaffolding domain
The scaffold protein of bacterial tailed viruses is often fused to either the maturation protease protein (e.g. myovirus P2)  or the MCP as in the case of siphovirus HK97 . The presence of the protease and the MCP domains in the same polypeptide is infrequent, but not unprecedented. A similar domain organization has been recently reported for the protein encoded by bacterial virus Gifsy-2 . In the latter case, however, the protease domain is related to ClpP-like serine proteases. Interestingly, the region between the protease and MCP domains in the Gifsy-2 protein was suggested to play a role of a scaffolding protein. It is therefore tempting to speculate that the linker region (residues 202–282) between the N-terminal protease domain and the C-terminal MCP domain in Nvie-8 (Figure 2) might also perform a scaffolding function.
Genome synteny with tailed bacterial and euryarchaeal viruses
We have defined previously the set of genes conserved in tailed (pro)viruses infecting bacteria and euryarchaea . Careful examination of the putative gene product sequences of Nvie-Pro1 revealed that the provirus encodes an entire protein complement required to build a functional head-and-tail virion (see Supplementary Table S1). We were able to identify genes for capsid assembly (MCP, portal) and maturation (putative prohead protease and a homologue of the Mu protein gpG), genome packaging (TerL) as well as tail formation (major tail protein, tail tape measure protein, baseplate and tail fibres). The sequence similarity of these gene products to their counterparts in bacterial and archaeal (pro)viruses was in the range 23–35% (see Supplementary Table S1). The low pairwise sequence similarity indicates that Nvie-Pro1 is not closely related to any tailed (pro)viruses characterized to date. Nevertheless, the organization of these morphogenetic genes is remarkably syntenic when compared with those of tailed bacterial viruses (Figure 1A). These observations suggest that the virus at the origin of Nvie-Pro1 provirus also relied on similar strategies for capsid and tail assembly, maturation and genome packaging as tailed dsDNA viruses infecting bacteria and euryarchaea.
Defective or not?
In addition to the morphogenetic module necessary for building of a head-and-tail virion, tailed euryarchaeal viruses usually encode modules for genome replication and, in the case of temperate viruses, also modules for the integration into the host chromosome . However, not all euryarchaeal tailed viruses encode apparent genome-replication proteins. For example, the complete genome sequence of siphovirus ψM2, infecting the euryarchaeon Methanothermobacter marburgensis, did not reveal any candidate proteins for genome replication . Similarly, we were not able to identify any putative genome-replication module in Nvie-Pro1.
Tailed dsDNA viruses generally integrate their genomes into the cellular chromosome with the aid of serine or tyrosine recombinases, which are usually encoded by the virus. In fact, all tailed euryarchaeal (pro)viruses for which complete genome sequences are available, even those that are considered to be lytic (e.g. HF1 and HF2) , encode identifiable tyrosine recombinases of the phage integrase family . Nvie-Pro1, on the other hand, does not possess an apparent integrase gene. Furthermore, the putative attachment sites (such as direct repeats flanking the provirus), signatures of tyrosine integrase-mediated recombination reaction, could not be identified. Consequently, it is not possible to define the precise borders of Nvie-Pro1. We considered that open reading frames with numerous homologues in other archaea (green arrows in Figure 1A) signify the termini of the provirus.
At the moment, it is not possible to predict with confidence whether the putative provirus is defective or not. On one hand, absence of directly associated modules for genome replication and integration would argue against the possibility for Nvie-Pro1 to be inducible. On the other hand, lack of apparent disrupted genes and presence of genes for all major structural proteins suggests that, in principle, Nvie-Pro1 might be capable of producing tailed virions. Yet another possibility is that Nvie-Pro1 represents a GTA (gene-transfer agent) rather than a provirus. GTAs resemble head-and-tail viruses in their appearance, but, unlike viruses, they do not encapsidate the genomic sequence that encodes their virus-like particles. Instead, GTAs carry random cellular DNA and transfer it horizontally from one cell to another . Morphogenetic GTA proteins are encoded on a cellular chromosome where they are under control of cellular promoters and transcriptional regulators . Nevertheless, the structural proteins are homologous with those of tailed dsDNA viruses . Notably, GTAs were documented not only in bacteria, but also in the methanogenic euryarchaeon Methanococcus voltae . We have identified previously a cryptic provirus in the M. voltae genome and suggested that it might represent the genomic region encoding the GTA  observed by Bertani . As in the case of Nvie-Pro1, the cryptic provirus contains all major virion structural-protein-coding genes, but no genes for an integrase or genome-replication proteins.
Whatever the nature and function of Nvie-Pro1, its evolutionary relationship to tailed viruses of the order Caudovirales is hardly questionable. The identification of this provirus in a thaumarchaeal genome sheds more light on the evolution of tailed archaeal viruses and revives the question of their origin in Archaea. Given the high degree of morphological and genomic similarity between bacterial and archaeal tailed dsDNA viruses , the possibility of independent origins for these viruses in the two cellular domains can be ruled out with certainty. Consequently, two alternative routes for the origin of tailed archaeal viruses have been proposed . The first possibility is that the ancestor of tailed viruses predated the divergence of bacteria and archaea and, as cellular organisms diversified into distinct domains of life, tailed viruses co-evolved and diversified with their hosts. The second scenario posits that tailed viruses emerged in archaea as a result of horizontal transfer across the domain boundary from bacteria . The latter hypothesis was in part based on the fact that, at the time, tailed viruses were isolated only from archaeal species belonging to the classes Halobacteria and Methanobacteria, as opposed to the global distribution of tailed dsDNA viruses in bacteria. The observation that this morphotype corresponded to less than 1% of virus-like particles in a hypersaline environment was also considered to be supporting evidence for the horizontal-transfer hypothesis . However, our recent survey of archaeal proviruses related to tailed dsDNA viruses of the order Caudovirales has indicated that these viruses are also in contact with members of the archaeal classes methanococci and methanomicrobia , as well as archaeoglobi (M. Krupovic, unpublished work). It therefore appears that organisms belonging to the majority of classes of the phylum Euryarchaeota possess tailed viruses associated with them. Furthermore, comparative genome analysis indicated that tailed archaeal (pro)viruses tend to form groups that follow the taxonomic grouping of the cellular organisms that they infect, suggesting co-evolution of these tailed viruses with their hosts . The wide distribution of head-and-tail viruses in Euryarchaeota and the identification of Nvie-Pro1 in Thaumarchaeota, an archaeal phylum which might have branched off before the separation of Crenarchaeota and Euyarchaeota [9,10], suggest that association of these viruses with archaea might be more ancient than anticipated previously (Figure 3). Although no relatives of Nvie-Pro1 are found in the two available genomes from marine Thaumarchaeota, the presence of several close homologues in the environmental GOS (Global Ocean Sampling Expedition) marine sequence database (Table S1) suggests a high and still unexplored diversity.
Distribution of tailed dsDNA viruses in the domain Archaea
It has been suggested previously that structurally related viruses infecting hosts from different domains of life descend from a common ancestor that existed before the divergence of cellular organisms [22,34,35]. Such structurally related viruses were suggested to be grouped into viral lineages, and one of these lineages unites tailed viruses of the order Caudovirales and eukaryotic herpesviruses [20,35]. The identification of Nvie-Pro1 in the thaumarchaeon Ca. ‘Nitrososphaera viennensis’ suggests that tailed viruses might have been present in Archaea from the very emergence of this cellular domain (Figure 3). If this is the case, at the time of the last common ancestor of bacteria and archaea, the population of tailed viruses already consisted of individuals with different tail structures. Indeed, viruses with contractile and non-contractile tails (families Myoviridae and Siphoviridae respectively) have been isolated from both cellular domains . It is obvious that evolutionary history of tailed viruses in Archaea is far from simple. It appears to consist of an element of vertical descent from a common ancestor with bacterial tailed viruses, but also of horizontal gene exchange between bacterial and archaeal viruses. Multiple instances of interdomain transfer of tailed viruses from bacteria to archaea are highly unlikely, due to fundamentally different transcription and replication machineries in the two domains. It is perhaps more reasonable to envisage that transfer of bacterial tailed virus genes has occurred as a result of recombination between archaeal tailed virus genomes and provirus-containing exogenous bacterial DNA which could have been acquired by archaeal cells from the environment in a course of natural transformation. Obviously, more genomic sequences of tailed (pro)viruses from archaeal species covering a wider phylogenetic range are required in order to understand the relationship of these viruses to their bacterial relatives.
Molecular Biology of Archaea II: A Biochemical Society Focused Meeting held at Robinson College, Cambridge, U.K., 16–18 August 2010. Organized and Edited by Stephen Bell (Oxford, U.K.) and Finn Werner (University College London, U.K.).
We thank Marion Engel and Michael Schloter for support in genome sequencing and assembly and Thomas Rattei for bioinformatic help on genome annotation.
This work was supported by the European Molecular Biology Organization [Long-Term Fellowship ALTF 347–2010 to M.K.] and the Austrian Academy of Sciences (DOC-fForte fellowship to A.S).