The ongoing Ebola virus (also known as Zaire ebolavirus, a member of the Ebolavirus family) outbreak in West Africa has so far resulted in >28000 confirmed cases compared with previous Ebolavirus outbreaks that affected a maximum of a few hundred individuals. Hence, Ebolaviruses impose a much greater threat than we may have expected (or hoped). An improved understanding of the virus biology is essential to develop therapeutic and preventive measures and to be better prepared for future outbreaks by members of the Ebolavirus family. Computational investigations can complement wet laboratory research for biosafety level 4 pathogens such as Ebolaviruses for which the wet experimental capacities are limited due to a small number of appropriate containment laboratories. During the current West Africa outbreak, sequence data from many Ebola virus genomes became available providing a rich resource for computational analysis. Here, we consider the studies that have already reported on the computational analysis of these data. A range of properties have been investigated including Ebolavirus evolution and pathogenicity, prediction of micro RNAs and identification of Ebolavirus specific signatures. However, the accuracy of the results remains to be confirmed by wet laboratory experiments. Therefore, communication and exchange between computational and wet laboratory researchers is necessary to make maximum use of computational analyses and to iteratively improve these approaches.
The recent Ebola virus outbreak in West Africa has changed our perception of the threat that is caused by members of the Ebolavirus family . It was of unprecedented size resulting in 28639 confirmed cases and 11316 deaths as of 28th February 2016 (www.who.int). Previous Ebolavirus outbreaks had ranged from a very few infected individuals to a few hundred cases with the 2000 outbreak in Uganda being the largest, affecting 425 individuals and causing 224 deaths . The current outbreak started in Guinea in December 2013 and has (as of 15th March 2016) still not been declared over due to flare-ups (www.who.int). During this outbreak evidence has emerged that Ebola viruses were able to persist and remain infective for a long time in immune-privileged body sites, which complicates effective outbreak control. Ebola viruses may persist for more than a year in the semen after the recovery from acute disease, resulting in reported cases of sexual transmission [3–5]. Moreover, Ebola virus RNA was detected in vaginal fluids 33 months and in breast milk 16 months after acute disease (www.who.int) and shown to persist in the eye . The risk of new transmission from these persistent infections is not currently known; however, taken together, these findings caused concerns about future large outbreaks [1,7].
The West Africa Ebola virus outbreak also resulted in a dramatic increase in scientific interest into the disease as indicated by a large rise in the number of Ebola related scientific publications. As of 15th March 2016, the search term ‘Ebola’ resulted in 5055 hits in PubMed (http://www.ncbi.nlm.nih.gov/pubmed) including 3195 hits in the years 2014–2016 alone. More articles were published on Ebola in the last 3 years (63%) than in the 37 years from the discovery of the first two Ebolavirus species (Sudan viruses and Ebola viruses) in 1976 [8–10] until 2013. However, the number of original studies that investigate Ebolavirus biology is actually rather limited. The 5055 Ebola related articles included: 582 reviews, 290 editorials, 423 ‘News’ articles and 249 comments. The combination of ‘Ebola’ and ‘public health’ resulted in 2492 articles and the combination of ‘Ebola’ and ‘epidemiology’ in 1656 articles. Analysis of a sample containing the first 100 articles returned by the PubMed search for the term ‘Ebola’ revealed only 16 original research articles that investigated the virus biology or antiviral therapeutic strategies. The reason for this is the small number of high containment level laboratories that enable experimental work using Ebolaviruses. The use of computational approaches is a strategy to increase research capacity on pathogens that require biosafety level 4 containment such as Ebolaviruses. Here, we provide an overview of recent computational approaches to study Ebolavirus biology.
Ebolavirus terminology, family members and genome organization
The nomenclature in this manuscript follows the recommendations of Kuhn et al. . The genus is Ebolavirus. It is only italicized if the name refers to the genus but not if it refers to physical viruses or virus parts or constituents such as proteins or genomes. The species are Zaire ebolavirus (type virus: Ebola virus, EBOV), Sudan ebolavirus (type virus: Sudan virus, SUDV), Bundibugyo ebolavirus (type virus: Bundigugyo virus, BDBV), Taï Forest ebolavirus (formerly Côte d'Ivoire ebolavirus; type virus: Taï Forest virus, TAFV) and Reston ebolavirus (type virus: Reston virus, RESTV). Ebolaviruses are negative strand RNA viruses and members of the Filoviridae that also include the genera Marburgvirus and Cuevavirus . All family members, with the exception of Reston viruses, cause haemorrhagic fever in humans, typically associated with a high mortality rate.
Ebolavirus genomes contain seven genes, which encode nine proteins. The proteins are RNA dependent RNA polymerase (L), nucleoprotein (NP), glycoprotein (GP), soluble GP (sGP), small soluble GP (ssGP) and a further four structural proteins that are called VP24, VP30, VP35 and VP40. The small number of proteins encoded by the genome requires many of them to be multifunctional, for example different oligomeric forms of VP40 have roles in membrane trafficking, virus assembly and budding and in transcriptional regulation .
Availability of genomics data, sequence variation and virus evolution
The use of next generation sequencing for the characterization of isolates from the current West African outbreak [14–18], including the use of MinION nanopore sequencers , has substantially increased the available information about Ebola virus genetics and evolution. Such large datasets provide extensive opportunities to use computational analyses to study Ebolaviruses. The Ebola Nextstrain website (http://ebola.nextstrain.org) provided real-time analysis of Ebola virus evolution during the outbreak, adding data from the sequencing projects as they became available.
Initial sequencing of 99 Ebola virus genomes in Sierra Leone  suggested a relatively high rate of evolution within the 2014 outbreak, approximately 2-fold greater than between different outbreaks. The analysis also suggested a single transmission from an Ebola virus reservoir followed by subsequent human-to-human transmission as the cause of the outbreak. However, later studies indicated lower mutation rates of the order of 1×10−3 substitutions per site per year, which is the rate observed in previous outbreaks . It is proposed that the Gire study observed a higher rate of substitution because a short sampling time did not allow for purifying selection to remove mutations that were deleterious to viral function [14,17].
Hoenen et al.  sequenced Ebola viruses from infected individuals in Mali during the current outbreak. They identified very few non-synonymous amino acid changes and the few present did not map to functional regions of Ebola virus proteins. From their findings they propose that the virus has been undergoing limited evolution during the outbreak and that there is no evidence of increased virulence or transmissibility . Phylogenetic analysis of further Ebola viruses from infected patients in Sierra Leone identified three different lineages, with one of the lineages consisting of seven sub-lineages . Carroll et al.  sequenced 179 Ebola virus patient samples from Guinea, phylogenetic analysis identified two lineages (A and B). Lineage A was present earlier in the outbreak (not observed after July 2014) and thought to have been contained by response to the outbreak. However, lineage B shows spread across Guinea, Sierra Leone and Liberia.
Phylogenetic analysis of the sequences of 85 Ebola virus genomes from Guinea by Simon-Loriere et al.  identified three lineages. Although the rate of substitutions is comparable with the other studies, they observed non-synonymous substitutions in the VP35, GP and L proteins. Some of the GP changes are in the mucin like domain and it is proposed that they could alter glycosylation of GP or affect the shape of the virus . In VP35, mutations were identified in the interferon inhibitory domain. However, the functional implications of this sequence variation remain unclear.
Azarian et al. investigated Ebola virus evolution by focusing on changes in GP, the most variable of the Ebola virus proteins. They analysed the Ebola virus sequences from 65 genomes from a range of outbreaks and infections in both humans and great apes. Their analysis suggests that the variation observed is primarily a result of neutral genetic drift and therefore the emergence of strains with largely altered pathogenicity or transmission is unlikely.
Most recently Quick et al.  used MinION nanopore sequencers to perform ‘real-time’ sequencing in the field with results available within 24 h. They sequenced 142 Ebola virus genomes during 2015. The nanopore sequencer has a higher error rate than other next generation sequencing approaches but is able to sequence very long DNA fragments. Quick et al. developed a bioinformatics pipeline to generate high accuracy genotypes from the sequence data, which was tested on Ebola virus samples from previous outbreaks. Analysis of their findings revealed that the isolates belonged to two main lineages GN1 and SL3. GN1 had largely been confined to Guinea and identified early on in the outbreak, although SL3 originated in Sierra Leone before spreading to Guinea. Most importantly though, this study demonstrates the ability to use portable sequencers to quickly obtain genomic data in viral outbreaks, enabling effective surveillance.
Taken together, the available Ebola virus sequences do not suggest virus evolution towards easier human-to-human transmission nor changes in virulence. However, the availability of a large number of sequences provides a basis for extensive computational analysis.
Other groups have performed epidemic modelling of Ebola. Walsh et al.  considered outbreaks prior to 2014. Their phylogenetic analysis identified that all outbreaks were epidemiologically related to the original outbreak in 1976. They proposed a wave like spread of the virus from this source and based on the distance between outbreaks this spread progressed at a rate of ∼50 km/year. Many groups have used mathematical modelling of the 2014 outbreak to predict how the virus would spread (recently reviewed ).
Prediction of microRNAs
Three studies have used bioinformatics approaches to predict miRNAs in Ebola viruses [22–24]. The studies used similar pipelines to first search for precursor miRNA stem–loop structures in the Ebola virus genomes, followed by predicting mature miRNAs from the precursors and modelling the structures of the miRNAs. Liang et al. identified three putative miRNAs and two putative precursor miRNAs. They designed plasmids that express the predicted miRNAs and silence reporter plasmids that express artificial target sequences .
Teng et al.  predicted four pre-miRNAs and eight mature miRNAs. The mature miRNAs were predicted to have 138 potential target proteins in the human proteome. Analysis of the gene ontology functions associated with the target genes highlighted three associated biological processes: response to stimulus, immune system processes and multicellular organismal processes. Pathway analysis indicated that targeted genes have functions in the PDGF signalling, Purine metabolism, PI3 kinase, Ras and NF-kB pathways. The enrichment of such functions in the targeted pathways supports that they may be relevant to Ebola virus infection.
Liu et al.  predicted one pre-miRNA and two mature miRNAs. Computational analysis was used to identify potential targets of these miRNAs. One of the miRNAs was predicted to target Karyopherin alpha 1, which has a role in interferon signalling through transport of STAT1 to the nucleus. The authors proposed that this miRNA has a role in preventing interferon signalling. VP24 is also known to bind to Karyopherins (see Predicting molecular determinants of Ebolavirus pathogenicity) to prevent interferon signalling, so this finding suggests that the virus has multiple mechanisms to block the immune response.
The studies propose that these miRNAs may be used by Ebola virus to down-regulate host cell gene expression and that they could be targets for diagnostics as miRNAs have been shown to circulate in the blood. However, these initial results remain to be validated in vivo during Ebola virus infection. Further research on the presence and potential roles of miRNAs in Ebola virus are required, particularly to consider if they have a role in pathogenicity or offer evolutionary advantages.
Identifying Ebola virus specific signatures as therapeutic, diagnostic or vaccine targets
The development of vaccines, diagnostics and druggable targets in viruses are important elements in the prevention, identification and treatment of infection. Silva et al.  developed a method EAGLE to identify minimal relative absent words (RAWs), sequences that are present in one species (i.e. the pathogen) but not present in another (i.e. the host). Such sequences may be used in the identification of druggable targets and the development of vaccines and diagnostics. Silva et al. used EAGLE to identify RAWs present in 165 Ebola virus sequences but not in the human reference genome. With words of lengths (amino acid stretches) 12 they identified three RAWs that map to conserved regions of Ebola virus proteins. One of the motifs is located in the N terminal of the NP protein and the other two are present in the protein L. These Ebola virus unique sequences present novel findings in the Ebola virus genome that could be exploited for the range of uses described above. Moreover, a very recent study identified potential B- and T-cell epitopes in conserved regions of GP .
Predicting molecular determinants of Ebolavirus pathogenicity
Two recent studies [27,28] investigated the molecular causes of Ebola virus human pathogenicity using both sequence analysis and structural bioinformatics. Both utilized the knowledge that Reston viruses are the only Ebolavirus species that is not pathogenic in humans. They used the basic premise that the differences in the genomes between these species contain the differences responsible for the discrepancies in human pathogenicity. Whereas the approaches used were similar their interpretation differed.
We compared 179 human pathogenic Ebolavirus genomes from all four human-pathogenic Ebolavirus species (Z. ebolavirus, S. ebolavirus, B. ebolavirus, Taï Forest ebolavirus) with 17 Reston virus genomes and focused on differences in protein coding regions . Initially, we identified specificity determining positions (SDPs)  between the two groups of sequences. SDPs are positions that are differentially conserved between the two groups (i.e. conservation of one amino acid in one group and conservation of a different amino acid in the second group). A total of 189 SDPs were predicted in the seven main Ebola virus proteins. Subsequent structural modelling using Phyre2 [30,31] mapped 47 of the SDPs on to protein structures. Analysis of the potential effects of these differences suggested eight SDPs that were likely to have an effect on protein stability (2) or protein–protein interactions (6). Four of these SDPs were present in the protein VP24.
VP24 is a multifunctional protein that has a least three roles in: the viral nucleocapsid, the regulation of virus replication and the antagonism of the host interferon response [32–35]. VP24 is known to inhibit interferon by binding both karyopherin proteins and STAT1, preventing STAT1 nuclear localization and therefore blocking activation of the interferon response . Our analysis identified three SDPs located in the VP24-karyopherin α5 (KPNA5) binding site (Figure 1). Based on their agreement with residues that when mutated reduce Ebola virus VP24 binding with KPNA5  we proposed that these SDPs are likely to reduce Reston VP24 affinity for, and binding to KPNA5. This diminished interaction would reduce the ability of Reston viruses to prevent interferon signalling and could be an explanation for the lack of human pathogenicity observed in Reston viruses. If this hypothesis is correct, a few mutations in VP24 may be sufficient to confer human pathogenicity to Reston viruses .
Amino acid differences between Ebola and Reston viruses
In contrast, Cong et al.  identified positions in Ebolavirus proteins where there is greater conservation among the human pathogenic species than between Reston and the human pathogenic species. Similarly, to our approach this identified positions that were differentially conserved between Reston and the four human pathogenic species. They identified 215 differentially conserved positions and were able to map 43 of them on to protein structures or models.
Cong et al. also identified six differentially conserved positions in protein–protein interfaces. They propose that differences in the VP24 and KPNA5 interface may have an effect on immune suppression. However, they focus on the variability of the host proteins that Ebolaviruses interact with and consider how they vary between human and primates as Reston viruses are pathogenic in primates but not in humans. They find that host proteins that interact with VP24, VP30 and VP40 are very similar and thus these proteins may not have a role in the altered pathogenicity. In contrast, there is greater sequence diversity in the host interaction partners of GP and VP35. Cong et al. thus propose that a cluster of differentially conserved residues in the C terminal region of GP (Figure 1), whose functional role is unclear and secondly a cluster of residues in VP35 may have an effect on pathogenicity between the Ebolavirus species.
Whereas both studies use similar approaches, their conclusions differ extensively. We proposed that VP24 has an important role in Ebola virus pathogenicity, although Cong et al. suggest that GP and VP35 are most likely to be responsible for differences in human pathogenicity between Ebolaviruses. Experiments will be necessary to generate conclusive evidence and enable the improvement of computational approaches in an iterative process.
The recent Ebola virus outbreak highlighted the need to advance our understanding of the virus and how to prevent and treat infection. However, laboratory research is limited by the small numbers of containment laboratories that can work with biosafety level 4 pathogens such as the Ebolaviruses. With the increasing availability of omics (in particular genomics) data, there is the potential for computational approaches to further our understanding of Ebolavirus biology as outlined above. This has also been recognized by the International Society of Computational Biology (ISCB) that launched an award for computational biology research that advances our understanding of Ebola .
However, the benefit of computational approaches depends very much on the interaction with wet lab experimentalists that test the computationally generated hypotheses. This is not only crucial in order to translate computational results into applicable knowledge, but also to improve computational methods in an iterative approach to enhance their predictive power and accuracy. It will be important that wet lab researchers exploit available, published data to inform their experiments and report their results, which in turn should be used by computational researchers to improve their methods. Hence, we lobby for openness and curiosity on both sides in order to achieve maximal impact and to avoid isolated lines of research.
New Developments in Protein Structure Modelling for Biological and Clinical Research: Held at Charles Darwin House, London, U.K., 8 December 2015