Molecular function is the result of proteins working together, mediated by highly specific interactions. Maintenance and change of protein interactions can thus be considered one of the main links between molecular function and mutation. As a consequence, protein interaction datasets can be used to study functional evolution directly. In terms of constraining change, the co-evolution of interacting molecules is a very subtle process. This has implications for the signal being used to predict protein–protein interactions. In terms of functional change, the ‘rewiring’ of interaction networks, gene duplication is critically important. Interestingly, once duplication has occurred, the genes involved have different probabilities of being retained related to how they were generated. In the present paper, we discuss some of our recent work in this area.
We now have extensive knowledge of the genetic change that underpins evolution in terms of both the specific mutational events involved and what happens to these mutations from generation to generation. We know that evolution is dependent on change at the molecular level, specifically on mutation in genome sequences . This change manifests in proteins when these mutations are in genes that code for amino acid sequences, can alter gene regulation (when genes are expressed and to what extent) or can have an impact on genome structure, i.e. the numbers of copies of a gene and the arrangements of its exons. Whether evolutionary processes such as genetic drift and selection then result in the fixation of these mutations is related to their phenotypic consequences (or lack thereof), the extent of recombination, population sizes and antagonistic relationships between populations of the same and different species .
Alongside this study of gene sequences and genome evolution, there has been extensive characterization of proteins and their interactions . Importantly, proteins work together forming ‘functional modules’ , groups of molecules that form a function, be it structural, signalling, immune-related or for reactions related to metabolism. Specificity and strength of binding determines which proteins interact, defining the molecular phenotype. On this basis, we argue that protein interactions are a key determinant of molecular function, and it is the specificity of these interactions on which evolutionary change has the most impact. Protein interactions are thus central to understanding the link between genetic mutation and function.
What then is a mutations influence on protein binding? For the most part, purifying selection will act to constrain mutational events that obviate binding in any way . However, an amino acid replacement that influences the optimal specificity of an interaction, if not immediately deleterious, can either be mitigated by change in the same protein or that of the interacting partner. In other words, rather than a back mutation occurring, a subsequent mutation can compensate for the initial mutation (Figure 1). This reciprocal change is termed co-evolution, and it can occur either within one protein [6–10] or between two or more proteins that interact [5,11–14].
Diagrammatic representation of two interacting proteins: BAH domain of Orc1 (white) and the Orc1-interacting domain of Sir1 (yellow) from yeast
Co-evolution to maintain interactions
Recently generated large-scale protein interaction datasets (e.g. [15–17]) are permitting new insights into the constraints acting to maintain interactions. For example, changes in interactions between orthologues (so-called interlogs) can be used to compare functional evolution between species directly  or even to reconstruct ancestral network configurations . In our work, we have used the numbers of interactions that a protein has, compared with the number of interactions a paralogous copy has, as a measure of functional change [20,21]:
where r is the shared interaction ratio, s is the number of interactions shared between the two proteins and n1 and n2 are the number of interactions for each of the two proteins. We therefore use the ratio of shared interactions as a direct measure of functional relatedness. Duplicates immediately after duplication will have identical protein sequences and so will have the same interactions (r=1). As sequences diverge, binding specificities may change such that r approaches zero as duplicate proteins share fewer interactions. Related use of protein interactions as a measure of functional divergence has been also used effectively in other studies [22–25].
This approach is different from previous computational studies in which asymmetric sequence divergence is used as a proxy for functional divergence (e.g. [26,27]). However, when we compare sequence divergence with shared interactions between small-scale and whole-genome duplicates (the latter chosen because there is no confounding signal from time in this set of genes), we find a lack of correlation (Figure 2) . This indicates that there is not a simple relationship between sequence divergence and functional evolution.
Relationship between sequence divergence and shared interactions between paralogues generated by whole-genome (circles) and small-scale (triangles) duplication events
A lack of a clear relationship between sequence divergence and function has implications for using sequence diversity to infer protein interactions. For example, a commonly used approach, exemplified by MirrorTree [28,29], is to compare pairwise-distance matrices derived from amino acid sequence alignments of orthologues and the detection of significant correlations used to infer probable co-evolution and thus interaction. However, when we tested the ability of this approach to detect protein interactions in a subset of the yeast proteome that comprised proteins involved in at least one physical interaction within a protein complex with known structure, we found this approach performed poorly . Furthermore, restricting evolutionary signal to the interface or surface region of the protein did not improve predictions.
Why is this? Why does sequence diversity not correlate well with binding? Principally this is because the main evolutionary signal being used is one of evolutionary rate, which is dependent on multiple factors . Expression level has a particularly strong effect [31,32] because interacting proteins will tend to be expressed to the same extent. This makes expression a good indicator of interaction; indeed correlated expression is a better indicator of interactions than correlated sequence diversity . As a consequence, MirrorTree-like approaches can detect members of the same complex or functional module that do not actually come into contact [21,33,34].
In contrast, compensatory change is observed in only a handful of sites within a given protein [5,14]. This is probably because only a minority of amino acid residues contribute to binding. In our dataset of interacting proteins, we found that only 10–15% of residues are found within the binding interface . It is likely that even this is an overestimate of the number of residues that we would expect to co-evolve, since, for most proteins, only a minority of those residues found in the binding interface contribute significantly to the formation of the interaction .
In addition, any signal in sequence diversity is confounded by the evolutionary history of the proteins being examined. Indeed, the majority of sequence evolution (excluding horizontal gene transfer) of different genes in the same genome will be the result of their shared evolutionary histories, which tracks speciation history to greater or lesser extent depending on factors such as lineage sorting . This shared history will particularly affect the branching order (topology) of gene trees, tending to make them similar whether their products interact or not; this is the reason the underlying evolutionary tree is often factored out of the analysis . Thus caution is warranted when interpreting the reasons for similarities between amino acid sequences from the same genomes. Crucially, the definition of co-evolution when applied to interacting proteins is a specific reference to reciprocal change in the associated proteins such that amino acid replacements become fixed because of compensatory effects. A working definition of co-evolution, in contrast with some suggestions , must explicitly exclude similarity from shared evolutionary histories.
Clearly, interacting proteins do exhibit co-evolution [10,14], and, to make reliable predictions, greater attention needs to be given to their structural context  and other properties predictive of binding in vivo. Protein interactions take place between folded molecules, not linear sequences. Thus the key to understanding the relationship between evolutionary change and function is knowledge of the relationship between evolution and structural divergence . Specifically, we require knowledge of the relationship between evolutionary change in the protein structure, particularly the three-dimensional shape of the binding interface, and potential change of binding specificity. This relationship is complex, such that the evolution of binding specificity is to some degree independent of evolution over the entire sequence. In addition, we need to understand more about when and where proteins are in cells, as this has an impact on their opportunities to interact.
Duplication and ‘rewiring’
Duplication provides potentially redundant genetic material that is less constrained by purifying selection. This leads to accumulation of changes that occasionally can be functionally important. After a duplication event, if one of the copies of the gene does not both become a pseudogene and get ‘lost’ from the genome, it may acquire a new function (neofunctionalization) or partition its ancestral function if pleiotropic (subfunctionalization) . Importantly, these evolutionary processes manifest as changes in interactions between proteins , termed rewiring events, which, along with changes in gene regulation and expression , have the potential to contribute to functional evolution.
Information from protein interaction datasets has again led to interesting insights into functional evolution. For example, we have explored the opportunity for different types of duplicate to contribute to evolution . Specifically, we found that whole-genome duplicates tend to share more interactions than small-scale duplicates, indicating that whole-genome duplication-derived genes tend to be more functionally similar than those derived from small-scale duplication events. Interestingly, this indicates that the differences between the two types of duplicate are not limited to the way in which they were generated (see also ).
Given that rates of small-scale gene duplication are higher than whole-genome duplication events, there is clearly a huge difference in the probability of gene retention following a small-scale event compared with whole-genome duplication . This discrepancy provides compelling evidence that these different types of duplicate must experience different evolutionary pressures en route to retention. This has important implications for how new genes with novel protein functions arise within the genome. In particular, it indicates that there is potential for bias in the types of genes that contribute the most to functional innovation and the evolution of complexity. As a direct result of their greater chance of being retained, whole-genome duplicates will often contribute to functional innovation. Paradoxically, the processes such as balance and dosage  that increase the probability of retention of genome duplicates may impose constraints on their functional evolution . Thus, although more frequently lost from the genome, the products of small-scale duplications will, when they are retained, have the potential to make a relatively larger contribution to innovation. Indeed, we have found that the different duplicate gene sets have a tendency to be involved in different functional categories , indicating that, despite their differences, both types of duplicate contribute significantly to functional evolution, but in subtly different ways.
Protein–protein interaction data represents the first step towards a detailed map of the interactome: the totality of interactions that occur within a cell. For the most part, however, such networks have been crude ball-and-stick graphs with no regard to the inherent modularity of the biological system (both spatial and temporal), let alone the inaccuracies in the available datasets  and the fact that they are a simplistic representation of a complex system. In addition, interactions are usually presented as binary events when, in fact, they can be obligate, where the subunits associate soon after synthesis and remain associated until degraded, or transient, where the proteins associate and disassociate in order to function, and on a continuum of specificity and promiscuity . Furthermore, through evolutionary time, the nature of these interactions varies with changes in their strength and specificity, with rewiring events representing functional evolution. In conclusion, the study of protein interaction data in a comparative context ultimately has the potential to lead to a more complete understanding of function, its evolution and the relationship between mutation and human disease.
Protein Evolution: Sequences, Structures and Systems: Biochemical Society Focused Meeting to commemorate the 200th Anniversary of Charles Darwin's birth held at the Wellcome Trust Conference Centre, Cambridge, U.K., 26–27 January 2009. Organized and Edited by Roman Laskowski (EMBL-EBI, Hinxton, U.K.), Michael Sternberg (Imperial College London, U.K.) and Janet Thornton (EMBL-EBI, Hinxton, U.K.).
Our research is supported by the Biotechnology and Biological Sciences Research Council [grants numbers BB/C515412/1, BB/F013337/1 and BB/F007620/1].