Molecular function is the result of proteins working together, mediated by highly specific interactions. Maintenance and change of protein interactions can thus be considered one of the main links between molecular function and mutation. As a consequence, protein interaction datasets can be used to study functional evolution directly. In terms of constraining change, the co-evolution of interacting molecules is a very subtle process. This has implications for the signal being used to predict protein–protein interactions. In terms of functional change, the ‘rewiring’ of interaction networks, gene duplication is critically important. Interestingly, once duplication has occurred, the genes involved have different probabilities of being retained related to how they were generated. In the present paper, we discuss some of our recent work in this area.

Introduction

We now have extensive knowledge of the genetic change that underpins evolution in terms of both the specific mutational events involved and what happens to these mutations from generation to generation. We know that evolution is dependent on change at the molecular level, specifically on mutation in genome sequences [1]. This change manifests in proteins when these mutations are in genes that code for amino acid sequences, can alter gene regulation (when genes are expressed and to what extent) or can have an impact on genome structure, i.e. the numbers of copies of a gene and the arrangements of its exons. Whether evolutionary processes such as genetic drift and selection then result in the fixation of these mutations is related to their phenotypic consequences (or lack thereof), the extent of recombination, population sizes and antagonistic relationships between populations of the same and different species [2].

Alongside this study of gene sequences and genome evolution, there has been extensive characterization of proteins and their interactions [3]. Importantly, proteins work together forming ‘functional modules’ [4], groups of molecules that form a function, be it structural, signalling, immune-related or for reactions related to metabolism. Specificity and strength of binding determines which proteins interact, defining the molecular phenotype. On this basis, we argue that protein interactions are a key determinant of molecular function, and it is the specificity of these interactions on which evolutionary change has the most impact. Protein interactions are thus central to understanding the link between genetic mutation and function.

What then is a mutations influence on protein binding? For the most part, purifying selection will act to constrain mutational events that obviate binding in any way [5]. However, an amino acid replacement that influences the optimal specificity of an interaction, if not immediately deleterious, can either be mitigated by change in the same protein or that of the interacting partner. In other words, rather than a back mutation occurring, a subsequent mutation can compensate for the initial mutation (Figure 1). This reciprocal change is termed co-evolution, and it can occur either within one protein [610] or between two or more proteins that interact [5,1114].

Diagrammatic representation of two interacting proteins: BAH domain of Orc1 (white) and the Orc1-interacting domain of Sir1 (yellow) from yeast

Figure 1
Diagrammatic representation of two interacting proteins: BAH domain of Orc1 (white) and the Orc1-interacting domain of Sir1 (yellow) from yeast

Interactions across the binding interface are highlighted in colour. From PDB code 1ZHI [47].

Figure 1
Diagrammatic representation of two interacting proteins: BAH domain of Orc1 (white) and the Orc1-interacting domain of Sir1 (yellow) from yeast

Interactions across the binding interface are highlighted in colour. From PDB code 1ZHI [47].

Co-evolution to maintain interactions

Recently generated large-scale protein interaction datasets (e.g. [1517]) are permitting new insights into the constraints acting to maintain interactions. For example, changes in interactions between orthologues (so-called interlogs) can be used to compare functional evolution between species directly [18] or even to reconstruct ancestral network configurations [19]. In our work, we have used the numbers of interactions that a protein has, compared with the number of interactions a paralogous copy has, as a measure of functional change [20,21]:

 
formula

where r is the shared interaction ratio, s is the number of interactions shared between the two proteins and n1 and n2 are the number of interactions for each of the two proteins. We therefore use the ratio of shared interactions as a direct measure of functional relatedness. Duplicates immediately after duplication will have identical protein sequences and so will have the same interactions (r=1). As sequences diverge, binding specificities may change such that r approaches zero as duplicate proteins share fewer interactions. Related use of protein interactions as a measure of functional divergence has been also used effectively in other studies [2225].

This approach is different from previous computational studies in which asymmetric sequence divergence is used as a proxy for functional divergence (e.g. [26,27]). However, when we compare sequence divergence with shared interactions between small-scale and whole-genome duplicates (the latter chosen because there is no confounding signal from time in this set of genes), we find a lack of correlation (Figure 2) [21]. This indicates that there is not a simple relationship between sequence divergence and functional evolution.

Relationship between sequence divergence and shared interactions between paralogues generated by whole-genome (circles) and small-scale (triangles) duplication events

Figure 2
Relationship between sequence divergence and shared interactions between paralogues generated by whole-genome (circles) and small-scale (triangles) duplication events

Sequence divergence is measured by non-synonynous substitutions. Shared interactions are measured by the ratio of interactions in common between pairs of paralogues [20,21].

Figure 2
Relationship between sequence divergence and shared interactions between paralogues generated by whole-genome (circles) and small-scale (triangles) duplication events

Sequence divergence is measured by non-synonynous substitutions. Shared interactions are measured by the ratio of interactions in common between pairs of paralogues [20,21].

A lack of a clear relationship between sequence divergence and function has implications for using sequence diversity to infer protein interactions. For example, a commonly used approach, exemplified by MirrorTree [28,29], is to compare pairwise-distance matrices derived from amino acid sequence alignments of orthologues and the detection of significant correlations used to infer probable co-evolution and thus interaction. However, when we tested the ability of this approach to detect protein interactions in a subset of the yeast proteome that comprised proteins involved in at least one physical interaction within a protein complex with known structure, we found this approach performed poorly [21]. Furthermore, restricting evolutionary signal to the interface or surface region of the protein did not improve predictions.

Why is this? Why does sequence diversity not correlate well with binding? Principally this is because the main evolutionary signal being used is one of evolutionary rate, which is dependent on multiple factors [30]. Expression level has a particularly strong effect [31,32] because interacting proteins will tend to be expressed to the same extent. This makes expression a good indicator of interaction; indeed correlated expression is a better indicator of interactions than correlated sequence diversity [21]. As a consequence, MirrorTree-like approaches can detect members of the same complex or functional module that do not actually come into contact [21,33,34].

In contrast, compensatory change is observed in only a handful of sites within a given protein [5,14]. This is probably because only a minority of amino acid residues contribute to binding. In our dataset of interacting proteins, we found that only 10–15% of residues are found within the binding interface [21]. It is likely that even this is an overestimate of the number of residues that we would expect to co-evolve, since, for most proteins, only a minority of those residues found in the binding interface contribute significantly to the formation of the interaction [35].

In addition, any signal in sequence diversity is confounded by the evolutionary history of the proteins being examined. Indeed, the majority of sequence evolution (excluding horizontal gene transfer) of different genes in the same genome will be the result of their shared evolutionary histories, which tracks speciation history to greater or lesser extent depending on factors such as lineage sorting [36]. This shared history will particularly affect the branching order (topology) of gene trees, tending to make them similar whether their products interact or not; this is the reason the underlying evolutionary tree is often factored out of the analysis [37]. Thus caution is warranted when interpreting the reasons for similarities between amino acid sequences from the same genomes. Crucially, the definition of co-evolution when applied to interacting proteins is a specific reference to reciprocal change in the associated proteins such that amino acid replacements become fixed because of compensatory effects. A working definition of co-evolution, in contrast with some suggestions [38], must explicitly exclude similarity from shared evolutionary histories.

Clearly, interacting proteins do exhibit co-evolution [10,14], and, to make reliable predictions, greater attention needs to be given to their structural context [39] and other properties predictive of binding in vivo. Protein interactions take place between folded molecules, not linear sequences. Thus the key to understanding the relationship between evolutionary change and function is knowledge of the relationship between evolution and structural divergence [40]. Specifically, we require knowledge of the relationship between evolutionary change in the protein structure, particularly the three-dimensional shape of the binding interface, and potential change of binding specificity. This relationship is complex, such that the evolution of binding specificity is to some degree independent of evolution over the entire sequence. In addition, we need to understand more about when and where proteins are in cells, as this has an impact on their opportunities to interact.

If compensatory change/co-evolution is concerned with functional constraints, under what circumstances will novel interactions be acquired? Of key importance is gene duplication and reuse of domains, i.e. evolution's tendency to repurpose rather than evolve genes and proteins de novo [41,42].

Duplication and ‘rewiring’

Duplication provides potentially redundant genetic material that is less constrained by purifying selection. This leads to accumulation of changes that occasionally can be functionally important. After a duplication event, if one of the copies of the gene does not both become a pseudogene and get ‘lost’ from the genome, it may acquire a new function (neofunctionalization) or partition its ancestral function if pleiotropic (subfunctionalization) [42]. Importantly, these evolutionary processes manifest as changes in interactions between proteins [43], termed rewiring events, which, along with changes in gene regulation and expression [44], have the potential to contribute to functional evolution.

Information from protein interaction datasets has again led to interesting insights into functional evolution. For example, we have explored the opportunity for different types of duplicate to contribute to evolution [20]. Specifically, we found that whole-genome duplicates tend to share more interactions than small-scale duplicates, indicating that whole-genome duplication-derived genes tend to be more functionally similar than those derived from small-scale duplication events. Interestingly, this indicates that the differences between the two types of duplicate are not limited to the way in which they were generated (see also [25]).

Given that rates of small-scale gene duplication are higher than whole-genome duplication events, there is clearly a huge difference in the probability of gene retention following a small-scale event compared with whole-genome duplication [20]. This discrepancy provides compelling evidence that these different types of duplicate must experience different evolutionary pressures en route to retention. This has important implications for how new genes with novel protein functions arise within the genome. In particular, it indicates that there is potential for bias in the types of genes that contribute the most to functional innovation and the evolution of complexity. As a direct result of their greater chance of being retained, whole-genome duplicates will often contribute to functional innovation. Paradoxically, the processes such as balance and dosage [45] that increase the probability of retention of genome duplicates may impose constraints on their functional evolution [20]. Thus, although more frequently lost from the genome, the products of small-scale duplications will, when they are retained, have the potential to make a relatively larger contribution to innovation. Indeed, we have found that the different duplicate gene sets have a tendency to be involved in different functional categories [20], indicating that, despite their differences, both types of duplicate contribute significantly to functional evolution, but in subtly different ways.

Conclusions

Protein–protein interaction data represents the first step towards a detailed map of the interactome: the totality of interactions that occur within a cell. For the most part, however, such networks have been crude ball-and-stick graphs with no regard to the inherent modularity of the biological system (both spatial and temporal), let alone the inaccuracies in the available datasets [46] and the fact that they are a simplistic representation of a complex system. In addition, interactions are usually presented as binary events when, in fact, they can be obligate, where the subunits associate soon after synthesis and remain associated until degraded, or transient, where the proteins associate and disassociate in order to function, and on a continuum of specificity and promiscuity [3]. Furthermore, through evolutionary time, the nature of these interactions varies with changes in their strength and specificity, with rewiring events representing functional evolution. In conclusion, the study of protein interaction data in a comparative context ultimately has the potential to lead to a more complete understanding of function, its evolution and the relationship between mutation and human disease.

Protein Evolution: Sequences, Structures and Systems: Biochemical Society Focused Meeting to commemorate the 200th Anniversary of Charles Darwin's birth held at the Wellcome Trust Conference Centre, Cambridge, U.K., 26–27 January 2009. Organized and Edited by Roman Laskowski (EMBL-EBI, Hinxton, U.K.), Michael Sternberg (Imperial College London, U.K.) and Janet Thornton (EMBL-EBI, Hinxton, U.K.).

Funding

Our research is supported by the Biotechnology and Biological Sciences Research Council [grants numbers BB/C515412/1, BB/F013337/1 and BB/F007620/1].

References

References
1
Graur
D.
Wen-Hsiung
L.
Fundamentals of Molecular Evolution
2000
Sunderland, MA, U.S.A.
Sinauer Associates Inc.
2
Hartl
D.L.
Clark
A.G.
Principles of Population Genetics
2007
Sunderland, MA, U.S.A.
Sinauer Associates Inc.
3
Nooren
I.M.
Thornton
J.M.
Diversity of protein–protein interactions
EMBO J.
2003
, vol. 
22
 (pg. 
3486
-
3492
)
4
Hartwell
L.H.
Hopfield
J.J.
Leibler
S.
Murray
A.W.
From molecular to modular cell biology
Nature
1999
, vol. 
402
 (pg. 
C47
-
C52
)
5
Mintseris
J.
Weng
Z.
Structure, function, and evolution of transient and obligate protein–protein interactions
Proc. Natl. Acad. Sci. U.S.A.
2005
, vol. 
102
 (pg. 
10930
-
10935
)
6
Pollock
D.D.
Taylor
W.R.
Goldman
N.
Coevolving protein residues: maximum likelihood identification and relationship to structure
J. Mol. Biol.
1999
, vol. 
287
 (pg. 
187
-
198
)
7
Dutheil
J.
Pupko
T.
Jean-Marie
A.
Galtier
N.
A model-based approach for detecting coevolving positions in a molecule
Mol. Biol. Evol.
2005
, vol. 
22
 (pg. 
1919
-
1928
)
8
Wang
Z.O.
Pollock
D.D.
Coevolutionary patterns in cytochrome c oxidase subunit I depend on structural and functional context
J. Mol. Evol.
2007
, vol. 
65
 (pg. 
485
-
495
)
9
Socolich
M.
Lockless
S.W.
Russ
W.P.
Lee
H.
Gardner
K.H.
Ranganathan
R.
Evolutionary information for specifying a protein fold
Nature
2005
, vol. 
437
 (pg. 
512
-
518
)
10
Gloor
G.B.
Martin
L.C.
Wahl
L.M.
Dunn
S.D.
Mutual information in protein multiple sequence alignments reveals two classes of coevolving positions
Biochemistry
2005
, vol. 
44
 (pg. 
7156
-
7165
)
11
Moyle
W.R.
Campbell
R.K.
Myers
R.V.
Bernard
M.P.
Han
Y.
Wang
X.
Co-evolution of ligand–receptor pairs
Nature
1994
, vol. 
368
 (pg. 
251
-
255
)
12
Atchley
W.R.
Wollenberg
K.R.
Fitch
W.M.
Terhalle
W.
Dress
A.W.
Correlations among amino acid sites in bHLH protein domains: an information theoretic analysis
Mol. Biol. Evol.
2000
, vol. 
17
 (pg. 
164
-
178
)
13
Chao
J.A.
Patskovsky
Y.
Almo
S.C.
Singer
R.H.
Structural basis for the coevolution of a viral RNA–protein complex
Nat. Struct. Mol. Biol.
2008
, vol. 
15
 (pg. 
103
-
105
)
14
Yeang
C.H.
Haussler
D.
Detecting coevolution in and among protein domains
PLoS Comput. Biol.
2007
, vol. 
3
 pg. 
e211
 
15
Gavin
A.C.
Aloy
P.
Grandi
P.
Krause
R.
Boesche
M.
Marzioch
M.
Rau
C.
Jensen
L.J.
Bastuck
S.
Dumpelfeld
B.
, et al. 
Proteome survey reveals modularity of the yeast cell machinery
Nature
2006
, vol. 
440
 (pg. 
631
-
636
)
16
Krogan
N.J.
Cagney
G.
Yu
H.
Zhong
G.
Guo
X.
Ignatchenko
A.
Li
J.
Pu
S.
Datta
N.
Tikuisis
A.P.
, et al. 
Global landscape of protein complexes in the yeast Saccharomyces cerevisiae
Nature
2006
, vol. 
440
 (pg. 
637
-
643
)
17
Kim
W.K.
Krumpelman
C.
Marcotte
E.M.
Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy
Genome Biol.
2008
, vol. 
9
 pg. 
(Suppl. 1), S5
 
18
Sharan
R.
Ideker
T.
Modeling cellular machinery through biological network comparison
Nat. Biotechnol.
2006
, vol. 
24
 (pg. 
427
-
433
)
19
Pinney
J.W.
Amoutzias
G.D.
Rattray
M.
Robertson
D.L.
Reconstruction of ancestral protein interaction networks for the bZIP transcription factors
Proc. Natl. Acad. Sci. U.S.A.
2007
, vol. 
104
 (pg. 
20449
-
20453
)
20
Hakes
L.
Pinney
J.W.
Lovell
S.C.
Oliver
S.G.
Robertson
D.L.
All duplicates are not equal: the difference between small-scale and genome duplication
Genome Biol.
2007
, vol. 
8
 pg. 
R209
 
21
Hakes
L.
Lovell
S.C.
Oliver
S.G.
Robertson
D.L.
Specificity in protein interactions and its relationship with sequence diversity and coevolution
Proc. Natl. Acad. Sci. U.S.A.
2007
, vol. 
104
 (pg. 
7999
-
8004
)
22
Wagner
A.
Asymmetric functional divergence of duplicate genes in yeast
Mol. Biol. Evol.
2002
, vol. 
19
 (pg. 
1760
-
1768
)
23
Baudot
A.
Jacq
B.
Brun
C.
A scale of functional divergence for yeast duplicated genes revealed from analysis of the protein–protein interaction network
Genome Biol.
2004
, vol. 
5
 pg. 
R76
 
24
Conant
G.C.
Wolfe
K.H.
Functional partitioning of yeast co-expression networks after genome duplication
PLoS Biol.
2006
, vol. 
4
 pg. 
e109
 
25
Guan
Y.
Dunham
M.J.
Troyanskaya
O.G.
Functional analysis of gene duplications in Saccharomyces cerevisiae
Genetics.
2007
, vol. 
175
 (pg. 
933
-
943
)
26
Conant
G.C.
Wagner
A.
Asymmetric sequence divergence of duplicate genes
Genome Res.
2003
, vol. 
13
 (pg. 
2052
-
2058
)
27
Kellis
M.
Birren
B.W.
Lander
E.S.
Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae
Nature
2004
, vol. 
428
 (pg. 
617
-
624
)
28
Pazos
F.
Valencia
A.
Similarity of phylogenetic trees as indicator of protein–protein interaction
Protein Eng.
2001
, vol. 
14
 (pg. 
609
-
614
)
29
Valencia
A.
Pazos
F.
Computational methods for the prediction of protein interactions
Curr. Opin. Struct. Biol.
2002
, vol. 
12
 (pg. 
368
-
373
)
30
Pal
C.
Papp
B.
Lercher
M.J.
An integrated view of protein evolution
Nat. Rev. Genet.
2006
, vol. 
7
 (pg. 
337
-
348
)
31
Agrafioti
I.
Swire
J.
Abbott
J.
Huntley
D.
Butcher
S.
Stumpf
M.P.
Comparative analysis of the Saccharomyces cerevisiae and Caenorhabditis elegans protein interaction networks
BMC Evol. Biol.
2005
, vol. 
5
 pg. 
23
 
32
Drummond
D.A.
Raval
A.
Wilke
C.O.
A single determinant dominates the rate of yeast protein evolution
Mol. Biol. Evol.
2006
, vol. 
23
 (pg. 
327
-
337
)
33
Chen
Y.
Dokholyan
N.V.
The coordinated evolution of yeast proteins is constrained by functional modularity
Trends Genet.
2006
, vol. 
22
 (pg. 
416
-
419
)
34
Juan
D.
Pazos
F.
Valencia
A.
High-confidence prediction of global interactomes based on genome-wide coevolutionary networks
Proc. Natl. Acad. Sci. U.S.A.
2008
, vol. 
105
 (pg. 
934
-
939
)
35
DeLano
W.L.
Unraveling hot spots in binding interfaces: progress and challenges
Curr. Opin. Struct. Biol.
2002
, vol. 
12
 (pg. 
14
-
20
)
36
Maddison
W.P.
Gene trees in species trees
Syst. Biol.
1997
, vol. 
46
 (pg. 
523
-
536
)
37
Pazos
F.
Ranea
J.A.
Juan
D.
Sternberg
M.J.
Assessing protein co-evolution in the context of the tree of life assists in the prediction of the interactome
J. Mol. Biol.
2005
, vol. 
352
 (pg. 
1002
-
1015
)
38
Pazos
F.
Valencia
A.
Protein co-evolution, co-adaptation and interactions
EMBO J.
2008
, vol. 
27
 (pg. 
2648
-
2655
)
39
Kann
M.G.
Jothi
R.
Cherukuri
P.F.
Przytycka
T.M.
Predicting protein domain interactions from coevolution of conserved regions
Proteins
2007
, vol. 
67
 (pg. 
811
-
820
)
40
Williams
S.G.
Lovell
S.C.
The effect of sequence evolution on protein structural divergence
Mol. Biol. Evol.
2009
, vol. 
26
 (pg. 
1055
-
1065
)
41
Orengo
C.A.
Thornton
J.M.
Protein families and their evolution: a structural perspective
Annu. Rev. Biochem.
2005
, vol. 
74
 (pg. 
867
-
900
)
42
Conant
G.C.
Wolfe
K.H.
Turning a hobby into a job: how duplicated genes find new functions
Nat. Rev. Genet.
2008
, vol. 
9
 (pg. 
938
-
950
)
43
Stumpf
M.P.
Kelly
W.P.
Thorne
T.
Wiuf
C.
Evolution at the system level: the natural history of protein interaction networks
Trends Ecol. Evol.
2007
, vol. 
22
 (pg. 
366
-
373
)
44
Babu
M.M.
Luscombe
N.M.
Aravind
L.
Gerstein
M.
Teichmann
S.A.
Structure and evolution of transcriptional regulatory networks
Curr. Opin. Struct. Biol.
2004
, vol. 
14
 (pg. 
283
-
291
)
45
Papp
B.
Pal
C.
Hurst
L.D.
Dosage sensitivity and the evolution of gene families in yeast
Nature
2003
, vol. 
424
 (pg. 
194
-
197
)
46
Hakes
L.
Pinney
J.W.
Robertson
D.L.
Lovell
S.C.
Protein–protein interaction networks and biology: what's the connection?
Nat. Biotechnol.
2008
, vol. 
26
 (pg. 
69
-
72
)
47
Hou
Z.
Bernstein
D.A.
Fox
C.A.
Keck
J.L.
Structural basis of the Sir1–origin recognition complex interaction in transcriptional silencing
Proc. Natl. Acad. Sci. U.S.A.
2005
, vol. 
102
 (pg. 
8489
-
8494
)