We have examined the evolution of the genes at the major human β-defensin locus and the orthologous loci in a range of other primates and mammals. For the first time, these data allow us to examine selective episodes in the more recent evolutionary history of this locus as well as in the ancient past. We have used a combination of maximum-likelihood-based tests and a maximum-parsimony-based sliding window approach to give a detailed view of the varying modes of selection operating at this locus. We provide evidence for strong positive selection soon after the duplication of these genes within an ancestral mammalian genome. During the divergence of primates, however, variable selective pressures have acted on β-defensin genes in different evolutionary lineages, with episodes of both negative and, more rarely, positive selection. Positive selection appears to have been more common in the rodent lineage, accompanying the birth of novel rodent-specific β-defensin gene clades. Sites in the second exon have been subject to positive selection and, by implication, are important in functional diversity. A small number of sites in the mature human peptides were found to have undergone repeated episodes of selection in different primate lineages. Particular sites were consistently implicated by multiple methods at positions throughout the mature peptides. These sites are clustered at positions that are predicted to be important for the function of β-defensins.

Introduction

Antimicrobial peptides have a critical role in the vertebrate innate immune defence against microbes. These peptides have potential as therapeutics, and intelligent drug design relies on understanding how these molecules function. Defensins are peptides, which are generally cationic, are produced as prepropeptides and can be divided into subclasses based on the distribution of the six canonical cysteines that are located in the mature peptide. There are only two subclasses shared between mouse and human: the α- and β-defensins. These molecules both have six canonical cysteine residues, but differ in the spacing of these residues and the intramolecular disulphide bridges that are formed [1]. The antimicrobial activity of both α- and β-defensins in vivo is well established [25]. More recently, β-defensins have been shown to act as a link between adaptive and innate immunity [6], and may play important roles in cancer progression. This has stimulated great interest in the function and evolution of β-defensins in primate lineages [7].

β-Defensins demonstrate divergence in the mature peptide region, driven by positive selection

Genes that are involved in host defence often display high rates of genomic divergence and evidence for adaptive evolution. As seems to be the case with other proteins that are involved in the immune response, such as MHC molecules, immunoglobulins and α-defensins, this selection may be a response to the rapid evolution of pathogens [8,9]. In agreement with this, the four well-studied human β-defensins vary in their expression patterns as well as in their antimicrobial and antiviral activities [7]. We have investigated the eight transcribed human genes at the major 8p22-p23 β-defensin locus and 21 genes at the orthologous mouse locus [10]. In both mouse and human, β-defensin paralogues show little sequence similarity in the mature peptide region, and this divergence appears to have been driven by positive selection following duplication [11,12]. These genes show an unusual pattern of evolution, with rapid divergence between second exon sequences that encode the mature peptides matched by relative stasis in the first exons that encode signal peptides. However, these previous studies detected that positive selection acted during the more distant evolutionary history of this locus to produce a diverse cluster of paralogous genes that were apparently established early in mammalian evolution. The majority of the human genes have a clear orthologue in the mouse, but there has been species-specific clade expansion in the rodent [10]. Mouse genes that are adjacent on the chromosome are very similar to each other (>90% identity) in the first exon and can be divided into clear subgroups. The second exon sequences are, however, only approx. 30% identical.

It has been reported that both DEFB1 [13] and DEFB103 (formerly DEFB3) [14] have evolved neutrally in primate lineages with no evidence for positive selection. In contrast, there is circumstantial evidence to suggest that the evolution of primate DEFB4 (formerly DEFB2) genes has involved positive selection [15]. The selective forces operating on the other β-defensins at this locus in primate lineages have, until now, remained unknown.

Methods used to examine selective pressures

Previous studies have used pairwise comparisons over the whole sequence of the rate of non-synonymous nucleotide change (Ka) to the rate of synonymous nucleotide change (Ks). If Ka exceeds Ks, this indicates positive selection, i.e. the rate of amino acid residue-changing mutation is higher than the neutral rate. This approach is subject to error when examining short molecules such as defensins. It is possible to use LRTs (likelihood ratio tests) to allow Ka/Ks ratio estimation (ω) at particular sites, rather than averaging it over the whole molecule [16]. However, Suzuki and Nei [17,18] found that positively selected amino acid sites are more reliably inferred by parsimony-based methods than by likelihood-based methods, with the latter prone to producing false positives.

A new approach, SWAPSC (sliding window analysis procedure to detect selective constraints) [19] based on maximum parsimony has been used to conservatively predict selective constraints from alignments, with special attention paid to reducing false positives. We have used this approach to analyse the sequences of primate and mouse β-defensins in the major chromosome 8 clusters.

An NJ (neighbour-joining) tree was constructed from the 97 aligned mouse and primate amino acid sequences using p-distance estimates (see Figure 1). This tree and an alignment of nucleotide sequences derived from the protein alignment were used in our evolutionary analyses. It should be noted that all 21 mouse genes analysed here are readily detectable on the orthologous rat chromosome 16 and that apparently mouse-specific clades in Figure 1 are therefore likely to be rodent-specific. The rodent-specific genes were not detected in a dog genome search, whereas orthologues of all eight primate genes were readily detectable in the dog genome. This is in agreement with data from Patil et al. [20].

Phylogenetic tree relating primate and mouse β-defensin proteins constructed using NJ

Figure 1
Phylogenetic tree relating primate and mouse β-defensin proteins constructed using NJ

Branches with less than 50% bootstrap support have been collapsed. Primate species names are abbreviated as follows: Cercopithecus preussi (Preuss's monkey) (cp), Cercopithecus aethiops (vervet monkey) (ca), Cercopithecus erythrogaster (red-bellied monkey) (ce), Presbytis cristata (silvered langur) (pc), Presbytis obscurus (spectacled langur) (po), Presbytis melalophos (banded langur) (pm), Macaca mulatta (Rhesus macaque) (mm), Macaca fascicularis (crab-eating macaque) (mf), Papio anubis (olive baboon) (pa), Hylobates lar (lar gibbon) (hl), Hylobates moloch (silvery gibbon) (hm), Hylobates concolor (crested gibbon) (hc), Callithrix jacchus (common marmoset) (cj), Saguinus oedipus (cotton-top tamarin) (so), Pan troglodytes (chimpanzee) (pt), Gorilla gorilla (gorilla) (gg), Pongo pygmaeus (orangutan) (pp), Homo sapiens (human) (hs). The Cercopithecidae are represented by cp, ca, ce, pc, po, pm, mm, mf and pa; the Hylobatidae by hl, hm and hc; the Callitrichidae by cj and so; and the Hominidae by pt, gg, pp and hs. Note that sequences from every species were not available for each primate gene. Mouse genes are in lower case. Branches labelled with letters A–Q show significant (P<0.0001) evidence of positive selection, and asterisks indicate branches showing significant (P<0.0001) evidence for negative selection as determined using SWAPSC.

Figure 1
Phylogenetic tree relating primate and mouse β-defensin proteins constructed using NJ

Branches with less than 50% bootstrap support have been collapsed. Primate species names are abbreviated as follows: Cercopithecus preussi (Preuss's monkey) (cp), Cercopithecus aethiops (vervet monkey) (ca), Cercopithecus erythrogaster (red-bellied monkey) (ce), Presbytis cristata (silvered langur) (pc), Presbytis obscurus (spectacled langur) (po), Presbytis melalophos (banded langur) (pm), Macaca mulatta (Rhesus macaque) (mm), Macaca fascicularis (crab-eating macaque) (mf), Papio anubis (olive baboon) (pa), Hylobates lar (lar gibbon) (hl), Hylobates moloch (silvery gibbon) (hm), Hylobates concolor (crested gibbon) (hc), Callithrix jacchus (common marmoset) (cj), Saguinus oedipus (cotton-top tamarin) (so), Pan troglodytes (chimpanzee) (pt), Gorilla gorilla (gorilla) (gg), Pongo pygmaeus (orangutan) (pp), Homo sapiens (human) (hs). The Cercopithecidae are represented by cp, ca, ce, pc, po, pm, mm, mf and pa; the Hylobatidae by hl, hm and hc; the Callitrichidae by cj and so; and the Hominidae by pt, gg, pp and hs. Note that sequences from every species were not available for each primate gene. Mouse genes are in lower case. Branches labelled with letters A–Q show significant (P<0.0001) evidence of positive selection, and asterisks indicate branches showing significant (P<0.0001) evidence for negative selection as determined using SWAPSC.

Significant evidence of selection was sought using three different programs: (PAML [16], ADAPTSITE [21] and SWAPSC [19]). The levels of divergence between many of the primate sequences in this dataset are often very low, and occasionally zero. Nevertheless the LRTs suggested that the best description of these data is a model incorporating many categories of variable ω, including one showing positive selection. However, using this dataset, PAML could not confidently (i.e. at greater than 95% confidence) suggest the particular sites subject to positive selection.

Indication of particular evolutionary branches under selection

The SWAPSC analysis of all mouse and primate data also broadly reflected the LRT results: the dataset was estimated to contain sites subject to a wide range of Ka/Ks values, including a small number under positive selection. Specifically, 0.77% of the sites were estimated to be subject to positive selection and 1.15% to negative selection. The branches identified as under positive and negative selection are indicated in Figure 1 and reveal the dynamic evolutionary history of this locus. Of the eight primate genes examined, positive selection has played a role in the evolution of six, and negative selection has acted upon all eight. However, the 21 genes at the orthologous rodent locus appear to have less turbulent histories, with ten and four genes subject to positive and negative selection respectively. This leaves seven mouse genes lacking significant evidence of either positive or negative selection. It is also notable that the majority (seven out of 11) of mouse genes that have experienced detectable selection belong to apparently rodent-specific clades in Figure 1.

It is clear that the selective episodes that affect primate genes have involved relatively low values of Ka/Ks, with many periods of negative selection, while those that affect rodent genes have spanned a broader range of Ka/Ks, with few episodes of negative selection. More ancient branches seem to have involved the highest values, which is consistent with the view that the early stages of duplication and diversification among mammalian β-defensin paralogues involved strong selection. The later stages of evolution within mammalian groups, and particularly primates, seem to have involved less innovation. Figure 2 shows that the focus of most positive selection in rodent and ancient branches, but also of negative selection in primate branches, has been the first ∼120 bp of the alignment. These first 40 amino acids include the propeptide region of the peptide which is subsequently cleaved from the mature peptide, and the α-helix and first β-strand of the mature defensin peptide.

Selection across β-defensin sequence

Figure 2
Selection across β-defensin sequence

Data for ω across primate (◆), mouse (□) and more ancient branches (△) for all significant positive and negative selection detected across the alignment in base pairs (midpoints of three codon windows) are shown. The positions of the β-strands and α-helix in DEFB1 are indicated. Reprinted from [10], with permission.

Figure 2
Selection across β-defensin sequence

Data for ω across primate (◆), mouse (□) and more ancient branches (△) for all significant positive and negative selection detected across the alignment in base pairs (midpoints of three codon windows) are shown. The positions of the β-strands and α-helix in DEFB1 are indicated. Reprinted from [10], with permission.

Indication of particular sites under selection

Figure 3 shows the raw data plotted for two of the branches (the most recent and oldest in the tree), demonstrating significant evidence for positive selection (Figure 1). For each branch, the Ka and Ks measured at successive three-codon windows are shown. These graphs make clear that the sites of positive and negative selection identified as significant are likely to be a subset of those actually subject to these forces in reality. The recent branch (C) shows the changes between the last common ancestor of Cercopithecus aethiops (vervet monkey) DEFB106 and Papio anubis (olive baboon) DEFB106. Most regions of the molecule show little or no changes, as expected over approx. 9.62 million years, but two consecutive windows (base pairs 40–48 and 43–51) demonstrate a significant excess of Ka over Ks.

Substitution rates and selection measured in two branches of the tree relating mammalian β-defensins

Figure 3
Substitution rates and selection measured in two branches of the tree relating mammalian β-defensins

Each graph shows Ka (●), Ks (○) and significant positive or negative selection across the sequence (midpoints of three codon windows) encoded by the second exon. The graphs display the analysis for branches in Figure 1, described in the text in detail. Branch C is a relatively recent branch of DEFB106 and branch F is an older branch of DEFB1, both demonstrating statistically significant episodes of positive selection (in rectangles), with the latter branch also showing negative selection (circled). The positions of the β-strands and α-helix in DEFB1 are indicated. Reprinted from [10], with permission.

Figure 3
Substitution rates and selection measured in two branches of the tree relating mammalian β-defensins

Each graph shows Ka (●), Ks (○) and significant positive or negative selection across the sequence (midpoints of three codon windows) encoded by the second exon. The graphs display the analysis for branches in Figure 1, described in the text in detail. Branch C is a relatively recent branch of DEFB106 and branch F is an older branch of DEFB1, both demonstrating statistically significant episodes of positive selection (in rectangles), with the latter branch also showing negative selection (circled). The positions of the β-strands and α-helix in DEFB1 are indicated. Reprinted from [10], with permission.

The older branch shows changes between the last common ancestor of all primate DEFB1 and mouse Defb1 sequences and the ancestral primate DEFB1 sequence. This older branch concerns events approx. 40–92 million years ago, and shows greater variation in Ka and Ks across the sequences, although only two regions show significant evidence for positive selection (base pairs 4–12 and 67–75) and a further two for negative selection (base pairs 28–42 and 160–168).

There are many sites such as this where selection is detected similarly in all or most lineages, reflecting more ancient events in mammalian evolution. All except one of the conserved cysteine residues are implicated as being under negative selection in both primate and mouse lineages. Similarly, a small region at the extreme N-terminus of exon 2 predicted peptides (positions 2–4 in Figure 3) was found to be under positive selection in primate and mouse lineages. However, certain regions of these molecules have experienced positive and negative selection in different lineages. Arguably, it is these sites, where selection has at one time favoured a change but at another required stasis, that are likely to be the most potent in altering the functions of these proteins. These sites cluster mainly at a central region of the mature peptides (positions 36–40 in Figure 3), although other sites, often those that neighbour cysteine residues (positions 13–14, 24–25, 33 and 55–56), appear to have been subject to such opposing selection.

Position of sites of positive selection in the β-defensin molecule

It has been shown that primate and murine β-defensins share striking similarity at the level of secondary and tertiary structure, in spite of very low levels of sequence similarity [22]. The most reliably inferred sites of positive selection (those implicated by more than one different method) were mapped to the known structures of the human DEFB1 and the mouse Defb7 mature peptides [10]. As discussed above, there are more sites that demonstrate positive selection in the murine defensins as compared with the primate defensins; however, some clear similarities between the positions of positively selected sites are evident on the murine and primate structures [10]. It seems that sites within the triple β-strand so characteristic of these peptides are largely unaffected by positive selection. The few exceptional sites subject to positive selection found in the triple-stranded β-sheets that form the structural core of the β-defensins may represent alterations in the oligomerization of β-defensins. Both primate and rodent lineages show a large number of sites that are subject to positive selection within the N-terminal portion of the mature peptide. Two sites for both primate and murine peptides were located within a region which in DEFB1 and Defb7 forms an α-helix. Since regions of proteins within membranes are often helical, with surfaces covered with hydrophobic resides, we speculate that the α-helical section may be involved in anchoring the β-defensin to a bacterial cell wall. Thus the sites within the α-helix under positive selection may be significant in the specificity of β-defensins, with respect to either their antimicrobial or chemoattractant properties. The longest loop region of these peptides has a predominance of positively selected sites which suggests that this part of the structure has a key functional role in these small peptides. If, as shown for the β-defensin HNP3 (human neutrophil peptide-3), the second β-strand is involved in oligomerization [23], many of these sites would all be left exposed after dimer formation, suggesting a rapidly diverging exposed ‘skirt’ around the peptide. This is confirmed by the NMR data of Schibli et al. [24] whose structures of the human β-defensin DEFB3 suggest symmetrical dimer formation, through β-strand 2 of the β-sheet.

Importance of the propeptide region?

A clear, but unexpected, result of the present analyses was that the propeptide region has been subject to significant positive selection in rodents and negative selection in primates. This has not been observed previously, and these observations imply strongly that the propeptide region is more important to β-defensin function than has previously been appreciated. It is commonly assumed that the prepropeptide region is cleaved as the mature peptide is secreted from the cell.

In conclusion, examination of the evolution of the β-defensin gene family has revealed the complexity of the selection pressures acting on these genes. We have identified putatively important sites for our understanding of β-defensin function and for novel antibiotic design. Further investigations of cleavage and the functional consequences of sequence changes in this region are merited.

Antimicrobial Peptides: Mediators of Innate Immunity in the Development of Anti-Infective, Therapeutic and Vaccination Strategies: Focused Meeting held at New Royal Infirmary, Edinburgh, U.K., 21 November 2005. Organized and edited by J.-M. Sallenave and J. Govan (Edinburgh, U.K.).

Abbreviations

     
  • Ka

    rate of non-synonymous nucleotide change

  •  
  • Ks

    rate of synonymous nucleotide change

  •  
  • LRT

    likelihood ratio test

  •  
  • NJ

    neighbour-joining

  •  
  • SWAPSC

    sliding window analysis procedure to detect selective constraints

Research was supported by the U.K. MRC (Medical Research Council). The School of Chemistry and the EPSRC (Engineering and Physical Sciences Research Council) are thanked for funding the studentship of H.E. In addition, P.E.B. holds an Advanced Research Fellowship from the EPSRC. K.T. is funded by a Cystic Fibrosis Research Trust U.K. studentship awarded to J.R.D.

References

References
1
Ganz
T.
Comb. Chem. High Throughput Screen.
2005
, vol. 
8
 (pg. 
209
-
217
)
2
Wilson
C.L.
Ouellette
A.J.
Satchell
D.P.
Ayabe
T.
Lopez-Boado
Y.S.
Stratman
J.L.
Hultgren
S.J.
Matrisian
L.M.
Parks
W.C.
Science
1999
, vol. 
286
 (pg. 
113
-
117
)
3
Salzman
N.H.
Ghosh
D.
Huttner
K.M.
Paterson
Y.
Bevins
C.L.
Nature (London)
2003
, vol. 
422
 (pg. 
522
-
526
)
4
Moser
C.
Weiner
D.J.
Lysenko
E.
Bals
R.
Weiser
J.N.
Wilson
J.M.
Infect. Immun.
2002
, vol. 
70
 (pg. 
3068
-
3072
)
5
Morrison
G.
Kilanowski
F.
Davidson
D.
Dorin
J.
Infect. Immun.
2002
, vol. 
70
 (pg. 
3053
-
3060
)
6
Yang
D.
Chertov
O.
Bykovskaia
S.N.
Chen
Q.
Buffo
M.J.
Shogan
J.
Anderson
M.
Schroder
J.M.
Wang
J.M.
Howard
O.M.
Oppenheim
J.J.
Science
1999
, vol. 
286
 (pg. 
525
-
528
)
7
Lehrer
R.I.
Nat. Rev. Microbiol.
2004
, vol. 
2
 (pg. 
727
-
738
)
8
Hughes
A.L.
Immunol. Rev.
2002
, vol. 
190
 (pg. 
161
-
168
)
9
Hughes
A.L.
Ota
T.
Nei
M.
Mol. Biol. Evol.
1990
, vol. 
7
 (pg. 
515
-
524
)
10
Semple
C.A.
Maxwell
A.
Gautier
P.
Kilanowski
F.M.
Eastwood
H.
Barran
P.E.
Dorin
J.R.
BMC Evol. Biol.
2005
, vol. 
5
 pg. 
32
 
11
Morrison
G.M.
Semple
C.A.
Kilanowski
F.M.
Hill
R.E.
Dorin
J.R.
Mol. Biol. Evol.
2003
, vol. 
20
 (pg. 
460
-
470
)
12
Semple
C.A.
Rolfe
M.
Dorin
J.R.
Genome Biol.
2003
, vol. 
4
 pg. 
R31
 
13
Del Pero
M.
Boniotto
M.
Zuccon
D.
Cervella
P.
Spano
A.
Amoroso
A.
Crovella
S.
Immunogenetics
2002
, vol. 
53
 (pg. 
907
-
913
)
14
Boniotto
M.
Antcheva
N.
Zelezetsky
I.
Tossi
A.
Palumbo
V.
Verga Falzacappa
M.V.
Sgubin
S.
Braida
L.
Amoroso
A.
Crovella
S.
Biochem. J.
2003
, vol. 
374
 (pg. 
707
-
714
)
15
Boniotto
M.
Tossi
A.
Del Pero
M.
Sgubin
S.
Antcheva
N.
Santon
D.
Masters
J.
Crovella
S.
Genes Immun.
2003
, vol. 
4
 (pg. 
251
-
257
)
16
Yang
Z.
Comput. Appl. Biosci.
1997
, vol. 
13
 (pg. 
555
-
556
)
17
Suzuki
Y.
Nei
M.
Mol. Biol. Evol.
2001
, vol. 
18
 (pg. 
2179
-
2185
)
18
Suzuki
Y.
Nei
M.
Mol. Biol. Evol.
2002
, vol. 
19
 (pg. 
1865
-
1869
)
19
Fares
M.A.
Bioinformatics
2004
, vol. 
20
 (pg. 
2867
-
2868
)
20
Patil
A.A.
Cai
Y.
Sang
Y.
Blecha
F.
Zhang
G.
Physiol. Genomics
2005
, vol. 
23
 (pg. 
5
-
17
)
21
Suzuki
Y.
J. Mol. Evol.
2004
, vol. 
59
 (pg. 
11
-
19
)
22
Bauer
F.
Schweimer
K.
Kluver
E.
Conejo-Garcia
J.R.
Forssmann
W.G.
Rosch
P.
Adermann
K.
Sticht
H.
Protein Sci.
2001
, vol. 
10
 (pg. 
2470
-
2479
)
23
Hill
C.P.
Yee
J.
Selsted
M.E.
Eisenberg
D.
Science
1991
, vol. 
251
 (pg. 
1481
-
1485
)
24
Schibli
D.J.
Hunter
H.N.
Aseyev
V.
Starner
T.D.
Wiencek
J.M.
McCray
P.B.
Jr
Tack
B.F.
Vogel
H.J.
J. Biol. Chem.
2002
, vol. 
277
 (pg. 
8279
-
8289
)