To take full advantage of the mouse as a model organism, it is essential to distinguish lineage-specific biology from what is shared between human and mouse. Investigations into shared genetic elements common to both have been well served by the draft human and mouse genome sequences. More recently, the virtually complete euchromatic sequences of the two reference genomes have been finished. These reveal a high (∼5%) level of sequence duplications that had previously been recalcitrant to sequencing and assembly. Within these duplications lie large numbers of rodent- or primate-specific genes. In the present paper, we review the sequence properties of the two genomes, dwelling most on the duplications, deletions and insertions that separate each of them from their most recent common ancestor, approx. 90 million years ago. We consider the differences in gene numbers and repertoires between the two species, and speculate on their contributions to lineage-specific biology. Loss of ancient single-copy genes are rare, as are gains of new functional genes through retrotransposition. Instead, most changes to the gene repertoire have occurred in large multicopy families. It has been proposed that numbers of such ‘environmental genes' rise and fall, and their sequences change, as adaptive responses to infection and other environmental pressures, including conspecific competition. Nevertheless, many such genes may be under little or no selection.

Introduction

During the Cretaceous Period, approx. 90 million years ago, a line of small mammals, probably similar in appearance to modern tree shrews, split into two lineages. One of these led to the primates, including Homo sapiens, the other to rodents such as the laboratory mouse, Mus musculus. Despite the obvious morphological differences between humans and mice, close observation over 100 years of mouse genetics has demonstrated the many close anatomical and physiological affinities between humans and mice [1,2].

Yet how are these species similar or different in their genes and in genomes? How much has changed in these genomes in the last 90 million years, and how much has resisted change to preserve essential ancestral functions? If Darwinian adaptation has been responsible for the divergence between rodents and primates, would this have left a corresponding signature in their genomes? To answer these and other questions the sequencing of human and mouse genomes was required, followed by detailed comparisons pinpointing those nucleotides or exons or genes or chromosomal segments that have remained intact in both lineages since their last common ancestor.

The draft genome sequences of humans in 2001 [3] and mice in 2002 [4] were great boons for research in genomics, genetics and evolution. Yet, these draft sequences were never intended to be comprehensive. It was always realized that sequencing of the extremely repetitive and transposon-rich heterochromatin would be too technically challenging. However, because heterochromatin is believed to have low gene density and is largely silenced, its absence has been met with little complaint.

On the other hand, it was of great concern that a large amount of euchromatic sequence (>10% of human and >6.5% of mouse) was also found to be absent from the initial drafts. The process by which this additional sequence has been painstakingly added to genome assemblies is known as ‘finishing’, although even ‘finished’ euchromatin is not perfect and will contain a low level of missing and inaccurate sequence for some time to come [5].

The ‘finished’ human genome sequence was reported in 2004 [5] and a similar sequence for mice will be reported elsewhere soon [6]. The purpose of the present review is to provide a brief overview of the similarities and differences between these two genome sequences in the light of new-found sequence previously absent from draft assemblies.

Mind the gaps

The human genome assembly extends to 3.09 Gb (Table 1), containing 99% of the euchromatin and 94% of the entire genome [5]. This is a slightly larger genome assembly than for the mouse, which comprises approx. 2.66 Gb. Similarities between chromosome numbers and genome sizes belie the substantial rearrangement of the ancestral genome, particularly in the rodent lineage, with over 300 large (>300 kb) blocks of genes being reordered, and many more smaller scale rearrangements observed [4]. Across the whole of the human genome, only 40% of sequence can be aligned to the mouse genome, with much of the remainder representing remnants of transposable elements that have been frequently inserted and deleted in each lineage, much of which has decayed beyond recognition.

Table 1
Properties of finished human and mouse reference genome assemblies

NCBI Builds 36.1 and 36 respectively.

Property Human Mouse 
Assembled genome size (Gb) 3.091 2.661 
Segmentally duplicated sequence (Mb) 159.2 (5.52%) 126.0 (4.94%) 
Interspersed repeats (Gb) 1.406 1.091 
Number of gaps 118 1218 
Sequence in gaps (Mb) 9.343 6.088 
Number of gene models 19042 20210 
Coding sequence (%) 1.07 1.27 
Property Human Mouse 
Assembled genome size (Gb) 3.091 2.661 
Segmentally duplicated sequence (Mb) 159.2 (5.52%) 126.0 (4.94%) 
Interspersed repeats (Gb) 1.406 1.091 
Number of gaps 118 1218 
Sequence in gaps (Mb) 9.343 6.088 
Number of gene models 19042 20210 
Coding sequence (%) 1.07 1.27 

Finishing the genome assemblies revealed that the draft assemblies were particularly deficient in segmental duplications, defined as >1 kb fragments of genomic sequence with high sequence identity (>90%) that map to multiple locations [7]. The repetitive nature of this sequence explains its recalcitrance to assembly, especially via the whole genome shotgun approach. Segmental duplications are now known to cover approx. 5% of both human and mouse euchromatin [8]. These newly discovered duplicated sequences would perhaps be only of passing interest, except that they contain a high density of duplicated protein-coding genes (see below).

These same regions also tend to be highly variable in structure, including copy number, among unrelated human or mouse individuals [911]. The mouse sequence is from a single highly inbred individual female from the laboratory black 6 (C57BL/6) strain. However, the human reference genome represents the agglomeration of contributions from an anonymous panel of outbred individuals [3], although more than half the assembly stems from a single contributor. Each individual carries ∼0.5% copy number variant sequence, often in a heterozygous state [12].

The human genome sequence is thus not a consensus representing the most frequent variants. Instead, because of selection for size in choosing insert clones for sequencing, parts of the human reference genome may contain a systematic bias towards the incorporation of variable regions, especially those with high copy numbers. In any case, even for the mouse where copy number variants are seen in abundance between different inbred mouse strains [8], structural variation between individuals highlights the limitations of a single reference genome in representing an entire population.

Gene repertoires

Interest in relating gene numbers to organismal complexity has waned considerably since the discovery that the genome of the simple nematode Caenorhabditis elegans contains almost as many protein-coding genes as humans [1315]. Nevertheless, the reduction of the human gene count from an initial 32000 in the draft human genome publication [3] to its current level of approx. 19000 shows the many inherent difficulties in gene predictions as well as the great progress that has been made since.

Initial gene counts were greatly inflated by the inclusion of fragmentary gene models, many of which represent the debris of non-functional pseudogenes. Spurious ORFs (open reading frames) present by chance in RNA transcripts were also misidentified even in the absence of either protein-coding potential or evolutionary conservation [15]. Disruptions to putative ORFs such as small insertions, deletions and frameshifts were difficult to distinguish from sequence errors that are often found among even bona fide gene predictions in a draft genome sequence. Finally, it has never been straightforward to distinguish one gene from its genomic neighbour, particularly as (non-coding or chimaeric) transcripts can often be found to span the two [16,17].

The current low estimates of gene number in the human genome [1315] result from the exclusion of erroneous gene models that fail criteria for protein-coding potential and conservation. These rely upon three widely held observations: that intron positions and especially phase are very well conserved within coding sequence, that confirmed cases of mammalian protein-coding genes recruited de novo from non-coding sequence are very rare, and that retrotransposition generally results in non-functional sequence with immediate loss of constraint. This last assumption appears to be the most fallible, since promoter elements are sometimes fortuitously present at the 5′-end of the integration site of retrotransposons. However, empirically, because only about one functional retrogene arises per million years, this results in only a small underestimation of ∼100 genes since the human mouse divergence [18,19].

The great majority of genes are thus gained in each lineage not by de novo creation or retrotransposition, but instead by whole gene duplication, often in tandem copies; simultaneously, functional genes are lost by deletion or after disruptive mutations (‘pseudogenization’). The fates of genes on each of the human and mouse lineages are thus best described by gene births and deaths in the time since their last common ancestor approx. 90 million years ago. Figure 1 shows the rate of accumulation of extant genes in both Mus musculus and Homo sapiens lineages. The large numbers of the most recently duplicated genes largely reflect copy number variable genes: many of these are unlikely to be fixed in future populations (see below).

Mouse and human gene duplication as a function of age

Figure 1
Mouse and human gene duplication as a function of age

These are extant genes from the two reference genome assemblies. Their age of duplication has been estimated phylogenetically from the ratio of the duplicates' branch length to the distance to the rodent primate speciation node, and by assuming that the split occurred 90 million years ago.

Figure 1
Mouse and human gene duplication as a function of age

These are extant genes from the two reference genome assemblies. Their age of duplication has been estimated phylogenetically from the ratio of the duplicates' branch length to the distance to the rodent primate speciation node, and by assuming that the split occurred 90 million years ago.

Correspondence between mouse and human genes

 To a Well-Connected Mouse

 (Upon reading of the genetic closeness of mice and men.)

Wee, sleekit, cow'rin, tim'rous beastie,

Braw science says that at the leastie

We share full ninety-nine per cent

O'genes, where'er the odd ane went.

 © 2003 John Updike. Originally published in The New Yorker. All rights reserved.

This reworking of the 18th Century Robert Burns poem (“To a Mouse”) was sparked by the Nature editorial commentary that 99% of mouse genes have “direct counterparts in humans” [20]. This statement unfortunately gives the misleading impression that human and mouse genes mostly correspond, and that divergence in their genomes is thus largely either within the sequence of each corresponding gene, or in regulatory regions.

In fact, the mouse and human gene repertoires have been dramatically remodelled since the divergence of the lineages ∼90 million years ago: 20% (3852 of 19042) of human genes and 24% (5020 of 20210) of mouse genes are duplicate copies which have arisen since their last common ancestor (Figure 2). As for “where’er the odd ane went”, the Nature commentary reflects that in fewer than 1% of genes has every homologue become extinct in the other lineage. These are genes that had long persisted before the common ancestor and yet have latterly become dispensable. This may no doubt reflect profound changes to the functional repertoire. An example of a loss of an erstwhile essential gene is that of EYS whose disruption in humans results in a form of adolescent onset blindness (retinitis pigmentosa), but whose loss approx. 80 million years ago seems to have little affected the rodents [21]. A more recent example in evolution involves the human-specific loss of the myosin heavy chain MYH16 which has been argued in a ‘less is more’ hypothesis to confer a selective benefit in increasing the cranial capacity [22].

Mouse genes have a higher synonymous nucleotide substitution rate (dS) and have accumulated more lineage-specific duplicates than human genes

Figure 2
Mouse genes have a higher synonymous nucleotide substitution rate (dS) and have accumulated more lineage-specific duplicates than human genes

(A) Mouse and human phylogeny drawn to the dS scale. (B) The number of 1:1 mouse and human orthologues (black) and the number of gene duplicates unique to each species (grey).

Figure 2
Mouse genes have a higher synonymous nucleotide substitution rate (dS) and have accumulated more lineage-specific duplicates than human genes

(A) Mouse and human phylogeny drawn to the dS scale. (B) The number of 1:1 mouse and human orthologues (black) and the number of gene duplicates unique to each species (grey).

Approx. 15187 genes have remained unduplicated in both human and mouse lineages since their last common ancestor (Table 2). These ‘simple orthologues’ have also substantially conserved their sequences: their median amino acid and nucleotide identities are 88 and 85% respectively. These are the genes expected to convey much of the functional repertoire that is conserved among mammals and, more broadly, among other animals. It is thus appropriate that these simple orthologues are being specifically targeted for disruption in large-scale phenotypic screens in the mouse to illuminate conserved mammalian biology.

Table 2
Properties of human and mouse simple 1:1 orthologues

Properties are median values. dN, non-synonymous substituion; dS, synonymous substitution.

Property Value 
Counts of 1:1 orthologues 15187 
dN 0.057 
dS 0.58 
dN/dS ratio 0.095 
Amino acid sequence identity (%) 88.2 
Coding sequence identity (%) 85.3 
Aligned sequence length (codons) 434 
Property Value 
Counts of 1:1 orthologues 15187 
dN 0.057 
dS 0.58 
dN/dS ratio 0.095 
Amino acid sequence identity (%) 88.2 
Coding sequence identity (%) 85.3 
Aligned sequence length (codons) 434 

The estimated gene count is substantially higher in mice (20210) than in humans (19042) because of the larger number of rodent-specific gene duplicates. This is almost entirely due to genes with roles in chemosensation, such as olfactory and vomeronasal receptors and pheromone genes. The cull of primate-specific chemosensation genes relative to murid rodents and other mammals seems to have accelerated in the old-world primates in the last ∼25 million years [23,24]. The contrast in gene repertoires perhaps reflects the divergent sensory requirements for mammals that differ in diet and behaviour.

Gene duplications in mice are predominantly found in tandem copies of genomic sequence [8], and most probably arise via unequal crossover or non-allelic homologous recombination. In contrast, human duplicates are more often dispersed to different chromosomes in recombinations that were mediated by the primate-specific Alu-SINE (short interspersed element) retrotransposons [25,26].

Large genomic duplications are less frequent than smaller ones. Thus compact genes are more likely to be duplicated with a complete and intact ORF, as well as the accompanying promoter and other necessary non-coding regulatory sequence. Median sizes of human- or mouse-lineage-specific genes are thus approx. 3- and 6-fold smaller than those of simple orthologues (6.6 or 3.3 kb compared with 26.6 or 21.2 kb respectively); they have fewer exons (medians of five and three compared with nine for both mice and humans); and they are 5- and 12-fold (in humans or mice) more likely to contain only a single coding exon.

Unequal crossing-over can also generate novel functional genes, either via the formation of chimaeric genes or as partial duplicates (e.g. [27]). For example, a duplication of a fragment of the rodent Dlg5 gene spawned an extended family of testis-specific genes that have become extremely widespread in the mouse and rat genomes [28]. Partial duplications have also occasionally been observed within duplicated genes. The primate-specific LPA gene, for example, was formed initially by a complete duplication of PLG with a subsequent succession of further internal duplications of individual domains [29].

Innovative functions of lineage-specific genes

Duplicated genes form a far from random selection of all genes. Aside from the bias in gene sizes mentioned above, several functional categories are overrepresented among genes specific to either primate or rodent lineages. These include (i) chemosensation genes which are more numerous in rodents, and genes associated with (ii) immunity or host defence (e.g. T-cell receptor genes), (iii) with detoxification (e.g. cytochrome P450), and (iv) with reproduction (e.g. pheromone or cancer-testis antigen genes [30]).

Many of these genes are of considerable biological interest and are found in regions of the genome rich in segmental duplications and interspersed repeats, and are thus only present in finished genome assemblies. The extreme repetitive nature of the Y chromosome has required particularly focused efforts to complete its sequence. Indeed, further copies of rapidly evolving gene families can be expected to be discovered as the remaining gaps in the human and mouse genomes are closed.

These functional biases might imply that gene duplication events are largely adaptive: expansions in the gene repertoire may be a response to environmental challenges from pathogens and parasites, or confer reproductive advantages over other individuals from the same species. Evidence of positive selection at specific codons [dN/dS ratio (ratio of non-synonymous to synonymous substitutions) significantly greater than 1] has often been cited for duplicated gene families (e.g. [31,32]).

However, with the small mammalian (especially primate) effective population sizes, chance gene duplications may remain in the population for some time or proceed to fixation when they are not advantageous or even mildly deleterious. This would explain the recent provenance and hence short lifespans of gene duplicates (Figure 1), some of which are unfixed and copy-number-variable within human or mouse populations. The observed overrepresentation of functional categories may simply be because duplications of genes in other classes tend to be deleterious (in part due to stoichiometric constraints) and thus are preferentially purged from the population [33,34].

Even where evidence for positive selection is cited, this may instead be the result of loss of constraint in redundant gene copies after duplication. Other cases of apparent adaptive change can be due to biased gene conversion [35]. This is a bias in the recombination-associated repair process following double-strand breaks and can lead to the erroneous inference of positive selection [36,37]. It is suggested that biased gene conversion among adjacent tandemly duplicated genes may drive their accelerated evolution [38].

On the other hand, the short lifespan of many gene duplicates may point to the short period of advantage that novelty confers in a constantly changing adaptive landscape. A high turnover in the gene repertoire with many births and deaths would be seen in an evolutionary ‘arms race’ with pathogens [39]. There can also be positive selection among genes expressed in germline or stem cells [4042] which confers no advantage on either the individual or population. Instead, gain-of-function mutations that, for example, favour clonal expansion of mutant spermatogonia [43], may underlie much strong diversifying selection.

Almost 40 years after Ohno [44] first proposed that gene duplications are the principal forces for the generation of novelty in evolution, the relative contributions of adaptation and genetic drift in maintaining our gene repertoires thus remain to be determined. The availability of further primate and rodent assembled genomes of high quality on one hand, and extensive allele frequency data for copy number variants in natural primate and rodent populations on the other, may be required to decide between these two evolutionary scenarios [45].

Concluding remarks

Strenuous efforts have now provided near-complete euchromatic genome sequence assemblies of humans and mice. These have revealed large numbers of lineage-specific genes lying within segmentally duplicated genomic regions that were previously recalcitrant to sequencing and assembly. Many of these novel genes are studied only rarely, partly because of their relatively late arrival into genome sequence assemblies, and partly due precisely to the absence of single orthologous genes in the other species: inferences from model organisms about lineage-specific human genes have to be treated with the greatest caution, while a case has to be made for the relevance of each mouse-specific gene to human physiology and disease. Nevertheless, if we are to fully appreciate the functional repertoire of our own species, and that of the mouse, our most important model organism, then these genes demand much further scrutiny in the future.

Protein Evolution: Sequences, Structures and Systems: Biochemical Society Focused Meeting to commemorate the 200th Anniversary of Charles Darwin's birth held at the Wellcome Trust Conference Centre, Cambridge, U.K., 26–27 January 2009. Organized and Edited by Roman Laskowski (EMBL-EBI, Hinxton, U.K.), Michael Sternberg (Imperial College London, U.K.) and Janet Thornton (EMBL-EBI, Hinxton, U.K.).

Abbreviations

     
  • ORF

    open reading frame

Funding

We gratefully acknowledge support from the Medical Research Council.

References

References
1
Paigen
K.
One hundred years of mouse genetics: an intellectual history. I. The classical period (1902–1980)
Genetics
2003
, vol. 
163
 (pg. 
1
-
7
)
2
Paigen
K.
One hundred years of mouse genetics: an intellectual history. II. The molecular revolution (1981–2002)
Genetics
2003
, vol. 
163
 (pg. 
1227
-
1235
)
3
Lander
E.S.
Linton
L.M.
Birren
B.
Nusbaum
C.
Zody
M.C.
Baldwin
J.
Devon
K.
Dewar
K.
Doyle
M.
FitzHugh
W.
, et al. 
Initial sequencing and analysis of the human genome
Nature
2001
, vol. 
409
 (pg. 
860
-
921
)
4
Waterston
R.H.
Lindblad-Toh
K.
Birney
E.
Rogers
J.
Abril
J.F.
Agarwal
P.
Agarwala
R.
Ainscough
R.
Alexandersson
M.
An
P.
, et al. 
Initial sequencing and comparative analysis of the mouse genome
Nature
2002
, vol. 
420
 (pg. 
520
-
562
)
5
International Human Genome Sequencing Consortium
Finishing the euchromatic sequence of the human genome
Nature
2004
, vol. 
431
 (pg. 
931
-
945
)
6
Church
D.M.
Goodstadt
L.
Hillier
L.W.
Zody
M.C.
Goldstein
S.
She
X.
Bult
C.J.
Agarwala
R.
Cherry
J.L.
DiCuccio
M.
, et al. 
Lineage-specific biology revealed by a finished genome assembly of the mouse
PLoS Biol.
2009
, vol. 
7
 pg. 
e1000112
 
7
Eichler
E.E.
Clark
R.A.
She
X.
An assessment of the sequence gaps: unfinished business in a finished human genome
Nat. Rev. Genet.
2004
, vol. 
5
 (pg. 
345
-
354
)
8
She
X.
Cheng
Z.
Zollner
S.
Church
D.M.
Eichler
E.E.
Mouse segmental duplication and copy number variation
Nat. Genet.
2008
, vol. 
40
 (pg. 
909
-
914
)
9
Sebat
J.
Lakshmi
B.
Troge
J.
Alexander
J.
Young
J.
Lundin
P.
Maner
S.
Massa
H.
Walker
M.
Chi
M.
, et al. 
Large-scale copy number polymorphism in the human genome
Science
2004
, vol. 
305
 (pg. 
525
-
528
)
10
Iafrate
A.J.
Feuk
L.
Rivera
M.N.
Listewnik
M.L.
Donahoe
P.K.
Qi
Y.
Scherer
S.W.
Lee
C.
Detection of large-scale variation in the human genome
Nat. Genet.
2004
, vol. 
36
 (pg. 
949
-
951
)
11
Tuzun
E.
Sharp
A.J.
Bailey
J.A.
Kaul
R.
Morrison
V.A.
Pertz
L.M.
Haugen
E.
Hayden
H.
Albertson
D.
Pinkel
D.
, et al. 
Fine-scale structural variation of the human genome
Nat. Genet.
2005
, vol. 
37
 (pg. 
727
-
732
)
12
McCarroll
S.A.
Kuruvilla
F.G.
Korn
J.M.
Cawley
S.
Nemesh
J.
Wysoker
A.
Shapero
M.H.
de Bakker
P.I.
Maller
J.B.
Kirby
A.
, et al. 
Integrated detection and population-genetic analysis of SNPs and copy number variation
Nat. Genet.
2008
, vol. 
40
 (pg. 
1166
-
1174
)
13
Goodstadt
L.
Ponting
C.P.
Phylogenetic reconstruction of orthology, paralogy, and conserved synteny for dog and human
PLoS Comput. Biol.
2006
, vol. 
2
 pg. 
e133
 
14
Goodstadt
L.
Heger
A.
Webber
C.
Ponting
C.P.
An analysis of the gene complement of a marsupial, Monodelphis domestica: evolution of lineage-specific genes and giant chromosomes
Genome Res.
2007
, vol. 
17
 (pg. 
969
-
981
)
15
Clamp
M.
Fry
B.
Kamal
M.
Xie
X.
Cuff
J.
Lin
M.F.
Kellis
M.
Lindblad-Toh
K.
Lander
E.S.
Distinguishing protein-coding and noncoding genes in the human genome
Proc. Natl. Acad. Sci. U.S.A.
2007
, vol. 
104
 (pg. 
19428
-
19433
)
16
Gerstein
M.B.
Bruce
C.
Rozowsky
J.S.
Zheng
D.
Du
J.
Korbel
J.O.
Emanuelsson
O.
Zhang
Z.D.
Weissman
S.
Snyder
M.
What is a gene, post-ENCODE?. History and updated definition
Genome Res.
2007
, vol. 
17
 (pg. 
669
-
681
)
17
Denoeud
F.
Kapranov
P.
Ucla
C.
Frankish
A.
Castelo
R.
Drenkow
J.
Lagarde
J.
Alioto
T.
Manzano
C.
Chrast
J.
, et al. 
Prominent use of distal 5′ transcription start sites and discovery of a large number of additional exons in ENCODE regions
Genome Res.
2007
, vol. 
17
 (pg. 
746
-
759
)
18
Marques
A.C.
Dupanloup
I.
Vinckenbosch
N.
Reymond
A.
Kaessmann
H.
Emergence of young human genes after a burst of retroposition in primates
PLoS Biol.
2005
, vol. 
3
 pg. 
e357
 
19
Kaessmann
H.
Vinckenbosch
N.
Long
M.
RNA-based gene duplication: mechanistic and evolutionary insights
Nat. Rev. Genet.
2009
, vol. 
10
 (pg. 
19
-
31
)
20
Gunter
C.
Dhand
R.
Human biology by proxy
Nature
2002
, vol. 
420
 pg. 
509
 
21
Abd El-Aziz
M.M.
Barragan
I.
O'Driscoll
C.A.
Goodstadt
L.
Prigmore
E.
Borrego
S.
Mena
M.
Pieras
J.I.
El-Ashry
M.F.
Safieh
L.A.
, et al. 
EYS, encoding an ortholog of Drosophila spacemaker, is mutated in autosomal recessive retinitis pigmentosa
Nat. Genet.
2008
, vol. 
40
 (pg. 
1285
-
1287
)
22
Stedman
H.H.
Kozyak
B.W.
Nelson
A.
Thesier
D.M.
Su
L.T.
Low
D.W.
Bridges
C.R.
Shrager
J.B.
Minugh-Purvis
N.
Mitchell
M.A.
Myosin gene mutation correlates with anatomical changes in the human lineage
Nature
2004
, vol. 
428
 (pg. 
415
-
418
)
23
Shi
P.
Zhang
J.
Comparative genomic analysis identifies an evolutionary shift of vomeronasal receptor gene repertoires in the vertebrate transition from water to land
Genome Res.
2007
, vol. 
17
 (pg. 
166
-
174
)
24
Liman
E.R.
Use it or lose it: molecular evolution of sensory signaling in primates
Pflugers Arch.
2006
, vol. 
453
 (pg. 
125
-
131
)
25
Bailey
J.A.
Liu
G.
Eichler
E.E.
An Alu transposition model for the origin and expansion of human segmental duplications
Am. J. Hum. Genet.
2003
, vol. 
73
 (pg. 
823
-
834
)
26
Kim
P.M.
Lam
H.Y.
Urban
A.E.
Korbel
J.O.
Affourtit
J.
Grubert
F.
Chen
X.
Weissman
S.
Snyder
M.
Gerstein
M.B.
Analysis of copy number variants and segmental duplications in the human genome: evidence for a change in the process of formation in recent evolutionary history
Genome Res.
2008
, vol. 
18
 (pg. 
1865
-
1874
)
27
Wang
X.
Zhang
J.
Remarkable expansions of an X-linked reproductive homeobox gene cluster in rodent evolution
Genomics
2006
, vol. 
88
 (pg. 
34
-
43
)
28
Spiess
A.N.
Walther
N.
Muller
N.
Balvers
M.
Hansis
C.
Ivell
R.
SPEER: a new family of testis-specific genes from the mouse
Biol. Reprod.
2003
, vol. 
68
 (pg. 
2044
-
2054
)
29
Lawn
R.M.
Schwartz
K.
Patthy
L.
Convergent evolution of apolipoprotein(a) in primates and hedgehog
Proc. Natl. Acad. Sci. U.S.A.
1997
, vol. 
94
 (pg. 
11992
-
11997
)
30
Emes
R.D.
Goodstadt
L.
Winter
E.E.
Ponting
C.P.
Comparison of the genomes of human and mouse lays the foundation of genome zoology
Hum. Mol. Genet.
2003
, vol. 
12
 (pg. 
701
-
709
)
31
Karn
R.C.
Clark
N.L.
Nguyen
E.D.
Swanson
W.J.
Adaptive evolution in rodent seminal vesicle secretion proteins
Mol. Biol. Evol.
2008
, vol. 
25
 (pg. 
2301
-
2310
)
32
Jackson
M.
Watt
A.J.
Gautier
P.
Gilchrist
D.
Driehaus
J.
Graham
G.J.
Keebler
J.
Prugnolle
F.
Awadalla
P.
Forrester
L.M.
A murine specific expansion of the Rhox cluster involved in embryonic stem cell biology is under natural selection
BMC Genomics
2006
, vol. 
7
 pg. 
212
 
33
Dopman
E.B.
Hartl
D.L.
A portrait of copy-number polymorphism in Drosophila melanogaster
Proc. Natl. Acad. Sci. U.S.A.
2007
, vol. 
104
 (pg. 
19920
-
19925
)
34
Nguyen
D.Q.
Webber
C.
Ponting
C.P.
Bias of selection on human copy-number variants
PLoS Genet.
2006
, vol. 
2
 pg. 
e20
 
35
Marais
G.
Biased gene conversion: implications for genome and sex evolution
Trends Genet.
2003
, vol. 
19
 (pg. 
330
-
338
)
36
Berglund
J.
Pollard
K.S.
Webster
M.T.
Hotspots of biased nucleotide substitutions in human genes
PLoS Biol.
2009
, vol. 
7
 pg. 
e26
 
37
Galtier
N.
Duret
L.
Glemin
S.
Ranwez
V.
GC-biased gene conversion promotes the fixation of deleterious amino acid changes in primates
Trends Genet.
2009
, vol. 
25
 (pg. 
1
-
5
)
38
Hurst
L.D.
Evolutionary genomics: a positive becomes a negative
Nature
2009
, vol. 
457
 (pg. 
543
-
544
)
39
Dawkins
R.
Krebs
J.R.
Arms races between and within species
Proc. R. Soc. London Ser. B
1979
, vol. 
205
 (pg. 
489
-
511
)
40
Laukaitis
C.M.
Heger
A.
Blakley
T.D.
Munclinger
P.
Ponting
C.P.
Karn
R.C.
Rapid bursts of androgen-binding protein (Abp) gene duplication occurred independently in diverse mammals
BMC Evol. Biol.
2008
, vol. 
8
 pg. 
46
 
41
Birtle
Z.
Goodstadt
L.
Ponting
C.
Duplication and positive selection among hominin-specific PRAME genes
BMC Genomics
2005
, vol. 
6
 pg. 
120
 
42
Johnson
M.E.
Viggiano
L.
Bailey
J.A.
Abdul-Rauf
M.
Goodwin
G.
Rocchi
M.
Eichler
E.E.
Positive selection of a gene family during the emergence of humans and African apes
Nature
2001
, vol. 
413
 (pg. 
514
-
519
)
43
Goriely
A.
McVean
G.A.
Rojmyr
M.
Ingemarsson
B.
Wilkie
A.O.
Evidence for selective advantage of pathogenic FGFR2 mutations in the male germ line
Science
2003
, vol. 
301
 (pg. 
643
-
646
)
44
Ohno
S.
Evolution by gene duplication
1970
Heidelberg
Springer-Verlag
45
Perry
G.H.
Yang
F.
Marques-Bonet
T.
Murphy
C.
Fitzgerald
T.
Lee
A.S.
Hyland
C.
Stone
A.C.
Hurles
M.E.
Tyler-Smith
C.
, et al. 
Copy number variation and evolution in humans and chimpanzees
Genome Res.
2008
, vol. 
18
 (pg. 
1698
-
1710
)