Complement control protein modules (CCPs) occur in numerous functionally diverse extracellular proteins. Also known as short consensus repeats (SCRs) or sushi domains each CCP contains approximately 60 amino acid residues, including four consensus cysteines participating in two disulfide bonds. Varying in length and sequence, CCPs adopt a β-sandwich type fold and have an overall prolate spheroidal shape with N- and C-termini lying close to opposite poles of the long axis. CCP-containing proteins are important as cytokine receptors and in neurotransmission, cell adhesion, blood clotting, extracellular matrix formation, haemoglobin metabolism and development, but CCPs are particularly well represented in the vertebrate complement system. For example, factor H (FH), a key soluble regulator of the alternative pathway of complement activation, is made up entirely from a chain of 20 CCPs joined by short linkers. Collectively, therefore, the 20 CCPs of FH must mediate all its functional capabilities. This is achieved via collaboration and division of labour among these modules. Structural studies have illuminated the dynamic architectures that allow FH and other CCP-rich proteins to perform their biological functions. These are largely the products of a highly varied set of intramolecular interactions between CCPs. The CCP can act as building block, spacer, highly versatile recognition site or dimerization mediator. Tandem CCPs may form composite binding sites or contribute to flexible, rigid or conformationally ‘switchable’ segments of the parent proteins.
The sequences of numerous extracellular proteins contain repeats that fold into structures likened to beads on a string . Each repeat corresponds to a domain or ‘module’ . Multiple modules are often arranged in tandem and inter-module interactions are dominated by contacts between pairs of modules that are neighbours within the sequence. Thus diverse protein architectures and an even wider diversity of functions, arose during evolution from combinatorial use of a relatively small set of module-types . Small and compact, most modules proved straightforward targets for structure determinations . More understanding is needed of how the sum of the interactions between component modules determines the shape and flexibility of the parent protein. Armed with this knowledge, we could engineer novel proteins, designed to possess a range of properties, from simple building blocks.
The complement control protein module (CCP)  is an intriguing example of such a building block. An unusual property is that long, uninterrupted, strings of tandem CCPs occur in some proteins (Figure 1). How multiple CCPs are organized to form functionally sophisticated proteins is critical to function. For illustration, we focus below on factor H (FH), a key regulator of the complement system, made up entirely of 20 CCPs.
The RCAs are architecturally diverse CCP-rich proteins
The complement control protein module
The existence of CCPs, alternatively named ‘sushi domains’ , was inferred from the multiple imperfectly repeating segments, of ∼60 residues, observed in sequences of the regulators of complement activation (RCA) protein family (Figure 1) [7,8] that includes FH. Initially termed short consensus repeats (SCRs), these have a cysteine near either end (CysI and CysIV) and two additional near-invariant cysteines (CysII and CysIII). Disulfide bonds form CysI-CysIII and CysII-CysIV. A tryptophan occurs in the sequence between CysIII and CysIV. The consensus sequence also includes several glycines, prolines and hydrophobic residues. Other proteins of the complement system contain CCPs, as do cytokine receptors and proteins involved in development, neurotransmission, cell adhesion, blood clotting, the extracellular matrix and haemoglobin metabolism.
Early CD  and Fourier transform-infrared spectroscopy  predicted β-structures for the 20 CCPs of FH. Subsequently, NMR was used to determine the structure of a recombinant version of the sixteenth CCP of human FH (FH 16) [11,12]. Many 3D structures of CCPs have been determined since then and used to model hundreds more . CCPs (see example of FH 9 in Figure 1) have a β-sandwich type of structure, smaller and less regular than examples of this fold found in immunoglobulins or fibronectin. CCPs approximate to prolate ellipsoids, with N- and C-termini lying at opposing poles of the longest axis. A small hydrophobic core, containing the side chains of the tryptophan and other conserved residues, is bounded by the disulfide linkages.
Numbers and positions within the sequence of β-strands vary, as does the presence of loops and bulges and insertions. Up to eight strands may occur: β-strand 1, includes the N-terminus and CysI; β-strand 2 follows the consensus glycine at 8–10 positions beyond position CysI; β-strands 3, 4 and 5 occur (underlined) within a ‘hXhGXXhXhXCIIXXG↑hXhXG’ motif (h is a hydrophobic residue and ↑ is a possible insertion; β-strand 6 precedes (and may include) CysIII; β-strand 7 includes the consensus tryptophan; and β-strand 8 includes CysIV and the residues on either side.
How each CCP is attached to its neighbours is critical to function. Many CCP-containing proteins have multiple binding sites  and binding sites often span neighbouring CCPs [15–17]. Cross-talk between simultaneously occupied binding sites may underpin biological function . Unsurprisingly, many examples of multiple-CCP proteins and protein fragments have been studied using structural (see below and Figure 2 for examples) and biophysical techniques [19–21].
A variety of CCP–CCP junctions occur in the RCAs
Each intermodular junction is composed of the linker (3–8 residues long) along with proximal loops and strands belonging to the two CCPs. Covalent bonds and non-covalent interactions within the linker, interactions between the proximal loops or strands of neighbouring modules and interactions between loops or strands and the linker, all contribute. As exemplified below and in Figure 2, junctions vary widely in the surface area buried between modules, their intermodular geometry and their flexibility.
Functional aspects, exemplified by CCPs in the RCAs
As mentioned above, CCPs are abundant in the RCAs protein family, which protects host tissue from damage by the complement system [22,23]. The RCAs have been extensively characterized  illuminating structure–function relationships for CCP-containing proteins. Perhaps the most studied of all CCP-containing protein is the key soluble RCA, FH.
By way of context, ‘complement’ is a set of blood proteins that ensure rapid clearance or destruction of pathogens and other hazards . Crucially, to avoid damage, complement must be selectively and vigorously suppressed on healthy self-cells and tissues. Suppression is largely attributable to the RCAs. Whether attached to membranes (Figure 1) or circulating in plasma (like FH), RCAs operate by preventing amplification of C3b. C3b is the key activation-specific proteolytic fragment of the abundant inactive precursor C3. Some C3b is generated spontaneously and continuously (via the ‘alternative pathway’) whereas larger amounts can be produced in response to immune complexes (‘classical’ pathway) or bacterial polysaccharides (‘lectin’ pathway). Irrespective of provenance, nascent C3b undergoes a domain rearrangement  and binds covalently to nearby surfaces . It can then experience rapid amplification . This requires a positive-feedback loop in which C3b binds factor B (B), where-upon B is cleaved forming a C3b–Bb complex; the short-lived C3b–Bb converts additional C3 to C3b. Any blood-exposed surface can potentially become coated in C3b. Consequences include inflammation, opsonophagotysis and cytolytic destruction. Crucially, where sufficient RCAs are already resident on the surface or (like FH) are recruited from circulation, they interact with C3b and suppress its amplification. Figure 1 illustrates the diverse sizes, shapes and architectures of RCAs created from strings of multiple CCPs.
Factor H, 20 CCPs working in concert
The soluble 155-kDa RCA FH is uniquely simple (Figure 1), consisting entirely of CCPs . It is, however, functionally sophisticated [30–33]. Sophistication arises from its numerous CCPs of diverse sequence. Of particular interest, since it forms the basis for self compared with non-self discrimination by complement, is that FH, a soluble protein, binds C3b with a context-dependent affinity . It binds particularly well to C3b on self-surfaces thereby competing with binding of B and suppressing the initial step of C3b–Bb formation. C3b-bound FH then recruits factor I (FI) and assists FI to cleave C3b  to an inactive form. Additionally, FH accelerates irreversible decay of C3b–Bb . Thus FH very effectively inhibits C3b amplification on self-surfaces. Crucially, on a foreign surface FH is normally unable to suppress C3b amplification; it has less affinity for C3b in this context and less ability to accelerate C3b–Bb decay.
Pathogens display FH-capturing molecules in an attempt to evade complement [37,38]. FH has probably evolved strategies to resist such bacterial hijack. Thus FH may circulate in a semi-quiescent conformation that binds relatively weakly to C3b (albeit sufficient to suppress complement in fluid phase). Anchoring of FH to its surface by a bacterium may therefore capture a conformation that affords little protection. It has been suggested  that FH adjusts its conformation only in response to specific molecular patterns that are unique to self surfaces . In this way, FH reserves its full C3b-binding affinity for when and where it is needed and denies it to most bacteria. In a further round of the evolutionary arms race, however, some bacterial proteins can overcome this resistance.
Collectively, the 20 CCPs of FH must mediate all these functional capabilities . Two FH regions, CCPs 1–4 and 19–20, bind to non-overlapping sites on C3b (and in the case of CCPs 19–20, to C3b degradation products) confirmed by crystal structures [40,41] (Figure 1). The C-terminal CCP also contains a site for binding glycosaminoglycans and sialic acids , and CCPs 6-8 contain an additional self-recognition site that recognizes glycosminoglycans [43,44]. The C-terminal C3b-binding site may not be available [41–46], in circulating quiescent FH, which can therefore bind C3b only via CCPs 1-4.
Simultaneous occupation of both ‘pattern-reading’ sites (CCPs 6–8 and CCP 20, which are also the sites exploited for binding by many bacterial proteins), could unmask the C-terminal C3b-binding site facilitating the bivalent complex with C3b needed to ensure suppression of its amplification. This ability to transition between quiescent and active forms of FH has been attributed to its central CCPs  that feature longer linkers and shorter modules than average for FH. CCP 13 is the shortest CCP in the RCAs and is structurally distinct from other CCPs .
The architecture of complement factor H
EM and scattering studies indicated that FH is extended and flexible [48,49], consistent with CCPs being connected head-to-tail with at least some inter-modular conformational mobility. This is in line with early NMR-based structural studies of the CCP pair FH 15–16 . An extended arrangement of the two CCPs, sharing a small interface, was inferred from the dearth of nuclear Overhauser effects (NOEs) between residues in different modules. A lack of chemical shift differences between either module on its own and the same module when part of the pair, also suggested flexibility.
NMR-based studies of two overlapping recombinant module pairs, FH 1–2 and FH 2–3 , revealed more inter-modular and module-linker NOEs, implying greater rigidity than in FH 15–16. A ‘merged’ model of FH 1–3 suggested a rod-like structure, 105 Å (1 Å=0.1 nm) long with limited mobility . The inter-modular geometry of FH 1–3 resembles that of the first three CCPs of C3b-binding sites in complement receptor type 1 (CR1 1–3, 8–10 and 15–17) and decay-accelerating factor (DAF 2–4; Figure 1). Moreover, it is broadly conserved in a crystal structure of FH 1–4 bound to C3b . This structure revealed a tilt between CCPs 3 and 4 that is also observed between the CCPs 3 and 4 of membrane cofactor protein (MCP; Figure 2) and of a four-CCP RCA, Vaccinia complement protein, from vaccinia virus .
A crystal structure of FH 6–8 (in complex with an analogue of the glycosaminoglycans) was solved , as were structures of FH 7  alone and FH 6–7 in complex with a FH-binding protein from Neisseria meningitides . These structures indicate an extended arrangement and limited flexibility between these three CCPs.
Taken together, a crystal structure and SAXS analysis of FH 18–20  and both crystal  and solution structures of 19–20  suggest that CCP 18 is flexibly connected to a less flexible, extended segment consisting of the C-terminal two modules that have also been solved in complex with C3d  and a sialic acid .
Although binding sites in CCPs 1–4, 6–8 and 19–20 were all rod-like, the atypical stretch of CCPs in the central region of FH could cause FH to bend back on itself and provide the combination of stiffness and motion required for a conformational switch that modulates the accessibility of the N- and C-terminal C3b-binding sites . Larger inter-modular angles occur here than elsewhere, according to NMR-derived structures of recombinant module-pairs, CCP 10–11, CCP 11–12  and CCP 12–13  (Figure 2). High-resolution structures of module-pairs were combined with SAXS studies of longer fragments leading to a model in which CCPs 8 and 9 are flexibly attached to a compact segment consisting of CCPs 10–15 (Figure 1).
Strand-swaps, intrinsic disorder and dimerization
Although each SCR in a protein sequence typically corresponds to a distinct CCP and each CCP interacts predominantly with CCPs that are immediate neighbours within the same polypeptide chain, there are notable exceptions. For example, the N-terminal CCP of the gamma-aminobutyric acid-B receptor does not appear to fold compactly (when expressed as a pair with the second CCP ). In the interleukin-2 (IL2) receptor α-chain a 42-residue linker connects two ‘predicted CCPs’, inferred from the presence of SCRs. The crystal structure  was unexpected (Figure 3). The two predicted modules have swapped their β-strands 1 and 2 or, looked at in another way, one CCP (and the 42-residue linker) form an insertion between strands 3 and 4 of a ‘parent’ CCP. The inserted CCP (to which IL2 binds) and parent CCPs are arranged ‘tail-to-tail’ and are kinked with numerous interactions between them yielding rigidity.
Strand swaps and dimerization
The haemoglobin-scavenging protein, haptoglobin (Hg) has one weakly predicted CCP that lacks CysIII. In a structure of an Hg dimer (Figure 3), solved in complex with two haemoglobin molecules , CCPs from opposing subunits of Hg are arranged head-to-tail and side-by-side with swapped strands. Whereas CysII and CysIV form a canonical disulfide within each chain, the two CysI residues form an inter-chain disulfide. The segment corresponding (in other CCPs) to β-strands 1–3 is swapped, whereas β-strands 3 and 4 form a continuous strand. The resulting very rigid dimer, with its haemoglobin cargo, can cross-link two macrophage scavenging receptors, probably facilitating endocytosis. Another tight dimer interface is formed by FHR (FH-related protein) 1 CCPs 1–2 , although no strands are swapped (Figure 3). Two CCPs from opposing subunits sit side-by-side and antiparallel. Hydrophobic residues from the end of β-strand 2, from β-strand 4 and from an insertion between β-strands 5 and 6 of CCP 1 interact with a set of hydrophobic residues in and around β-strand 1 of CCP 2. The C-terminal CCPs of each subunit are near-identical with FH CCPs 19 and 20; dimerization of FHR 1 creates an effective competitor of FH binding to surfaces that could modulate FH's regulatory activity.
Returning finally to FH, the above mentioned examples lend credence to circumstantial evidence suggesting that FH CCP 14 may also be atypical. Unlike other CCP pairs (and unlike CCP 10–15) neither CCP 13–14 nor CCP 14–15 could be produced recombinantly and stably in a form suitable for structural work . This was interpreted in terms of CCP 14 requiring more than its immediate neighbours for stability. This led to the suggestion that this CCP could provide a ‘soft’ point in FH; its structure is readily perturbed, helping the molecule switch between active and quiescent conformations, as discussed earlier. More studies of the central region of FH are needed to investigate this possibility.
Recent work in the Barlow lab discussed herein was funded by the Wellcoem Trust [grant number 081179] and the BBSRC UK [grant number BB/I007946/1].
Repetitive, Non-Globular Proteins: Nature to Nanotechnology: Held at the University of York, U.K., 30 March 2015–1 April 2015.