The role of pro-domains in human growth factors and cytokines

Many growth factors and cytokines are produced as larger precursors, containing pro-domains, that require proteolytic processing to release the bioactive ligand. These pro-domains can be significantly larger than the mature domains and can play an active role in the regulation of the ligands. Mining the UniProt database, we identified almost one hundred human growth factors and cytokines with pro-domains. These are spread across several unrelated protein families and vary in both their size and composition. The precise role of each pro-domain varies significantly between the protein families. Typically they are critical for controlling bioactivity and protein localisation, and they facilitate diverse mechanisms of activation. Significant gaps in our understanding remain for pro-domain function — particularly their fate once the bioactive ligand has been released. Here we provide an overview of pro-domain roles in human growth factors and cytokines, their processing, regulation and activation, localisation as well as therapeutic potential.


Introduction
Regulation of biological processes is critical for the correct functioning of an organism. Proteins are often at the heart of this regulation, and mechanisms used for controlling their activities are many-fold. One such mechanism by which the activity of a protein can be controlled is via the inclusion of a pro-domain or pro-peptide (see Table 1 for nomenclature we use). This is by no means an unusual mode of regulation as almost 1 in 25 human proteins are known to contain a pro-peptide or pro-domain (756 out of 20 394 of annotated human proteins in UniProt contain 'pro-domain' and 'pro-peptide' keywords). These proteins are initially produced as inactive precursor polypeptides, which are subsequently rendered active in a spatially and temporally appropriate way through proteolytic cleavage between the pro-domain and the mature growth factor (GF). Such activation of precursor proteins through proteolytic processing is well characterised for proteases, which are often produced as inactive precursors, also called zymogens, with pro-domain often serving as both a folding chaperone and an inhibitor [1], but less appreciated in the context of growth factors.
Intercellular signalling is a process by which different parts of a multicellular organism communicate with each other. This is typically achieved with the aid of secreted proteins, growth factors, cytokines and hormones, which are released by the cell that synthesises them. They then diffuse to the target cell that presents the appropriate receptor, triggering a signalling cascade. Pro-domain function is relatively well studied in the Transforming Growth Factor β (TGF-β) family of growth factors, however numerous other proteins contain pro-domains of both known and unknown function. In this review, we take a broader look at all families of human growth factors and cytokines and see how prevalent pro-domains are, what they look like in sequence and how they contribute to the biological functions of these proteins.

Survey of growth factor pro-domains
We searched the UniProt protein database (www.uniprot.org, version 2020_06) with keywords 'pro-domain' and 'pro-peptide' to find secreted human signalling proteins (growth factors, cytokines and hormones) that are synthesised as larger precursors. As annotation can be variable and nomenclature is not always consistent, we then expanded the search manually looking for other members of already identified protein families and other similar proteins we expected to be in the list. We also limited the search to folded proteins, leaving out peptide hormones which are not expected to have defined three-dimensional structure in isolation. We also excluded possible splice variants from this analysis.
The results of this data gathering are shown schematically in Figure 1. The most identifiable feature is that pro-domains vary greatly in size. In many cases the pro-domains are as big or even much bigger than the mature ligand, with epidermal growth factor (EGF) being the most extreme in this respect; EGF is synthesised as a protein of more than 1200 residues, but the bioactive mature GF is only 52 residues. Pro-domains or -peptides are more often found in the N-terminal part of the protein, but they can also be C-terminal, or the mature domain is flanked by removable parts at both ends. The insulin family is unique in containing a so-called C-peptide in the middle of the precursor, resulting in disulfide-linked dimers containing N-and C-terminal parts of the precursor. Precursors of the EGF and tumour necrosis factor α (TNF-α) family proteins and stem cell factor (SCF) contain well-predicted transmembrane segments and can consequently exist both as soluble ligands as well as bioactive membrane-bound proteins [2][3][4]. Some of the interleukins lack a signal peptide, which would be expected in secreted proteins, as they are initially produced intracellularly until processed by proteases and then secreted [5]. Hepatocyte growth factor (HGF) is in a class of its own: it is made as an inactive pro-HGF and activated proteolytically in the extracellular space, but the two parts remain attached by a disulfide bond with both parts needed for bioactivity [6]. This diversity alone provokes questions about the function and role of the precursor forms and pro-domains in these proteins, yet surprisingly little is known compared to the function of the mature GFs.
While we have not done comprehensive analysis of sequence conservation in the pro-domains, it is clear that these are less conserved across the species than the mature domains are. Also, while the domain structure, as shown in Figure 1, shows apparent conservation within a protein family, at the sequence level the conservation of pro-domains can be low or even non-existent. This is perhaps not surprising as the mature GFs are often signalling through shared receptors while the pro-domains and -peptides have been able to differentiate to play more specialised roles in these proteins.
In the following, we will discuss some of the growth factor families which are produced in a precursor form and for which information about processing and the role of pro-domains are available.

The TGF-β family
The TGF-β family of proteins contain large pro-domains, typically 2-3 times larger than the mature ligands, which are known to play a key role in the activity of the mature growth factors [7]. It is perhaps the best studied group in this respect and one where most structural information is available. The genes for these growth factors encode for a 210-432 residues pro-domain and 100-120 residues mature GF ligand. Typically, the precursor is processed by furin-like proteases in the Golgi complex, and cells secrete a processed pro-form. Pro-domain A removable domain distinct from the mature GF, dispensable for bioactivity but possibly playing some functional role, most typically, inhibition while still tethered to the GF.
Pro-peptide A pro-domain that is not predicted to form a globular, folded structure on its own.

Precursor
Uncleaved form of the growth factor with pro-and mature domains still as a single polypeptide.
Pro-form or pro-complex Full-length protein including both pro-and mature domains, e.g. pro-activin A, cleaved by a protease. The mature and pro-domains remain non-covalently attached. Several structures of pro-forms of TGF-β family proteins have been determined experimentally and have helped to understand the role of the pro-domains ( Figure 2). While they all contain similar structural elements, the architecture of each complex varies significantly. The N-terminal part of the pro-domain, the so-called fore-arm, forms a long α-helix that interacts with the GF in the so-called wrist epitope where the type I receptors bind. The pro-domain then continues over the type II receptor site of the GF, forming a lasso-like structure, followed by a globular 'arm' domain that is bound to the mature GF. The pro-TGF-β1 structure shows a closed complex with arm domains linked by a disulfide in a bowtie-like extension, enclosing the mature ligand and masking all receptor-binding sites ( Figure 2A) [8,9]. Structures of pro-activin A, pro-myostatin and pro-BMP9 have revealed a different architecture, with the globular arm domains of the pro-domains pointing  [10][11][12]. Structural details of these complexes can be correlated with the level of inhibitory effect of the pro-domains [13].
The prototypical member of this family, TGF-β1, is known to be secreted as a furin-cleaved but still inactive pro-form, the latent-TGF-β1, which can be activated by mechanical force [14]. Myostatin (also called as growth and differentiation factor (GDF)8) is an example of a protein that is secreted in an unprocessed form [15] and remains inhibited by the pro-domain even after cleavage at the furin site [10]. Full activation of myostatin is achieved only after secondary proteolysis by Tolloid-like metalloproteases in the ECM, similar to closely related GDF11 [16,17]. The pro-form of bone morphogenetic protein 4 (BMP-4) requires also activation by a secondary cleavage of the pro-domain but, in this case, it is at a second furin-like cleavage site that is N-terminal to the primary poly-basic cleavage site [18]. BMP-3A, BMP-3B, GDF-6 and GDF-7 all possess a tribasic cleavage site in addition to the polybasic cleavage site, which generates proteins of different sizes [19][20][21]. Recently a conserved site for MMP cleavage and GF activation has been found for BMP-7, further widening the mechanisms of activation [22].
Many furin-processed TGF-β growth factors remain complexed with their pro-domains although the prodomains do not always inhibit their activity. Examples of this include activins, BMP-4, BMP-5, BMP-7 and BMP-9 [11,23]. Recent structures of pro-BMP-9 and pro-BMP-10 in complex with the extracellular domain of ALK1 receptor demonstrate how pro-domain and ALK1 binding to the mature GF ligands are not mutually exclusive [12]. The same has been observed with pro-BMP-7, where type I receptor and the pro-domain can bind simultaneously, while type II receptors will displace the pro-domain [24].
In addition to influencing GF bioactivity, the pro-domains appear to often play a role in protein localisation. Latent-TGF-βs are the best characterised in this way. These proteins associate with components of the extracellular matrix (ECM) and are disulfide-bonded to latent-TGF-β binding proteins (LTBPs) or Leucine rich repeat containing 32 (LRRC32, aka GARP; Figure 2) [25][26][27][28]. These inactive, but furin-cleaved, proteins can activated by diverse mechanism, e.g., by mechanical forces, proteolysis and through interaction with other proteins. Interaction with ECM on one end and with integrin binding to the RGD motif on the other end of the prodomain, can induce mechanical force that distorts the pro-domain releasing the bioactive mature dimer [14] ( Figure 2). Thromospondin-1 can bind both to mature and latent forms of TGF-β and activate the latent form both in vitro and in vivo [29,30]. The BMP-4 pro-domain interacts with fibrillin, targeting the complex to the extracellular matrix where the growth factor is stored and stabilised [31]. Pro-domains of other members of the TGF-β superfamily can also interact with components of the ECM at the cell surface. These include the prodomains of myostatin, which interact with perlecan [23], BMP-4, -5, -7, -10 and GDF-5, which interact with fibrillin-1 and fibrillin-2 [23,31]. Activin pro-domains are known to bind to heparan sulfates as a potential mechanism for ECM retention [32]. In addition to affecting localisation, interaction with ECM components is very likely to affect activation as well. While pro-activin A complex in solution has the same bioactivity as isolated mature domain, it is likely that pro-domain interactions with heparan sulfate or ECM protein will affect the dissociation of the pro-mature complex [11].

GDNF family
The glial-derived neurotrophic factor (GNDF) sub-family of TGF-β-like growth factors contains 5 members. While their mature GFs are very similar to classical TGF-β proteins, they have significantly smaller prodomains, typically just 55-75 residues. These are not predicted to contain a folded domain, and little is known of their structure. GDF15, which has only recently been shown to belong to this group [33], contains a propeptide of 154 residues, but it is not known if it forms a stable structure on its own. Pro-GDF15 has been shown to be enriched at the ECM in contrast with the processed mature form, suggesting a localisation role for the GDF15 pro-domain with processing possibly impacting circulating levels of the growth factor [34,35].

PDGF/NGF-like growth factors
Platelet-derived growth factors (PDGFs) and nerve growth factor (NGF)-family of GFs contain relatively small pro-peptides, 56-120 residues which are processed by furin-like proprotein convertases. Both NGF and brainderived neurotrophic factor (BDNF) pro-peptides are known to be necessary for the secretion of the proteins [36]. NGF pro-peptide has been characterised structurally, but it appears not to make well defined interactions with the mature GF even in its uncleaved precursor form, and both pro-domain and mature form are biologically active but display different effect on the target neurons [37]. Pro-NGF has been crystallised in complex with its receptor p75 ectodomain but no interpretable electron density was present for the pro-peptide [38]. In contrast, BDNF pro-peptide has been shown to interact with the mature domain at nanomolar affinity. This interaction is enhanced by pro-peptide variants due to genetic polymorphisms, but structural details of this are missing [39]. Additionally, both NGF and BDNF pro-peptides have been shown to have independent bioactivity. Pro-NGF and pro-BDNF promote apoptosis through p75 and sortilin receptors [40,41] but specific receptors or mechanisms of signalling for pro-peptides have not been described [37,42]. Polymorphism in BDNF pro-domain, resulting in Val66Met substitution, is known to affect BDNF processing and secretion and associated with psychiatric disorders [43].

EGF family
All members of this EGF superfamily, including neuregulins, heparin-binding-EGF (HB-EGF), TGF-α and amphiregulin, are membrane-bound proteins in their precursor form with the mature domain located in the extracellular side. They vary significantly in size, and the pro-domains show little similarity to one another. Except for amphiregulin, EGF-like proteins can also be secreted without the pro-domains [44]. The release of the mature EGF ligands is through a process called 'ectodomain shedding' in which typically furin-like and/or ADAM-family proteases cleave the mature ligand from the membrane bound precursor. This process is best characterised for HB-EGF (reviewed in [45]). In addition to release of the extracellular GF, the C-terminal cytoplasmic part of the HB-EGF precursor is also released and translocates to the nucleus [46]. The proteolytic release of EGF is not needed for bioactivity as high molecular weight forms of the protein are also bioactive and can compete with mature EGF for receptor binding [3,47]. The function of the almost 1000 residue prodomain of EGF is unknown, apart from containing the transmembrane helix.

TNF-α family
The TNF-α family members are type II transmembrane proteins with a short cytoplasmic N-terminal prodomain and the active ligand domain on the extracellular side. Ectodomain shedding releases the trimeric mature TNF-α which can activate its receptor, but the membrane bound form can also signal [2]. The processing of the precursor was identified as a potential mechanism of TNF-α control, and metalloprotease inhibitors [48,49] helped to identify the processing enzyme TACE (TNF-α converting enzyme, or ADAM17) [50,51].

Interleukin-1 family
Of the interleukins, only members of the IL-1 family contain a pro-domain. Of these IL-1β, IL-18 and IL-33, which are all involved in the innate immune system, are particularly interesting as they are strongly up-regulated in a number of human inflammatory diseases [52] and therefore targeting of each of them is being explored as a treatment approach with three therapeutics neutralising IL-1β approved [53]. They all have pro-domains of ∼100 residues, but they lack a signal peptide and hence are not directly secreted from cells. Instead, they are at first localised in the nucleus [54] or move from cytoplasm to nucleus upon stimulus such as cell stress. The release of the mature domain by proteolysis [5] can occur intracellularly or, within vesicles, by calpains or caspases [55]. Different mechanisms of secretion of the mature interleukins have been suggested, including lysosomal vesicles, microvesicles and exosomes [56]. The pro-domains contain a helix-loop-helix motif that is typically found in transcription factors and shown to bind to chromatin where they have been shown to modulate gene expression when overexpressed in vitro [57][58][59]. Nuclear localisation may play a part in the regulation of activity of IL-1 family members, in particular IL-33, as the deletion of the DNA-binding pro-domain results in constitutive activity [60,61]. The precursors appear to have some bioactivity without processing, but maximal activity is found in the mature protein [62]. The purpose of the pro-domains appears therefore to not only to inhibit activity as such but to regulate release and bio-availability. It is not known what happens to the DNA-binding pro-domain after the mature cytokine has been released.
We have summarised the general features of the different families we have discussed in Figure 3, to illustrate the diversity in localisation and processing of these proteins.

Pro-domain relevance in medicine and drug discovery
A classic example of a pro-domain biomarker use can be seen in insulin. Insulin is produced as a single polypeptide chain, with a pro-domain contained within the middle of the two active chains, A and B, of insulin. This connecting or C-peptide is removed by pro-hormone convertases prior to secretion and released simultaneously and in stoichiometric amounts with active insulin formed out disulfide-linked A and B chains. Insulin C-peptide has been widely used as a proxy for secreted insulin due to its higher serum stability, yet it does appear to also have bioactivity of its own [63].
Some of the GFs have been demonstrated, at least in animal models of disease, to have potential for human therapy [64,65]. GDNF and the related family member Neurturin for the treatment of Parkinson's disease [66], Neurturin, as a potential key growth hormone in pancreas development, in diabetes, as well as GDF15 for obesity are particularly notable examples. The pharmacological agent is typically the mature GF, but these frequently have unfavourable pharmacological properties, in particular a short serum half-life which in turn significantly limits exposure, that require engineering [67]. Precursors could be considered as alternatives, with different properties and potentially a more controlled release of the mature domain. Pro-domains could also be used, natively or in engineered form, as targeting signals to deliver the GFs to desired part of the body where their bioactivity is needed. Indeed, engineered forms of BMP-2, vascular endothelial growth factor-A (VEGF-A) and platelet-derived growth factor (PDGF) with artificial ECM binding domains, even if not prodomains in these cases, display improved bioavailability and signallingleading to enhanced tissue regeneration in animal models [68,69].
Pro-domains have also inspired a new class of engineered antibody variants, pro-bodies. These are based on masking the binding site of an antibody with a bespoke inhibitory domain that is then tethered to the antibody via a protease-cleavable linker. Only in tissue with the cognate protease present, the linker is then cleaved, leading to release of the masking domain and thereby rendering the pro-body active to bind its cognate antigen. The same principle has also been applied to make variants of cytokines such as IL-2 that are conditionally active only in tissue that expresses the proteases that releases its engineered inhibitory domain [70].
Understanding of pro-domain function can aid in the development of novel therapies that aim to inhibit the mature domain by targeting the proteolytic processing or release of the mature GFs as potential points of intervention. Instead of capturing the free GF after it has been released from the pro-complex and before it binds its receptor, inhibition of ligand activation could be a far more effective strategy. As the GF domains are often highly conserved, antibodies targeting a complex of more diverse pro-domains with mature GFs might also achieve selectivity more easily.
For example, myostatin regulates negatively skeletal muscle growth and its inhibition can treat muscle wasting conditions [71]. Antibodies directly targeting myostatin have been described but were disappointing in clinical studies. Instead, a recent report described antibody-mediated inhibition of proteolytic activation of promyostatin. The phage-display selected antibody, SRK-015, binds to the precursor form of myostatin and prevents its proteolysis. SRK-015 binds to the arm region of the pro-domain and protects the protein from proteolytic processing and GF activation [72]. Similarly, antibodies have been generated against TGF-β1 that bind the precursor protein and inhibit its activation, thereby inhibiting TGF-β1 [73]. Examples of the diverse cellular localisations and activation processes for representative growth factor and cytokine precursors. Colouring as in Figure 1.
Pro-domains are also potential inhibitors of GFs themselves. Peptides derived from the first α-helix of myostatin pro-domain are relatively potent inhibitors of myostatin [74,75], and myostatin pro-domain fused to antibody Fc part can increase muscle mass in mdx mice, a mouse model for Duchenne muscular dystrophy [76,77].
Activin A and B pro-domains, despite their lower intrinsic ability to inhibit bioactivity, have been shown to be potent inhibitors of their respective ligand, especially when driven to dimerise by an Fc domain or by disulfides inspired by latent-TGF-β [78][79][80].

Discussion
Growth factor research has historically focussed on mature domains and their pharmacological effects. For TGF-β proteins the importance of the pro-domain for secretion and for ECM localisation has long been described, but for most members of this family any potential biological roles of their pro-domains, if any, are still to be discovered. Peptide hormones are often synthesised as larger precursors which are then processed to yield multiple different hormones with distinct activities, and it would not be unconceivable to think the same is the case with growth factors. Could a pro-domain or -peptide act as a ligand for a yet unidentified receptor? For example, neuropilins recognise peptides with a C-terminal arginine which is present in all pro-domains that have been cleaved by furin-like proteases [81]. Activation of the pro-complexes and ECM-retained progrowth factors is also still relatively poorly understood. These can act as fast-release depositories of bioactivity, but details of how and when they are activated are still largely unknown.
One of the reasons for the limited research, compared with studies on mature ligands, lies in the lack of tools. We have very few antibodies against pro-domains, reducing the possibilities to study their processing, localisation and fate in tissues and organisms. With antibodies we could also see if pro-domains are useful as serum biomarkers, such as insulin C-peptide. With availability of high-quality antigens for antibody production this sparsity of tools will hopefully be addressed, and the biological roles of pro-domains be elucidated in more detail. The multiplicity of regulation and control of growth factors through their pro-domains and their processing might yet provide a treasure trove of new discoveries on the function and control of growth factor activity as part of tissue homeostasis and disease.

Perspectives
• Precursors and pro-domains are a greatly understudied aspect of growth factors and cytokines.
• Pro-domains play an important role in the regulation of their mature, active growth factors, but precise mechanisms are still often unresolved.
• Therapeutically pro-domains and precursor forms of growth factors offer possibilities both for modulation of growth factor properties as therapeutic agents and for inhibition of the GFs.

Competing Interests
The authors declare that there are no competing interests associated with the manuscript.

Open Access Statement
Open access for this article was enabled by the participation of University of Cambridge in an all-inclusive Read & Publish pilot with Portland Press and the Biochemical Society under a transformative agreement with JISC.

Author Contributions
M.R. and R.X.Z. collected sequences from Uniprot and annotated the data. M.R., R.X.Z. and M.H. analysed the data and compiled them to Figure 1. All authors wrote and edited the manuscript.