Recombinant proteins have been extensively employed as therapeutics for the treatment of various critical and life-threatening diseases and as industrial enzymes in high-value industrial processes. Advances in genetic engineering and synthetic biology have broadened the horizon of heterologous protein production using multiple expression platforms. Selection of a suitable expression system depends on a variety of factors ranging from the physicochemical properties of the target protein to economic considerations. For more than 40 years, Escherichia coli has been an established organism of choice for protein production. This review aims to provide a stepwise approach for any researcher embarking on the journey of recombinant protein production in E. coli. We present an overview of the challenges associated with heterologous protein expression, fundamental considerations connected to the protein of interest (POI) and designing expression constructs, as well as insights into recently developed technologies that have contributed to this ever-growing field.

Ever since the Food and Drug Administration (FDA) approved the first recombinant protein for therapeutic use in 1982, Escherichia coli has been a workhorse for recombinant protein production in both academia and industry. Despite huge advances in other expression systems, the production of heterologous recombinant proteins in microbial expression systems remains simpler and less expensive than in alternative systems such as mammalian cell culture [1]. E. coli offers various advantages such as comparatively easier genetic manipulation, use of simple growth medium, rapid cell growth, simple fermentation process, virus-free product, high product yields, and cost-effective production [1]. The science behind recombinant protein production seems straightforward, however, in practice, multiple factors can impose hurdles. As Sun Tzu says in the Art of War ‘know the enemy and know yourself’, because if you do not then there is a high chance of failure. Hence, the starting point for any expression should be to know your protein.

The protein and its properties

This review will focus on the production of soluble proteins or soluble fragments of transmembrane (TM) or membrane-associated proteins. For additional issues connected with the production of TM proteins, see [2–4]. Often the protein of interest (POI) is a eukaryotic protein. This can cause additional problems including codon usage, post-translational modifications (PTMs) and issues related to protein folding.

For an overview of the full workflow, see Figure 1. The starting point for any protein expression is to define the protein you wish to make, taking into account possible splice variants, signal sequences, TM helices, and PTMs found in the natural protein. While protein databases such as UniProt [5] are an excellent starting point for looking at these, it is always worthwhile doing additional bioinformatics analysis (Table 1).

While bioinformatics approaches are powerful, they are only predictions and so gathering a consensus from multiple independent bioinformatics approaches or looking for validation through experimental means (e.g., from published literature) is always worthwhile. For example, human cytotoxic T-lymphocyte antigen 4 (CTLA-4) is an obligate dimer and requires N-glycosylation of Asn78 and Asn110 for dimerization [6]. As this PTM cannot be made in E. coli, spending a little time to know your protein can save a lot of heartache later on. In essence, without the use of synthetic biology approaches (see below), the only eukaryotic-like PTMs E. coli does is disulfide bond formation in the periplasm [7].

It is also often worthwhile using bioinformatics approaches, e.g. JPRED [8] to look for both domain boundaries and prediction of intrinsically disordered protein (IDP) regions. Expressing a construct that is too short and misses an essential part of a domain, e.g. a β-strand, is always going to result in failure, while expressing a construct that is too long and includes flexible regions prone to proteolysis is likely to either result in heterogeneity or loss of a purification tag. Proteins with large IDP regions are often problematic to make as they are often prone to degradation, however, it should be remembered that many IDP regions may gain structure upon interaction with other molecules, e.g. upon protein complex formation (e.g. ACTR and nuclear co-activator binding domain (NCBD)) [9] and so, co-expression of a partner may help considerably in obtaining the protein in a stable and soluble form.

Before cloning the gene for the protein you want, it is worth considering how you are going to subsequently purify it, as this may affect the construct you want to express. The most powerful first step in the purification of soluble proteins is affinity chromatography (if possible). This includes either the endogenous properties of the protein, e.g. immobilized-ligand or substrate mimic chromatography (e.g. Cibacron Blue F3GA [10] or cyclic peptide-based ligands [11]) or the addition of a tag to aid purification, e.g. a maltose-binding protein (MBP)-tag, glutathione-S-transferase (GST)-tag or most commonly a hexahistidine tag (His-tag) allowing the use of immobilized metal affinity chromatography (IMAC). For an overview of possible affinity tags, refer to [12]. If the structure of your protein or something closely related is available, it is worthwhile looking at the accessibility of the N- and C-termini to see if any added tag is likely to be disruptive to the structure, e.g. if the protein termini are buried. Alternatively, structure prediction programs such as Phyre 2 [13] could be used. While very useful and widely used, N-terminal His-tags may increase the heterogeneity of your final product due to variable (phospho)gluconylation of the N-terminus [14].

Depending on the end use of the protein, you may want to be able to remove the affinity tag after purification by proteolysis. Enzymes with broad specificity can sometimes be used, e.g. trypsin can be used to both remove an N-terminal tag and the C-peptide from insulin derivatives, e.g. [15] but usually, removal of affinity tags is mediated through more highly specific proteases such as TEV (consensus site ENLYFQ↓G/S) and Factor Xa (consensus site IE/DGR) [12]. Care should be taken of the source of the protease, for example, recombinant bovine Factor Xa is reported to have a different specificity than recombinant human Factor Xa [16,17]; see also MEROPS database for other proteases [18]. Most proteases have specificity to sequences both before and after the site of cleavage and so often one or more amino acids from the cleavage site are left on the mature protein. In addition, proteases cannot access buried cleavage sites and so often the cleavage site is put into a flexible linker region (usually glycine/serine-rich), which may add more residues to the mature protein.

In addition to making fusion proteins to aid purification, they can also be used to add solubilization tags. Such tags which are often small, highly soluble, and stable proteins, can aid not only in the solubilization of the final product but also in the solubilization of folding intermediates. If a eukaryotic protein has more than one N-glycan per 100 amino acids, a solubilization tag may be essential to produce it in a soluble form in E. coli. Commonly used solubilization tags include MBP (which doubles as an affinity purification tag), thioredoxin, Sumo, or Fh8. For solubilization tags, there needs to be a balance, if they help too little then soluble protein may not be achieved. Conversely, if they help solubilize too much then false positives may be achieved where the final product is soluble despite the POI not being correctly folded. This balance often has to be achieved by trial and error.

Even with careful selection of domain boundaries and possible solubilization tags, not all eukaryotic proteins can fold to a native state in E. coli. This is linked to issues of protein folding, PTMs, and/or the protein being part of an unknown obligate complex. E. coli contains a wide range of molecular chaperones (e.g. GroEL/ES, DnaK, Skp) and ten peptidyl cis-trans prolyl isomerases and so issues related to protein folding are usually either linked with (i) translation rates (see below); (ii) oxidative folding, i.e. the formation of disulfide bonds; (iii) the protein having an essential PTM which E. coli cannot perform; (iv) the protein having a buried prosthetic group which wildtype E. coli cannot make or becomes limiting (in some cases this can be solved by the addition of the moiety to the growth media); (v) rare cases where a specialized folding factor is involved in folding the protein, e.g. to express a hyperthermophilic α-amylase from Pyrococcus furiosus (a hyperthermophilic archaeum) in E. coli, the co-expression of small heat shock protein (sHSP) or chaperonin (HSP60) from the same P. furiosus was found to be essential [19]. For an overview of alternate expression platforms and genetic engineering approaches available to carry out PTMs in heterologous proteins, refer to [20].

Native disulfide bond formation is the most common issue. There are three approaches to deal with this issue. Firstly, the protein could be allowed to form aggregates, or inclusion bodies, of misfolded/unfolded protein. Inclusion bodies are relatively easy to purify, and the protein can then be refolded in vitro [21,22]. Secondly, the protein could be targeted to the periplasm via the addition of an N-terminal periplasmic signal sequence. Here there is machinery for native disulfide formation [7], and while it is a powerful technique both the sec secretion system and the folding apparatus in the periplasm can easily be overwhelmed, so (extreme) care must be taken [23]. Thirdly, an engineered strain could be used that removes disulfide bond reducing pathways from the cytoplasm [24,25], or adds oxidative folding catalysts, reviewed in [26]. This can be combined with the TAT-secretion system for exporting folded proteins to the periplasm, e.g. [27,28]. Similar synthetic biology approaches also allow other PTMs to be made in the cytoplasm, for example mucin-type O-glycosylation in E. coli. [29].

Finally, it should be remembered that the cytoplasm of E. coli contains methionine aminopeptidase, which can remove the initiating methionine [30], depending on the subsequent amino acids (e.g. serine, alanine, cysteine, proline, or glycine at P1′ preferred, Pro at P2′ inhibits), with engineered systems extending the list, e.g. [31]. This also combines with the N-end rule for protein clearance from a cell. For E. coli, proteins with an N-terminal Arg, Lys, Leu, Phe, Tyr, or Trp can be rapidly degraded [32], but this depends on the context of the N-terminal and subsequent amino acids [33,34].

After all these considerations, if no purified protein is obtained, a simple troubleshooting sodium dodecyl sulfate/polyacrylamide gel electrophoresis (SDS/PAGE) analysis may quickly help elucidate the possible issues (Figure 2). SDS/PAGE analysis can be complemented by other techniques including mass spectrometry, Western blotting, activity assays for the POI etc.

The gene and its properties

Once details of the protein construct are finalized it is time to turn your attention to the gene. Just as much care must be taken for it as for the protein construct or yields may be low. One important concept that is often forgotten in protein expression is cellular homeostasis or everything in balance. Too often a high-copy number plasmid may be used with a strong promoter, but this will invariably result in less protein than could be produced as too many cellular resources are put into making plasmid deoxyribonucleic acid (DNA) and messenger RNA (mRNA), and the mRNA produced is in far excess of the limitations of the translation apparatus (Figure 3).

A multitude of genetic engineering strategies have been developed over the years to enable efficient cloning of protein expression constructs [35,36]. While industry often integrates genes into the bacterial chromosome to avoid the problem of plasmid loss during large scale fermentation, the academic approach more usually uses plasmids for expression as they are faster and cheaper to use. Plasmid selection for protein production is based on (i) copy number, which depends on the origin of replication of the plasmid (Table 2); (ii) promoter (Table 3); (iii) selection marker (Table 4). There is a balance between plasmid copy number and promoter strength (Figure 3) to maximize cellular resources going into protein production and this also depends on the media, with chemically defined minimal media being more sensitive to alterations in these, in particular when either is excessively high. Recent advancements in synthetic biology led to growth-decoupled recombinant protein production through the co-expression of a bacteriophage-derived E. coli ribonucleic acid (RNA) polymerase inhibitor peptide called Gp2 [37]. This approach allowed the modulation of metabolic resources, so they are exclusively utilized to produce the POI.

The plasmid is not the only decision to make. The source of the gene is important. For decades, the normal source of the gene for the POI was directly from the original organism e.g., by complementary DNA (cDNA) library obtained by real time-polymerase chain reaction (RT-PCR) from an mRNA pool (to avoid introns). While this can be fast, cheap and efficient, it can give rise to problems connected with differences in translation initiation and codon usage between prokaryotes and eukaryotes.

While eukaryotic ribosomes bind to the cap at the 5′ end of the mRNA and then move down the mRNA until they initiate translation from the first AUG codon with a Kozak sequence in front of it, prokaryotic ribosomes bind to a sequence on the mRNA known as the Shine–Dalgarno (SD) sequence or ribosome-binding site (rbs; Figure 4). The rbs are usually 5–13 base pairs [38] upstream of the initiating AUG (optimal distance 5–6 base pairs [39]); and are complementary to the 3′ end of the 16S ribosomal RNA. In E. coli, this sequence is AGGAGGU [40]. The requirement for a distinct rbs has two consequences for eukaryotic protein expression in E. coli. Firstly, an rbs must be present before the initiating AUG. This may be present in the plasmid outside the multicloning site, but care should be taken that it is within the correct distance and that there are no other possible AUG trinucleotides that translation could initiate from. Secondly, this nucleotide sequence should not appear inside the gene of interest. An internal rbs will either result in the generation of a second protein (if there is an AUG at the correct distance from it) or will result in translation stalling as a ribosome binds to this site and prevents translation through it. Due to this care must be taken in the codon used for Gly–Gly pairs (i.e. not GGA–GGU), Arg–Arg pairs (i.e. not AGG–AGG), and sequences around Glu (GAG), including Glu–Glu pairs (GAG–GAG). AGG and GGA codons are rarely used by E. coli (see below) and so mostly care with codon optimization to avoid internal rbs relates to sequences around Glu (Q/K/E-E or E-V).

Codon usage is not equally distributed among the codons available and the variation in codon usage bias is considerable between organisms (Table 5). Codon usage varies considerably between organisms (Table 5) and correlates with corresponding transfer RNA (tRNA) levels [41]. mRNA which contains multiple rare codons can exhibit translation stalling and mRNA degradation, reviewed in [42]. Codon usage issues can be examined by bioinformatic approaches, e.g. Graphical Codon Usage Analyzer [43]. One method to prevent this problem was the overexpression of rare tRNAs, e.g. [44,45] such as from pLysSRARE [46]. For more detailed insights into codon usage, refer to [47]. The more usual approach now is the use of synthetic genes that can be codon optimized for the expression host, while simultaneously avoiding internal rbs, internal restriction sites, and factors that influence mRNA structure and stability [48,49]. As prices have rapidly dropped a synthetic gene can cost less than the labor and material costs associated with cloning a gene from a cDNA library.

Synthetic genes can also help mitigate the potentially deleterious effects of one other difference between eukaryotic and prokaryotic protein translation, translation rates. In prokaryotes such as E. coli, transcription and translation rates are coupled, with transcription rates approx. 50 nucleotides/s and translation rates approx. 16 amino acids/s [50]. In contrast, translation rates in eukaryotes are slower, with a rate of approx. 3 amino acids/s [51]. Protein folding has evolved in parallel with these translation rates and hence when a eukaryotic protein is expressed in E. coli, the rate of the translation may be faster than the rate of folding and for multidomain proteins, this can be a serious issue (Figure 5). This can be mitigated by modulation of translation rate [52], codon usage harmonization [53], or the use of rarer codons just after domain boundaries to cause ribosome stalling [54] (Figure 5).

A specialized ribosome system aimed specifically at the expression of the POI in E. coli by modifying the SD sequence of the mRNA and corresponding anti-SD sequence of the 16S rRNA was first reported by Hui and De Boer in 1987 [55]. Alternative ribosome systems such as the orthogonal riboswitch system [56], the RiboTite system [57], and the Ribo-T system [58] have been reported since. The riboswitch system allows tunable co-expression of multiple genes in a dose-dependent response to small synthetic molecules while the RiboTite system, which builds on the riboswitch technology, has been shown to harmonize protein translation rates with protein secretion [59]. The Ribo-T system employs an engineered hybrid rRNA composed of both small and large subunit rRNA sequences, in which short RNA linkers covalently link the subunits into a single translating unit [58]. This orthogonal ribosome–mRNA system is capable of supporting bacterial growth even in the absence of wildtype ribosomes and its improved tethered version has been reported recently [60].

Another difference between eukaryotic and prokaryotic protein translation can be an advantage for recombinant protein production. Many prokaryotic genes are expressed in operons, where a single promoter results in the production of multiple proteins from a single mRNA that has an rbs before the initiating AUG of each (Figure 4). This allows both the co-expression of subunits that form complexes, or the co-expression of ancillary factors that may be required for the protein to reach the native conformation.

Strains and media for small-scale expression screening

Once a suitable construct for protein expression has been generated, the next step is to express the protein. This leads again to more rational choices needing to be made. E. coli is a remarkably diverse bacterial species, with only approx. 20% of the genome common to all strains [61]. It can be broadly split into four subgroupings, K-12 strains, B-strains, and the C and W strains based on their initial isolation [61]. Many K-12 and B-strains are used for recombinant protein production (Table 6). Some POI show strong strain dependence, often for unclear reasons, so we routinely test any new protein in at least one K-12 and one B-strain. Similarly, there are a wide variety of media choices, which can be broadly split into rich media (which contains yeast extract and/or another mixed source of peptides such as tryptone) and chemically defined or minimal media (where there are often only 1–3 carbon sources and a single nitrogen source). Again, some POIs show strong media dependence for production and so we routinely test any new protein in at least one rich media and one chemically defined media. While Luria–Bertani (LB) media used to be the default media for academic protein production, it has been largely superseded by media which allow higher density cultures to be obtained as higher cell mass usually results in higher protein yields. In particular, the use of auto-induction media, e.g. [62], both facilitate the screening of multiple POI and allow culture densities typically 10× higher than LB. Additionally, an alternate growth medium for recombinant protein production in E. coli which allows the controlled release of substrates, thereby mimicking fed-batch process conditions at a small scale, has been reported [63].

In addition to strain and media, the temperature of the culture post-induction can play a key role in the yield of the folded protein. This effect probably arises both from the change in relative hydrophobicity with temperature and from the slower rate of protein translation [64] so as not to exceed the capacity of the folding machinery. If you choose to use a non-autoinducing media, the concentration of inducer (e.g. isopropyl β-d-1-thiogalactopyranoside (IPTG)) and the timing and length of induction can also significantly influence the yields of folded protein and may need optimization.

Once small-scale screening experiments have concluded positively and you have chosen your expression construct and strain, you may want to scale-up the production and purification of your protein depending on the end use. For an extensive overview of upstream and downstream process development strategies for production of heterologous proteins in E. coli, refer to [1,65].

  • E. coli is an excellent host for recombinant protein production in both academia and industry.

  • A rational approach is required for successful protein production. Understanding or predicting using bioinformatics tools, the biophysical characteristics of the protein is essential.

  • Correct identification of domain boundaries, signal sequences, TM regions, obligate oligomeric complex formation, and PTMs are critical.

  • It is equally important to consider genetic and translation factors, such as codon usage, the nature and position of the rbs and differences between prokaryotic and eukaryotic translation rates.

  • Other factors such as the strain and media used also impact protein yield, but they cannot compensate for poor planning.

The authors declare that there are no competing interests associated with the manuscript.

This work was supported the European Union’s Horizon 2020 Research and Innovation Programme under Marie Sklodowska-Curie [grant number 642937].

L.W.R. conceived the article. All authors contributed to the writing.

cDNA

complementary DNA

DNA

deoxyribonucleic acid

IDP

intrinsically disordered protein

LB

Luria–Bertani

MBP

maltose-binding protein

mRNA

messenger RNA

POI

protein of interest

PTM

post-translational modification

rbs

ribosome-binding site

SD

Shine–Dalgarno

SDS/PAGE

sodium dodecyl sulfate/polyacrylamide gel electrophoresis

TM

transmembrane

tRNA

transfer RNA

1.
Tripathi
N.K.
and
Shrivastava
A.
(
2019
)
Recent developments in bioprocessing of recombinant proteins: expression hosts and process development
.
Front. Bioeng. Biotechnol.
7
,
420
[PubMed]
2.
Karyolaimos
A.
,
Ampah-Korsah
H.
,
Zhang
Z.
and
de Gier
J.-W.
(
2018
)
Shaping Escherichia coli for recombinant membrane protein production
.
FEMS Microbiol. Lett.
365
,
152
3.
Schlegel
S.
,
Hjelm
A.
,
Baumgarten
T.
,
Vikström
D.
and
de Gier
J.-W.W.
(
2014
)
Bacterial-based membrane protein production
.
Biochim. Biophys. Acta. Mol. Cell Res.
1843
,
1739
1749
4.
Errey
J.C.
and
Fiez-Vandal
C.
(
2020
)
Production of membrane proteins in industry: the example of GPCRs
.
Protein Expr. Purif.
169
,
105569
[PubMed]
5.
The UniProt Consortium
(
2019
)
UniProt: a worldwide hub of protein knowledge
.
Nucleic Acids Res.
47
,
D506
D515
[PubMed]
6.
Darlington
P.J.
,
Kirchhof
M.G.
,
Criado
G.
,
Sondhi
J.
and
Madrenas
J.
(
2005
)
Hierarchical regulation of CTLA-4 dimer-based lattice formation and its biological relevance for T cell inactivation
.
J. Immunol.
175
,
996
1004
[PubMed]
7.
Manta
B.
,
Boyd
D.
and
Berkmen
M.
(
2019
)
Disulfide bond formation in the periplasm of Escherichia coli
.
EcoSal Plus
8
,
[PubMed]
8.
Drozdetskiy
A.
,
Cole
C.
,
Procter
J.
and
Barton
G.J.
(
2015
)
JPred4: a protein secondary structure prediction server
.
Nucleic Acids Res.
43
,
W389
W394
[PubMed]
9.
Demarest
S.J.
,
Martinez-Yamout
M.
,
Chung
J.
,
Chen
H.
,
Xu
W.
,
Jane Dyson
H.
et al.
(
2002
)
Mutual synergistic folding in recruitment of cbp/p300 by p160 nuclear receptor coactivators
.
Nature
415
,
549
553
[PubMed]
10.
Subramanian
S.
and
Ross
P.D.
(
1984
)
Dye-ligand affinity chromatography: The interaction of cibacron blue f3GA® with proteins and enzyme
.
Crit. Rev. Biochem. Mol. Biol.
16
,
169
205
11.
Kish
W.S.
,
Roach
M.K.
,
Sachi
H.
,
Naik
A.D.
,
Menegatti
S.
and
Carbonell
R.G.
(
2018
)
Purification of human erythropoietin by affinity chromatography using cyclic peptide ligands
.
J. Chromatogr. B Anal. Technol. Biomed. Life Sci.
1085
,
1
12
12.
Young
C.L.
,
Britton
Z.T.
and
Robinson
A.S.
(
2012
)
Recombinant protein expression and purification: A comprehensive review of affinity tags and microbial applications
.
Biotechnol. J.
7
,
620
634
[PubMed]
13.
Kelley
L.A.
,
Mezulis
S.
,
Yates
C.M.
,
Wass
M.N.
and
Sternberg
M.J.E.
(
2015
)
The Phyre2 web portal for protein modeling, prediction and analysis
.
Nat. Protoc.
10
,
845
858
[PubMed]
14.
Geoghegan
K.F.
,
Dixon
H.B.F.
,
Rosner
P.J.
,
Hoth
L.R.
,
Lanzetti
A.J.
,
Borzilleri
K.A.
et al.
(
1999
)
Spontaneous α-N-6-phosphogluconoylation of a “His tag” in Escherichia coli: the cause of extra mass of 258 or 178 Da in fusion proteins
.
Anal. Biochem.
267
,
169
184
[PubMed]
15.
Castellanos-Serra
L.R.
,
Hardy
E.
,
Ubieta
R.
,
Vispo
N.S.
,
Fernandez
C.
,
Besada
V.
et al.
(
1996
)
Expression and folding of an interleukin-2-proinsulin fusion protein and its conversion into insulin by a single step enzymatic removal of the C-peptide and the N-terminal fused sequence
.
FEBS Lett.
378
,
171
176
[PubMed]
16.
Ludeman
J.P.
,
Pike
R.N.
,
Bromfield
K.M.
,
Duggan
P.J.
,
Cianci
J.
,
Le Bonniec
B.
et al.
(
2003
)
Determination of the P′1, P′2 and P′3 subsite-specificity of factor Xa
.
Int. J. Biochem. Cell Biol.
35
,
221
225
[PubMed]
17.
Bianchini
E.P.
,
Louvain
V.B.
,
Marque
P.E.
,
Juliano
M.A.
,
Juliano
L.
and
Le Bonniec
B.F.
(
2002
)
Mapping of the catalytic groove preferences of factor Xa reveals an inadequate selectivity for its macromolecule substrates
.
J. Biol. Chem.
277
,
20527
20534
[PubMed]
18.
Rawlings
N.D.
,
Barrett
A.J.
,
Thomas
P.D.
,
Huang
X.
,
Bateman
A.
and
Finn
R.D.
(
2018
)
The MEROPS database of proteolytic enzymes, their substrates and inhibitors in 2017 and a comparison with peptidases in the PANTHER database
.
Nucleic Acids Res.
46
,
D624
D632
[PubMed]
19.
Peng
S.
,
Chu
Z.
,
Lu
J.
,
Li
D.
,
Wang
Y.
,
Yang
S.
et al.
(
2016
)
Co-expression of chaperones from P. furiosus enhanced the soluble expression of the recombinant hyperthermophilic α-amylase in E. coli
.
Cell Stress Chaperones
21
,
477
484
[PubMed]
20.
Amann
T.
,
Schmieder
V.
,
Faustrup Kildegaard
H.
,
Borth
N.
and
Andersen
M.R.
(
2019
)
Genetic engineering approaches to improve posttranslational modification of biopharmaceuticals in different production platforms
.
Biotechnol. Bioeng.
116
,
2778
2796
[PubMed]
21.
Alibolandi
M.
and
Mirzahoseini
H.
(
2011
)
Chemical assistance in refolding of bacterial inclusion bodies
.
Biochem. Res. Int.
2011
,
631607
[PubMed]
22.
Kaur
J.J.
,
Kumar
A.
and
Kaur
J.J.
(
2018
)
Strategies for optimization of heterologous protein expression in E. coli: Roadblocks and reinforcements
.
Int. J. Biol. Macromol.
106
,
803
822
[PubMed]
23.
Simmons
L.C.
and
Yansura
D.G.
(
1996
)
Translational level is a critical factor for the secretion of heterologous proteins in Escherichia coli
.
Nat. Biotechnol.
14
,
629
634
[PubMed]
24.
Lobstein
J.
,
Emrich
C.A.
,
Jeans
C.
,
Faulkner
M.
,
Riggs
P.
and
Berkmen
M.
(
2012
)
SHuffle, a novel Escherichia coli protein expression strain capable of correctly folding disulfide bonded proteins in its cytoplasm
.
Microb. Cell Fact.
11
,
1
[PubMed]
25.
Bessette
P.H.
,
Åslund
F.
,
Beckwith
J.
and
Georgiou
G.
(
1999
)
Efficient folding of proteins with multiple disulfide bonds in the Escherichia coli cytoplasm
.
Proc. Natl. Acad Sci. U.S.A.
96
,
13703
13708
[PubMed]
26.
Saaranen
M.J.
and
Ruddock
L.W.
(
2019
)
Applications of catalyzed cytoplasmic disulfide bond formation
.
Biochem. Soc. Trans.
47
,
1223
1231
[PubMed]
27.
Matos
C.F.R.O.R.O.
,
Robinson
C.
,
Alanen
H.I.
,
Prus
P.
,
Uchida
Y.
,
Ruddock
L.W.
et al.
(
2014
)
Efficient export of prefolded, disulfide-bonded recombinant proteins to the periplasm by the Tat pathway in Escherichia coli CyDisCo strains
.
Biotechnol. Prog.
30
,
281
290
[PubMed]
28.
Alanen
H.I.
,
Walker
K.L.
,
Lourdes Velez Suberbie
M.
,
Matos
C.F.R.O.
,
Bönisch
S.
,
Freedman
R.B.
et al.
(
2015
)
Efficient export of human growth hormone, interferon α2b and antibody fragments to the periplasm by the Escherichia coli Tat pathway in the absence of prior disulfide bond formation
.
Biochim. Biophys. Acta Mol. Cell Res.
1853
,
756
763
29.
Mueller
P.
,
Gauttam
R.
,
Raab
N.
,
Handrick
R.
,
Wahl
C.
,
Leptihn
S.
et al.
(
2018
)
High level in vivo mucin-type glycosylation in Escherichia coli
.
Microb. Cell Fact.
17
,
168
[PubMed]
30.
Wingfield
P.T.
(
2017
)
N‐terminal methionine processing
.
Curr. Protoc. Protein Sci.
88
,
6.14.1
6.14.3
31.
Liao
Y.-D.
,
Jeng
J.-C.
,
Wang
C.-F.
,
Wang
S.-C.
and
Chang
S.-T.
(
2004
)
Removal of N-terminal methionine from recombinant proteins by engineered E. coli methionine aminopeptidase
.
Protein Sci.
13
,
1802
1810
[PubMed]
32.
Tobias
J.W.
,
Shrader
T.E.
,
Rocap
G.
and
Varshavsky
A.
(
1991
)
The N-end rule in bacteria
.
Science
254
,
1374
1377
33.
Erbse
A.
,
Schmidt
R.
,
Bornemann
T.
,
Schneider-Mergener
J.
,
Mogk
A.
,
Zahn
R.
et al.
(
2006
)
ClpS is an essential component of the N-end rule pathway in Escherichia coli
.
Nature
439
,
753
756
[PubMed]
34.
Schuenemann
V.J.
,
Kralik
S.M.
,
Albrecht
R.
,
Spall
S.K.
,
Truscott
K.N.
,
Dougan
D.A.
et al.
(
2009
)
Structural basis of N-end rule substrate recognition in Escherichia coli by the ClpAP adaptor protein ClpS
.
EMBO Rep.
10
,
508
514
[PubMed]
35.
Celie
P.H.N.
,
Parret
A.H.A.
and
Perrakis
A.
(
2016
)
Recombinant cloning strategies for protein expression
.
Curr. Opin. Struct. Biol.
38
,
145
154
[PubMed]
36.
Ou
B.
,
Garcia
C.
,
Wang
Y.
,
Zhang
W.
and
Zhu
G.
(
2018
)
Techniques for chromosomal integration and expression optimization in Escherichia coli
.
Biotechnol. Bioeng.
115
,
2467
2478
[PubMed]
37.
Stargardt
P.
,
Feuchtenhofer
L.
,
Cserjan-Puschmann
M.
,
Striedner
G.
and
Mairhofer
J.
(
2020
)
Bacteriophage inspired growth-decoupled recombinant protein production in Escherichia coli
.
ACS Synth. Biol.
9
,
1336
1348
[PubMed]
38.
Chen
H.
,
Bjerknes
M.
,
Kumar
R.
and
Jay
E.
(
1994
)
Determination of the optimal aligned spacing between the shine - dalgarno sequence and the translation initiation codon of escherichia coli m RNAs
.
Nucleic Acids Res.
22
,
4953
4957
[PubMed]
39.
Shepard
H.M.
,
Yelverton
E.
and
Goeddel
D.V.
(
1982
)
Increased synthesis in E. coli of fibroblast and leukocyte interferons through alterations in ribosome binding sites
.
DNA
1
,
125
131
[PubMed]
40.
Shine
J.
and
Dalgarno
L.
(
1974
)
The 3′ terminal sequence of Escherichia coli 16S ribosomal RNA: complementarity to nonsense triplets and ribosome binding sites
.
Proc. Natl. Acad. Sci. U.S.A.
71
,
1342
1346
[PubMed]
41.
Ikemura
T.
(
1981
)
Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes
.
J. Mol. Biol.
146
,
1
21
[PubMed]
42.
Boël
G.
,
Letso
R.
,
Neely
H.
,
Price
W.N.
,
Wong
K.H.
,
Su
M.
et al.
(
2016
)
Codon influence on protein expression in E. coli correlates with mRNA levels
.
Nature
529
,
358
363
[PubMed]
43.
Fuhrmann
M.
,
Hausherr
A.
,
Ferbitz
L.
,
Schödl
T.
,
Heitzer
M.
and
Hegemann
P.
(
2004
)
Monitoring dynamic expression of nuclear genes in Chlamydomonas reinhardtii by using a synthetic luciferase reporter gene
.
Plant Mol. Biol.
55
,
869
881
[PubMed]
44.
Kleber-Janke
T.
and
Becker
W.M.
(
2000
)
Use of modified BL21(DE3) Escherichia coli cells for high-level expression of recombinant peanut allergens affected by poor codon usage
.
Protein Expr. Purif.
19
,
419
424
[PubMed]
45.
Lipinszki
Z.
,
Vernyik
V.
,
Farago
N.
,
Sari
T.
,
Puskas
L.G.
,
Blattner
F.R.
et al.
(
2018
)
Enhancing the translational capacity of E. coli by resolving the codon bias
.
ACS Synth. Biol.
7
,
2656
2664
[PubMed]
46.
Novy
R.
,
Drott
D.
,
Yaeger
K.
and
Mierendorf
R.
(
2001
)
Overcoming the codon bias of E. coli for enhanced protein expression
.
inNovations
12
,
1
3
47.
Komar
A.A.
(
2016
)
The Yin and Yang of codon usage
.
Hum. Mol. Genet.
25
,
R77
R85
[PubMed]
48.
Chemla
Y.
,
Peeri
M.
,
Heltberg
M.L.
,
Eichler
J.
,
Jensen
M.H.
,
Tuller
T.
et al.
(
2020
)
A possible universal role for mRNA secondary structure in bacterial translation revealed using a synthetic operon
.
Nat. Commun.
11
,
1
11
[PubMed]
49.
Lenz
G.
,
Doron-Faigenboim
A.
,
Ron
E.Z.
,
Tuller
T.
and
Gophna
U.
(
2011
)
Sequence features of E. coli mRNAs affect their degradation
.
PLoS ONE
6
,
e28544
[PubMed]
50.
Dennis
P.P.
and
Bremer
H.
(
2008
)
Modulation of chemical composition and other parameters of the cell at different exponential growth rates
.
EcoSal Plus
3
,
[PubMed]
51.
Riba
A.
,
Di Nanni
N.
,
Mittal
N.
,
Arhné
E.
,
Schmidt
A.
and
Zavolan
M.
(
2019
)
Protein synthesis rates and ribosome occupancies reveal determinants of translation elongation rates
.
Proc. Natl. Acad. Sci. U.S.A.
116
,
15023LP
15032LP
52.
Siller
E.
,
DeZwaan
D.C.
,
Anderson
J.F.
,
Freeman
B.C.
and
Barral
J.M.
(
2010
)
Slowing bacterial translation speed enhances eukaryotic protein folding efficiency
.
J. Mol. Biol.
396
,
1310
1318
[PubMed]
53.
Angov
E.
,
Hillier
C.J.
,
Kincaid
R.L.
and
Lyon
J.A.
(
2008
)
Heterologous protein expression is enhanced by harmonizing the codon usage frequencies of the target gene with those of the expression host
.
PLoS ONE
3
,
e2189
[PubMed]
54.
Zhang
G.
and
Ignatova
Z.
(
2011
)
Folding at the birth of the nascent chain: coordinating translation with co-translational folding
.
Curr. Opin. Struct. Biol.
21
,
25
31
[PubMed]
55.
Hui
A.
and
De Boer
H.A.
(
1987
)
Specialized ribosome system: preferential translation of a single mRNA species by a subpopulation of mutated ribosomes in Escherichia coli
.
Proc. Natl. Acad. Sci. U.S.A.
84
,
4762
4766
[PubMed]
56.
Dixon
N.
,
Robinson
C.J.
,
Geerlings
T.
,
Duncan
J.N.
,
Drummond
S.P.
and
Micklefield
J.
(
2012
)
Orthogonal riboswitches for tuneable coexpression in bacteria
.
Angew Chemie. Int. Ed.
51
,
3620
3624
57.
Morra
R.
,
Shankar
J.
,
Robinson
C.J.
,
Halliwell
S.
,
Butler
L.
,
Upton
M.
et al.
(
2016
)
Dual transcriptional- Translational cascade permits cellular level tuneable expression control
.
Nucleic Acids Res.
44
,
21
58.
Orelle
C.
,
Carlson
E.D.
,
Szal
T.
,
Florin
T.
,
Jewett
M.C.
and
Mankin
A.S.
(
2015
)
Protein synthesis by ribosomes with tethered subunits
.
Nature
524
,
119
124
[PubMed]
59.
Horga
L.G.
,
Halliwell
S.
,
Castiñeiras
T.S.
,
Wyre
C.
,
Matos
C.F.R.O.
,
Yovcheva
D.S.
et al.
(
2018
)
Tuning recombinant protein expression to match secretion capacity
.
Microb. Cell Fact.
17
,
199
[PubMed]
60.
Carlson
E.D.
,
d'Aquino
A.E.
,
Kim
D.S.
,
Fulk
E.M.
,
Hoang
K.
,
Szal
T.
et al.
(
2019
)
Engineered ribosomes with tethered subunits for expanding biological function
.
Nat. Commun.
10
,
1
13
[PubMed]
61.
Lukjancenko
O.
,
Wassenaar
T.M.
and
Ussery
D.W.
(
2010
)
Comparison of 61 sequenced Escherichia coli genomes
.
Microb. Ecol.
60
,
708
720
,
2010/07/11
[PubMed]
62.
Studier
F.W.
(
2005
)
Protein production by auto-induction in high density shaking cultures
.
Protein Expr. Purif.
41
,
207
234
[PubMed]
63.
Ukkonen
K.
,
Neubauer
A.
,
Pereira
V.J.
and
Vasala
A.
(
2017
)
High yield of recombinant protein in shaken E. coli cultures with enzymatic glucose release medium EnPresso B
.
Methods Mol. Biol.
1586
,
127
137
[PubMed]
64.
Rosano
G.L.
and
Ceccarelli
E.A.
(
2014
)
Recombinant protein expression in Escherichia coli: advances and challenges
.
Front. Microbiol.
5
,
172
[PubMed]
65.
Tripathi
N.K.
(
2016
)
Production and purification of recombinant proteins from Escherichia coli
.
Chem. Biol. Eng. Rev.
3
,
116
133
66.
Artimo
P.
,
Jonnalagedda
M.
,
Arnold
K.
,
Baratin
D.
,
Csardi
G.
,
de Castro
E.
et al.
(
2012
)
ExPASy: SIB bioinformatics resource portal
.
Nucleic Acids Res.
40
,
W597
W603
[PubMed]
67.
Terpe
K.
(
2006
)
Overview of bacterial expression systems for heterologous protein production: from molecular and biochemical fundamentals to commercial systems
.
Appl. Microbiol. Biotechnol.
72
,
211
222
[PubMed]
68.
Marschall
L.
,
Sagmeister
P.
and
Herwig
C.
(
2017
)
Tunable recombinant protein expression in E. coli: promoter systems and genetic constraints
.
Appl. Microbiol. Biotechnol.
101
,
501
512
[PubMed]
69.
Benson
D.A.
,
Cavanaugh
M.
,
Clark
K.
,
Karsch-Mizrachi
I.
,
Lipman
D.J.
,
Ostell
J.
et al.
(
2013
)
GenBank
.
Nucleic Acids Res.
41
,
D36
D42
[PubMed]
70.
Almagro Armenteros
J.J.
,
Tsirigos
K.D.
,
Sønderby
C.K.
,
Petersen
T.N.
,
Winther
O.
,
Brunak
S.
et al.
(
2019
)
SignalP 5.0 improves signal peptide predictions using deep neural networks
.
Nat. Biotechnol.
37
,
420
423
[PubMed]
71.
Freudl
R.
(
2018
)
Signal peptides for recombinant protein secretion in bacterial expression systems
.
Microb. Cell Fact.
17
,
[PubMed]
72.
Tsirigos
K.D.
,
Peters
C.
,
Shu
N.
,
Käll
L.
and
Elofsson
A.
(
2015
)
The TOPCONS web server for consensus prediction of membrane protein topology and signal peptides
.
Nucleic Acids Res.
43
,
W401
W407
[PubMed]
73.
Mazola
Y.
,
Chinea
G.
and
Musacchio
A.
(
2011
)
Integrating bioinformatics tools to handle glycosylation
.
PLoS Comput. Biol.
7
,
e1002285
[PubMed]
74.
Monigatti
F.
,
Gasteiger
E.
,
Bairoch
A.
and
Jung
E.
(
2002
)
The Sulfinator: predicting tyrosine sulfation sites in protein sequences
.
Bioinformatics
18
,
769
770
[PubMed]
75.
Blom
N.
,
Gammeltoft
S.
and
Brunak
S.
(
1999
)
Sequence and structure-based prediction of eukaryotic protein phosphorylation sites
.
J. Mol. Biol.
294
,
1351
1362
[PubMed]
76.
Gasteiger
E.
,
Gattiker
A.
,
Hoogland
C.
,
Ivanyi
I.
,
Appel
R.D.
and
Bairoch
A.
(
2003
)
ExPASy: the proteomics server for in-depth protein knowledge and analysis
.
Nucleic Acids Res.
31
,
3784
3788
[PubMed]
77.
Bolivar
F.
,
Rodriguez
R.L.
,
Betlach
M.C.
and
Boyer
H.W.
(
1977
)
Construction and characterization of new cloning vehicles. I. Ampicillin-resistant derivatives of the plasmid pMB9
.
Gene
2
,
75
93
[PubMed]
78.
Vieira
J.
and
Messing
J.
(
1982
)
The pUC plasmids, an M13mp7-derived system for insertion mutagenesis and sequencing with synthetic universal primers
.
Gene
19
,
259
268
[PubMed]
79.
Yanisch-Perron
C.
,
Vieira
J.
and
Messing
J.
(
1985
)
Improved M13 phage cloning vectors and host strains: nucleotide sequences of the M13mpl8 and pUC19 vectors
.
Gene
33
,
103
119
[PubMed]
80.
Hershfield
V.
,
Boyer
H.W.
,
Yanofsky
C.
,
Lovett
M.A.
and
Helinski
D.R.
(
1974
)
Plasmid ColEl as a molecular vehicle for cloning and amplification of DNA
.
Proc. Natl. Acad. Sci. U.S.A.
71
,
3455
3459
[PubMed]
81.
Eun
H.-M.
(
1996
)
Marker/reporter enzymes
in
Enzymology Primer for Recombinant DNA Technology
.
Academic Press
,
San Diego
82.
Chang
A.C.
and
Cohen
S.N.
(
1978
)
Construction and characterization of amplifiable multicopy DNA cloning vehicles derived from the P15A cryptic miniplasmid
.
J. Bacteriol.
134
,
1141
1156
83.
Shafferman
A.
and
Helinski
D.R.
(
1983
)
Structural properties of the beta origin of replication of plasmid R6K
.
J. Biol. Chem.
258
,
4083
4090
[PubMed]
84.
Cohen
S.N.
and
Chang
A.C.Y.
(
1977
)
Revised interpretation of the origin of the pSC101 plasmid
.
J. Bacteriol.
132
,
[PubMed]
85.
Hasunuma
K.
and
Sekiguchi
M.
(
1977
)
Replication of plasmid pSC101 in Escherichia coli K12: requirement for dnaA function
.
Mol. Gen Genet.
154
,
225
230
86.
Sutcliffe
J.G.
(
1978
)
Nucleotide sequence of the ampicillin resistance gene of Escherichia coli plasmid pBR322
.
Proc. Natl. Acad. Sci. U.S.A.
75
,
3737
3741
[PubMed]
87.
Schwarz
S.
,
Kehrenberg
C.
,
Doublet
B.B.
and
Cloeckaert
A.
(
2004
)
Molecular basis of bacterial resistance to chloramphenicol and florfenicol
.
FEMS Microbiol. Rev.
28
,
519
542
[PubMed]
88.
Jelenić
S.
(
2003
)
Controversy associated with the common component of most transgenic plants - Kanamycin resistance marker gene
.
Food Technol. Biotechnol.
41
,
183
190
89.
Møller
T.S.B.
,
Overgaard
M.
,
Nielsen
S.S.
,
Bortolaia
V.
,
Sommer
M.O.A.
,
Guardabassi
L.
et al.
(
2016
)
Relation between tetR and tetA expression in tetracycline resistant Escherichia coli
.
BMC Microbiol.
16
,
39
[PubMed]
90.
Ramirez
M.S.
and
Tolmasky
M.E.
(
2010
)
Aminoglycoside modifying enzymes
.
Drug Resist. Updat.
13
,
151
171
[PubMed]
91.
Ali
S.A.
and
Chew
Y.W.
(
2015
)
FabV/triclosan is an antibiotic-free and cost-effective selection system for efficient maintenance of high and medium-copy number plasmids in Escherichia coli
.
PLoS ONE
10
,
e0129547
[PubMed]
92.
Fiedler
M.
and
Skerra
A.
(
2001
)
proBA complementation of an auxotrophic E. coli strain improves plasmid stability and expression yield during fermenter production of a recombinant antibody fragment
.
Gene
274
,
111
118
[PubMed]
93.
Velur Selvamani
R.S.
,
Telaar
M.
,
Friehs
K.
and
Flaschel
E.
(
2014
)
Antibiotic-free segregational plasmid stabilization in Escherichia coli owing to the knockout of triosephosphate isomerase (tpiA)
.
Microb. Cell Fact.
13
,
58
[PubMed]
94.
Vidal
L.
,
Pinsach
J.
,
Striedner
G.
,
Caminal
G.
and
Ferrer
P.
(
2008
)
Development of an antibiotic-free plasmid selection system based on glycine auxotrophy for recombinant protein overproduction in Escherichia coli
.
J. Biotechnol.
134
,
127
136
[PubMed]
95.
Dong
W.R.
,
Xiang
L.X.
and
Shao
J.Z.
(
2010
)
Novel antibiotic-free plasmid selection system based on complementation of host auxotrophy in the NAD de novo synthesis pathway
.
Appl. Environ. Microbiol.
76
,
2295
2303
[PubMed]
96.
Cranenburgh
R.M.
,
Lewis
K.S.
and
Hanak
J.A.J.
(
2004
)
Effect of plasmid copy number and lac operator sequence on antibiotic-free plasmid selection by operator-repressor titration in Escherichia coli
.
J. Mol. Microbiol. Biotechnol.
7
,
197
203
[PubMed]
97.
Ohashi-Kunihiro
S.
,
Hagiwara
H.
,
Yohda
M.
,
Masaki
H.
and
Machida
M.
(
2006
)
Construction of a positive selection marker by a lethal gene with the amber stop codon(s) regulator
.
Biosci. Biotechnol. Biochem.
70
,
119
125
[PubMed]
98.
Rosano
G.L.
,
Morales
E.S.
and
Ceccarelli
E.A.
(
2019
)
New tools for recombinant protein production in Escherichia coli: a 5-year update
.
Protein Sci.
28
,
1412
1422
[PubMed]
99.
Dumon-Seignovert
L.
,
Cariot
G.
and
Vuillard
L.
(
2004
)
The toxicity of recombinant proteins in Escherichia coli: a comparison of overexpression in BL21(DE3), C41(DE3), and C43(DE3)
.
Protein Expr. Purif.
37
,
203
206
[PubMed]
100.
Vijayendran
C.
,
Polen
T.
,
Wendisch
V.F.
,
Friehs
K.
,
Niehaus
K.
and
Flaschel
E.
(
2007
)
The plasticity of global proteome and genome expression analyzed in closely related W3110 and MG1655 strains of a well-studied model organism, Escherichia coli-K12
.
J. Biotechnol.
128
,
747
761
[PubMed]
101.
Marisch
K.
,
Bayer
K.
,
Scharl
T.
,
Mairhofer
J.
,
Krempl
P.M.
,
Hummel
K.
et al.
(
2013
)
A comparative analysis of industrial Escherichia coli K-12 and B strains in high-glucose batch cultivations on process-, transcriptome- and proteome level
.
PLoS ONE
8
,
e70516
[PubMed]

Author notes

*

These authors contributed equally to this work.

This is an open access article published by Portland Press Limited on behalf of the Biochemical Society and distributed under the Creative Commons Attribution License 4.0 (CC BY).