The inherent hydrophobicity of membrane proteins is a major barrier to membrane protein research and understanding. Their low stability and solubility in aqueous environments coupled with poor expression levels make them a challenging area of research. For many years, the only way of working with membrane proteins was to optimise the environment to suit the protein, through the use of different detergents, solubilising additives, and other adaptations. However, with innovative protein engineering methodologies, the membrane proteins themselves are now being adapted to suit the environment. This mini-review looks at the types of adaptations which are applied to membrane proteins from a variety of different fields, including water solubilising fusion tags, thermostabilising mutation screening, scaffold proteins, stabilising protein chimeras, and isolating water-soluble domains.
Proteins tend to be classified into one of two broad categories, globular cytoplasmic proteins and integral membrane proteins (IMPs) . This simple classification is defined by a proteins intrinsic ability to produce a stable, functional structure in either a hydrophilic or hydrophobic environment. Analysis of published genomes suggests that between 15% and 30% of the identified open reading frames encode proteins which would fit into the latter, membrane-embedded, category [2,3]. Yet, despite seemingly so many membrane proteins available, these hydrophobic proteins lag significantly behind their water-soluble counterparts in terms of characterisation . For instance, less than 1% of the resolved protein structures deposited in the Protein Data Bank (PDB) are membrane proteins . This statistic belies the significant effort and strides made in membrane protein research over the past decades.
Understanding a protein in terms of function usually requires an understanding of its three-dimensional structure, and the current paucity of membrane protein structures produces a knowledge gap which has an impact on a variety of research areas in which membrane proteins play a key role . For instance, membrane proteins are prime targets for therapeutics due to their important roles as key mediators of molecular transport into and out of the cell, as well in other crucial processes such as cell signalling . The knowledge gap in membrane protein research is a major bottleneck in the development of future biomedical and biotechnological advances.
The main barrier to accelerated membrane protein research is their intrinsic hydrophobicity. This single factor produces difficulties at every stage of the protein production pipeline . Membrane proteins have, by their very nature, a predominance of alkyl and aryl residues on their surface which makes hydrophobic interactions with the surrounding lipid. This hydrophobic interaction is the main driver of their folding and stability in the lipid environment. Most analytical, characterisation, and structural techniques, such as X-ray crystallography, NMR, and to an increasing extant electron microscopy, rely on relatively large quantities of protein which are soluble in aqueous conditions . As most membrane proteins are produced at very low levels in their natural host, they are difficult to isolate from native membranes. High levels of expressed protein in easily amplified cells such as Escherichia coli, which can then be adequately solubilised, is a necessity.
Expressed membrane proteins must be incorporated into the cell membrane so that they assume the correct fold and retain their functional state. Such a small amount of available membrane relative to the total available volume within the cell places a limitation on the amount of viable membrane protein that any cell can produce. In addition, many membrane proteins produce large insoluble aggregates, inclusion bodies, within the cytoplasm. To maximise the amount of the target protein in the membrane, the expression conditions (typically host strain, growth medium, induction temperature and duration) will require extensive time-consuming optimisation to achieve sufficient levels of protein for downstream processing , and in many cases, this may never be realised. This is especially true for membrane transporters. Overexpression of transport proteins can produce an unwanted increase in the movement of molecules and ions across the cell membrane, causing significant toxicity and a reduction in culture growth and protein yield.
Membranes, harbouring the target protein, must be harvested from a cell lysate and solubilised so that they can be purified under aqueous conditions. Conventional solubilisation strategies focus on the addition of detergents [8–10]. These amphiphilic molecules self-assemble in solution, forming micelle structures which can encompass the membrane proteins, burying the hydrophobic surface residues in the core of the micelle, and maintaining the protein's structure close to that present in the natural lipid environment . Finding a detergent which offers good levels of solubilisation while retaining the structure of the protein requires painstaking optimisation and trial-and-error testing of a range of detergents with different chain lengths and head groups [9,12,13]. In addition, the pH of the buffer, salinity levels, and the presence or absence of various other additives will all play a role in determining the final stability of the membrane protein [13,14]. A protein which is only stable and functional in a narrow window of environmental conditions is not only difficult to handle and characterise in the laboratory, but also curbs its potential use in biomedical or biotechnological applications. New additives to replace the use of detergents, such as nanodiscs [15,16], SMALPs , bicelles , and lipid-like peptides , are now emerging.
With the advent of modern genetic engineering techniques and the ability to alter genetic sequences easily and cheaply, there are ways to overcome many of these challenges for membrane protein production by manipulating the protein itself. This mini-review looks at some of the options available and provides examples of where changes to the membrane protein itself have been used with excellent results.
Isolating water-soluble domains from a membrane protein
One approach to the investigation of membrane proteins is to decide if the membrane-spanning region of the protein is actually required at all. Many membrane proteins possess water-soluble domains which may be exposed to the cytoplasm, periplasm, or even extracellularly. These domains are often sites of catalytic activity. The activity of such domains can provide an easily assayable function for structure–function investigations. Working on a water-soluble domain overcomes many of the inherent challenges of membrane proteins, with improved expression levels, greater stability, and ease of characterisation. A recent example of a truncated membrane protein investigated in this way is the phosphoethanolamine transferase MCR-1 from Enterobacteriaceae . The crystal structure of this protein is helping to elucidate the mechanisms behind the certain antibiotic resistance.
From a protein design perspective, the biggest obstacle is correctly determining the domain boundaries. This is a straightforward procedure where sequence alignments to well-characterised protein families exist. However, for some protein groups, this can be anything but a trivial exercise. Truncating the protein too early may remove stabilising residues, and too late may mean additional, potentially destabilising residues are fused to the domain of interest. High-throughput solubility screening strategies can help. The expression of soluble proteins by random incremental truncation (ESPRIT) is one such method . This library-based approach, requiring very little background knowledge of the protein of interest, uses robotic screening to analyse potentially thousands of clones.
In essence, the gene encoding the protein of interest is enzymatically degraded, a nucleotide at a time, to produce a spectrum of sequences which become incrementally smaller . Each library member is fused to sequences encoding a C-terminal biotin acceptor peptide  and an N-terminal poly-histidine tag. The biotinylation peptide acts as a solubility reporter, as soluble proteins will be efficiently biotinylated in vivo. Only proteins displaying both a positive signal for the N- and C-terminal tags will be (a) soluble, (b) in the desired reading frame, and (c) free from proteolytic degradation  (Figure 1A). The ESPRIT methodology was used to investigate the phosphatase domain of the sporulation protein SpoIIE from Bacillus subtilis . SpoIIE has an N-terminal membrane-spanning domain with 10 transmembrane helices (Figure 1B). The remainder of the protein is displayed to the inside of the spore compartment during stage III of the sporulation pathway, where it forms part of a sigma factor cascade, enabling the spore and mother cell compartments to establish different fates [24,25]. By truncating the spoIIE gene incrementally, the domain boundaries were isolated and soluble protein purified, something that was difficult to achieve from analysis of the sequence alone  (Figure 1B). Based on this technique, the crystal structure of a domain-swapped dimer of the phosphatase was determined Figure 1C), confirming it belonged to the PP2C family of protein phosphatases, and allowed an intramolecular signalling surface to be identified .
Obtaining water soluble fragments of SpoIIE using the ESPRIT methodology.
Producing water-soluble IMPs with solubilising fusion tags
A recent development in membrane protein engineering is to retain the native membrane protein sequence, but introduce a fusion tag which overcomes the insolubility of the membrane protein in aqueous solution . Mizrachi et al. have pioneered such an approach [28,29], termed SIMPLEx (solubilisation of IMPs with high levels of expression). They employ an amphipathic protein fusion partner, added to the membrane protein termini . In this case, the C-terminal lipid-binding domain of apolipoprotein was selected due to its intrinsic amphipathic nature and its inherent structural flexibility . To prevent undesired incorporation of the membrane protein in the inner cell membrane of E. coli, a decoy protein, such as maltose-binding protein (MBP), was added to the N-termini of the IMP. Having such a large soluble protein at the N-termini blocks transfer of the IMP to the membrane, ensuring efficient solubilisation via the apolipoprotein fusion tag . Using this methodology, a range of different IMPs have been produced without the need for additional solubilising additives (detergents, for example). The list of IMPs currently explored with this technique extends to cytochrome B5, the ethidium multidrug resistance protein E (EmrE), Claudin-1 and Claudin-3, and DsbB and OmpX . Significantly, IMPs from both bacterial and mammalian systems have been tested with equal success . However, careful characterisation is required to ensure that the membrane protein retains the correct structure and function, and that it has not been unintentionally compromised by the addition of the solubilising tag. The ability to produce human IMPs in this way opens up the approach for drug discovery applications, where the challenges of producing adequate quantities of purified membrane protein is a major hurdle .
The implications of this methodology have wider applications. For instance, the SIMPLEx approach has been used to produce a water-soluble form of the bacterial protein DsbB [29,32]. This enzyme, in the presence of ubiquinone, is capable of catalysing the oxidation of the thiodisulfide oxidoreductase partner enzyme DsbA within the bacterial periplasm [33,34] (Figure 2A). DsbA is exported to the periplasmic space via a defined N-terminal export sequence. DsbB is a membrane-bound enzyme which inserts into the inner membrane and thereby limits the oxidising activity of the protein to the periplasm. By producing a water-soluble form of DsbB via the addition of the apolipoprotein and decoy fusion tags, this protein can be overexpressed to high levels within the cytoplasm . In that study, the partner protein, DsbA, had its periplasmic export sequence removed, meaning it too was retained within the cytoplasm . This allowed the DsbB–DsbA disulfide bond catalysis pathway, which was previously limited to the periplasm, to be fully constituted within the cytoplasm. The modified form of DsbB was still able to interact with DsbA as shown by cross-linking and small-angle X-ray scattering (SAXS) studies (Figure 2B) . The recompartmentalisation of this pathway in the cytoplasm was also enzymatically active, displaying disulfide bond forming capability against several different substrate proteins . Excitingly, this result demonstrates that by engineering IMPs in this way, enzymatic activities previously only associated with lipid membranes can be transplanted into different cellular compartments and non-native environments without the need for additional solubilising additives.
Using the SIMPLEx methodology to develop a cytoplasmic DsbA/DsbB system.
A slightly different approach to generate water-soluble membrane proteins is via the introduction of multiple residue substitutions, where typically hydrophobic surface residues are switched for hydrophilic ones. This approach is often guided by computational modelling to pick the most appropriate substitutions and has been successfully used to generate various water-soluble forms of membrane proteins including KcsA  and the Ca2+ pump regulating protein phospholamban [36,37].
Conformational thermostabilisation of G protein-coupled receptors
One of the highest priority IMP classes for structural and biophysical characterisation are G protein-coupled receptors (GPCRs). These proteins are found within the membranes of eukaryotic cells . GPCRs are involved in a huge array of different cellular processes. These IMPs feature seven transmembrane spanning helices (TMS) which enable signal transduction across a cell membrane . These proteins feature specific binding sites for ligands, which stimulate the GPCR . The interacting ligands which activate the receptors can be the natural receptor molecule or a synthetic ligand. This ability to elicit a GPCR response with synthetic molecules makes these proteins prime targets for pharmaceuticals , and they represent the site of action for ∼35% of existing drugs .
One of the main difficulties in realising more GPCR crystal structures is the inherent structural flexibility in these proteins . They display a highly dynamic structure, with a large degree of conformation flexibility . For a protein to crystallise, it needs to assume a regular, ordered assembly of repeating protein units. Flexibility within the protein structure means that each protein has the potential to assume one of a large number of possible conformations; this makes the crystals unlikely to form or have the long-range order required for X-ray diffraction studies. An additional problem to overcome is that GPCRs often have quite short loops connecting the different TMS. The lack of significant hydrophilic regions means a shortage of protein–protein contacts in the crystal lattice, leading to instability of the protein crystal and poor diffraction quality. However, there are protein engineering methodologies which can help in both of these regards .
An innovative approach is termed conformational thermostabilisation. This methodology has the ability to produce a stabilised GPCR with a favoured conformation, making it ideal for crystallisation and structural characterisation [43,45]. This technique relies on a programme of single residue substitutions, typically switched to alanine, systematically introduced at each position along the protein sequence [46,47]. Each variant is then expressed in cells, detergent solubilised, and heated to just above the melting temperature (Tm) of the native protein. Using a radiolabelled version of a ligand which binds to the GPCR allows researchers to easily and rapidly determine if the radioligand remains bound to the detergent-solubilised protein after heating. This technique relies on the fact that the ligand-binding site of GPCRs tends to be formed from residues displayed on various parts of the protein . If the protein is heated and subsequently denatures, the specific architecture of the protein responsible for producing the binding site would be lost . Therefore, the binding/non-binding of the specific radioligand gives a very convenient indicator of thermostability.
Preparing, expressing, and screening these mutations (potentially hundreds of constructs for a single GPCR) is a large undertaking, but using high-throughput techniques and automation where necessary can reduce some of the experimental burden . An alternative strategy is to generate random mutations by processes such as error-prone PCR and screen for binding on the pool of variants using fluorescent ligands and fluorescently activated cell sorting equipment [49,50]. In both cases, by determining the individual substitutions which give rise to the largest increase in Tm, the most stabilising mutations can be collected within a single construct. However, the most elegant addition to this procedure is the use of a specific agonist or antagonist during the thermal screening steps in order to find mutations which stabilise the protein in an active or inactive state. There are challenges to this technique, namely that it requires functional well-expressing membrane proteins, as well as ready access to a specific ligand, and the infrastructure to produce and screen a high number of single residue substitutions (hundreds of constructs in most cases). However, the rewards appear to be make up for these difficulties with numerous examples of structures obtained through this approach [51–55].
Computational advances are also helping to lighten the load of some of these limitations. Using different strategies ranging from free energy calculations to machine learning allows a suite of candidate mutations to be rapidly screened for stability in silico [56,57]. To reduce the range of potential mutants, most systematic mutagenesis strategies only substitute each residue for alanine but it has been estimated that, in ∼30% of cases, alanine may not produce the optimum stabilisation . A computational approach allows each residue to be swapped for any desired amino acid, thereby allowing the much greater mutational scope to be explored. Computational methodologies appear extremely promising, particularly with the increase in GPCR structures from which to build realistic homology models. Owing to the reductions in both cost and time it can bring, many laboratories not equipped for high-throughput screening would have the potential to benefit from a thermostabilising methodology.
Stabilisation through protein chimeras
GPCRs, due to their prevalence as therapeutic drug targets, are at the forefront of membrane protein innovations. Over the past decade, various teams of researchers have made chimeric proteins, whereby a loop connecting two TMS is exchanged for a stable, soluble, protein [58–60]. By incorporating the new protein into the IMP, a significant increase in hydrophilicity is achieved. If chosen carefully, the new fusion partner can promote the formation of protein crystals. Key requirements for judicious selection of fusion partners are that the protein is extremely stable, is already structurally characterised, and that the N- and C-termini are located closely together to allow them to be inserted into a loop of an IMP without distorting the arrangement of the transmembrane helices. A recent example of this is the Orexin receptor, OX2R . This GPCR is a member of the rhodopsin family and forms part of the sleep-wake cycle in mammals where it binds to neuropeptides of the central nervous system . Yin et al.  prepared a chimera of OX2R by substituting the interconnecting loop 3 (IL3) with PGS, a glycogen synthase from Pyrococcus abysii. This 196 residue protein is extremely stable, as it is derived from a thermophilic organism, benefitted from a high-resolution crystal structure, and featured termini less than 10 Å apart. The chimeric protein was crystallised and yielded a crystal structure of OX2R at 2.5 Å resolution (Figure 3) . The binding interaction with the anti-insomnia drug Suvorexant was revealed, and novel potential binding sites for future agonists were identified via computational docking to the OR2X model . A limitation of this approach is that the protein is missing one loop which may have unintended consequences for protein conformation or ligand interaction. This is often minimised by selecting loops which are less likely to form part of the ligand-binding site.
Using a stabilising protein chimera to obtain the crystal structure of OX2R.
Membrane protein chimeras with green fluorescent protein (GFP) have been used to find membrane protein constructs with improved stability, expression, or membrane insertion [63–67]. The intrinsic fluorescence of GFP makes it a natural choice of a reporter protein, which can be easily assayed using fluorescence detection. In these cases, GFP (or a form of split GFP to minimise any stabilising effect from the GFP itself) is typically fused to one of the termini of the membrane protein, allowing the protein to be tracked during various stages of its production and purification. This approach can, for instance, be coupled to targeted or random mutagenesis of the protein to find variants with improved expression levels and detergent stability [63,67].
If the part of the membrane protein which is, of interest, one of the interconnecting loop regions and the transmembrane spanning portion is simply an anchor, then there are opportunities to transplant the desired loop and graft it onto a scaffold protein. This approach has been used to investigate the loop regions of bacterial membrane proteins from magnetotactic bacteria. These bacteria are able to produce crystals of magnetic iron minerals (magnetite or goethite) within specialised membrane-bound organelles in their cells termed magnetosomes [68–70]. The magnetosomes harbour a range of different proteins, embedded within the membrane, which enable the production of the crystals [71,72]. Some of these proteins are found tightly bound to the crystal, interacting via the loops between TMS. In one example, the loop region from MamC was transplanted onto MBP . This allowed the crystal structure of the loop to be obtained that revealed a helical conformation, from which a proposed magnetite interaction site was derived . Protein produced using this approach was tested in in vitro magnetite formation reactions to ascertain whether it retained native functionality. Inspection of the synthesised magnetite nanocrystals showed that the protein was able to influence the properties of synthetic magnetite nanoparticles .
Membrane proteins are some of the most challenging biomolecules to produce, isolate, and characterise, and yet they often perform some of the most interesting and useful roles within the protein function repertoire. This makes them a rich source of reagents for use in fields as wide-ranging as pharmaceuticals to nanoparticle formation. Their inherent hydrophobicity and stability within lipid environments create obstacles to their wider use and exploitation. However, advances within the field of protein engineering are providing new tools to overcome these difficulties. Proteins can be edited to yield water-soluble domains , fused to solubilising protein tags to produce water-soluble variants , or have their stability enhanced to aid in crystallisation or detergent compatibility . The ingenuity of protein engineers has led to many new innovations and it is truly an exciting time to be working in the area of membrane protein research.
The Author declares that there are no competing interests associated with this manuscript.