ATP-binding cassette (ABC) transporters, although being ubiquitous in biology, often feature a subunit that is limited primarily to bacteria and archaea. This subunit, the substrate-binding protein (SBP), is a key determinant of the substrate specificity and high affinity of ABC uptake systems in these organisms. Most prokaryotes have many SBP-dependent ABC transporters that recognize a broad range of ligands from metal ions to amino acids, sugars and peptides. Herein, we review the structure and function of a number of more unusual SBPs, including an ABC transporter involved in the transport of rare furanose forms of sugars and an SBP that has evolved to specifically recognize the bacterial cell wall-derived murein tripeptide (Mtp). Both these examples illustrate that subtle changes in binding-site architecture, including changes in side chains not directly involved in ligand co-ordination, can result in significant alteration of substrate range in novel and unpredictable ways.
ATP-binding cassette (ABC) transporters are ubiquitous in bacteria and function predominantly in substrate uptake as opposed to efflux. A common feature of these uptake systems is an additional subunit, the substrate binding protein (SBP) that is involved in initial recognition of the substrate and delivers this to the membrane-bound subunits that use the binding and hydrolysis of ATP to catalyse concentrative uptake into cells .
The SBPs or periplasmic-binding proteins (PBPs), as they were earlier known, were the first components of ABC uptake systems that were characterized in prokaryotes as they were easily released in periplasmic extracts from Gram-negative bacteria. What we now know as ABC transporters were initially known as ‘osmotic shock-sensitive’ transporters as release of the PBP abolished transport activity, demonstrating their essential role in transporter function. Early biochemical studies of these proteins date back to the 1970s with, for example, the galactose-binding protein , the ribose-binding protein  and the oligopeptide-binding protein . After the discovery of similar proteins in Gram-positive bacteria and archaea, which lack a periplasm and in which the SBPs are tethered to the cell membrane, the term PBP has gone out of favour and has been progressively replaced by SBP.
Structures of SBPs were solved relatively shortly after their discovery in the 1970s and onwards, with pioneering work through the groups of Quiocho [5,6] and Mowbray [7,8]. These early structures revealed that the SBPs contain two domains linked by a hinge and that the ligand was captured between these domains through a ‘Venus flytrap’ binding mechanism .
It was in 1985 that Chris Higgins et al. , by sequencing the genes encoding the SBPs and their flanking genes, discovered a linked gene encoding an ATPase subunit. The ABC transporter operons usually encode between 3 and 5 genes, with the SBP, the membrane domain and an ATPase domain. Both the membrane and the ATPase domains can be either homodimers or heterodimers.
Extensive work on the structure and function of numerous bacterial ABC transporters has revealed many variations of the basic function of the SBP. Some function in addition as parts of the bacterial chemotaxis system, transducing signals from ligand binding to the chemosensory apparatus. The same SBP fold is also used inside the cell, forming the ligand-binding domain in many transcription factors, including the classical Lac repressor . Although usually a single SBP works with a single ABC transporter, there are many examples now of multiple different SBPs working with the same membrane and ABC subunits, which are often encoded by separate genetic loci from the rest of the transporter, so called ‘orphan’ SBPs .
Being both generally soluble and stable proteins, there are now biochemical and structural data for hundreds of different SBPs from a diverse range of diverse bacteria. Tam and Saier  published the first extensive review of these different transporters, revealing that there were clearly different subgroups of SBPs that did not share obvious sequence homology; however, they did all appear to conform to Quicho's Venus flytrap analogy as they retain a common mixed α-helix/β-sheet structure with a hinge of some form. Berntsson et al.  introduced a simple structural classification of SBPs into six large clusters, which serves as a convenient and useful way to group bacterial SBP-dependent transporters. Of note, one of these clusters, cluster E, are SBPs that function not with ABC transporters but with secondary transporters in the tripartite ATP-independent periplasmic (TRAP) and the tripartite tricarboxylate transporters (TTT) families [14,15].
In this review, we will examine two of the groups from the Berntsson classification, group B and group C, looking at selected recent structures from within these groups and assess our ability to be able to accurately predict transporter specificity from the primary amino acid sequence of an SBP.
A furanose sugar-specific ABC transporter
Some of the first SBPs identified were from Escherichia coli and were specific for monosaccharides, including the ribose-binding protein (RbsB), the galactose/glucose-binding protein (MglB) and the arabinose-binding protein (AraF). These are known to function as part of regular ABC transporters in this organism and are defined as cluster B SBPs .
In an attempt to characterize some of the remaining unstudied ABC uptake systems in E. coli K-12 we cloned and expressed the product of the ytfQ gene, which was a predicted cluster B SBP encoded within the ytfQRTyjfF operon . Using native ES–MS, the protein was demonstrated to purify with a pre-bound ligand of 180 mass units, consistent with a hexose sugar, a regular substrate for a cluster B SBP. Binding experiments with a range of monosaccharides, assessed by measuring changes in the intrinsic tryptophan fluorescence of the protein, revealed that two sugars, D-galactose and L-arabinose, bound with 1.7 and 1.3 μM affinity respectively. As already alluded to, E. coli already had well-studied ABC transporters, the Mgl and Ara systems respectively, for both these sugars, making the discovery of this additional system rather perplexing. The structure of YtfQ was then solved and was found to contain an endogenous ligand present from the expression host E. coli. This was the hexose sugar D-galactose, but remarkably the protein was bound to the rare five-membered furanose ring form of the sugar, D-galactofuranose . In solution D-galactose is 92% in the pyranose form and whereas only 8% is in the furanose form, this appears to be sufficiently significant for bacteria that are scavenging carbon and competing with many other bugs to have evolved a dedicated second system. This is expressed under similar conditions as the Mgl system, where the SBP, MglB, binds D-galactopyranose, so that the bacterium can use the full 100% of D-galactose present in solution .
Might we have predicted this binding specificity from the sequence of YtfQ? Examining the overall structures of YtfQ and five other related cluster B monosaccharide transporters reveals that they share a very similar overall fold (Figure 1A). In fact, upon closer inspection of the ligand-binding site (Figure 1B), there are a number of well conserved amino acids across this family, which are indicative of monosaccharide binding. These include the conserved asparagine, aspartic acid and arginine residues at the bottom of the binding pocket (as illustrated in Figure 1B). A conserved aromatic residue sits behind these three amino acids and forms a hydrophobic surface that packs against the non-polar portions of the sugar. Despite this ‘fingerprint’ of a monosaccharide-binding pocket, the rest of the binding sites are much less conserved and contain specific adaptations for the different sugar structures. Drawing comparison between YtfQ and its closest sequence and structural homologue, the D-ribopyranose-binding protein RbsB, one can see a specific adaptation in YtfQ, where a phenylalanine in RbsB is now an arginine (Arg17) in YtfQ (Figure 1B). This introduction of a charge results in a salt-bridge forming with Asp90, which alters its conformation such that it now points away from the binding pocket. The ‘space’ created is where the extended form of the furanose sugars sits. However, one would have great difficulty in predicting this change from sequence analysis alone. Overall, although almost half of the binding pocket is conserved; the sugars all sit in different conformations within their respective binding pockets, making it very difficult to predict ab initio what a particular binding pocket is going to recognize beyond being a monosaccharide. Interestingly there is now evidence for another furanose-specific-binding protein, with the discovery of a D-ribofuranose SBP from Hahella chejuensis , perhaps suggesting that there are likely to be furanose-specific ABC transporters for any sugars that have a significant amount of the furanose form present when in solution.
Structural comparisons of cluster B monosaccharide SBPs
Cluster C SBPs bind the largest ligands for bacterial ABC transporters
We have examined some members of the cluster B SBPs and can generally conclude that the substrates for transporters bearing SBPs of this type are likely to be monosaccharides. For the cluster C proteins, which are the largest in mass of all the SBPs, being up to ∼70 kDa, the majority of the members bind peptides. The extra mass of the protein corresponds to an extra domain in the SBP, which clearly enables these proteins to bind larger ligands than other SBPs. For example, E. coli has a number of cluster C SBPs that bind peptides of different length, including the oligopeptide transporter containing OppA and the dipeptide transporter containing DppA, illustrated in Figure 2(A). The pioneering work of Tame and colleagues  revealed at the structural level how the Opp oligopeptide transporter could bind peptides with high affinity, but yet be able to recognize effectively any peptide of a particular length, between 3 and 5 amino acids for OppA. The structure of OppA from Salmonella enterica serovar Typhimurium revealed that the SBP makes contact to the main chain atoms of the peptide ligand and not the side chains [18–20]. The side chains instead sit in water filled cavities that provide the structural flexibility to bind any amino acid; exactly what a bacterium would want to do when it is scavenging peptides. This model has been slightly refined by an elegant study using MS to examine released peptides bound to a population of the E. coli OppA protein purified after growth on a peptide-containing medium . The analysis of these peptides, which were between 2 and 5 amino acids long, revealed an apparent preference for positively-charged peptides. The protein was also solved in the open unliganded form, which revealed a negatively-charged surface in the binding cleft, also supporting the idea that the protein, although still being a general peptide-binding protein, does have a preference for positively charged ligands . The Gram-positive bacterium Lactococcus lactis has an Opp system that can transport much larger peptides than the E. coli system, up to 18 residues. How can the same protein scaffold bind such large ligands? Although having an overall similar cluster C-fold, the L. lactis OppA has a large cavity within the protein that allows it to accommodate much larger peptides , an adaptation that could only really happen in these larger cluster C proteins.
Structural comparisons of cluster C SBPs
Whereas peptides appear to be the major cluster C ligands, Cuneo et al.  published a remarkable paper showing that a cluster C SBP from Thermatoga maritima was involved in transport of oligosaccharides. The protein, TM0031, binds oligosaccharides derived from cellulose from two rings (cellobiose) to five rings (cellopentaose). This binding site also has structural flexibility conferred by having a part of the binding site that specifically binds the disaccharide portion (shown in Figure 2A), whereas when the pentasaccharide binds, the other three sugar rings sit further into the binding pocket in a large water-filled cavity that makes few specific contacts to the protein.
So, cluster C SBPs bind peptides and saccharides that bacteria scavenge from their environment, but another early structure of a cluster C SBPs apparently bound a totally different and much smaller ligand. This was NikA, the E. coli nickel-binding proteins that forms part of a nickel ABC transporter in this bacterium that is used to acquire this metal under anaerobic conditions for use in hydrogenase enzymes . The first structure of NikA suggested that the protein was binding Ni(II) ions directly , which was in contrast with all other cluster C SBPs in terms of ligand size and binding mode (Figure 2A). Through the work of Cavazza and colleagues [26,27], it became clear that free nickel is actually unable to bind and, in fact, the nickel is binding in association with a chelator or nickelophore, which can be two histidine amino acids in some cases, making the binding site of a similar size to dipeptide SBPs. However, very recent structural data for NikA proteins from a range of other bacteria suggests a variety of binding modes with more or less of the nickel co-ordination coming from the protein or the nickelophore .
MppA is a peptide-binding protein that has evolved unique specificity for the murein tripeptide
One of the cluster C SBPs encoded in the E. coli K-12 genome is an orphan SBP, MppA, which was discovered by Park et al.  to be involved in recycling of peptidoglycan fragments during growth and, from genetic evidence, was thought to function with the membrane components of the Opp system. Bacterial cell walls are very strong due to protein-mediated cross-links of the glycan backbone. As bacterial cells grow and divide, they must remodel this peptidoglycan and bacteria, such as E. coli are known to shed fragments of turned-over peptidoglycan into the growth medium. This includes the tripeptide, L-alanyl-γ-D-glutamyl-meso-diaminopimelate, called murein tripeptide (Mtp), which has an unusual structure for a tripeptide as it has a D-glutamine residue that is linked to the next amino acid by a γ-linkage. We solved the structure of MppA to identify how a protein that is very similar to OppA is able to now only recognize a single ligand. The results revealed subtle changes in the binding site that have resulted in selectivity for structural features of the Mtp . First, the structure revealed that this unusual D-amino acid-containing peptide could adopt a similar 3D structure to a regular tripeptide due to the use of the γ-linkage extending the peptide from the D-glutamic acid and superposition of OppA with tripeptide lysine-glutamine-lysine (KEK) bound and MppA with Mtp bound look very similar (Figure 2B). This superposition revealed a key change in the binding mode that leads to specificity. In OppA, the side chain of the second amino acid of the ligand, glutamate, sits in the water filled cavity, but in MppA there is not a normal amino acid side chain in this position, rather the α-carboxylate group is sitting in this pocket. A mutation away from the binding site in which a glutamic acid in OppA (Glu276) has changed to an alanine in MppA (Ala274), results in the breakage of salt bridge and the release of Arg402 in MppA to now fill the ‘water filled cavity’ and form a direct salt bridge to the ligand (Figure 2B), thereby creating a specific interaction to the ligand, something that is avoided in OppA. The arginine is present in both proteins, but is functioning in two different ways. Previously the structure of PrgZ from Enterococcus faecalis was solved, which binds a specific hydrophobic heptapeptide (Figure 2A) and again small changes in the binding pocket explain how an OppA-like protein has evolved specificity for a single peptide .
From this brief survey of two of the clusters of ABC SBPs, it is clear that we still have much to discover about the diversity of ABC transporters in bacteria, some of which have hundreds of different ABC uptake systems allowing them to use a plethora of different growth substrates in complex environments. We can conclude that using sequence based analysis, a protein can be placed into one of the clusters and this can give a likely prediction of the overall ‘class’ of substrate that an uncharacterized ABC transporter might recognize. However, as we have seen, it is then currently almost impossible to make a very accurate prediction about the exact substrates that are bound. Often analysis of the genome context of an ABC transporter helps by finding linked genes that give clues to function, but this is not applicable in many cases. As has recently happened for TRAP transporter SBPs, a systematic structural genomics program to solve the structure of hundreds of new SBPs  is now a real possibility for ABC transporters to systematically assess the true diversity of their biological function. There are many surprises yet to be revealed.
This article is dedicated to the memory of the late Professor Amy Davidson, an exceptional scientist, colleague and leader in the bacterial ABC transporter community.
This work was funded by the Biotechnology and Biological Sciences Research Council [PhD studentship to R.S.P.H.] and the Higher Education Commission Pakistan for a studentship to A. Maqbool.
ATP binding cassette transporters: from mechanism to organism: Held at University of Chester, U.K., 16–18 April 2015.
Present address: John Innes Centre, Colney Lane, Norwich, Norfolk, NR4 7UH
Present address: Department of Chemistry, California Institute of Technology, 1200 East California Boulevard, Pasadena, CA 91125, U.S.A.