The single-molecule approach seeks to understand molecular mechanisms by observing biomolecular processes at the level of individual molecules. These methods have led to a developing understanding that for many processes, a diversity of behaviours will be observed, representing a multitude of pathways. This realisation necessitates that an adequate number of observations are recorded to fully characterise this diversity. The requirement for large numbers of observations to adequately sample distributions, subpopulations, and rare events presents a significant challenge for single-molecule techniques, which by their nature do not typically provide very high throughput. This review will discuss many developing techniques which address this issue by combining nanolithographic approaches, such as zero-mode waveguides and DNA curtains, with single-molecule fluorescence microscopy, and by drastically increasing throughput of force-based approaches such as magnetic tweezers and laminar-flow techniques. These methods not only allow the collection of large volumes of single-molecule data in single experiments, but have also made improvements to ease-of-use, accessibility, and automation of data analysis.
If we consider the molecular processes that are central to life — DNA replication, transcription, translation, and cell division — intricate, precise and highly evolved mechanisms come to mind. These processes have been subject to tremendous selection pressure over aeons, resulting in highly efficient mechanisms with little room for error. Textbooks often depict these processes moving like clockwork without detours or off-pathway transitions. However, a picture is now emerging of dynamic molecular pathways which, rather than being shepherded deterministically through a linear series of events, are more free form than any textbook model . On careful consideration, such a picture makes sense — kinetic pathways can only be determined by the thermodynamic boundary conditions of the system, an intricate interplay of interaction strengths, concentrations, avidity effects, and catalytic rates of the molecular species involved. These all contribute to a reaction free-energy landscape of peaks and troughs which may result in a preferential pathway, but could allow many different routes (Figure 1). Stochasticity thus has a pervading influence and complicated, multi-faceted processes require plasticity to achieve robustness. Experimentally resolving the mechanistic details of such pathways presents a special challenge. In traditional, ensemble-based experiments, which measure the behaviour of a population of molecules as a whole, underlying distributions representing the diversity of kinetic behaviour are hidden. In contrast, single-molecule techniques follow the reaction trajectories of individual molecules and have tremendous power to resolve the fine details of multi-step pathways. Such an approach allows distributions of pathways to be measured, rather than merely the distributions of quantifiable endpoint parameters. For complex biological processes, heterogeneity is typically observed within single-molecule trajectories as a function of time, as well as across many single molecules in an ensemble; these two types of heterogeneity are not necessarily equivalent. Static disorder describes the heterogeneity present across molecules and complexes that does not change through time, while dynamic disorder describes the changing behaviour of single-molecule trajectories through time. Truly high-throughput single-molecule techniques should be capable of observing large numbers of molecules, each for a sufficient duration so that a representative sampling of both static and dynamic disorder can be obtained.
Multi-pathway nature of biochemical processes.
When a process is comprised of a diversity of pathways, obtaining a representative sampling may require hundreds of single-molecule trajectories. Most single-molecule techniques, however, have only limited throughput, typically following trajectories of at most a handful of molecules at once. The experimental set-up for these techniques can be time-consuming and technically demanding, often leading to a woefully low rate of experimental success and overall throughput. Data processing is thus an exercise in separating signal from noise, potentially leading to the introduction of biases in the analysis and interpretation that may not be discernible when curated results are published. High-throughput approaches promise to address these problems by improving the signal-to-noise ratio and making automated methods of analysis viable, and hence will be crucial to the more widespread adoption of single-molecule techniques [2,3]. Single-molecule techniques include a variety of methods for following reaction trajectories of individual fluorescently labelled molecules, such as single-molecule FRET, as well as force-based mechanical techniques including atomic force spectroscopy, and optical/magnetic tweezers [4–7]. In general, fluorescence-based techniques are particularly suited to real-time measurement of protein stoichiometries, exchange events, conformational changes, and enzymatic catalysis, and many of these techniques can be applied in vivo and in vitro. Although force-based techniques have a somewhat more limited scope, they are particularly suited to the study of protein–nucleic acid interactions and the dynamics of molecular motors — topics of particular interest in the field of single-molecule biophysics.
Nucleic acids provide an ideal stage on which to observe single-molecule processes and nucleic acid-focussed systems have led the way since the inception of the single-molecule field. This focus on nucleic acids continues with the advent of high-throughput techniques; however, improvements to throughput are also being made in other generally applicable and specialised methods, particularly through improvement in optics and sensors [8–10]. In this review, we will discuss fluorescence and force-based single-molecule methods with a particular focus on platforms based on nucleic acids and how innovative approaches to increasing their throughput have contributed to significant new findings.
Single-molecule real-time DNA sequencing
The reigning champion of high-throughput single-molecule techniques was developed with a commercial motivation. Pacific Bioscience's single-molecule real-time (SMRT) DNA-sequencing platform is capable of simultaneously following nucleotide incorporation events by many thousands of DNA polymerases, allowing rapid whole-genome sequencing [11–14]. As a commercial product, the system is standardised and automated, including automated reagent injection, downstream data processing, and analysis. Rapid DNA sequencing is achieved by multiplexing the sequencing-by-elongation reaction into discrete zero-mode waveguides (ZMWs), nanostructured reaction wells which are arrayed on a thin metal layer over a glass substrate (Figure 2) . In the current generation of SMRT sequencers, there are one million ZMWs per reaction cell. Each ZMW has dimensions smaller than the wavelength of the illuminating laser light and has a volume in the zeptolitre (10−21 l) range. Since millimolar concentrations correspond to single molecules within zeptolitre volumes, the very small illumination volume of the ZMW allows high concentrations of fluorophore-labelled substrates to be present in the reaction volume.
Zero-mode waveguides for single-molecule reactions.
The SMRT sequencing reaction is carried out by a highly processive modified phi 29 DNA polymerase enzyme tethered to the surface, with an average of one polymerase per ZMW. In the presence of all four dNTPs, each labelled at the terminal phosphate with a unique fluorophore, the polymerase carries out replication of the target DNA. By using ZMWs, dNTPs can be present at a concentration conducive to replication at rates of ∼1–2 nt/s, with minimal background fluorescence, enabling rapid elongation and thus sequencing. Under laser illumination, each nucleotide incorporation event is associated with a burst of fluorescence, the colour of which identifies the incorporated nucleotide. Multi-megapixel image sensors record the sequence of nucleotides incorporated recorded for each polymerase-containing ZMW. The consensus sequence is then generated by an automated data-processing pipeline.
The single-molecule resolution, the ability to use high concentrations of fluorophore-tagged ligands, and the massive throughput of the Pacific Biosciences sequencing platform have also proven useful as a platform for basic research. Puglisi and colleagues have described how the SMRT sequencer can be modified for highly multiplexed studies of biomolecular interactions [16,17], and they have put this system to use studying protein–RNA interaction dynamics. A recent study used the ZMW platform to study translation initiation at the cricket paralysis virus (CrPV) internal ribosome entry site (IRES), a structured feature of the CrPV mRNA which circumvents the requirement for translation initiation factors . As such, this represents a minimal system for studying the initiation of translation, with the complexity of multiple pathways associated with the various eukaryotic initiation factors removed. By fluorescently labelling the IRES, tRNA, and the 40S and 60S subunits with fluorophores of different colour, multicolour experiments using the ZMW array were performed to study the assembly of the 80S ribosome on the surface-immobilised IRES, as well as the association of fluorescently labelled tRNA. Similar to the sequencing reaction, the arrival and persistence of any labelled component at the ZMW is measured by a step-wise increase in fluorescence, and multiple colours can be monitored simultaneously. This technique allows association and dissociation rate constants to be determined and was used to compare the probabilities of various possible modes of assembly. It was found that functional ribosome assembly could occur either with the 40S and 60S ribosomal subunits recruited by the IRES in sequence, or simultaneously with pre-formed 80S ribosomes binding to the IRES, with the Mg2+ concentration controlling the balance between these two pathways. By using two different mRNA templates, one with the first codon of the 0 frame encoding phenylalanine, and another doing so with the first codon of the +1 frame, the preference of tRNAPhe-Cy3-Phe for the 0 or +1 frame at the start of elongation was measured. It was found that binding of the tRNA to the 0 frame was faster and more efficient than binding to the +1 frame, and that in both cases elongation factors were required for efficient binding to occur. Although the ribosomal complex used by the authors represents a minimal system for eukaryotic translation, multiple pathways for ribosome assembly and the initiation of elongation were observed. The present study would not have been possible without the high-throughput ZMW approach that allowed these different reaction pathways and conditions to be explored.
Total internal reflection fluorescence (TIRF) microscopy is one of the most successful and widely used techniques in single-molecule biophysics. The strength of this method lies in the selective illumination of only a thin layer of sample close to the surface of a glass substrate [19,20]. By excluding the fluorescence from above the illuminated volume, significant background reduction is achieved and individual fluorophores can be imaged with high signal-to-background ratios. By tagging proteins of interest with fluorophores, stoichiometries, interactions, exchange events, and many other parameters can be tracked in vitro or in vivo. A range of imaging techniques and experimental strategies are available to make TIRF microscopy supremely adaptable. In protein–DNA studies, linear DNA molecules are typically tethered to the surface of a microscopic cover slip using a binding partner that is randomly distributed on the surface, such as streptavidin in the case of biotin-functionalised DNA. An alternative strategy is the DNA curtain technique, in which long DNA templates are aligned in rows on the surface (Figure 3). This approach has the advantages of providing a high surface density, which increases throughput, and greatly improving the spatial localisation of fluorescent proteins along the DNA sequence. This method has been pioneered by the laboratory of Eric Greene, relying on nanofabrication approaches such as electron-beam and UV lithography [2,21–23]. In brief, these lithographic techniques are used to create nanofabricated chromium diffusion barriers in rows on a coverslip surface. A flow cell is then assembled on top of the coverslip, and a lipid bilayer is coated onto the surface with DNA molecules tethered to biotinylated phospholipids in the bilayer. When buffer flow is applied, the DNA tethers diffuse to the diffusion barriers, resulting in the alignment of the DNA in curtains. The spacing between diffusion barriers is only slightly greater than the DNA length, to maximise the number of rows of curtains that can be imaged in a single field of view. DNA curtains are particularly well suited for observing sequence-dependent interactions; as the tether point for each DNA template is known with high accuracy, sequences of interest can be closely matched to the distance from the tether point. This technique has become a mainstay for the Greene and Finkelstein groups, and has been used in many studies of protein–DNA interactions [2,21–44].
DNA curtains to multiplex single-molecule observations.
DNA curtains are particularly well suited to identifying the means by which DNA-interacting proteins search for their target sequences and sites, balancing the need for speed in the search phase with stability in the target-bound phase . A recent study investigated how the eukaryotic mismatch repair complex Msh2–Msh3 finds its target sites on crowded DNA, in the presence of nucleosomes and other protein roadblocks . By using doubly tethered DNA curtains of λ-phage DNA (with a length of 48.5 kbp) and Msh2–Msh3 complexed with a quantum dot, the diffusion of the complex on DNA could be visualised and tracked. To evaluate whether the protein diffuses on DNA using a sliding (translocation in continuous DNA contact) or hopping (correlated dissociation and reassociation) mechanism, the ionic strength was varied so that the level of electrostatic interaction with the DNA was controlled. The diffusion coefficient was observed to increase with the ionic strength in a manner characteristic of a mechanism of alternating hopping and sliding. In the presence of nucleosomes or other protein roadblocks, the Msh2–Msh3 complex was observed to bypass the roadblock and continue its search. In contrast, another mismatch repair complex, Msh2–Msh6, was shown in a previous DNA curtain study to have no dependence of diffusion coefficient on the ionic strength, indicative of a sliding mechanism . Without hopping, this complex was shown to be unable to bypass roadblocks. The ability to monitor large numbers of DNA molecules simultaneously was crucial in allowing the authors to visualise the interplay between the repair complexes and the nucleosomes, and to dissect how the mode of protein translocation dictates its ability to bypass nucleosomal roadblocks.
One drawback of the DNA curtains technique is the expense and difficulty of the lithographic fabrication methods involved. One way this could possibly be improved is by the use of soft lithography, in which a finely patterned, reusable elastomeric stamp is used to deposit linking molecules such as streptavidin onto the flow cell surface . Although soft lithography has been used to deposit patterns on surfaces in single-molecule studies, including for ordered DNA tethering [48–52], this technology has not been employed yet to support the production of DNA curtains.
Force-based single-molecule methods rely on the application of a stretching force to a biopolymer, such as DNA, and the readout of the length of the molecule as a response to that force or to proteins acting on the polymer. Many force-based methods use wide-field microscopy for data collection and are readily amenable to scaling up by maximising the size of the field of view and the density of species of interest. One such method is the tethered-bead flow-stretching technique [1,53]. Here, antibody-functionalised microbeads are tethered to the functionalised surface of a flow cell via a long double-stranded DNA. A constant flow rate applies a controllable drag force to the beads in the low piconewton range, and they are imaged with a very high signal-to-noise ratio by dark-field microscopy (Figure 4). Movement of the bead reports on changes to the DNA length, caused by interconversion between dsDNA and ssDNA, compaction by DNA-binding factors, or the formation and release of topological features such as loops and coils. Because the beads are spherically symmetric, their centroid positions can be easily tracked by sub-pixel fitting, at a precision of ∼1–10 nm [53–56]. This resolution is possible even when low-magnification optics are used, provided the resolution is sufficient that each bead projects onto multiple pixels on the image sensor. Brownian motion of the beads, mechanical drift, and small fluctuations in the flow rate are the main sources of experimental uncertainty.
High-throughput force-based single-molecule methods.
The mechanism of DNA replication is a well-studied example of how a multi-protein machine supports a complicated series of enzymatic events, and the flow-stretching technique is well suited to observing its dynamics [57–64]. Because duplex DNA is antiparallel, one nascent daughter strand, the leading strand, is extended continuously in the same direction as replication progression, while the lagging strand is extended discontinuously to form Okazaki fragments in the opposite direction. Each Okazaki fragment is initiated by an RNA primer synthesised by the primase subunit and is extended by the polymerase. It has been observed that under conditions in which the replisome is first assembled on DNA before being subjected to rapid dilution, leading and lagging-strand replication persists well beyond a single Okazaki-fragment length, implying that, in the absence of free polymerase in solution, a mechanism exists to recycle the lagging-strand polymerase and use it to synthesise multiple Okazaki fragments [65–67]. In the classic ‘trombone-loop’ model for DNA replication, the lagging-strand polymerase remains physically coupled to the replication fork, forcing DNA to loop out in between the helicase and the lagging-strand polymerase . Following the completion of each Okazaki fragment, the loop is released and a new cycle begins with recycling of the polymerase to a newly synthesised primer. While this model was shown to describe the behaviour of the replication complex under conditions of assembly followed by rapid dilution, it had not been thoroughly tested in the physiologically relevant context of free polymerase in solution.
To understand the coordination between leading- and lagging-strand DNA replication, Duderstadt et al.  created a DNA template for the flow-stretching technique with each end of the DNA attached to a bead, and with a replication fork located at the middle and bound to the flow cell surface. One bead reports on lagging-strand dynamics, whereas the other reports on a sum of leading and lagging-strand dynamics, which can be deconvoluted. The low frequency of the replication events being studied, and the scarcity of DNA templates with a bead bound at each end necessitated the scale-up of this method to high throughput. Using a low-magnification telecentric lens coupled with a high-megapixel camera, an ultra-wide field of view (5.3 mm × 3.5 mm) was visualised containing tens of thousands of beads that each could be monitored simultaneously. Key steps of the data analysis were automated, including identification of bead pairs and fitting of line segments to their event trajectories using a statistical technique called change-point detection. This workflow ensured that the data could be processed efficiently and sources of bias minimised. In a demonstration of the plastic, multi-pathway nature of DNA replication, Duderstadt et al. observed that in the presence of free polymerase in solution, Okazaki fragment synthesis by the bacteriophage T7 replisome could occur either within replication loops or in their absence, with the majority of lagging-strand replication taking place outside the context of replication loops. Moreover, both modalities of DNA replication were observed within individual trajectories of processive replication reactions. While looping was observed during most cycles of Okazaki fragment synthesis, the information provided by both the leading- and lagging-strand beads indicated that most of these were priming loops — where DNA loops out between the helicase and primase subunits during primer synthesis. The detailed study of the extremely complex dynamic behaviour of the replication machinery relied critically on the high throughput of the flow-stretching method and would have been exceedingly challenging with any other technique.
A related technique, magnetic tweezers, has similarly been shown to be amenable to scaling up to high throughput [3,69–71], and its application to multi-pathway processes has been described in detail . In this method, a force is applied to superparamagnetic beads attached to nucleic acid polymers in a flow cell by using permanent magnets, and similarly to the flow-stretching technique, these are imaged and tracked by wide-field microscopy. To achieve high throughput, Dekker and colleagues use a relatively large field of view, 400 × 300 µm, with a high-resolution 12 MPix camera. As in the flow-stretching experiments, the power of sub-pixel fitting means that even at low magnification, the beads can be tracked with no sacrifice of resolution. With hundreds of tethered microbeads in the field of view, and a fast imaging frame rate of 58 Hz, tracking of the beads for a long-duration experiment could potentially cause a bottleneck in data analysis. To speed up this step, Cnossen et al.  developed GPU-based tracking software that makes use of the NVIDIA CUDA platform for parallelisation and tracks all the beads in real time. The high-throughput power of this technique has been used to study error incorporation by a viral RNA-dependent RNA polymerase (RdRP) . In this experiment, beads were tethered to double-stranded RNA, which was converted into single-stranded RNA by the RdRP, causing lengthening of the RNA tether and commensurate movement of the bead. An unbiased, automated dwell-time analysis method was applied in which the dwell times for the polymerase in consecutive 10-nt windows were quantified and their distributions in a range of conditions were characterised. Distinct peak features were present in the dwell-time distribution, indicating different pathways for elongation with differing error rates. These include brief dwell times associated with elongation with no pausing, intermediate dwell times associated with pausing and error incorporation, and longer dwell times caused by backtracking events. The statistical significance of this analysis critically relied on access to the hundreds of trajectories collected using this high-throughput technique.
The growth of the single-molecule biophysics field and the adoption of its techniques by researchers entering the field are dependent on the development of high-throughput, adaptable and reliable techniques. The nucleic acid-focussed techniques reviewed in the present study can certainly provide high throughput, however, there are still gains to be made in adaptability, expense, and ease-of-use. Integrated systems for single-molecule experiments and data analysis are starting to become available as commercial packages, for example the acoustic force spectroscopy system produced by Lumicks [73,74]. Single-molecule methods could also benefit from a greater consensus around standards for statistical methods and standards of reporting. In contrast with many established techniques in biochemistry such as gel electrophoresis and chromatography, single-molecule methods produce data that must be extensively processed, analysed, and interpreted, and this is especially true of high-throughput techniques. As the field matures, it will be crucial for conventions in statistical methods and transparency to be developed. An example of this is the interpretation of histograms of measured single-molecule kinetic parameters. It is common practise to fit experimentally obtained distributions with a sum of multiple underlying distributions, each corresponding to an ostensibly distinct subpopulation; however, statistical tests are often missing to justify the assumption of multi-modality. While high-throughput techniques can greatly improve the statistical significance required to justify multi-modality, it can also increase the temptation to over-fit. It may also become desirable in the future for raw data sets, or randomly selected subsets of these, to be routinely made available online as Supplementary data, along with the tools and instructions necessary to replicate published analyses.
F.R.H. acknowledges support from the Australian Research Council [DP150100956]. E.M. acknowledges support from the Netherlands Foundation for Fundamental Research on Matter [12CMCE03]. A.M.v.O. acknowledges support by the Australian Research Council [DP150100956 and FL140100027].
The Authors declare that there are no competing interests associated with the manuscript.