Synthetic biology aims to apply engineering principles to the design and modification of biological systems and to the construction of biological parts and devices. The ability to programme cells by providing new instructions written in DNA is a foundational technology of the field. Large-scale de novo DNA synthesis has accelerated synthetic biology by offering custom-made molecules at ever decreasing costs. However, for large fragments and for experiments in which libraries of DNA sequences are assembled in different combinations, assembly in the laboratory is still desirable. Biological assembly standards allow DNA parts, even those from multiple laboratories and experiments, to be assembled together using the same reagents and protocols. The adoption of such standards for plant synthetic biology has been cohesive for the plant science community, facilitating the application of genome editing technologies to plant systems and streamlining progress in large-scale, multi-laboratory bioengineering projects.
With their ability to directly utilize CO2 and sunlight to produce a diverse range of organic compounds, plants and algae are attractive chassis for the sustainable generation of chemicals and materials. The use of plant cells has been widely demonstrated for the manufacture of pharmaceuticals and other high value products [1–9]. Plant cells are generally not hosts of human pathogens, making them particularly attractive for the manufacture of vaccines and therapeutics. Many plant secondary metabolites are highly desirable but are often in low abundance and too complex for chemical synthesis . Plant metabolic pathways have therefore been successfully re-engineered for bio-production in yeast as industrial fermentation processes are well established [11,12]. Recently, however, large-scale facilities have been constructed to enable manufacturing in vascular plants at industrial scales. For example, Leaf Biopharmaceutical Incorporated produced the ZMapp therapeutic for Ebola in Nicotiana benthamiana, a relative of tobacco indigenous to Australia, at Kentucky Processing, and Medicago Incorporated have constructed a facility on Research Triangle Park, North Carolina where they are developing plant-produced vaccines for influenza and other viruses [2,13].
The use of recombinant DNA technologies in crop plants has been particularly significant. So-called genetically modified (GM) crops are now grown on hundreds of millions of hectares of land with significant economic and social benefits being recorded alongside ongoing opposition from various groups [14–16]. Even non-food crops such as pest-resistant biotech cotton have indirectly contributed to food security by raising household income levels and improving access to more nutritious food .
Many photosynthetic algae provide the combined advantages of unicellularity and conversion of solar to chemical energy. Although some algae are challenging to culture and recalcitrant to current genome engineering techniques, the many unique metabolites and rapid growth rates found among their diversity make them attractive production systems [17–19]. There are already a number of companies exploiting algae for biofuels and other consumer products and research centres such as the Algae R&D Facility at University of California, San Diego provides capacity to grow genetically engineered strains to large, pre-commercial scales .
Despite the relatively low growth requirements and other advantages conferred by plant and algal chassis, technical bottlenecks lessened their attractiveness for industrial biotechnology. These included delivery of DNA through the cell walls, expression of foreign genes from the nuclear and organelle genomes, selection and regeneration of transformed cells and the construction of complex, heterogeneous DNA molecules with features that enable the specialized delivery-methods required for such organisms. All of these hurdles have been overcome to some extent but recent successes in improving the efficiency of DNA assembly has created pressures to further improve the downstream steps [21–23].
The need for reliable DNA assembly
The first reports of the successful introduction of foreign DNA into plant chromosomes were published in the early 1980s but, for the next 30 years, the vast majority of reported experiments typically introduced just a single gene of interest along with a selectable marker gene, largely due to the inefficient and difficult nature of assembling more complex constructs. Although efficient DNA assembly was a technical barrier extending beyond the plant sciences, the relatively large construct sizes required for genetic engineering in plants exacerbated the problem. Regulatory sequences required for the correct expression of each coding sequence in a plant cell are often several kb in length and, therefore, even the assembly of just two or three genes can result in a construct of tens of kb.
The ability to assemble genetic material into single and multigene constructs has been a field of study since the discovery and first use of restriction endonucleases over 40 years ago. The past decade has seen several advancements that have helped shape the field of synthetic biology. Biological standards have defined ways in which DNA parts can be assembled together using the same reagents and protocols, the first such standard being the BioBrick [24,25]. More recently, however, a number of different methods that allow multiple DNA fragments to be assembled in parallel have removed many of the bottlenecks surrounding the assembly of large DNA fragments [26–34]. It therefore follows that the application of biological standards to parallel assembly methods would bring even greater benefits to communities of users.
Robust modular systems
Many plant scientists rapidly adopted technologies based on Type IIS restriction enzymes, which cut outside of their recognition sequence, known widely as Golden Gate cloning [26–28,35–41]. This was both because of the inherent technical benefits such methods confer and because Golden Gate cloning originated in a plant-focused laboratory. Plasmid vectors suitable for the insertion of DNA into plant nuclear genomes using Agrobacterium tumefaciens as a shuttle chassis have therefore been available to the research community for a number of years [26,41]. One of the key benefits of Type IIS-mediated assembly is that it can be modularized to allow the production of standard parts that can be assembled together in the desired order simply by mixing the intact plasmids housing the parts together with an acceptor plasmid and an enzyme cocktail [21,23,41,42]. Requiring only the mixing of liquids, the process is automatable conferring an advantage over overlap-dependent methods that require the amplification of linear fragments with bespoke primers for each assembly prior to the assembly reaction. However, as Type IIS methods spread and gained popularity in the community, the interoperability of basic parts was soon lost as individual laboratories varied and extended the modules outlined in the first publications. By 2014, six years after the original publication, several consortia of plant scientists had adopted shared standards but were unable to interface with those outside of their syndicates and a number of individual labs were using independent systems and could not share parts with other users.
A common genetic syntax
To address the lack of interoperability and the consequential replication of efforts produced by parallel adoption of different standards, we led the establishment of a common genetic syntax for the exchange of DNA parts for Type IIS-mediated assembly . This defined 12 fusion sites at the boundaries of the functional elements that comprise a eukaryotic gene as well as describing the features of the plasmids that house standard parts (Figure 1A). These features are the minimal requirements for interoperability and allow assembly of parts into complete transcriptional units in a single digestion–ligation reaction employing the BsaI restriction endonuclease and a T4 ligase (Figure 1B). Use of parts in the common syntax does not limit users to any one standard or plasmid toolkit for assembly of multigene constructs, however (Figure 2). Parts are compatible with two widely used Type IIS plasmid systems: the Golden Gate Modular Cloning (MoClo) Toolbox  and the GoldenBraid2.0 (GB2.0) system . Alternatively, transcriptional units can be joined using adapters to supply positional information, for example those described in Biopart Assembly Standard for Idempotent Cloning (BASIC) that also utilizes BsaI  or by using an overlap-dependent, ligation-independent assembly method such as Gibson isothermal assembly (Figure 2).
The common syntax for exchange of interoperable DNA parts for plants
Options for the assembly of multigene constructs
For parts to be compatible with MoClo or GB2.0 systems, they must be free of recognition sites for additional Type IIS enzymes. MoClo and GB2.0 have mutually exclusive advantages for multigene assembly: MoClo allows multiple genes to be assembled in a single step but, to do so, requires an extensive suite of acceptors and end-linkers . In contrast, GB2.0 iteratively assembles transcriptional units from a relatively small plasmid toolkit, but can only assemble two-by-two . Both systems can be used for continuous assembly: MoClo uses ‘Level M’ and ‘Level P’ plasmids and switches between BsaI and BpiI whereas GB2.0 uses ‘α’ and ‘Ω’ plasmids and switches between BsaI and BsmBI (Figure 2). The plant common syntax therefore declares BsaI sites to be illegal and BpiI and BsmBI sites to be undesirable .
As well as engendering interoperability in the research community, we also acknowledged that standards for basic, interoperable parts would allow infrastructure projects such as the establishment of a parts repository for plants and accommodation of the syntax in software packages that facilitate construct design. We also introduced a Universal Acceptor Plasmid (AddGene #63674) in the pSB1C3 plasmid backbone, commonly known as the ‘iGEM shipping backbone’. In principle, parts cloned in this backbone would be compatible with processes at the Registry of Standard Biological Parts (http://parts.igem.org) at the International Genetically Engineered Machine (iGEM) foundation, which currently mainly houses BioBricks, although still being compatible with Type IIS assembly.
Applications in collaborative research programmes
The authors of the common syntax for plants came from 27 institutes in eight countries led by researchers from OpenPlant (http://openplant.org), a publically funded synthetic biology research centre developing foundational technologies, interdisciplinary applications and shared resources for plant synthetic biology. The Plant Engine Network, a European Cooperation in Science and Technology (COST) action, brought together a larger network of scientists that was key to starting the conversation on standards for plants. Authors also included researchers from several large-scale, multi-laboratory plant biotechnology projects that have adopted biological standards and modular assembly methods to facilitate combinatorial DNA assembly and allow consortia members to exchange interoperable parts. Two of these projects aim to improve the photosynthetic capacity of leaves in order to increase the supply of carbohydrates to developing seeds, thus raising the barrier on maximum yield. Many cultivated cereals, including rice and wheat, use C3 photosynthesis in which ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) catalyses the fixation of atmospheric CO2. At moderate temperatures, this reaction is efficient. However, particularly in countries within the tropical zone where temperatures are higher and population increases are outstripping gains in productivity, it is desirable to increase yield. At temperatures of 30°C and greater, the partial pressure of O2 competes and RuBisCO also catalyses an oxygenation reaction that results in the loss of 25% of carbon atoms [43–47]. The Realizing Increased Photosynthetic Efficiency (RIPE) project (http://ripe.illinois.edu) aims to increase yield by improving C3 photosynthesis whereas the C4 Rice Project (http://c4rice.irri.org) aims to engineer C3 crops to use C4 photosynthesis, concentrating CO2 around RuBisCO in a cellular compartment relatively free of the competing O2 [48–50]. Another large-scale plant-engineering project aims to improve cereal yields specifically in developing African nations where lack of infrastructure and prohibitive costs make it difficult for farmers to apply fertilizers at the quantities required to sufficiently improve yields. The Engineering Nitrogen Symbiosis for Africa (ENSA–https://www.ensa.ac.uk) is attempting to transfer the symbiosis found between legumes and rhizobia to cereal crops [51,52]. After becoming established in root nodules, rhizobia use a nitrogenase complex to fix elemental nitrogen into ammonia, which it supplies to plants in exchange for organic acids. A large part of the nodulation-signalling pathway is in common with that used for signalling with arbuscular mycorrhizal fungi and already exists in cereals. The ENSA project is therefore aiming to augment this signalling pathway with a mechanism to allow perception of rhizobia in order to establish an association sufficient to provide some fixed nitrogen to the plant.
All of these ambitious projects will eventually require complex, synthetic constructs to be inserted into plant genomes and, potentially, additional modifications of endogenous genes. Current work, however, requires the identification and testing of large numbers of DNA parts and devices, often in different combinations and in experiments performed across collaborating laboratories. This work is being aided by the adoption of engineering principles such as standardization, modularization and interoperability.
Applications in genome engineering
Parts adhering to the plant common syntax have been used to facilitate the construction of plasmid constructs for engineering and mutating endogenous sequences found in plant genomes. Mutations resulting from imperfect repair of double stranded breaks at selected genetic loci induced by RNA-guided Cas9, a protein associated with the clustered regularly-interspaced short palindromic repeats (CRISPR) loci that confers adaptive immunity in bacteria and Archaea, have been reported in a wide range of eukaryotic species [53,54]. Requiring expression of the Cas9 nuclease and a small, easily recoded, guide RNA for each genetic target, this technology is easy to implement. Indeed, the ease at which the system could be used for multiplex genome engineering (the targeting of multiple genomic sites) in a single experiment was recognized from the outset. Several methods to achieve the co-expression of multiple guide RNAs have been demonstrated, including the use of ribozymes and transfer RNAs for polycistronic expression [55,56]. The simplest approach, however, is to include separate transcriptional units for each guide RNA. Such constructs require the assembly of multiple repeated short fragments, problematic for overlap-dependent DNA assembly methods. Type IIS restriction endonucleases have therefore been adopted for the construction of such plasmids [57–60]. Several such experiments have produced standard parts in the plant common syntax in order to utilize existing regulatory parts and multigene assembly systems. There is now a significant toolkit of standard parts that can be applied to multiplex genome editing in a range of diverse plant species [61,62].
Storing and sharing standard parts
The more that any individual standard part is used and the more diverse the systems in which its function is tested, the more valuable it becomes. The ideal, therefore, is to make biological parts, along with data on their functionality, accessible to a community of users who can collect further data. At present the Registry of Standard Biological Parts at the iGEM foundation and Addgene (https://www.addgene.org), a non-profit organization that archives and distributes plasmids from academic laboratories, are the largest repositories of biological parts, both operating in every world region. The iGEM registry mainly houses sequence and characterization data for BioBricks generated in its annual collegiate synthetic biology competition. The Addgene database houses the sequences of plasmids from academic laboratories, linking to the peer-reviewed publications that describe their creation. Addgene does not currently curate compatibility with DNA assembly standards, house performance data or link plasmids to studies or data produced by users other than the initial depositor. For biological standards to fulfil their promise and for their adopters to reap the benefits of interoperable and exchangeable characterized parts, communities of users need access to sequence and functional data for catalogues of parts. There is also a pressing need for standards and infrastructure for recording new data collected by subsequent users.
This work was supported by the U.K. Biotechnological and Biological Sciences Research Council (BBSRC) and Engineering and Physics Research Council (EPSRC) Synthetic Biology Research Centre ‘OpenPlant’ award [grant number BB/L014130/1]; and The Gatsby Foundation.
Synthetic Biology UK 2015: Held at Kingsway Hall Hotel, London, U.K., 1–3 September 2015
Present address: The Genome Analysis Centre, Norwich Science Park, Norwich, NR4 7UH and The John Innes Centre, Norwich Science Park, Norwich, Norfolk, NR4 7UH.