A synthetic biology workflow is composed of data repositories that provide information about genetic parts, sequence-level design tools to compose these parts into circuits, visualization tools to depict these designs, genetic design tools to select parts to create systems, and modeling and simulation tools to evaluate alternative design choices. Data standards enable the ready exchange of information within such a workflow, allowing repositories and tools to be connected from a diversity of sources. The present paper describes one such workflow that utilizes, among others, the Synthetic Biology Open Language (SBOL) to describe genetic designs, the Systems Biology Markup Language to model these designs, and SBOL Visual to visualize these designs. We describe how a standard-enabled workflow can be used to produce types of design information, including multiple repositories and software tools exchanging information using a variety of data standards. Recently, the ACS Synthetic Biology journal has recommended the use of SBOL in their publications.
Reproducibility is a critical and growing issue in synthetic biology . Substantial effort is often required to design a new biological system, with input from many researchers with different backgrounds, including biology, mathematics, computer science, physics, and chemistry. Extracting information in order to reuse or build upon the contributions made by these researchers, however, is often extremely challenging. At present, information about genetic circuit design is often incomplete or buried in textual descriptions. Even scientific publications often fail to fully convey this information: designs are often available only as visual depictions that provide abstract representations or as unannotated sequences, and frequently some of the genes or gene products are not even captured, making it nearly impossible to reuse these designs. Capturing DNA sequences is key as a first step, but this information may also not be available, may require lengthy and error-prone manual lookups based on gene identifiers, or may only be derivable by search and extraction of the partial sequences given in forward and reverse primers. Even then, deriving exact sequences of designs may be impossible when full information about the final design, such as its exact assembly process, cloning strategy, or the spacer sequences between constituent genes and their components, is not clearly specified.
Further complicating matters, experimental measurements may vary between different laboratories due to the differences in sequences, chassis organisms, or the lack of information about experimental conditions. Even single nucleotide differences between sequences in the design itself or the host chassis can significantly change the functionality of genetic circuits. Notably, modifications in non-coding sequences can strongly affect the rates of transcription and translation processes, resulting in unexpected behaviors [2–5]. As the scale and the complexity of designs increase, these problems bring more challenges.
As synthetic biology continues to develop as an engineering discipline, practitioners are grappling with these problems and adopting the same sort of strategies that enable management of complexity in every mature engineering discipline, such as standardization, abstraction, modularity, and automation. Applications are created through design-build-test cycles and automation is key to achieving faster cycles for commercialization. There are already a wide variety of available computational tools that can be used in different stages of design, manufacturing, testing, and analysis. Often, tools are specialized in performing specific functions, and synthetic biology engineers need to flexibly co-ordinate the operation of these tools in complex design workflows. As a result, computational access to and exchange of information without any loss is crucial. Finally, the use of standards to capture design information also enhances reproducibility and reusability , effectively allowing the products of one workflow to be consumed by other workflows. Computational access particularly facilitates the storage and retrieval of these designs, making them ever more accessible. Practitioners can therefore more easily find designs that are created by other practitioners in a timely manner, make modifications or reuse them, and electronically share their new designs and data.
Synthetic Biology Open Language
The Synthetic Biology Open Language (SBOL) is one of the key technologies that can support the emerging standards-driven approach to synthetic biology engineering workflows. SBOL is a free and open community standard for the description and exchange of biological designs, supported by a diverse international community of researchers. This standard provides a ‘common core’ set of relatively abstract representations of biological structure, function, and sequence, with a focus on abstraction and composition, and is broadly applicable across a wide range of workflow elements. Critically, SBOL also supports machine-interpretable links between this shared core and more specialized representations, such as numerical models, protocol automation scripts, LIMS tracking, and measurement data, allowing SBOL to serve as a ‘hub’ for linking together a wide range of more specialized tools and processes without loss of information as shown in Figure 1.
Central role of SBOL in the synthetic biology design, build, test cycle.
The development of SBOL was motivated by the shortcomings of prior standards, such as FASTA  and GenBank , with respect to describing the engineering of biological systems. These prior standards focus on the recording and annotation of natural nucleic acid or protein sequence data, which have different challenges and requirements from those of the engineering of novel human-designed biological constructs. For example, the description of engineered systems requires the representation of the abstraction and composition of (at least partially) modular components. To serve these needs, in 2008, the SBOL community developed first an initial draft standard called PoBol , which evolved into first the SBOL 1 standard [10,11], focusing on the genetic structure of engineered DNA sequences. SBOL 1 recently evolved into the SBOL 2 standard [12,13], which represents both the structure and function of genetic designs as depicted in Figure 2.
SBOL extends beyond prior sequence-centric formats like FASTA and GenBank to enable modular, hierarchical representations of both structure and function of a genetic design.
Complementary to this data model, the SBOL visual standard provides a common visual language for communication about engineering biological constructs, much as diagram languages for electrical engineering [14,15] and architecture [16,17] do in those fields. SBOL Visual (SBOLv) [18,19] enables diagrams for SBOL 1 constructs and is in the process of being extended and integrated with Systems Biology Graphical Notation (SBGN)  to support the functional representations of SBOL 2 as well. SBOL visual is formally related to the SBOL data representation by means of the Sequence Ontology (SO) , which is used by the SBOL data model to designate the roles of components as shown in Figure 3. Namely, each glyph in SBOL visual is mapped to one or more ontology terms, enabling automatic computational mapping from SBOL data models to diagrams, by selecting for each component the most specific glyph whose term covers the component's role or roles and by organizing these glyphs according to the sequence and order relationships specified in the data model.
Link between SBOLv and SBOL data.
Supporting reproduction and reuse with SBOL
To support effective reproduction and reuse, practitioners must not only have the capability to represent information about engineered biological organisms, but must also use those capabilities to encode enough information of the right types to enable others to reproduce or build upon their results. In mature engineering fields, this typically takes the form of formalized datasheets, such as the component datasheets used in electronics or CAD components used in mechanical engineering. Although biological organism engineering aspires to this level of rigor (e.g. ), in practice the field has not yet attained that level of maturity . In other areas of biology, the challenges of reproduction and reuse are addressed with a variety of minimum information standards , which aim to at least ensure that enough information on protocol and context is included that a practitioner can determine whether an attempt to reproduce or reuse works as expected. For example, Minimum Information About a Microarray Experiment establishes minimum information standards for reporting on microarray experiments , and MIFlowCyt establishes minimum information standards for reporting on flow cytometry experiments . By making it easier to compare the products of different efforts, such minimum information standards have significantly improved data quality and accelerated discovery in the areas in which they have been established.
Similarly, reproduction and reuse of genetic constructs should be able to be accelerated by establishing a reporting standard for the minimum information about a genetic construct. Such minimum information about a genetic construct or collection of constructs needs to include at least the following:
The full sequence of all of the ‘base’ components used in a genetic construct. For example, a library made by combining pairs of promoters and coding sequences would need to include the full sequence of every promoter and every coding sequence.
Information sufficient to unambiguously determine the sequence of every complete construct. For example, the promoter/coding-sequence library would record all combinations made, but not necessarily the sequence of each combination, if that can be determined from the combination and the sequences of the individual components.
Identification of the role played by each significant designed feature. For example, explicitly recording that each promoter is, in fact, a promoter.
Identification of identities between construct components, such as by the composition of subcomponents. For example, it should be easy to tell if two promoter/coding-sequence constructs share the same promoter.
The assembly method used, if any, for composing smaller components into larger components, and any effects this is expected to have on the resulting sequence.
Any required additional modifications of the base sequence, such as methylation.
The vector or integration point used for transformation of the host organism. For example, a plasmid used to deliver a construct to bacteria, or the location targeted for CRISPR-based integration into a chromosome.
An unambiguous identification of the host organism for the construct, sufficient for determining genome and other relevant features.
The core representations of SBOL readily support most of this information, while the remainder can be linked to SBOL representations via the annotation mechanisms provided by SBOL, and an effort is ongoing within the SBOL community to formalize these recommendations.
Already, journals have shown interest in using SBOL to improve the ability of readers to reproduce and reuse elements of the papers they publish. In 2016, ACS Synthetic Biology became the first journal to formally embrace SBOL as a means of enhancing reproduction and reuse of synthetic biology research , with a workflow including validation and review of submitted designs and their deposit into a design repository linked with the paper and with interfaces for access by both humans and genetic design automation tooling as shown in Figure 4. As minimum information standards are established and adopted, they can integrate with such workflows in order to improve the ability of the research community to reproduce and to build upon one another's results. In parallel, we may expect such standards to provide a basis for the development of a wide variety of new capabilities, services, and business models in the industrial community, much as shared standards have already done in other communities, such as software, electronics, and mechanical systems.
ACS Synthetic Biology workflow for integration of published articles with machine-readable SBOL representations of the biological constructs described by those articles.
Software support for SBOL
Leveraging these libraries, many software applications that support the SBOL standard have been developed, as illustrated in Table 1. These tools can be loosely divided into data repositories for storing genetic design information, sequence editors, visualization tools, genetic design compilers, and modeling and simulation tools. Many of these applications actually cover more than one of these functions. While most of these tools support either SBOL 1 or SBOLv, an increasing number of tools supporting SBOL 2 are being released. The rest of this section provides a brief description of some of these software tools. More detailed descriptions can be found in Supplementary Material.
An up-to-date list is maintained in http://sbolstandard.org. The function column indicates if the tool is a (R)epository, (S)equence design tool, (G)enetic circuit design tool, (M)odeling and simulation tool, or a (V)isualization tool. The SBOL column indicates if it supports SBOL(1), (2), or (v)isual.
Several data repositories have been developed that can store genetic design information using the SBOL data standard. ICE  is an open-source software tool that provides a web-based platform to register and manage DNA parts, and an instance of this platform is used as the ACS Synthetic Biology Registry . SynBioHub is an open-source repository built upon the SBOL Stack  RDF database back-end, and it provides both a user-friendly web-based front-end and programmatic access via either libSBOLj or a RESTful API. SBOLme is a web-based open-access repository that has recently been developed to promote the use of the SBOL for metabolic engineering applications . The first release of SBOLme contains annotated SBOL parts of 28 437 chemical compounds, 6883 enzyme classes, 9909 metabolic reactions, and 3 173 238 proteins from 3908 different organisms. Finally, the Virtual Parts Repository supports CAD tools by providing readily accessible modular and reusable models of biological components that can be individually joined together for simulation .
Sequence editors are software tools for the design of DNA, RNA, and/or protein sequences. The task of designing sequences incorporates the manipulation, composition, and annotation of sequences. There are many tools developed or being developed with these functions; we highlight here a few with the best SBOL support, while more are described in Supplementary Material. Eugene enables the specification of rules in order to automatically enumerate composited designs based on biological knowledge . The Joint BioEnergy Institute (JBEI) develops DeviceEditor  to visually design combinatorial DNA constructs based on part types (e.g. promoter, CDS, and terminator), VectorEditor for a graphical preview of the design, and j5 for DNA assembly design automation. SBOLDesigner is a modular sequence design tool that combines the SBOL 2 data model with SBOLv symbols to construct genetic designs hierarchically using parts fetched from SBOL data repositories . The Build-Optimization Software Tools (BOOST)  enable the design of DNA sequences in order to maximize the success rate of their synthesis via codon optimization, verification of sequence constraints, and decomposition into synthesizable blocks.
SBOLv defines a set of agreed symbols to denote commonly used genetic elements and best practices for how biological designs should be visualized. Many point-and-click genetic design tools have adopted these symbols (see Table 1), and several dedicated pieces of software are now available to simplify the process of generating compliant diagrams. One of the first tools to help automate the production of standardized SBOLv diagrams was Pigeon , which converts a textual input description of a genetic construct into a diagram where each part is represented by its associated SBOLv symbol. Highly customized SBOLv diagrams can be created by using the DNAplotlib computational toolkit . VisBOL is a web-based tool that in addition to supporting the Pigeon syntax can also convert directly from an SBOL 2 document into SBOLv symbols . Finally, SBOL visual symbols have been adopted into the widely used general graph visualization toolkit, Graphviz.
Genetic circuit design involves constructing biological systems that implement logical functions similar to those found in electronic circuits. Circuit designers usually build circuits by connecting parts or modules found in a library to form larger and more complex constructs. Many tools have been developed that attempt to assist engineers in genetic circuit design. Proto BioCompiler takes in specifications of computations, transforms them into a data-flow representation of the computation to be carried out by the biological organism, then selects parts from a genetic library, and finally optimizes the circuit design . iBioSim adapts a graph-based technology mapping procedure from digital electronic circuit design to map a specified genetic regulatory model into a network of genetic gates specified using SBOL . Finally, Cello provides a platform where users can describe the desired function of their genetic circuit using Verilog, a hardware description language commonly used to specify electronic circuits, and then translate it into a directed acyclic graph of connected 2-input NOR and NOT gates implementing the logic .
Finally, SBOL allows for the association of genetic circuit designs with computational models. The most commonly used data standard for models of biological systems is the Systems Biology Markup Language (SBML) . SBML models can be analyzed using a large selection of different analysis methods including deterministic and stochastic simulation , flux balance analysis , and stochastic model checking . To facilitate the construction of SBML models, a converter from SBOL into SBML has been developed . It is also possible to begin with an SBML model annotated with SBOL  and produce an SBOL description for the genetic design . Given an SBML model for a genetic design, it is then possible to analyze this model using a variety of SBML modeling tools including those optimized for genetic circuit design, such as iBioSim [50,51], Tellurium , and TinkerCell .
A standard-enabled workflow for synthetic biology
A key design principle in the development of SBOL is that it would not attempt to cover all aspects of genetic design, but rather it would leverage existing standards whenever possible. A key example of this is the use of SBML for modeling. To pursue this goal, SBOL recently joined the COMBINE (COmputational Modeling in BIology NEtwork) community of standards . COMBINE is an open community initiative to co-ordinate the development of standards and formats for systems and synthetic biology. Figure 5 depicts a complete synthetic biology computational design workflow that leverages COMBINE standards. This workflow assumes that data required for design must come from a variety of data repositories. While some are SBOL repositories, others store their information in other formats such as GenBank or BioPAX , another COMBINE standard. Converters can be utilized to translate this knowledge into SBOL to be utilized during sequence design using any of the sequence editors and visualization tools described earlier. Next, genetic modeling, analysis, and design tools can be utilized to construct and evaluate complete genetic designs. These models would be constructed using a COMBINE modeling language such as SBML or CellML , and their analyses should be encoded using the Simulation Experiment Description Markup Language (SED-ML) . Next, SBOLv only represents DNA constructs, so a visualization standard such as SBGN could be leveraged to represent the biochemical aspects of the design. Finally, each of these files can be packaged together, shared, and distributed using a COMBINE Archive . Throughout, the data conversions required by this standard-enabled workflow are enabled by the use of common ontologies, such as the BioPAX Ontology , the SO , and the Systems Biology Ontology (SBO)  with URIs taken from identifiers.org , whenever possible.
A standard-enabled workflow for synthetic biology using COMBINE standards.
Standards are an important enabler for data sharing and reproducibility in synthetic biology. Collaborations within the COMBINE community are essential to create new workflows enabled by these standards. The ultimate goal of these collaborations is a complete standard-enabled workflow for synthetic biology. For more information about SBOL, please see our website: http://www.sbolstandard.org/, and YouTube channel that includes several demonstrations of the standard-enabled workflow that we are developing.
application programming interface
Build-Optimization Software Tools
COmputational Modeling in BIology NEtwork
clustered regularly interspaced short palindromic repeats
Joint BioEnergy Institute
Laboratory Information Management System
resource description framework
Systems Biology Graphical Notation
Systems Biology Markup Language
Synthetic Biology Open Language
Simulation Experiment Description Markup Language
This material is based on work supported by the National Science Foundation under grant nos CCF-1218095 and DBI-135604. T.E.G. is supported by BrisSynBio, a Biotechnology and Biological Sciences Research Council and Engineering and Physical Sciences Research Council Synthetic Biology Research Centre [BB/L01386X/1]. G.M. and A.W. have been supported by the Engineering and Physical Sciences Research Council (EPSRC) [grant EP/J02175X/1]. J.A.M. is supported by FUJIFILM DioSynth Biotechnologies. J.B. is supported, in part, by the National Science Foundation Expeditions in Computing Program Award #1522074 as part of the Living Computing Project. E.O. is supported under Contract No. DE-AC02-05CH11231 by the U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility. This document does not contain technology or technical data controlled under either the U.S. International Traffic in Arms Regulations or the U.S. Export Administration Regulations.
The Authors declare that there are no competing interests associated with the manuscript.