The discipline of molecular biology has become increasingly important in recent times for the process of drug discovery. We describe the impact of molecular biology across the whole process of drug discovery and development, including (i) the identification and validation of new drug targets, (ii) the development of molecular screens to find new candidate drugs, and (iii) the generation of safety data and competences leading to enhanced clinical efficacy. We also speculate on emerging developments in drug discovery where it seems likely that molecular biology will play an even more vital role in the generation of future therapies.
From the mid-1970s when the first human genes were cloned, molecular biology has promised to transform the generation of new medicines. The last 30 years have seen not only the complete integration of this discipline into modern small-molecule drug discovery, but also the rise of a whole range of novel therapeutics.
Recent advances in automation, miniaturization and bioinformatics have moved molecular biology from the analysis of one gene at a time to an understanding of whole genomes and biological systems . Similar advances have transformed the early part of drug discovery into a rapid massively parallel process where millions of potential therapeutics can be evaluated for multiple biological activities in a few weeks . In the last decade, a powerful synergy has developed between the high-throughput methods that typify modern molecular biology and those used in drug discovery, and these will be discussed in this review.
1 is a schematic diagram of modern drug discovery, and identifies points in the process where molecular biology plays a significant role.
The impact of molecular biology on the drug-discovery process
Identification and validation of new drug targets
For most of the 20th Century, drug discovery was based on observation of the pharmacological properties of compounds and did not require knowledge of the molecular identity of the drug target. In contrast, the modern process typically starts by identifying potential targets that participate in the disease process. The completion of the Human Genome Project has vastly increased the number of potential drug targets from the 500 or so that existed in 1996 . However, with 20000–25000 possible candidates, target selection now represents a significant challenge. Microarrays (e.g. Affymetrix GeneChip® microarrays) have, with increasing fidelity, defined the expressed portion of the genome and allowed those genes dysregulated in disease to be identified . In a similar way, proteomic methods can identify abnormally expressed proteins; however, both of these approaches are fraught with the issue that these changes may be a consequence rather than a cause of the disease process. Bioinformatics can help sift the terabytes of data in public and proprietary databases and extrapolate protein function to identify the most plausible candidates . Nevertheless, considerable experimental data may still be required to fully validate candidate targets identified in this way.
More focused methods that analyse smaller numbers of gene products that have also benefited from advances in automation and miniaturization continue to be used to identify potential new drug targets. For example, approaches making use of yeast two-hybrid experiments and MS can be used to explore signalling pathways that are important in disease to identify points that are amenable to therapeutic intervention . The last century of drug discovery has identified many compounds with interesting biological activities, some with drug-like properties, where the molecular target remains unknown. Proteomic methods, such as high-resolution MS, coupled with photoaffinity labelling can allow the identification of not only novel drug targets but also new insights into biological processes . Finally, human genetics has started to make an impact on the identification of the multiple genes that contribute to common diseases . While this method remains lengthy and costly and only a limited number of potential targets have been identified in this way, this method can provide the strongest causal link between a gene and disease.
With the huge increase in potential targets have come increasing demands on establishing that there is a plausible involvement of the potential target in the disease process. Even the very first stages of drug discovery are expensive and no organization can afford to flood its drug-hunting capability with targets that have little association with the disease. The process of target validation has become an established discipline and its objective is to show that the modulation of the function of a given gene product has a potential therapeutic outcome. These experiments may typically be in vitro at an early stage, but move to in vivo studies providing increasing confidence in disease linkage as the project becomes more advanced. Modulation of the gene in vitro can be achieved in a number of ways mimicking the effect of a drug using, for example, antisense or siRNA (short interfering RNA) to achieve knockdown of the RNA or antibodies or aptamers to reduce the function of the protein . Some of these methods can also be used in vivo and a significant improvement in our understanding of gene function has come from mouse transgenics (gene addition) and knockouts (gene deletion) .
Molecular screens to find new candidate drugs
Target-driven drug-discovery programmes have changed the way in which we perform large-scale screening for active compounds [1,2]. Building assays around known targets enables us to generate target material relatively quickly from cheap, flexible, scaleable and reliable cell expression systems. This avoids a dependence on more expensive, more variable and technically limiting material derived from native cells and tissues. There are two requirements for heterologous drug target expression: (i) use of cell types with low or no expression of the drug target and functionally related proteins, and (ii) use of systems delivering high-level expression of the specific target. Often the default approach is to efficiently transfect canonical mammalian cell lines [e.g. HEK-293 (human embryonic kidney), CHO (Chinese-hamster ovary) or HeLa] with mammalian expression vectors driving high-level target expression from viral promoters [e.g. CMV (cytomegalovirus) promoter or SV40 (simian virus 40) promoter]. Using mammalian cell hosts offers the advantage that they possess native systems that may be required for function (e.g. protein folding and processing or pathway coupling). An alternative approach is to use non-mammalian high-expression systems, e.g. baculovirus-transduced insect cells, Escherichia coli and yeast. These systems offer a different compromise: potentially much stronger and cleaner target expression in a biologically less relevant background.
To reduce the risk of undesirable compound effects, it is essential for a drug-discovery project to identify and then develop compound selectivity for the desired target over other proteins sharing significant sequence identity (e.g. homologues and paralogues). Ideally, this is done while maintaining potency against relevant species equivalents (orthologues) and target variant forms such as splice variants and common polymorphic forms. Advances in bioinformatics, systems biology and data accessibility have enabled us to rapidly identify these target related sequences, facilitating the development of parallel screening tools and assays [5,11]. Crucially, the genomes have been solved for human and other mammalian species used in safety and efficacy models (e.g. rodent, dog and primate).
Molecular biology has long explored the link between form and function through mutagenesis, and this approach has been used in drug discovery to interrogate the structure–activity relationship between target and compound. Target sequences can be altered in a directed way, and the interaction of the resulting mutant proteins with compounds can be assessed in assays or via three-dimensional structure determination . This information helps to optimize drug design and has been used to support alternative approaches to high-throughput compound screening, such as compound class subset screening and ‘in silico’ compound screening, where active compounds have been successfully identified without actually running a physical screen .
Safety and clinical efficacy
Compound safety assessment has benefited from developments in transcriptomics following the emergence of high-throughput microarray technologies and real-time quantitative PCR (e.g. Applied Biosystems TaqMan® Gene Expression Assays). These and related technologies have been used to examine the impact of reference compounds on local or global gene transcription in biopsies taken from safety models or clinical samples. The cellular effects of candidate drugs can then be profiled at the transcription level, and potential safety issues can be identified . In an analogous fashion, disease-relevant transcriptional and proteomic ‘biomarkers’ can be identified that are known to be associated with either the disease or successful treatment. Biomarker tools can be excellent indicators of drug efficacy and are especially useful in human proof-of-principle studies as a precursor to very expensive, complex and long clinical trials .
The genetic modification of animal models is becoming an increasingly common approach to study drug efficacy and safety both in vivo and ex vivo . The generation of transgenic and knockout mice has benefited from the completion of the human and mouse genome-sequencing efforts. Although often highly informative, this approach is not without caveats, as embryonic lethality, developmental compensation from related genes and the pleiotropic effects of the mutation may complicate analysis. However, the recent development of conditional, temporal and tissue-specific target-gene-expression technology allows modulation in adult animals and should ultimately lead to better quality target validation, compound safety and efficacy data to be generated to support the development of drugs .
Molecular biology has also been used to develop a screening strategy for project compounds that act as substrates for drug-metabolizing enzymes, cytochromes P450 , commonly known as CYPs. The cloning and expression of the CYPs known to metabolize drugs has facilitated the systematic screening of project compounds for CYP activity in metabolic stability assays. More recently, it has become obvious that a similar screening competence needs to be established for the many drug uptake and efflux transporters  that are known to have a major impact on drug absorption, distribution, metabolism, excretion and toxicity (ADMET). These efforts are driven by an acceptance that CYPs and drug transporters can cause variable drug disposition and clinical response which can lead to uncertainty and compound attrition . Furthermore, in the clinic, both CYPs and drug transporters can lead to undesirable combinatorial effects, or ‘drug–drug interactions’ when different drugs are co-administered in patients. Other factors have emerged that add even further complexity to ADMET studies including: (i) the discovery of nuclear receptors that can be induced by many drugs and that lead to the altered regulation of many CYPs and drug transporters , (ii) the identification of functionally modifying SNPs (single nucleotide polymorphisms) in all these molecular classes, and (iii) the potential for complex pharmacogenomics relating to the inheritance of these various SNPs . Although currently presenting a significant challenge, the discovery of these sources of variability should actually be seen as a boon. A better understanding of the relative impacts of CYPs, transporters, nuclear receptors and SNPs on the ADMET properties of a drug may well reduce some of the current uncertainty, leading ultimately to reductions in development time and the attrition in clinical development.
Perhaps the most profound impact of molecular biology on the discovery of new drugs is that it facilitates a target-driven approach, with many advantages over the historical approach of identifying active compounds in a therapeutically relevant biological assay, but with no real knowledge of its target or mechanism. The fundamental questions that we ask of any compound during its discovery and development are ‘does it work?’ and ‘is it safe?’. Building a drug-discovery programme around a known target enables us with greater precision and accuracy to identify new drugs that prove to be highly potent and efficacious in humans while being reassured of a low risk of toxicity.
A truism of the drug-discovery process is that it is constantly changing, responding to advances in our disease knowledge and technology. The growth of biotechnology as a force is an example, and the successful development of recombinant and antibody therapeutics for the treatment of diseases such as arthritis and cancer may herald a boom in so-called biologicals . Furthermore the prospects for successfully delivering personalized medicine (drugs matched to specific patient groups) looks bright following the successful launch and take up of the first pharmacogenomic drugs  Developing drugs around the premise of personalized medicine would require yet further molecular biology input, especially in the areas of human genetics and bioinformatics .
Over recent years, the discipline of molecular biology has played an increasingly important and sophisticated role in the process of drug discovery, and it seems likely to play an even more vital role in the generation of future therapies.
Recombinant DNA Technology for the 21st Century: Focused Meeting held at AstraZeneca, Loughborough, U.K., 21–22 November 2005. Organized by M. Dyson (Wellcome Trust Sanger Institute), J. Sayers (Sheffield, U.K.) and A. Wallace (AstraZeneca, U.K.). Edited by J. Sayers.