Defects at the level of the pre-mRNA splicing process represent a major cause of human disease. Approximately 15–50% of all human disease mutations have been shown to alter functioning of basic and auxiliary splicing elements. These elements are required to ensure proper processing of pre-mRNA splicing molecules, with their disruption leading to misprocessing of the pre-mRNA molecule and disease. The splicing process is a complex process, with much still to be uncovered before we are able to accurately predict whether a reported genomic sequence variant (GV) represents a splicing-associated disease mutation or a harmless polymorphism. Furthermore, even when a mutation is correctly identified as affecting the splicing process, there still remains the difficulty of providing an exact evaluation of the potential impact on disease onset, severity and duration. In this review, we provide a brief overview of splicing diagnostic methodologies, from in silico bioinformatics approaches to wet lab in vitro and in vivo systems to evaluate splicing efficiencies. In particular, we provide an overview of how the latest developments in high-throughput sequencing can be applied to the clinic, and are already changing clinical approaches.
Alterations at the pre-mRNA splicing level play a major role in the development of many human genetic diseases, both monogenic but also complex diseases such as cancer and neurodegeneration [1–8]. In general, splicing mutations associated with human disease principally affect the work of the classic splicing pathway, normally referred to as the major spliceosome. This phenomenon has also been described in the minor spliceosome, which accounts for less than 0.5% of all introns in humans [9,10].
These observations highlight the pivotal role played by many splicing events to ensure correct development and maintain correct protein expression levels in a living organism. The emerging picture from studies conducted so far, is that depending on the complexity of the affected genes (size and number of introns compared with exons), mutations that lead to splicing dysfunction and hence genomic defects may represent up to 50% of all described variations [11,12].
The connection between splicing research and clinical practice initially focused on identifying splicing mutations from a background of sequence variants of unknown significance (VUS) [13,14]. Since then, and with the advent of new sequencing techniques, the list of splicing mutations described in the literature has grown enormously, and is expected to increase even further as our RNA-sequencing (RNA-seq) abilities improve [15–18]. In clinical practice, accurate identification of pathogenic mutations is fundamental, particularly with regards to diagnostic, prognostic, therapeutic and ethical issues . In addition, knowledge of how sequence variants affect RNA processing will mean that more mutations will be discovered making an impact on the ‘missing mutations’ and uplifting the diagnostic data return from genomics.
Before discussing these issues, it is important to explain why splicing processes are difficult to predict. It is well documented that normal splicing pathways are extremely complex, and often our knowledge of the splicing process is not enough to predict a sequence variant effect on the basis of the primary gene sequence alone [20–22]. This has generated huge research effort in this field, working towards a detailed understanding of the dynamics of pre-mRNA processing .
The spliceosome is the cellular machinery responsible for pre-mRNA splicing. It is a highly dynamic, complex and one of the most complicated macromolecular machines present in cells. Its fundamental role is that of “splicing” the nascent RNA so that the sequences that will eventually be part of the mature mRNA (“exons”) are joined together, whereas the intervening sequences (“introns”) are discarded. Initially, it was believed that introns were simply degraded by the cell as a price to pay for higher eukaryotes in order to afford the splicing process. Now, we know that introns play several regulatory roles such as acting as precursors for several non-coding RNA (ncRNA) molecules (miRNAs, lncRNAs or small nucleolar RNAs) and regulate translation or virus latency .
Although the existence of the spliceosome has been known for several decades, it is only recently that high-resolution structural models of whole working spliceosomes have become available [25–29]. From these studies, it is clear how the highly dynamic and modular structure of the spliceosome has reconciled the three major requirements of its functioning: to be able to recognize a huge variety of splicing signals, to be differentially regulated depending on cellular and developmental contexts and yet maintain the high precision required to generate functional mRNAs .
This lack of structural insight, however, did not hamper research into how the spliceosome works within cells. In fact, several genetic and biochemical approaches have provided considerable insight into spliceosome function. For example, it is clear that splicing fidelity by the spliceosome is achieved in several ways. At the protein level, the action of DexD/Hbox ATPases plays a major role in discarding sub-optimal substrates , whereas at the substrate level, several factors present on pre-mRNA sequences help the spliceosome to correctly and specifically distinguish the exons that need to be included in the mature mRNA from the introns to be discarded .
The first and possibly most important feature of an exon is the inherent strength of the basic splicing signals [33–36]. Good splicing signals are often essential, as they need to compete with numerous cryptic splice sites that we know are present in almost every pre-mRNA molecule [37,38]. However, even with well-defined basic splice sites, their effective use can often be modified by the presence of enhancer and silencer sequences localized within the exon or nearby intronic sequence [20,39–43]. The importance of these sequences is based on their ability to bind a huge number of RNA-binding proteins (RBPs) that can either help or hinder the binding of the basic splicing factors to the splice site. These RBPs mostly belong to the hnRNP and SR family of nuclear factors [44,45] and use similar mechanisms to influence splice site selection either in a positive or in a negative manner .
From a functional point of view, the presence of enhancers and silencers can explain much of the alternative splicing processes that are a common occurrence within eukaryotic genes [34,47–49]. When the expression of these factors is turned on (or off), their presence or absence can be used to modulate exon inclusion or skipping in a cell-specific, developmental or time-specific manner [50–54].
Unfortunately, even with this detailed knowledge of these basic and auxiliary sequence elements, in many instances, we are still unable to provide an answer as to whether a given exon will be included or not in a particular biological context. In fact, other, several hard-to-define factors have the ability to considerably modify the final outcome and include the following:
The local complexity of genomic architecture (i.e. presence of exon/intron clusters etc.) 
ncRNA-binding sites 
Activation of extracellular signalling pathways 
Presence of xenobiotics in the surrounding environment 
In conclusion, even for accomplished researchers in the RNA splicing field it is usually hard to predict how the combinatorial effect of all these factors will determine whether a particular exon will be recognized by the splicing machinery and to what extent [68–70].
Because of this, the use of splicing information for diagnostic, prognostic or therapeutic purposes has been hard. Nonetheless, recent progress has been made in bringing the basic scientific knowledge of splicing mechanisms to the clinical setting. In this review, we provide an overview of traditional methodologies used to achieve this aim and how these efforts have shifted and are likely to change further in the near future due to the introduction of sophisticated high-throughput approaches. Most importantly, we will discuss how classical methodologies and new high-throughput approaches can be combined for maximum benefit and bring splicing research closer to the clinic (Figure 1).
Comparison of classical and high-throughput methodologies in splicing research
Most human genes are subject to alternative splicing. This process through selective inclusion/skipping of exonic sequence can give rise to multiple mRNA and protein variants from a single gene , giving eukaryotic cells the advantage of proteome expansion . Importantly, alternative splicing events are highly coordinated within a cell and numerous changes in independent genes become part of splicing programs that regulate some of the most important and basic proteins in cells, ranging from proliferation to migration and death [73,74]. The consequences of this highly complex situation is that alterations in the alternative splicing processes of one gene may not simply result in the loss of a particular protein isoform, but are likely to alter the function of a much wider network of interactions required to maintain correct protein homoeostasis.
Splicing analysis in clinical diagnostics
The investigation of splicing changes in disease has been shaped by the development of analytical techniques. Initially, this was through the use of simple RT-PCR amplification techniques that amplify single exon inclusion in specific genes. This is still very much in use today with the development of quantitative approaches (RT-qPCR) and single-cell analysis (SCA) bringing a new level of precision to the field. In parallel, the study of pathological splicing events through the use of minigenes or in vitro splicing systems has helped enormously to understand the rules that govern spliceosomal function in normal and pathological conditions. These techniques have helped to solve some common practical problems such as the impossibility of always obtaining primary RNA samples from patients. However, these do not provide a measure of global splicing changes. This has been addressed in the last two decades by the development of microarray technologies and by improvements in RNA-seq technology. Finally, a common need for all approaches is the use of substantial bioinformatics data analyses.
RT-PCR methods to analyse directly splicing defects
Ever since the first human genome was sequenced , it has been easy to design appropriate primers and set up routine RT-PCR for potential mutations. These primers can be used to amplify total RNA from patients and controls and see whether splicing alterations are present in the presence of exonic or intronic mutations.
Whereas this is still useful in some cases today, in others patient RNA may not be available. In addition, available tissue from patients is often limited to blood leucocytes from which the RNA is prepared, consequently RNA splicing of tissue-specific genes (for example, brain or heart) cannot be visualised. Furthermore, aberrant transcripts in living cells are often degraded by nonsense-mediated decay (NMD). NMD is a regulatory mRNA surveillance mechanism that controls the post-transcriptional quality of the mRNA transcribed by the eukaryotic cell and selectively degrades transcripts that contain the introduction of premature termination codons (PTCs) [76,77]. Degradation by NMD can be blocked in stable cell lines from patient lymphoblasts following the addition of antibiotics such as anisomycin/gentamicin. However, even when these cells become available, this course of action represents a considerable investment of time and resources and is not used in routine analysis. Finally, direct detection of aberrant RNA transcripts involves careful analysis of the patient's allele composition in order to unambiguously assign the aberrant products to a specific mutation (for example, ruling out the presence of different splicing mutations on different alleles).
Real-time quantitative PCR and single-cell analysis
Classical RT-PCR approaches suffer from the limitation of correctly quantifying aberrant splicing levels, especially when having to amplify RNA isoforms of widely different lengths and compositions. This means that often the information provided by the traditional approach is qualitative rather than quantitative. Strategies that involve the co-amplification of known fragments have therefore been used to improve quantitative analysis using traditional RT-PCR approaches .
This problem was solved following the development of quantitative PCR methods (RT-qPCR). This methodology enables a precise approach to direct detection by providing the ability to measure absolute quantities of transcripts from patient and control cells. The RT-qPCR approaches usually use either SYBR or gene specific probes . The limitations of this system include RNA quality, experimental design and validation strategies . For this reason, good-practice guidelines for the use of this technique have been prepared for researchers and clinicians .
Quantitative PCR also allows analysis of splicing isoform production at the single-cell level . This is important, as in even apparently homogeneous cell populations there may be substantial differences from cell to cell. Indeed, a study of alternative splicing regulation at single-cell level has recently revealed that precise control of this process, is principally due to conserved sequences in the flanking intronic regions of exons .
In parallel with these direct approaches, researchers in the splicing field have also developed methodologies that look at the splicing process in an experimentally controlled environment. Two methods are widely used for this: in vitro splicing methods and cell-based minigene systems.
In vitro splicing systems
In in vitro assays, radioactively labelled pre-formed RNA molecules transcribed with bacterial T7/SP6 polymerases are incubated in the presence of nuclear extracts from particular tissues/cell types . The resulting spliced products can then be resolved on polyacrylamide denaturing gel and easily sequenced to determine their identity. The biggest advantage of in vitro splicing is that kinetics and intermediates of this process, such as lariat formation, can be easily evaluated. From a biochemical point of view, in vitro splicing can be performed in a highly controlled manner. This provides considerable flexibility in testing the effect of different reaction conditions such as ionic concentration, changes in pH and temperature. For these reasons, in vitro splicing systems have played a fundamental role in the basic mechanistic aspects of the splicing reaction, as reviewed elsewhere . Moreover, in vitro splicing systems have been successfully used as an ideal substrate for the SELEX type of analysis to find the best binding consensus motifs of RNA-binding proteins  or for testing of inhibitory compounds for potential therapeutic purposes . Finally, it should be noted that in vitro synthesized splicing substrates have been adapted to visualize the splicing of single pre-mRNA molecules in total cell populations . This has allowed researchers to study the differences that exist within apparently homogeneous cell populations.
Unfortunately, in vitro splicing systems have never been used extensively in the clinical or diagnostic field, except for the earliest characterizations of splicing defects [89,90]. There are many reasons for this: first, the necessity to transcribe pre-mRNA in vitro imposes an upper limit on the sequence length that can be analysed (approximately 1000–2000 nucleotides). This makes the system unable to contain long or complex genomic regions. In addition, because in vitro substrates are synthesized prior to the splicing reaction, they do not allow observation of the potential effects of factors such as RNA polymerase II processivity, although some particular experimental settings have successfully managed to combine these features (for example, transcription and splicing), thus mimicking in vivo conditions [91–93].
Minigene splicing systems
An important methodological development in the study of alternative splicing is minigene-based technologies. This approach was initially described more than 30 years ago  and permits a relatively fast approach for mapping splicing regulatory regions within exons and nearby flanking regions [95,96].
A minigene is a plasmid-based methodology that produces a relatively small pre-mRNA molecule that can be processed by the spliceosome following its transfection into a cell line of interest. The exon or exons to be studied can be amplified from wild-type and affected individuals and cloned into the plasmid with its short intronic flanking regions. Once splicing has occurred, the resulting mRNAs can be amplified by RT-PCR with primers specifically designed to amplify only the minigene-processed transcripts. Whereas in many cases, the genomic sequence inserted will contain just the exon that is suspected to be affected by the splicing mutation and a portion of its flanking intronic regions, every minigene has to be specifically made on a case-by-case basis and can include more relevant sequence. In order to observe in vivo mutational effects such as intron retention or multiple exon skipping events, it is important to keep the genomic context as close as possible to the original gene . However, the construction of such minigenes is often time consuming. Therefore, in order to make this process easier, several researchers have prepared easy to assemble minigenes such as pSpliceExpress that use recombination-based methods to insert any sequence of interest .
Minigenes systems are very versatile and can be adapted to a great variety of experimental systems. For example, reporter systems are now available that can be used to specifically examine the effect of cis-acting elements present within exons or upstream/downstream introns on splicing efficiency , thus mimicking one of the original strongpoints of in vitro splicing systems. In addition, fluorescence-based minigene systems have been adapted for many purposes such as in vivo evaluation of alternative splicing events in transgenic mice or culture cells [99–102] or for accurate quantification of alternative splicing ratios .
The minigene system, however, suffers from the drawback that as the exon is present in a heterologous context it may not necessarily recapitulate the exact splicing profile of the endogenous gene. Although comparative studies have shown that minigenes do often replicate the splicing profile of the endogenous gene [104,105], it is nonetheless good practice to attempt result confirmation with patient RNA whenever possible. Another drawback is that splicing intermediates are difficult to detect as the final RT-PCR analysis only gives information regarding eventually unprocessed or mature mRNA species.
Finally, it should be noted that minigenes can be useful in combination with high-throughput screening approaches in order to discover new drugs/effectors capable of modulating the splicing process. An example of how this can be achieved is schematically shown in Figure 2.
Uniting classical and HTS approaches to investigate splicing regulation
Using bioinformatics to predict pre-mRNA splicing mutations in disease
The advent of sophisticated bioinformatic approaches has represented a paradigm shift in our ability to analyse and compare “big data”, or even simply to set up easily accessible data repositories of gene splicing information that can be easily consulted by clinicians.
Over the years, several in silico approaches have been made available to researchers. For an excellent review describing the basic principles of many of these programs the reader is referred to the work of Hartmann et al. .
Many programs target prediction of the basic cis-acting elements in the definition of an exon (5′ss, 3′ss and branch-point). The most well-known examples are MaxEntScan , NNsplice , AST , Spliceport , Spliceview , HBond , Automated Splice Site Analyses , NetGene2 , Human Splicing Finder  and Cryp-Skip . All these servers provide a useful indication of whether donor, acceptor and branch-site are well defined compared with ideal consensus sequences.
In parallel, several web servers such as ESEfinder , RESCUE-ESE [118,119], ExonScan [107,119,120], PESE [21,121] or ESRsearch  focus on disruption or creation of splicing regulatory elements (SREs). Other applications are dedicated to the search of splicing factor binding motifs in a sequence of interest, such as Sfmap  or SpliceAid2/SpliceAid-F [124,125]. Recently, a new generation of programs for evaluating the possible disruption of SRE–exonic sequences have made their appearance: QUEPASA , HEXploring , as has machine-learning computational studies of RNA seq data across different human tissues .
For all these, a key question is the degree of reliance that one can place on the bioinformatics approach alone and the inter-dependency of one aspect of the splice mechanism with another, with regards to the routine identification of possible splicing spoilers.
As the donor and acceptor elements tend are reasonably conserved in human genes , the programs that evaluate their relative strengths seem to be more successful than those that aim to target the much more loosely conserved SRE elements. Programs such as MaxEntScan and NNsplice that take into account nucleotide dependencies within donor site sequences have been found to be the best predictors of cryptic splice site activation in disease-causing mutations [37,130] and efficient algorithms to predict, for example, the likely consequences of substitutions in 5′ splice site are currently being developed . For this reason, the future trend in bioinformatics prediction programs is to integrate on a single platform, as much information as possible on global splicing signals. Currently, the most used example is the Sroogle interface that allows users to obtain information on splice-site or enhancer/silencer disruption from a single interface .
However, programs that evaluate SRE disruption/creation should be used with great care. Programs such as ESEfinder have worked reasonably well in some research contexts  and in bioinformatics studies [134,135] but rather poorly in others [136–139] and in some cases their outputs have become the subject of lively scientific debate [140–143]. A similar situation has been described for RESCUE-ESE [138,139,144,145]. One important observation with regards to the use of these programs in diagnostic analysis is how good is the degree of overlap (beyond chance occurrence) of the motifs defined by these programs? The answer to this question is variable. In fact, although there is significant overlap between the RESCUE-ESE and PESE computational outputs, there is very little between ESEfinder and RESCUE-ESE or PESE. The only exception being ASF/SF2 motifs and PESE . This suggests that for now, no significant advantage is gained by looking at possible score positive mutations with multiple types of programs. For a more detailed explanation of these issues, the reader is referred to the recent review by Chasin . In summary, the combinatorial use of all these resources still represents the best chance of “predicting” putative splicing mutations whether in conserved or less conserved regions [130,147].
For RNA secondary structure, two programs are available . The first is based on energy minimization considerations based on in vitro RNA folding studies whereas the second uses an evolutionary approach to sequence conservation [149,150]. Web-based programs that use energy minimization approaches such as mFold  have the advantage that they are usually fast, only require the knowledge of the sequence of interest and are based on highly optimized energy minimization parameters and folding algorithms . However, they also present several caveats, such as the average number of predicted secondary structures for RNA sequences grow rapidly with length, making it difficult to distinguish the most probable one. Even more dangerously, predictions are highly dependent on the nucleotide “window” chosen for the analysis (for example, changing the window by even a few nucleotides can profoundly change the output structure/energy). Not taking this into account has led to controversial reports with regards to understanding the pathogenic molecular mechanism underlying mutations in the NF-1 [153–156] and hprt genes [157,158]. These studies have highlighted the difficulty of assigning significance to correlations between in silico predicted changes in secondary structure and mutations.
Sequence comparison approaches such as those provided by the web-based pFold program  allow identification of RNA secondary structures that are presumably maintained by evolutionary pressures. The result provides a single consensus structure, thus avoiding a potential bias introduced by too many arbitrary choices by the researcher. However, in order to use these programs, pre-requisite knowledge of the sequences that are already known to fold in a functionally important structure is required. Any mistake in this choice can be fatal for prediction accuracy. Moreover, the output is highly dependent on the correct alignment process between input sequences.
One final issue that should be discussed is the recent appearance of databases that are partially or totally dedicated to collecting mRNA splicing defects. The most publicized example is the Human Gene Mutation Database (HGMD). This database acts as a repository of general pathological gene mutations . Other databases are also being established along these lines such as the Alternative Splicing Mutation Database (ASMD) [161,162]. In addition, and for particular aberrant splicing events such as cryptic splice site activation, researchers should also be referred to the recently established DBASS3 and DBSSS5 databases [37,38]. Finally, there is a growing list of locus-specific databases that are exclusively focused on a particular gene of interest such as CFTR or HPRT . A comprehensive list of these specific databases is maintained by the Human Genome Variation Society (HGVS) . Although none of these databases contain predictive information, they are an easy reference source for clinicians, and represent an invaluable data-mining tool for the development of new in silico approaches together with the numerous resources primarily concerned with collecting alternative splicing data from normal pre-mRNA processing events, such as ASTD [165,166], ASAP2 , ASPicDB  or HASD , SpliceDisease  and HEXevent .
Global approaches to splicing research in the clinic
In recent years, technological advances have allowed investigation of splicing and splicing abnormalities in a more global manner. Looking at RNA splicing on a global scale has been useful and facilitated studies that would otherwise have been impossible using methodologies that measure a single (or few) genes at a time. For example, these techniques have allowed determination of the biological relevance of single splicing factors in different organs  or measurement of the effect of RNA Pol II elongation on global alternative splicing levels . Most importantly, at a clinical level, these approaches have allowed researchers and clinicians to identify functionally important changes in complex diseases from cancer  to neurodegeneration . To date, global analysis of alternative splicing levels in different cells/tissues has been made possible by two main approaches: (1) high-throughput RNA-seq and (2) splicing-junction or exon-array techniques .
This methodology was initially described in 1995 by Schena et al. . It consists of a series of DNA oligos spotted at extremely high density on a solid surface. With regards to RNA splicing, investigating intron/exon usage in a huge number of genes has been made easier by the availability of arrays that contain sequences that span predicted splicing junctions, thus greatly aiding identification of changes in their usage, for an example see De Conti et al. . Nowadays, the major limitation of this technique is that it can only detect pre-annotated splicing events, and being able to detect unannotated splicing events will be critically important in a disease setting. Thus, RNA seq is quickly replacing this methodology as the method of choice.
Next generation sequencing and RNA sequencing
The technological advances that have allowed deep sequencing of large pools of DNA and RNA preparations from cells and individuals represent one of the most exciting breakthroughs in recent years . Thanks to the introduction of massive parallel sequencing techniques, our ability to obtain huge amounts of DNA/RNA information has completely revolutionized our ability to look at clinical specimens, particularly with regards to alternative splicing and the clinical impact of other forms of RNA (for example, all the various forms of ncRNAs).
In addition to dramatically improving our sequencing capacity, the great advantage of RNA seq is the fact that one does not need to know the composition and status of the genome beforehand. This means that RNA seq can detect the existence of new transcripts that were not known to exist before, or detect new splicing isoforms that could not have been predicted on the basis of primary sequence analysis.
This possibility has been especially exploited by developing mapping tools that aim to identify the “reads” that span splice junctions [71,180,181]. These approaches have facilitated the identification of new splicing isoforms produced by eukaryotic genomes and have highlighted how most genes expressed in our cells are subject to alternative splicing. As a result, RNA seq has greatly increased our knowledge with regards to the evolutionary importance of alternative splicing , the mechanisms through which some very large introns may be removed by sequential splicing reactions that are currently referred to as “recursive splicing” , and the expression/function of new classes of transcripts such as circular mRNAs .
Alterations of protein coding genes are not the only important factor when considering disease. Other forms of RNA have also been shown to be important in disease, for example ncRNAs . As a result, emerging data is rich in different levels of gene and disease modulation affected by these molecules. RNA seq and bioinformatics can identify miRNA target sites in transcripts, and some of the variants uncovered by this wide-ranging study may affect these targets. In fact, the nature of the miRNA modulation is that it is almost always partial, making it difficult to identify. This is a subject that needs extensive population analysis and in the future, a catalogue of target site variants will be helpful in the clinical setting.
Therefore, which sequencing platforms should be used, how many samples should be sequenced for power calculations, are dependent on the question being asked of the method and this is where careful planning is needed by experimenters.
High-throughput approaches and the clinic
A new era of genetics and genomics has begun in clinical practice, transferring new genomic technology directly to the bedside for immediate patient benefit. Clinical exomes are already in regular use and full genome sequencing is close to being integrated through large projects such as the U.K. 100,000 genome project (https://www.genomicsengland.co.uk).
However, when genes are sequenced, many sequence variants are seen. The challenge is interpreting which are clinically important and which are benign variants. It is estimated that genomes will generate >10 million sequence variants between one individual and another of unknown clinical significance. Although some variants have obvious possible effects on the sequence of the proteins expressed, many mutations even in the coding regions are predicted with software tools to be silent and thus would not appear to affect any protein sequences expressed. As discussed, this could well be through disruption of the splicing process and the examination of the transcriptome will be a useful adjunct to these high-throughput analyses, adding value to the genomic data sets generated and changed interpretation of results for patients.
Information on a gene's expression levels will be useful for diagnostics and will inform us of a gene's possible use, particularly if this gene has been a ‘new’ association with a disease. ‘New’ genes associated with disorders will be a frequent occurrence with improved genomics analyses and interpretation using transcriptomics.
For example, if the transcriptome database shows that an unknown gene's expression levels are dramatically higher in cancer cells than in healthy cells, the unknown gene may play a role in cell growth. In addition, we know that transcriptomes can be useful as biomarkers.
At present, several transcriptome resources exist that can be freely consulted for this purpose, including those in NIH programs such as the Genotype-Tissue Expression Project (GTEx) (http://gtexportal.org/home/) and the Encyclopedia of DNA Elements (ENCODE) (https://www.genome.gov/encode/). GTEx is creating a catalogue of human gene expression in a variety of different tissues. ENCODE researchers aim to characterize and understand the working parts of the genome, including the transcriptome. In addition, both Novartis and the European Molecular Biology Laboratory have well-established gene expression databases that are useful for the interpretation and comparison of transcriptomes obtained from patients.
The introduction of high-throughput RNA-seq techniques has revolutionized our ability to study splicing events that occur in normal cells and the way they may be affected by disease-associated mutations or conditions. These techniques have allowed researchers, for the first time, to go beyond the analysis of one or a limited set of genes and to be able to look at the splicing process in a global manner. If anything, one of the major problems encountered so far with regards to these techniques is the huge amount of information that can be obtained from each single experiment.
Notwithstanding their growing usefulness, all these approaches still show a considerable degree of unreliability with regards to the system analysed and the bioinformatics methods used to interpret the millions of RNA sequences generated by each assay. As a consequence, their results can only be one part of the final picture and may have to be verified by “wet lab” experiments using more traditional splicing-based methodologies. For these reasons, future efforts should be devoted to optimizing RNA seq, data processing and obtaining a systematic RNA diagnostic pipeline, incorporating all these aspects. This will represent an important useful adjunct, maximizing interpretation of genomic testing.
This work was supported by the Telethon under the research “Identification of new therapeutic agents for the treatment of Glycogenosis type 2 due to the common splicing mutation c.-32-13T>G.” [grant number GGP14192].
Cystic Fibrosis Transmembrane Regulator
Encyclopedia of DNA Elements
Genotype-Tissue Expression Project
long noncoding RNA
- RNA seq
Systematic Evolution of Ligands by Exponential Enrichment
splicing regulatory element