High-throughput, genome-wide analytical technologies are now commonly used in all fields of medical research. The most commonly applied of these technologies, gene expression microarrays, have been shown to be both accurate and precise when properly implemented. For over a decade, microarrays have provided novel insight into many complex human diseases. Microarray-based discovery can be classified into three components, biomarker detection, disease (sub)classification and identification of causal mechanism, in order of accomplishment. Within the respiratory system, the application of microarrays has achieved significant success in all components, particularly with respect to lung cancer. Numerous studies over the last half-decade have applied this technology to the characterization of non-malignant respiratory diseases, animal models of respiratory disease and normal developmental processes. Studies of obstructive lung diseases by many groups, including our own, have yielded not only disease biomarkers, but also some novel putative pathogenic mechanisms. We have successfully used an integrative genomics approach, combining microarray analysis with human genetics, to identify susceptibility genes for COPD (chronic obstructive pulmonary disease). Interestingly, we find that the assessment of quantitative phenotypic variables enhances gene discovery. Our studies contribute to the identification of obstructive lung disease biomarkers, provide data associated with disease phenotypes and support the use of an integrated approach to move beyond marker identification to mechanism discovery.

Introduction

The completion of genome sequencing of specific organisms, combined with technological advances in the capability to detect changes in genome-wide expression at both the mRNA and protein levels, has heralded a new era in our quest for understanding the complex pathways and mechanisms in living organisms. In the last decade, there has been significant advancement in microarray technology with the development of many different platforms which are being used for analysing gene expression, genotyping and other applications [1,2] (Figure 1). According to the central dogma of molecular biology, genomic DNA is first transcribed into mRNA, which thereafter is translated into protein. Proteins play critical roles in most intra- and extra-cellular activities, including enzymatic, regulatory and structural functions. However, relative difficulties of expression measurement capabilities at the protein level and availability of technologies of high-throughput methods (expression microarrays) for detection of individual mRNA have led to the wide use of microarrays to simultaneously measure the sum of all mRNA expression in a sample, also called the transcriptome [3]. Like most classical methods for analysis of gene expression at the mRNA level, the basic principle of microarray technology is complementary hybridization of nucleotides as explained by the Watson–Crick double helical model of DNA. Microarrays measure transcriptomic modifications that, either at the single gene level or collectively in multiple genes, lead to changes in protein expression. In fact, it is unusual for changes in the level of a specific mRNA to not be accompanied by changes in the protein level for that gene. Furthermore, although not all changes in protein expression and function levels are captured at the steady-state level of that particular mRNA, their downstream effects are captured by the full transcriptome.

Overview of the utility of gene expression microarray technology in lung disease biomarker and therapeutic target discovery

Figure 1
Overview of the utility of gene expression microarray technology in lung disease biomarker and therapeutic target discovery

Multiple factors such as experimental design, laboratory methods and analytical approach contribute to data accuracy. Application of the technology can be broadly distinguished as class discovery, class prediction and mechanism discovery as defined by distinct analytical methods and experimental goals. Integrating the study of normal developmental processes, animal models of disease and human genetic epidemiology, with an emphasis upon quantitative variable analyses, can assist in the goal of identifying causal genes/pathways as we have described for SERPINE2 in COPD [55].

Figure 1
Overview of the utility of gene expression microarray technology in lung disease biomarker and therapeutic target discovery

Multiple factors such as experimental design, laboratory methods and analytical approach contribute to data accuracy. Application of the technology can be broadly distinguished as class discovery, class prediction and mechanism discovery as defined by distinct analytical methods and experimental goals. Integrating the study of normal developmental processes, animal models of disease and human genetic epidemiology, with an emphasis upon quantitative variable analyses, can assist in the goal of identifying causal genes/pathways as we have described for SERPINE2 in COPD [55].

In the early part of the decade, when the use of gene expression microarrays was growing exponentially (see Figure 2), there was considerable concern and debate regarding the precision of the technology. For instance, Tan et al. [4] reported divergence in microarray-based gene expression measurements. Similar observations, and the experiences of many investigators, resulted in concerns being raised regarding the potential utility of microarrays. This was met by an independent and thorough evaluation of the technology by the MAQC (MicroArray Quality Control) [5] project. The MAQC project was developed by the FDA (Food and Drug Administration), the EPA (Environmental Protection Agency) and the NIST (National Institute of Standardization and Technology), in association with commercial stakeholders and academic laboratories. The purpose was to evaluate the accuracy of microarray technology, to provide quality-control tools and to develop guidelines for microarray data analysis. The MAQC project involved quantification of gene expression levels using seven microarray platforms tested at three independent sites, with five replicates at each location. Although each microarray platform studied had different performance characteristics, they generated comparable results with up to 95% concordance with regard to defining differential expression. We have recently completed a study showing similar cross-platform concordance with time-series data (R. Du, K. G. Tantisira, V. J. Carey, S. Bhattacharya, S. Metje, B. J. Klanderman, R. Gaedigk, R. Lazarus, A.T. Kho, T. J. Mariani, J. S. Leeder and S.T. Weiss, unpublished work). These data unequivocally demonstrated extremely high levels of gene expression microarray measurement precision.

Analysis of growth in the application of microarray technology as defined by published research articles (A) or publicly deposited datasets (B)

Figure 2
Analysis of growth in the application of microarray technology as defined by published research articles (A) or publicly deposited datasets (B)

(A) The number of all research articles (black line) and lung-specific research articles (grey line) published each year, together with the relative number of articles focusing on lung cancer (black bars) or non-cancerous lung tissue (grey bars). (B) The number of all datasets (black line) and lung-specific datasets (grey line) deposited in the National Center for Biotechnology Information Gene Expression Omnibus each year, together with the relative number of the datasets focusing on lung cancer (black bars) or non-cancerous lung tissue (grey bars). Although the number of lung-related research articles initially grew at the same pace as all microarray research articles, there seems to have been a slowing in the pace of lung-related publications over the last 5 years. Likewise, the number of lung-specific datasets publicly deposited prior to 2004 was representative of all datasets as a whole, but the rate of growth of lung-related datasets has slowed substantially since then. Note publicly deposited datasets may not be entirely representative of all data being studied and published, dependent on journal-specific rules for data deposition.

Figure 2
Analysis of growth in the application of microarray technology as defined by published research articles (A) or publicly deposited datasets (B)

(A) The number of all research articles (black line) and lung-specific research articles (grey line) published each year, together with the relative number of articles focusing on lung cancer (black bars) or non-cancerous lung tissue (grey bars). (B) The number of all datasets (black line) and lung-specific datasets (grey line) deposited in the National Center for Biotechnology Information Gene Expression Omnibus each year, together with the relative number of the datasets focusing on lung cancer (black bars) or non-cancerous lung tissue (grey bars). Although the number of lung-related research articles initially grew at the same pace as all microarray research articles, there seems to have been a slowing in the pace of lung-related publications over the last 5 years. Likewise, the number of lung-specific datasets publicly deposited prior to 2004 was representative of all datasets as a whole, but the rate of growth of lung-related datasets has slowed substantially since then. Note publicly deposited datasets may not be entirely representative of all data being studied and published, dependent on journal-specific rules for data deposition.

There is no doubt that the technology has suffered setbacks due to rapid growth. Even though all microarray platforms apply the same basic principle of complementary hybridization, diverse probe designs have led to a multitude of problems in quantitative data acquisition and management. In retrospect, we can conclude that significant limitations in the implementation of microarray technology have led to poor performance. We can categorize these limitations as primarily including deficiencies in: (i) experimental design/study size, (ii) analytical methods and (iii) probe sequences.

Possibly the most important, and most overlooked, aspect of successful expression array analysis is appropriate experimental design including adequately powered sample size. A question often asked by investigators is what number of samples to study. Optimal sample sizes can be determined using modified power calculations [7]. The primary objectives of a microarray experiment are usually class comparison, class discovery or class prediction [8]. For class comparison and class prediction studies, a large number of biological replicates (not technical replicates) are recommended, whereas for class discovery, technical replicates from the same individual provide a better assessment of disease classification. Conversely, for inbred animal models (e.g. mouse), where genetic heterogeneity is limited and exposures are controlled, technical replicates (and pooling of biological replicates) are preferred. Irrespective of the study objectives, the use of improper control samples that are derived from tissues of origin other than that of the treatment samples lead to erroneous predictive classifications due to confounding of the samples [9].

Although there are established protocols for laboratory procedures leading to the generation of data, there is no clear consensus on a ‘best’ method for data analysis. Early analytical approaches relied heavily on a non-statistical measure of expression differences (fold change) [10] that was repeatedly shown to lack sensitivity and specificity. For instance, we showed that a statistical approach to accurately define differential expression, using measurement precision in technical replicates, is not directly proportional to fold change [11]. Other statistical methods, such as standard t tests, were recognized as being subject to problems of multiple testing resulting from repeated measures in a limited number of samples. Numerous complex and/or specific mathematical and statistical approaches have been subsequently developed and applied to microarray data (see [12] and references therein). Like experimental design, the choice of methods for statistical analysis should be chosen based on the structure and distribution of the data and the objective of the study. Fortunately, the MAQC project and user consensus have pointed to a limited number of preferred analytical approaches that are recognized as being robust and effective.

Another long-appreciated source of ‘noise’ or technical variability within expression arrays is inaccuracy of probe sequences, which may not match the transcripts they are intended to measure. This was first widely appreciated due to a ‘manufacturing’ (annotation) error made by the leading commercial source of the technology, which led to a large number of arrays/experiments that inadvertently measured sense transcripts [13]. However, this proved to be no more than a minor setback in the evolution of a powerful and complex technology. More recently, it has been appreciated that sequence databases evolve, resulting in transient (in)accuracy of probes [14]. We have previously shown improved accuracy and cross-platform comparisons when accounting for these inaccuracies [15].

Technological accomplishments

Having survived these temporary setbacks and limitations, microarrays have developed into standard tools for high-throughput analysis of gene expression, and continue to grow in information quality and in new applications. The evolution of microarray technology has been a gradual process. The technological principles involving combinatorial chemistry have been in development since the late 1960s with the works of R. Bruce Merrifield (Nobel Prize in Chemistry, 1984), with arrays in their current forms first appearing in the early-to- mid-1990s. Both Stephen Fodor and Mark Schena developed the early prototypes of cDNA and oligonucleotide arrays almost simultaneously [1,2]. While Fodor in collaboration with Lubert Stryer of Stanford University received a small business innovative research grant from the NIH (National Institutes of Health) and went on to establish Affymetrix Inc., Schena pioneered the cDNA technology under the guidance of Pat Brown at Stanford University [16].

By the late 1990s, the power and potential of microarray technology were fully appreciated and it was being applied in an effort to develop novel descriptions of diseased states. Novel genes and pathways, previously not implicated in the pathophysiology of a certain disease, may emerge from microarray studies to provide new theories regarding the disease process and potential therapeutic drug targets. The spread in use of the technology was unprecedented (Figure 2A), with exponential growth in the number of publications reporting results from its application in the early part of this decade. Parallel growth was initially observed in the lung biology and disease research community, predominantly by those focused on lung cancer (50–60% of publications each year), although over the last few years, its use in lung research has apparently lagged. Regardless of the sample studied, expression microarray application to disease can be classified under three broad topics: biomarker discovery or class prediction, disease subclassification or class discovery and uncovering the disease mechanism.

Biomarker discovery

Expression arrays have been widely used to predict the state or ‘class’ of an unknown sample using pre-existing information. One of the most studied human diseases by expression-profiling technology is cancer (Figure 2). Initial microarray studies were heavily focused on identifying gene expression markers for human cancer phenotypes distinguishing them from normal samples. Golub et al. [8] described the potential of expression array data to predict disease using the distinctions between human AML (acute myeloid leukaemia) and ALL (acute lymphoblastic leukaemia) as their experimental model [8]. Although these diseases are capable of being discriminated by other means (such as cytology), this study served as proof-of-principle that the technology could define expression markers informative for the diseased state. This approach has been widely applied to many diseases or previously recognized disease subtypes including breast cancers, cutaneous malignant melanoma, diffuse large B-cell lymphoma, colon cancer, leukaemia and ovarian carcinomas (see [17] and references therein). In the pulmonary system, numerous human disease states and animal models of disease have been subjected to expression profiling in an effort to define disease biomarkers and/or class predictors [18].

Detection of genomic signatures from individual studies has provided a wealth of information, but these data are limited for pathological diagnosis unless validated externally. One major limitation to this goal is the small number of samples in individual studies, particularly for human studies where there is a high degree of both intra- and inter-population variability. This ultimately results in disease biomarkers that are population dependent, rather than having global applicability. This is true with regard to studies of lung diseases, where biomarkers for acute and chronic diseases, including severe asthma and COPD (chronic obstructive pulmonary disease), and environmental exposures have been identified [19]. We have recently presented an analysis of lung tissue gene expression in subjects with COPD [20]. We used a novel combination of discrete and quantitative variable analysis to determine differential expression. Inclusion of quantitative phenotypes helped in providing a set of robust COPD markers, which successfully predicted disease in an independent COPD population [20]. Admittedly, this success is much more of an exception than the rule. However, with an increasing number of microarray datasets being deposited in the public domain [21,22] (Figure 2B), real opportunity exists for more reliable information to be generated through the integration of multiple, independently generated datasets focusing on the same biological paradigm. To facilitate this concept, MIAME (Minimum Information About a Microarray Experiment) was developed, which serves as the standard information required to accompany microarray data to ensure correct interpretation of the data and independent verification of analysis of results [23].

Many groups have proposed approaches for data integration across platforms and laboratories (for example see [24, 25] and references therein). Meta-analysis approaches have also been applied to validate results from different studies, which, in certain cases, have proved to be successful [17]. With a large number of microarray studies on breast cancer, great promise existed to leverage these data to identify a disease biomarker that could be used as a true diagnostic tool. Indeed, van't Veer et al. [26] identified a gene expression signature strongly predictive of patients with either poor or good prognosis. The gene-expression biomarker has been used as a predictor of the outcome of disease in young patients with breast cancer in combination with standard prediction tools based on clinical and histological criteria [27]. A 70-gene marker chip has been developed and tested for diagnosis of breast cancer that outperformed all clinical variables in predicting the likelihood of distant metastases within 5 years [27]. This represents the first clinically approved gene expression microarray test for molecular-based therapy.

Disease subclassification

The identification of cancer subtypes can sometimes be a difficult process, as it relies on the subjective interpretation of both clinical and histopathological observations with the aim of classifying samples in currently accepted subtypes based on the tissue of origin of the tumour. This sometimes is hampered by a lack of clinical information or unclear classification of samples based on histology. Alternatively, some diseases present with similar histopathological features of clearly distinct origin or with variable prognoses. Most early microarray studies involved marker distinction between normal and aberrant (diseased) tissues. However, much enthusiasm has been generated regarding the ability of gene expression microarrays to clarify disease subclasses, and even identify previously unappreciated classes with distinct aetiologies, prognoses and/or therapeutic responses. This potential is particularly relevant for cancer, as was first described by Alizadeh et al. [28] for B-cell lymphoma. Microarray analysis has led to the identification of molecular classification of several human malignant tumours based on pathological parameters, namely stage, recurrence, prognostic outcome or therapy response [29] in breast cancer [26,27,30], cervical lymph node metastasis in oral squamous cell cancer [31], non-small-cell lung cancers and clear cell renal carcinoma [33]. One of the most extensively analysed microarray datasets involves a comparison of leukaemia subtypes [8], which has also been used for identifying predictors for therapy response [34]. Every new discovery of disease subtype molecular markers creates a path for the development of molecular-targeted therapies.

The prognostic and discovery potential of microarrays to perform disease subclassification are possibly best exemplified as applied to NSCLC (non-small-cell lung carcinoma), the most common form of the most frequent cause of cancer death in the world. Lung carcinomas are a heterogeneous collection of tumours characterized by a large number of chromosomal and structural abnormalities. NSCLCs can be subclassified into adenocarcinomas (the most common), squamous cell carcinomas and large-cell carcinomas [35]. Bhattacharjee et al. [36] demonstrated the ability to define NSCLC subtypes based on gene expression profiles and first identified ‘molecular’ subclasses of adenocarcinoma. These observations have been replicated in numerous other studies [37,38], with the common limitation of the identification of population-dependent markers (as described above). In the absence of robust markers, meta-analysis has been implemented, with limited success. Potti et al. [39] identified meta-genes to predict therapy response in patients with early-stage non-small-cell lung cancer. In contrast, Ramaswamy et al. [40] re-analysed multiple datasets of tumour expression profiles to identify a set of gene expression markers that can be used as a multiclass cancer diagnosis tool.

In addition to lung cancer, disease subtype classification has also been attempted in other lung disorders such as COPD [41], pulmonary fibrosis [42] or asthma [43]. For instance, two laboratories have recently described markers for severe drug-resistant asthma in lung epithelial cells [44] and in peripheral blood cells [45]. In an interesting study, Kho et al. [46] used the organ-specific development transcriptome as a basis for subclass discovery. Molecular subclassification promises the hope of defining class-specific mechanisms and routes for therapeutic intervention.

Causal mechanisms

Even though gene-based signature sets have been developed, the causal mechanisms are still unclear. Initially, it was hoped that defining expression biomarkers of disease would lead directly to causality. We now appreciate that analytical methods and experimental designs focusing on class prediction are typically not efficient at uncovering disease mechanismss. One means used to apply microarrays to uncover the disease mechanism was intuitive: the use of animal modelling. Although such models have been developed in an effort to determine the effects of candidate genes, they can also be used for gene discovery. Such studies are common throughout the literature including those concerning the lung, particularly for models of lung inflammation and allergic hypersensitivity. Many of these studies suffer from the limitations described above for studies of clinical specimens (namely, small study size, ambiguous analytical methods etc.), but have provided tremendous insights nonetheless. For instance, expression profiling from animal models has provided insights into COPD pathogenesis [47]. Additionally, Novershtern et al. [48] have recently developed a signature gene set for asthma by integrating information from genome-wide expression and protein studies of animal models. In particular, the integration of multiple genomic approaches, such as integrating animal models and genetics, has been particularly useful in uncovering mechanistic genes/pathways. In terms of respiratory disease, we recognized the study by Karp et al. [49] of an allergic hypersensitivity model of asthma as an early example of this approach.

Most microarray studies are limited to comparison analysis, aiming to identify genes with a change in expression between two classes. In contrast, DeRisi et al. [50] published a study of the yeast life cycle that initiated the use of microarrays in time-series studies. These types of experiments have typically been used for biological discovery in ‘normal’ samples, and not for the identification of disease biomarkers. However, another successful approach for defining a causal mechanism has been through the integration of human genetics and genome-wide expression, such as that used by Blackshaw et al. [51]. Here, they used transcriptomic profiles of normal developing eye tissue to identify biologically relevant candidate causal genes within loci linked to human eye diseases. Collectively, results from these and similar studies have provided novel insights into the mechanisms of disease pathogenesis, and raise expectations of developing therapeutic targets from the application of DNA microarrays to complex diseases.

As described above, multiple groups have attempted to identify candidate genes for COPD, a complex human disease probably influenced by a genetic component (α1-antitrypsin deficiency), an environmental component (cigarette smoking) and gene-by-environment interactions [41,5254]. We recently reported the identification of a COPD susceptibility gene through the integration of human genetics with gene expression profiling of normal lung development and diseased lung tissue. Like Blackshaw et al. [51], we used transcriptomic profiles of organ development to inform us of biologically relevant candidate genes within a disease-linked locus. We identified SERPINE2 [serpin peptidase inhibitor, clade E (nexin, plasminogen activator inhibitor type 1), member 2], a homologue of the only known COPD susceptibility gene, and went on to show that expression of this gene was aberrant in lung tissue from COPD subjects [55]. Of particular note, SERPINE2 expression was not a robust class prediction marker, but was highly correlated with multiple quantitative measures of lung function in multiple datasets where quantitative phenotypes for COPD were available (Figure 3). Within the disease-linked locus, this was uniquely true of SERPINE2. Multiple groups, including our own, have subsequently demonstrated significant associations between SERPINE2 gene variants and COPD phenotypes [55,56] and aberrant SERPINE2 expression in COPD lung tissue [57]. We have further discovered that deficiency in SERPINE2 in the mouse leads to the development of COPD-related lung histopathology (S. Srisuma and T.J. Mariani, unpublished work). We believe that such an integrative genomics approach helps us to expedite candidate gene identification, and that the use of quantitative variables may be particularly useful to uncover causal disease mechanisms and pathways.

Using quantitative disease phenotypes for gene expression biomarker discovery

Figure 3
Using quantitative disease phenotypes for gene expression biomarker discovery

We initiated gene discovery for COPD in a genetically linked locus (chromosome 2q) in two independent datasets [41,58]. We focused on identifying gene expression changes associated with quantitative phenotypic variables characteristic of COPD such as forced airflow, diffusing capacity of the lung for carbon monoxide (DLCO) and total lung capacity (TLC). A significant positive correlation is indicated in black, while a significant negative correlation is indicated in grey. Note that TLC is inversely proportional to the other phenotypic variables. While there was evidence for association of multiple genes within the locus in each dataset, there was a single gene, SERPINE2, consistently associated with multiple quantitative phenotypic variables in both datasets. FEF 25–75%, forced expiratory flow at 50% of vital capacity; FEV1, forced expiratory volume in 1s; FVC, forced vital capacity.

Figure 3
Using quantitative disease phenotypes for gene expression biomarker discovery

We initiated gene discovery for COPD in a genetically linked locus (chromosome 2q) in two independent datasets [41,58]. We focused on identifying gene expression changes associated with quantitative phenotypic variables characteristic of COPD such as forced airflow, diffusing capacity of the lung for carbon monoxide (DLCO) and total lung capacity (TLC). A significant positive correlation is indicated in black, while a significant negative correlation is indicated in grey. Note that TLC is inversely proportional to the other phenotypic variables. While there was evidence for association of multiple genes within the locus in each dataset, there was a single gene, SERPINE2, consistently associated with multiple quantitative phenotypic variables in both datasets. FEF 25–75%, forced expiratory flow at 50% of vital capacity; FEV1, forced expiratory volume in 1s; FVC, forced vital capacity.

Summary

Gene expression microarrays have come a long way from being a complex technology (sometimes poorly applied) in bio-medical research and are now a critical component of state-of-the-art research on disease discovery, therapeutic responsiveness and pathogenesis. The tools used for statistical analysis, data mining and archiving have steadily improved, and objective criteria demonstrate a high level of precision for the technology when applied appropriately. Numerous studies in respiratory medicine have provided a wealth of data describing biological paradigms from normal development to lung cancer. Although the primary application of the technology has been to identify biomarkers for disease, the technology has been successful in the identification of disease subtypes and molecular diagnostic predictions. We have here listed only a select few instances of the wide ranging capabilities of this high-throughput technology. With proper planning and experimental design, it has been shown to discover or further resolve complex disease mechanisms to levels previously unimaginable. When used in combination with animal models and genetic studies, particularly focusing on quantitative variable analysis, it has provided unexpected power to identify disease mechanisms. Even though we have seen considerable advancement in the technology, in order to achieve its full potential, genome-wide expression studies have to co-evolve in a multidisciplinary approach that includes a combination of well-developed machine-learning algorithms and systems biology approaches. This can only be achieved when clinicians, surgeons, pathologists, epidemiologists, bioinformaticians and molecular biologists undertake a well-co-ordinated effort to properly plan and conduct every step of the process starting from experimental design to validation of the results.

Biochemical Basis of Respiratory Disease: Biochemical Society Focused Meeting held at AstraZeneca, Loughborough, U.K., 5–6 March 2009. Organized and Edited by Colin Bingle (Sheffield, U.K.) and Alan Wallace (AstraZeneca, U.K.).

Abbreviations

     
  • COPD

    chronic obstructive pulmonary disease

  •  
  • MAQC

    MicroArray Quality Control

  •  
  • NSCLC

    non-small-cell lung carcinoma

  •  
  • SERPINE2

    serpin peptidase inhibitor, clade E (nexin, plasminogen activator inhibitor type 1), member 2

Funding

This work was supported by grants from the National Institutes of Health [grant numbers HL071885 and ES014372] and Flight Attendant Medical Research Institute.

References

References
1
Fodor
S.P.
Read
J.L.
Pirrung
M.C.
Stryer
L.
Lu
A.T.
Solas
D.
Light-directed, spatially addressable parallel chemical synthesis
Science
1991
, vol. 
251
 (pg. 
767
-
773
)
2
Schena
M.
Shalon
D.
Heller
R.
Chai
A.
Brown
P.O.
Davis
R.W.
Parallel human genome analysis: microarray-based expression monitoring of 1000 genes
Proc. Natl. Acad. Sci. U.S.A.
1996
, vol. 
93
 (pg. 
10614
-
10619
)
3
Mehra
A.
Lee
K.H.
Hatzimanikatis
V.
Insights into the relation between mRNA and protein expression patterns. I. Theoretical considerations
Biotechnol. Bioeng.
2003
, vol. 
84
 (pg. 
822
-
833
)
4
Tan
P.K.
Downey
T.J.
Spitznagel
E.L.
Jr
Xu
P.
Fu
D.
Dimitrov
D.S.
Lempicki
R.A.
Raaka
B.M.
Cam
M.C.
Evaluation of gene expression measurements from commercial microarray platforms
Nucleic Acids Res.
2003
, vol. 
31
 (pg. 
5676
-
5684
)
5
Shi
L.
Reid
L.H.
Jones
W.D.
Shippy
R.
Warrington
J.A.
Baker
S.C.
Collins
P.J.
de Longueville
F.
Kawasaki
E.S.
Lee
K.Y.
, et al. 
The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements
Nat. Biotechnol.
2006
, vol. 
24
 (pg. 
1151
-
1161
)
6
Reference deleted
7
Seo
J.
Gordish-Dressman
H.
Hoffman
E.P.
An interactive power analysis tool for microarray hypothesis testing and generation
Bioinformatics
2006
, vol. 
22
 (pg. 
808
-
814
)
8
Golub
T.R.
Slonim
D.K.
Tamayo
P.
Huard
C.
Gaasenbeek
M.
Mesirov
J.P.
Coller
H.
Loh
M.L.
Downing
J.R.
Caligiuri
M.A.
, et al. 
Molecular classification of cancer: class discovery and class prediction by gene expression monitoring
Science
1999
, vol. 
286
 (pg. 
531
-
537
)
9
Bhattacharya
S.
Long
D.
Lyons-Weiler
J.
Overcoming confounded controls in the analysis of gene expression data from microarray experiments
Appl. Bioinformatics
2003
, vol. 
2
 (pg. 
197
-
208
)
10
Chen
Y.
Kamat
V.
Dougherty
E.R.
Bittner
M.L.
Meltzer
P.S.
Trent
J.M.
Ratio statistics of gene expression levels and applications to microarray data analysis
Bioinformatics
2002
, vol. 
18
 (pg. 
1207
-
1215
)
11
Mariani
T.J.
Budhraja
V.
Mecham
B.H.
Gu
C.C.
Watson
M.A.
Sadovsky
Y.
A variable fold change threshold determines significance for expression microarrays
FASEB J.
2003
, vol. 
17
 (pg. 
321
-
323
)
12
Quackenbush
J.
Computational approaches to analysis of DNA microarray data
Methods Inf. Med.
2006
, vol. 
45
 (pg. 
91
-
103
(Suppl. 1)
13
Marshall
E.
DNA arrays: affymetrix settles suit, fixes mouse chips
Science
2001
, vol. 
291
 pg. 
2535
 
14
Mecham
B.H.
Wetmore
D.Z.
Szallasi
Z.
Sadovsky
Y.
Kohane
I.
Mariani
T.J.
Increased measurement accuracy for sequence-verified microarray probes
Physiol. Genomics
2004
, vol. 
18
 (pg. 
308
-
315
)
15
Mecham
B.H.
Klus
G.T.
Strovel
J.
Augustus
M.
Byrne
D.
Bozso
P.
Wetmore
D.Z.
Mariani
T.J.
Kohane
I.S.
Szallasi
Z.
Sequence-matched probes produce increased cross-platform consistency and more reproducible biological results in microarray-based gene expression measurements
Nucleic Acids Res.
2004
, vol. 
32
 pg. 
e74
 
16
Southern
E.M.
DNA microarrays: history and overview
Methods Mol. Biol.
2001
, vol. 
170
 (pg. 
1
-
15
)
17
Rhodes
D.R.
Barrette
T.R.
Rubin
M.A.
Ghosh
D.
Chinnaiyan
A.M.
Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer
Cancer Res.
2002
, vol. 
62
 (pg. 
4427
-
4433
)
18
Tzouvelekis
A.
Patlakas
G.
Bouros
D.
Application of microarray technology in pulmonary diseases
Respir. Res.
2004
, vol. 
5
 pg. 
26
 
19
Mariani
T.J.
Ramoni
M.F.
Microarray techniques and data in asthma/chronic obstructive pulmonary disease. In Genetics of Asthma and Chronic Obstructive Pulmonary Disease (Postma, D.S. and Weiss, S.T., eds)
2007
New York
Informa Healthcare
(pg. 
75
-
103
)
20
Bhattacharya
S.
Srisuma
S.
Demeo
D.L.
Shapiro
S.D.
Bueno
R.
Silverman
E.K.
Reilly
J.J.
Mariani
T.J.
Molecular biomarkers for quantitative and discrete COPD phenotypes
Am. J. Respir. Cell Mol. Biol.
2009
, vol. 
40
 (pg. 
359
-
367
)
21
Edgar
R.
Domrachev
M.
Lash
A.E.
Gene expression omnibus: NCBI gene expression and hybridization array data repository
Nucleic Acids Res.
2002
, vol. 
30
 (pg. 
207
-
210
)
22
Parkinson
H.
Kapushesky
M.
Kolesnikov
N.
Rustici
G.
Shojatalab
M.
Abeygunawardena
N.
Berube
H.
Dylag
M.
Emam
I.
Farne
A.
, et al. 
ArrayExpress update: from an archive of functional genomics experiments to the atlas of gene expression
Nucleic Acids Res.
2009
, vol. 
37
 (pg. 
D868
-
D872
)
23
Brazma
A.
Hingamp
P.
Quackenbush
J.
Sherlock
G.
Spellman
P.
Stoeckert
C.
Aach
J.
Ansorge
W.
Ball
C.A.
Causton
H.C.
, et al. 
Minimum information about a microarray experiment (MIAME)-toward standards for microarray data
Nat. Genet.
2001
, vol. 
29
 (pg. 
365
-
371
)
24
Gordon
G.J.
Jensen
R.V.
Hsiao
L.L.
Gullans
S.R.
Blumenstock
J.E.
Ramaswamy
S.
Richards
W.G.
Sugarbaker
D.J.
Bueno
R.
Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma
Cancer Res.
2002
, vol. 
62
 (pg. 
4963
-
4967
)
25
Bhattacharya
S.
Mariani
T.J.
Transformation of expression intensities across generations of Affymetrix microarrays using sequence matching and regression modeling
Nucleic Acids Res.
2005
, vol. 
33
 pg. 
e157
 
26
van't Veer
L.J.
Dai
H.
van de Vijver
M.J.
He
Y.D.
Hart
A.A.
Mao
M.
Peterse
H.L.
van der Kooy
K.
Marton
M.J.
Witteveen
A.T.
, et al. 
Gene expression profiling predicts clinical outcome of breast cancer
Nature
2002
, vol. 
415
 (pg. 
530
-
536
)
27
van de Vijver
M.J.
He
Y.D.
van't Veer
L.J.
Dai
H.
Hart
A.A.
Voskuil
D.W.
Schreiber
G.J.
Peterse
J.L.
Roberts
C.
Marton
M.J.
, et al. 
A gene-expression signature as a predictor of survival in breast cancer
N. Engl. J. Med.
2002
, vol. 
347
 (pg. 
1999
-
2009
)
28
Alizadeh
A.A.
Eisen
M.B.
Davis
R.E.
Ma
C.
Lossos
I.S.
Rosenwald
A.
Boldrick
J.C.
Sabet
H.
Tran
T.
Yu
X.
, et al. 
Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling
Nature
2000
, vol. 
403
 (pg. 
503
-
511
)
29
Ramaswamy
S.
Ross
K.N.
Lander
E.S.
Golub
T.R.
A molecular signature of metastasis in primary solid tumors
Nat. Genet.
2003
, vol. 
33
 (pg. 
49
-
54
)
30
Huang
E.
Cheng
S.H.
Dressman
H.
Pittman
J.
Tsou
M.H.
Horng
C.F.
Bild
A.
Iversen
E.S.
Liao
M.
Chen
C.M.
, et al. 
Gene expression predictors of breast cancer outcomes
Lancet
2003
, vol. 
361
 (pg. 
1590
-
1596
)
31
Nagata
M.
Fujita
H.
Ida
H.
Hoshina
H.
Inoue
T.
Seki
Y.
Ohnishi
M.
Ohyama
T.
Shingaki
S.
Kaji
M.
, et al. 
Identification of potential biomarkers of lymph node metastasis in oral squamous cell carcinoma by cDNA microarray analysis
Int. J. Cancer
2003
, vol. 
106
 (pg. 
683
-
689
)
32
Kikuchi
T.
Daigo
Y.
Katagiri
T.
Tsunoda
T.
Okada
K.
Kakiuchi
S.
Zembutsu
H.
Furukawa
Y.
Kawamura
M.
Kobayashi
K.
, et al. 
Expression profiles of non-small cell lung cancers on cDNA microarrays: identification of genes for prediction of lymph-node metastasis and sensitivity to anti-cancer drugs
Oncogene
2003
, vol. 
22
 (pg. 
2192
-
2205
)
33
Takahashi
M.
Rhodes
D.R.
Furge
K.A.
Kanayama
H.
Kagawa
S.
Haab
B.B.
Teh
B.T.
Gene expression profiling of clear cell renal cell carcinoma: gene identification and prognostic classification
Proc. Natl. Acad. Sci. U.S.A.
2001
, vol. 
98
 (pg. 
9754
-
9759
)
34
Lyons-Weiler
J.
Patel
S.
Bhattacharya
S.
A classification-based machine learning approach for the analysis of genome-wide expression data
Genome Res.
2003
, vol. 
13
 (pg. 
503
-
512
)
35
Travis
W.D.
Travis
L.B.
Devesa
S.S.
Lung cancer
Cancer
1995
, vol. 
75
 (pg. 
191
-
202
)
36
Bhattacharjee
A.
Richards
W.G.
Staunton
J.
Li
C.
Monti
S.
Vasa
P.
Ladd
C.
Beheshti
J.
Bueno
R.
Gillette
M.
, et al. 
Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses
Proc. Natl. Acad. Sci. U.S.A.
2001
, vol. 
98
 (pg. 
13790
-
13795
)
37
Beer
D.G.
Kardia
S.L.
Huang
C.C.
Giordano
T.J.
Levin
A.M.
Misek
D.E.
Lin
L.
Chen
G.
Gharib
T.G.
Thomas
D.G.
, et al. 
Gene-expression profiles predict survival of patients with lung adenocarcinoma
Nat. Med.
2002
, vol. 
8
 (pg. 
816
-
824
)
38
Garber
M.E.
Troyanskaya
O.G.
Schluens
K.
Petersen
S.
Thaesler
Z.
Pacyna-Gengelbach
M.
van de Rijn
M.
Rosen
G.D.
Perou
C.M.
Whyte
R.I.
, et al. 
Diversity of gene expression in adenocarcinoma of the lung
Proc. Natl. Acad. Sci. U.S.A.
2001
, vol. 
98
 (pg. 
13784
-
13789
)
39
Potti
A.
Mukherjee
S.
Petersen
R.
Dressman
H.K.
Bild
A.
Koontz
J.
Kratzke
R.
Watson
M.A.
Kelley
M.
Ginsburg
G.S.
, et al. 
A genomic strategy to refine prognosis in early-stage non-small-cell lung cancer
N. Engl. J. Med.
2006
, vol. 
355
 (pg. 
570
-
580
)
40
Ramaswamy
S.
Tamayo
P.
Rifkin
R.
Mukherjee
S.
Yeang
C.H.
Angelo
M.
Ladd
C.
Reich
M.
Latulippe
E.
Mesirov
J.P.
, et al. 
Multiclass cancer diagnosis using tumor gene expression signatures
Proc. Natl. Acad. Sci. U.S.A.
2001
, vol. 
98
 (pg. 
15149
-
15154
)
41
Spira
A.
Beane
J.
Pinto-Plata
V.
Kadar
A.
Liu
G.
Shah
V.
Celli
B.
Brody
J.S.
Gene expression profiling of human lung tissue from smokers with severe emphysema
Am. J. Respir. Cell Mol. Biol.
2004
, vol. 
31
 (pg. 
601
-
610
)
42
Kaminski
N.
Zuo
F.
Cojocaro
G.
Yakhini
Z.
Ben-Dor
A.
Morris
D.
Sheppard
D.
Pardo
A.
Selman
M.
Heller
R.A.
Use of oligonucleotide microarrays to analyze gene expression patterns in pulmonary fibrosis reveals distinct patterns of gene expression in mice and humans
Chest
2002
, vol. 
121
 (pg. 
31S
-
32S
)
43
Laprise
C.
Sladek
R.
Ponton
A.
Bernier
M.C.
Hudson
T.J.
Laviolette
M.
Functional classes of bronchial mucosa genes that are differentially expressed in asthma
BMC Genomics
2004
, vol. 
5
 pg. 
21
 
44
Woodruff
P.G.
Boushey
H.A.
Dolganov
G.M.
Barker
C.S.
Yang
Y.H.
Donnelly
S.
Ellwanger
A.
Sidhu
S.S.
Dao-Pick
T.P.
Pantoja
C.
, et al. 
Genome-wide profiling identifies epithelial cell genes associated with asthma and with treatment response to corticosteroids
Proc. Natl. Acad. Sci. U.S.A.
2007
, vol. 
104
 (pg. 
15858
-
15863
)
45
Hakonarson
H.
Bjornsdottir
U.S.
Halapi
E.
Bradfield
J.
Zink
F.
Mouy
M.
Helgadottir
H.
Gudmundsdottir
A.S.
Andrason
H.
Adalsteinsdottir
A.E.
, et al. 
Profiling of genes expressed in peripheral blood mononuclear cells predicts glucocorticoid sensitivity in asthma patients
Proc. Natl. Acad. Sci. U.S.A.
2005
, vol. 
102
 (pg. 
14789
-
14794
)
46
Kho
A.T.
Kang
P.B.
Kohane
I.S.
Kunkel
L.M.
Transcriptome-scale similarities between mouse and human skeletal muscles with normal and myopathic phenotypes
BMC Musculoskelet. Disord.
2006
, vol. 
7
 pg. 
23
 
47
Rangasamy
T.
Cho
C.Y.
Thimmulappa
R.K.
Zhen
L.
Srisuma
S.S.
Kensler
T.W.
Yamamoto
M.
Petrache
I.
Tuder
R.M.
Biswal
S.
Genetic ablation of Nrf2 enhances susceptibility to cigarette smoke-induced emphysema in mice
J. Clin. Invest.
2004
, vol. 
114
 (pg. 
1248
-
1259
)
48
Novershtern
N.
Itzhaki
Z.
Manor
O.
Friedman
N.
Kaminski
N.
A functional and regulatory map of asthma
Am. J. Respir. Cell Mol. Biol.
2008
, vol. 
38
 (pg. 
324
-
336
)
49
Karp
C.L.
Grupe
A.
Schadt
E.
Ewart
S.L.
Keane-Moore
M.
Cuomo
P.J.
Kohl
J.
Wahl
L.
Kuperman
D.
Germer
S.
, et al. 
Identification of complement factor 5 as a susceptibility locus for experimental allergic asthma
Nat. Immunol.
2000
, vol. 
1
 (pg. 
221
-
226
)
50
DeRisi
J.L.
Iyer
V.R.
Brown
P.O.
Exploring the metabolic and genetic control of gene expression on a genomic scale
Science
1997
, vol. 
278
 (pg. 
680
-
686
)
51
Blackshaw
S.
Fraioli
R.E.
Furukawa
T.
Cepko
C.L.
Comprehensive analysis of photoreceptor gene expression and the identification of candidate retinal disease genes
Cell
2001
, vol. 
107
 (pg. 
579
-
589
)
52
Golpon
H.A.
Coldren
C.D.
Zamora
M.R.
Cosgrove
G.P.
Moore
M.D.
Tuder
R.M.
Geraci
M.W.
Voelkel
N.F.
Emphysema lung tissue gene expression profiling
Am. J. Respir. Cell Mol. Biol.
2004
, vol. 
31
 (pg. 
595
-
600
)
53
Ning
W.
Li
C.J.
Kaminski
N.
Feghali-Bostwick
C.A.
Alber
S.M.
Di
Y.P.
Otterbein
S.L.
Song
R.
Hayashi
S.
Zhou
Z.
, et al. 
Comprehensive gene expression profiles reveal pathways related to the pathogenesis of chronic obstructive pulmonary disease
Proc. Natl. Acad. Sci. U.S.A.
2004
, vol. 
101
 (pg. 
14895
-
14900
)
54
Zhang
W.
Yan
S.D.
Zhu
A.
Zou
Y.S.
Williams
M.
Godman
G.C.
Thomashow
B.M.
Ginsburg
M.E.
Stern
D.M.
Yan
S.F.
Expression of Egr-1 in late stage emphysema
Am. J. Pathol.
2000
, vol. 
157
 (pg. 
1311
-
1320
)
55
Demeo
D.L.
Mariani
T.J.
Lange
C.
Srisuma
S.
Litonjua
A.A.
Celedon
J.C.
Lake
S.L.
Reilly
J.J.
Chapman
H.A.
Mecham
B.H.
, et al. 
The SERPINE2 gene is associated with chronic obstructive pulmonary disease
Am. J. Hum. Genet.
2006
, vol. 
78
 (pg. 
253
-
264
)
56
Zhu
G.
Warren
L.
Aponte
J.
Gulsvik
A.
Bakke
P.
Anderson
W.H.
Lomas
D.A.
Silverman
E.K.
Pillai
S.G.
The SERPINE2 gene is associated with chronic obstructive pulmonary disease in two large populations
Am. J. Respir. Crit. Care Med.
2007
, vol. 
176
 (pg. 
167
-
173
)
57
Wang
I.M.
Stepaniants
S.
Boie
Y.
Mortimer
J.R.
Kennedy
B.
Elliott
M.
Hayashi
S.
Loy
L.
Coulter
S.
Cervino
S.
, et al. 
Gene expression profiling in patients with chronic obstructive pulmonary disease and lung cancer
Am. J. Respir. Crit. Care Med.
2008
, vol. 
177
 (pg. 
402
-
411
)
58
Bhattacharya
S.
Srisuma
S.
Demeo
D.L.
Reilly
J.J.
Bueno
R.
Silverman
E.K.
Mariani
T.J.
Microarray data-based prioritization of chronic obstructive pulmonary disease susceptibility genes
Proc. Am. Thorac. Soc.
2006
, vol. 
3
 pg. 
472