Abstract
Gene expression profiling data can be used in toxicology to assess both the level and impact of toxicant exposure, aligned with a vision of 21st century toxicology. Here, we present a whole blood-derived gene signature that can distinguish current smokers from either nonsmokers or former smokers with high specificity and sensitivity. Such a signature that can be measured in a surrogate tissue (whole blood) may help in monitoring smoking exposure as well as discontinuation of exposure when the primarily impacted tissue (e.g., lung) is not readily accessible. The signature consisted of LRRN3, SASH1, PALLD, RGL1, TNFRSF17, CDKN1C, IGJ, RRM2, ID3, SERPING1, and FUCA1. Several members of this signature have been previously described in the context of smoking. The signature translated well across species and could distinguish mice that were exposed to cigarette smoke from ones exposed to air only or had been withdrawn from cigarette smoke exposure. Finally, the small signature of only 11 genes could be converted into a polymerase chain reaction-based assay that could serve as a marker to monitor compliance with a smoking abstinence protocol.
Introduction
Cigarette smoke exposure has a negative impact on human health and is linked to the development of several fatal diseases. 1 The response to cigarette smoke exposure has been monitored by widely used biomarkers such as the levels of “nicotine equivalents” 2 in urine or the metabolites of the tobacco-specific lung carcinogen 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone. 3 However, the use of these biomarkers is associated with several drawbacks, including their short half-life and interindividual differences in metabolism. 4,5 Hair nicotine and cotinine have been used to monitor smoke exposure among infants and adults and found to be more precise measures of exposure than urine cotinine levels. 6 –8 Salivary cotinine concentrations have also been proposed as a noninvasive biomarker for environmental tobacco exposure in children. 9 While hair and saliva nicotine and cotinine measurements may provide accurate verification of nonsmoking status and provide useful measure of secondhand smoke exposure, they are restricted to a single constituent present in cigarette smoke. Moreover, such biomarkers do not offer insights into the biological mechanisms that are impacted in response to cigarette smoke exposure, a feature advocated by 21st century toxicity testing principles. 10
New technologies, such as whole genome microarrays, have therefore been incorporated into toxicity testing to increase efficiency and to provide a more data-driven approach to exposure response assessment. 11 Several studies have described chemical-specific gene expression profiles associated with the adverse effects of active substances in various tissues, 12 –14 and the large airway transcriptome of smokers has been well characterized. In line with the theory of the field of injury, molecular changes in response to smoke exposure can be detected even when no histological abnormalities are visible. 15 –19
Sample acquisition from the primary site of exposure (e.g., the airways) is usually invasive and is therefore not convenient for exposure assessment and monitoring. As a minimally invasive alternative, peripheral blood sampling can be employed in the general population to establish systemic biomarkers. Several blood-based biomarkers of potential harm have been proposed, including those related to inflammation, oxidative stress, and platelet activation. 20 A more global picture of the impacted biology can be achieved by gene expression studies, and various exposures have been shown to alter gene expression profiles in blood. 21 –23 Indeed, several studies have shown that gene expression in blood can distinguish between average subject populations, such as those with early stage non-small cell lung cancer, from those with non-malignant lung disease, 24,25 subjects with chronic obstructive pulmonary disease (COPD) from healthy smokers, 26 and even smokers with no detectable disease from nonsmokers. 27 –33
A transcriptome-based exposure response signature could be as simple as the presence or absence of a single gene expression, or, more likely, could be characterized by the expression levels of a collection of genes, each contributing to a specific diagnosis. It is therefore distinct from a molecular biomarker that is generally based on differentially expressed genes (DEGs) between case and control populations.
Because of the large interindividual variations that can be expected in human populations, signatures should be robust and well designed, maintaining high specificity (Sp) and sensitivity (Se) across independent subject cohorts, laboratories, and nucleic acid extraction methods. However, many published signatures lack proper validation, making them overly optimistic. 34,35
In the interest of developing more robust and predictive gene signatures, the Industrial Methodology for PROcess Verification in Research (IMPROVER) 36,37 aimed to identify the best classification pipeline for outcome prediction based on microarray data. However, despite the success of the IMPROVER Diagnostic Signature Challenge (DSC), the development of computational methodologies that can be robust and versatile in clinical applications remains challenging. 38,39
The aim of the present study was to identify a whole blood-based gene signature for current smokers (CS) with the potential to distinguish between subjects who smoked and those who had stopped smoking (former smokers (FS)) or never smoked (nonsmokers (NS)). Taking advantage of the lessons learnt from the IMPROVER DSC, we developed a new methodology based on a prediction model that uses high fold-change genes extracted from several publicly available gene expression datasets that profiled samples from CS and NS or FS in the same tissue of interest. Preselecting genes based on high fold-change genes from various independent studies has the potential to enforce the robustness of the signatures across studies. The validations were performed using independent datasets.
To assess the impact of exposures on human health, several experimental models other than clinical studies are regularly used. Prevailing rodent studies have both strengths and limitations, but the more translatable the signatures, the better they serve predictive toxicology and disease research. To comply with these demands, we showed that the whole blood-based signature can discriminate smoke-exposed mice from sham-exposed and even from mice that were withdrawn from smoke exposure.
Materials and methods
Generation of the smoker whole blood transcriptome dataset
BLD-SMK-01
We have produced a blood gene expression dataset, BLD-SMK-01, from PAXgene blood samples obtained from a banked repository (BioServe Biotechnologies Ltd, Beltsville, Maryland, USA) based on well-defined inclusion criteria. At the time of sampling, the subjects were between 23 and 65 years of age. Subjects with a disease history and those taking prescription medications were excluded. CS had smoked at least 10 cigarettes daily for at least 3 years. FS had ceased smoking at least 2 years prior to sampling and before quitting had smoked at least 10 cigarettes daily for at least 3 years. CS and NS were matched by age and gender. A total of 31 blood samples were obtained from CS, 30 from NS, and 30 from FS.
QASMC study
The Queen Ann Street Medical Center (QASMC) clinical case–control study was conducted at The Heart and Lung Centre (London, UK), according to Good Clinical Practices, and was registered at ClinicalTrials.gov with the identifier NCT01780298. It aimed to identify a biomarker or a panel of biomarkers that would enable differentiation between subjects with COPD (CS with a ≥10 pack/year smoking history at GOLD Stage 1 or 2) and three comparative groups of matched subjects: NS, FS, and CS. Sixty subjects in each group were enrolled (240 subjects in total). The additional goals of this study were to assess standard biomarkers of inflammation and to compare inflammatory cell responses and selected markers of inflammation in blood, induced sputum, and nasal samples. The 240 patients included males (58%) and females (42%) aged between 40 and 70 years. All subjects were matched by ethnicity, gender, and age (within 5 years) with the recruited COPD subjects. Blood samples were sent to AROS Applied Biotechnology AS (Aarhus, Denmark) for processing and to Affymetrix Human Genome U133 Plus 2.0 GeneChips (Santa Clara, California, USA) for hybridization, as described below.
RNA isolation
Total RNAs (including microRNAs) were isolated using the PAXgene blood miRNA kit (catalog number 763134; Qiagen, Germany) according to the manufacturer’s instructions. The concentration and purity of the RNA samples were determined using an ultraviolet spectrophotometer (NanoDrop ND1000; Thermo Fisher Scientific, Waltham, Massachusetts, USA) by measuring the absorbance at 230, 260, and 280 nm. The RNA integrity was further checked using an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, California, USA). Only RNAs with an RNA integrity number >6 were processed for further analysis.
RNA preparation and Affymetrix hybridization
Affymetrix probe sets targeting the 3′ ends of transcripts were prepared from 50 ng of RNA using the Ovation® Whole Blood Reagent and Ovation® RNA Amplification System V2 (NuGEN, San Carlos, California, USA). The quantity of complementary DNA (cDNA) was measured with a NanoDrop® 1000 or 8000 spectrophotometer (Thermo Fisher Scientific) or a SpectraMax® 384Plus microplate reader (Molecular Devices, Santa Clara, California, USA). The cDNA quality was determined by assessing the size of un-fragmented cDNA using an Agilent 2100 Bioanalyzer. The size distribution of the final fragmented and biotinylated product was also monitored using electropherograms. After labeling the cDNA, the fragments were hybridized to the GeneChip® Human Genome U133 Plus 2.0 Array (Affymetrix) according to the manufacturer’s guidelines. Samples for target preparation were fully randomized for the Affymetrix gene expression microarray. After the investigation of artifacts on the chip scan, the data were processed through a standard quality control pipeline. Briefly, raw data files were read using the ReadAffy function of the affy package 40 from the Bioconductor (R 3.1.2 and Bioconductor 3.2) suite of microarray analysis tools 41 available for the R statistical environment. 42
Population level analysis
For the population-level analysis (i.e., study of the average fold-changes), the data were subsequently normalized using GC-robust microarray analysis. Background correction and quantile normalization were used to generate microarray expression values 43 from all arrays passing quality control checks. For the individual signature prediction model, the data were normalized with MAS5. 44 An overall linear model was fitted for each comparison of interest to generate raw p values for each probe set on the expression array based on moderated t-statistics (Smyth, 2004). The Benjamini–Hochberg false discovery rate method was used to correct for multiple testing effects.
Individual sample prediction modeling
To achieve robustness in the prediction model, independent gene expression datasets from blood (GSE15289) and peripheral blood mononuclear cells (PBMCs; GSE42057) were obtained from the National Center for Biotechnology Information Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/gds/?term=GEO) and processed as described above (see Table 1). The Norwegian Women and Cancer (NOWAC) study (GSE15289) 45 dataset was composed of whole blood samples from postmenopausal women aged between 48 and 63 years, and included 211 NS and 74 CS. The Bahr et al. (GSE42057) 46 dataset included PBMC samples collected from 42 control subjects and 94 subjects with COPD of varying severity. All subjects were non-Hispanic Caucasians, and either CS or FS. These subjects were used to identify genes that exhibited large changes in average expression between samples from CS and NS or FS in each dataset. Average gene expression changes between CS and FS were used to guide signature extraction as follows:
Overall summary of the studies used to build, validate, or apply smoke exposure response signatures.
CS: current smokers; NS: nonsmokers; FS, former smokers; COPD: chronic obstructive pulmonary disease; QC: quality control; PBMC: peripheral blood mononuclear cells; HS: Homo sapiens; Mm: Mus musculus.
L
1 and L
2 were the set of the M (M = 1000 in our study) highest fold-change genes from the two independent datasets (GSE15289 and GSE42057). The L
1 and L
2 lists were then used for a priori filtering of the training dataset as follows: For each N from 1 to M, the performance of a linear discriminant analysis (LDA)
47
model was evaluated using 5-fold cross-validation (100 times) on
N was selected for which the MCC(N) was maximum: The core gene list for the signature was defined by The model was built by computing an LDA model on the core gene list.
The datasets were centered prior to learning and testing, so that an LDA model had a zero intercept.
Taqman® quantitative reverse transcription-PCR assay
Reverse transcription reactions with 500 ng of starting RNA were performed using the iScript™ cDNA Synthesis Kit (catalog number 170-8890; Bio-Rad, Hercules, California, USA) according to the manufacturer’s instructions, and cDNAs were diluted to 10 ng/µL. A commercial human universal RNA (UHR) reference (catalog number 740000; Agilent Technologies) was added to the sample as a calibrator to reliably compare the data across multiple experiments and instruments in a reliable manner. The probes used in the Taqman® assays spanned exons, and five housekeeping genes (B2 M, GAPDH, FARP1, A4GALT, and GINS2) were chosen for the data normalization step. The quatitative polymerase chain reaction (qPCR) step was carried out using Taqman® assays and TaqMan® Fast Advanced Master Mix (catalog number 444963, Life Technologies, Carlsbad, California, USA). Briefly, cDNAs were diluted to allow the application of 1.25 ng per well in a 384-well plate. In parallel, a master mix of Taqman® assay reagents and Taqman® Fast Advanced Mix was prepared for each assay, and the final reaction volume was 10 µL. qPCR was run using a Viia7 instrument (Life Technologies) and the automatic baseline and default threshold cycle (C t) settings were applied for analysis. C t values were normalized for each gene (by subtraction) with respect to the UHR C t values and then to the GAPDH housekeeping gene values (leading to the so-called ΔΔC t value).
Results
The establishment, validation, and translation of the exposure signature leveraged many datasets. Table 1 summarizes the studies involved in developing, validating, or applying these signatures.
Exposure signature establishment
Following RNA extraction and quality checking of the raw data files from the BLD-SMK-01 study, 82 samples remained for analysis, of which 28 were CS, 28 NS, and 26 FS. The population level transcriptomics analysis of BLD-SMK-01 samples revealed that there were no DEGs between NS and FS in whole blood (Figure 1), suggesting that it would be difficult to distinguish between them using the blood transcriptome. Conversely, many DEGs were identified between CS and NS or FS (Figure 1). Therefore, the signature was extracted based only on CS and NS samples from the BLD-SMK-01 study. The FS group was kept aside for validation purposes.

Volcano plots for the DEGs in BLK-SMK-01. Volcano plots showing the estimated log2 (fold-change) against −log10 (adjusted p value). p values were computed based on moderated t-statistics and were adjusted by the Benjamini–Hochberg method. Left panel: Comparison of gene expression profiles between CS and NS. Middle panel: Comparison of gene expression profiles between CS and FS. Right panel: Comparison of gene expression profiles between FS and NS. DEGs: differentially expressed genes; CS: current smokers; NS: nonsmokers; FS: former smokers.
By applying the statistical modeling methodology for individual sample prediction described in the Materials and methods section, we obtained a prediction model based on the following core genes: LRRN3, SASH1, PALLD, RGL1, TNFRSF17, and CDKN1C. The 5-fold cross-validation MCC of this model was 0.77 (Se = 0.91; Sp = 0.85) when classifying CS samples versus NS samples.
The core genes in the signature were among those exhibiting high fold-changes in both NOWAC (GSE15289) and Bahr et al.’s (GSE42057) studies. The predictions based on the core genes improved the performance of an LDA model based on all 77 genes that are in common between the 1000 highest fold change genes in those two datasets (Se = 0.73; Sp = 0.81). When we studied predictive models obtained by leveraging each list of high fold-change genes individually, IGJ, RRM2, ID3, SERPING1, and FUCA1 were repeatedly identified as potential candidates in signatures with a high specificity and sensitivity. These five genes were also among those with a high fold-change in the blood transcriptomes of both NOWAC (CS vs. NS) and Bahr et al. (CS vs. FS) studies, and were used to complement the core gene signature to an extended signature. The cross-validation MCC of the model based on the extended signature (LRRN3, SASH1, PALLD, RGL1, TNFRSF17, CDKN1C, IGJ, RRM2, ID3, SERPING1, and FUCA1) was 0.73 (Se = 0.88; Sp = 0.84) when classifying CS versus NS. The genes that were part of the core and extended signatures are further described in Table 2.
Extended blood-based smoking signature and known function of the gene product.a
PBMC: peripheral blood mononuclear cell.
aCore signature genes are shown in bold.
We compared our results with the cross-validation results of a model obtained when learning a sparse signature from BLD-SMK-01 alone (i.e. without using the two public datasets). We applied the approach comparable used by the best performing team of the IMPROVER DSC. 38,39 The 5-fold cross-validation performance of this model reached Sp = 0.96 and Se = 0.93 in predicting smokers versus NS; slightly above the performance of models based on the core and extended signatures.
Although the cross-validation specificity and sensitivity (Sp = 0.88; Se = 0.84) of the prediction model resulted in a slightly lower performance than the model obtained without using independent datasets (Sp = 0.96; Se = 0.93), its range of application was wider because of its robustness, as demonstrated in its application to independent studies and the signature translatability to mouse as shown below.
Verification of the exposure response signature in independent studies
To validate the core and extended signatures, we used the FS group from the BLD-SMK-01 dataset, as well as the blood dataset from the QASMC study. After checking the quality of the QASMC transcriptomics samples, 52 COPD, 58 CS, 58 FS, and 59 NS CEL files were available for predictions. To evaluate the prediction performance, the samples were stratified into two groups: CS (COPD and healthy smokers) and non-CS (NCS), the latter comprising both FS and NS. These groups enabled us to evaluate the robustness of the signature with respect to the COPD status. Each centered dataset was predicted using the model built on either the core gene signature or the extended signature. The classification performance of the signature against the QASMC study clearly confirmed that the model was robust regardless of COPD status (Se = 0.9, Sp = 0.9 for the core signature and Se = 0.90, Sp = 0.91 for the extended signature; Table 3, Figure 2).
Prediction results using independent datasets (BLD-SMK-01 (FS) and QASMC) for various signatures.a
CS: current smokers; NCS: non-current smokers; FS: former smokers; LDA: linear discriminant analysis.
aLDA model on the set of genes described in Beineke et al, 2012 are reported in the far right column. Both core and extended signatures led to higher specificities and sensitivities than those derived from BLD-SMK-01 samples alone and the signature model based on the set of genes from Beineke et al.

LDA scores for the training set (BLD-SMK-01, CS and NS) and test samples (BLD-SMK-01 FS and QASMC). A positive score is predictive of a CS status, while a negative score indicates a NCS status. LDA: linear discriminant analysis; CS: current smokers; NS: nonsmokers; FS: former smokers; NCS: non-current smokers.
The effects of additional covariates such as gender and age were also examined. Both BLD-SMK-01 and QASMC studies were balanced with respect to gender and age. No significant association between age or gender and smoking status was present, as indicated by:
BLD-SMK-01: χ 2 (gender, smoking status) p = 1, t-test (Age vs. smoking status) p = 0.8.
QASMC: χ 2 (gender, smoking status) p = 0.9, t-test (Age vs. smoking status) p = 0.46.
Each gene in the signature was also tested for association with gender and age in BLD-SMK-01. None of the genes showed analysis of variance p values < 0.05, except for PALLD, which showed a minor gender effect.
Previous work on smoking signatures from whole blood samples includes a study by Beineke et al. based on blood datasets from smokers and NS without cardiovascular disease. 62 We were unable to leverage the microarray data from this to validate further the prediction performance of our signature, because of the incompatible array platform (Agilent). In this earlier study, the authors reported a five gene signature (LRRN3, CLDND1, MUC1, GOPC, and LEF1) used in conjunction with age and gender as covariates in a logistic model with a cross-validated Sp value of 0.95 and a Se value of 0.79. The signature model was further validated in 180 independent subjects (from the PREDICT study, registered on ClinicalTrials.gov with identifier NCT00500617), based on quantitative reverse transcription (qRT)-PCR measurements with a Se value of 0.63 and a Sp value of 0.94. Our core and extended signature outperformed those described by Beineke and co-workers based on Se and Sp values from the LDA model (Table 3).
Performance of the signature in a rodent inhalation study
Longitudinal clinical and epidemiological observations are critical in linking the exposure response signature to adverse outcomes in humans. Because of easier sampling and shorter times to disease, animal models are often used to elucidate exposure effects and disease mechanisms in primary tissues. To determine the translatability of our blood-based smoking exposure response signature into an animal model that manifests important aspects of human smoking-related emphysema, 63,64 we used blood samples from C57Bl6 mice exposed to cigarette smoke for 7 months. The study also included sham-exposed animals and animals exposed to cigarette smoke for 2 months then switched to fresh air (Philips et. al, 2015). 65 For each group, blood samples were collected from 10 animals at 2, 3, 5, and 7 months. For the CS exposure and sham arms, samples were also collected after 4 months.
The exact model equation developed for the human samples did not perform well, perhaps because five of the genes were expressed at very low levels in mouse samples. We therefore verified that the remaining set of six genes (LRRN3, PALLD, ID3, IGJ, RRM2, and FUCA1) belonging to the extended signature were still able to discriminate between exposed and non-exposed mice based on the blood transcriptome. To this end, we retrained an LDA model from the blood sample transcriptomics. Interestingly, the performance of the models based on these six genes in the human samples was only slightly lower than the classification of the human blood samples (correct classification rate BLD-SMK-01 FS = 0.77, QASMC CS = 0.89, QASMC NCS = 0.79; Figure 3 and Table 4).

LDA scores of the signature trained on the exposed and sham mouse blood samples collected at month 2, 3, 4, 5, and 7. A positive score is predictive of a CS status, while a negative score indicates a NCS status. The mainly negative score from the cessation arm samples represents the disappearance of exposure effect detection by the signature. LDA: linear discriminant analysis; CS: current smokers; NCS: non-current smokers.
Cross-validation results (5-fold cross-validation repeated 10 times independently) from the LDA model derived from mouse blood sample transcriptomics and associated prediction results.
CS: current smoker; NCS: non-current smoker; LDA: linear discriminant analysis.
We further retrained a logistic and an LDA model on the same set of genes as Beineke et al. 62 While still performing reasonably in cross-validation, the blood signature failed to translate in mice (data not shown).
Validation of the exposure response signature by PCR-based assay
To determine whether the discovered signature could be translated into a qRT-PCR-based exposure biomarker, gene expression levels in the extended signature were tested in a subset of 20 randomly selected human samples (10 CS and 10 NS). An LDA model was trained on normalized qRT-PCR (see Materials and methods section) data and assessed by 10-fold cross-validation (1000 times; 10-fold was chosen because of the small sample size), leading to a Sp value of 0.85 and Se value of 0.96. When applying the same technique to the core signature, a Sp value of 0.8 and a lower Se value of 0.62 were obtained (Table 5).
Cross-validation (10-fold cross-validation repeated 1000 times independently) results for LDA model of normalized qRT-PCR data.
CS: current smoker; NS: nonsmoker; LDA: linear discriminant analysis; qRT-PCR: quantitative reverse transcription polymerase chain reaction.
Discussion
Compared with single molecule measurements, gene expression profiling provides a global and more complete view of the biological processes in normal and pathological situations. When the expression trends of multiple genes are taken together, it is also possible to derive a signature or a classifier for a given physiological state from an exposure response to a disease state. While the primarily affected tissue offers a sample that more accurately represents the normal, exposed, or pathological state, it is often not realistic to classify subjects using tissue biopsies. Because of the ease of blood sampling using minimally invasive techniques, blood-based signatures hold great promise for biomarker discovery. 35,66 In this study, we derived a whole blood-based diagnostic signature that can serve as a biomarker for the smoking exposure response.
No significant association between age or gender and the genes in the blood-based signature was observed. Although age was an important covariate in two of the public datasets (GSE15289 and GSE42057), in which CS were on average older than NS or FS, this covariate was not included in the predictor, because it had no significant association with smoking status in the BLD-SMK-01 study. The core and extended signatures were also robust with respect to inter-study and interindividual variations as well as COPD status. They were validated in two independent datasets with a high specificity and sensitivity and over performed the signature reported by Beineke et al. 62
Several genes present in our signature have been reported previously in the context of peripheral blood and smoking (LRRN3, 29,62,67 CDKN1C, PALLD, 28 SASH1, 68 and SERPING1 33 ) or smoking-related disease (FUCA1 68 ). LRRN3 expression was increased in CS compared with NCS, and LRNN3 overexpression has been reported in other smoker signatures from whole blood 45,62 as well as being shown to be relatively hypomethylated in CS and relatively hypermethylated following smoking cessation. 67 LRRN3 encodes an orphan receptor, which is essential for neural development 48 ; however, information about its function in immune cells is very limited. 49
Although our signature performed well on mouse blood samples, we observed that the coefficients of the mouse model and the human models did not correlate positively. A closer look revealed that the most prominent gene in the extended signature, LRRN3, which is over-expressed in blood samples from smokers, was downregulated in mice exposed to smoke. This may due to different numbers of white blood cells (WBCs) in mice and humans exposed to smoke. The total WBC count has been shown to be higher in healthy smokers than in NS. 20,69,70 Moreover, LRRN3 has been implicated in CD8+ T cell activation, 71 a cell population that is increased in smokers as compared to NS and decreased upon smoking cessation according to a study based on cell type-specific antibodies and flow cytometry. 72 Analysis of cell populations in smoke-exposed mice used in this study showed no change in the relative numbers or types amounts of circulating WBCs in smoke-exposed mice compared with sham mice (data not shown).
The use of rodent models is essential in predictive toxicology for testing new chemical compounds and evaluating disease endpoints, neither of which is feasible in human subjects. Ideally, a translatable exposure-response biomarker in a surrogate tissue between humans and experimental animals could aid our understanding of the link between the biomarker and the extent of damage in the target tissue. Our blood signature was translated to a rodent model with a high level of specificity and sensitivity, and similar to the human situation, could be used as a biomarker for smoking exposure response to complement the commonly used exposure markers, such as nicotine metabolites and carboxyhemoglobin levels in the blood.
Finally, a small signature, such as described here, allows the use of qRT-PCR. While Affymetrix gene expression profiling is a powerful technology to establish gene signatures, it is not the method of choice for using the signature in practical applications. When there is no need to follow the expression changes of the entire genome, the assay could potentially be developed into a kit with considerable savings in cost and time.
Conclusion
In conclusion, our systems toxicology approach enabled the construction of a robust whole blood-based smoker gene signature based on 11 genes that could distinguish CS from NCS with remarkable accuracy. The signatures presented in this study will not only allow us to monitor the smoking exposure response in humans, but should also permit the translation of the exposure response to a preclinical system.
Footnotes
Conflict of interest
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The study was fully funded by PMI and all authors are employees of PMI.
