Sage Journals: Discover world-class research

Abstract

In this paper, we consider the problem of calibrating diagnostic rules based on high-resolution mass spectrometry data subject to the limit of detection. The limit of detection is related to the limitation of instruments in measuring low-concentration proteins. As a consequence, peak intensities below the limit of detection are often reported as missing during the quantification step of proteomic analysis. We propose the use of censored data methodology to handle spectral measurements within the presence of limit of detection, recognizing that those have been left-censored for low-abundance proteins. We replace the set of incomplete spectral measurements with estimates of the expected intensity and use those as input to a prediction model. To correct for lack of information and measurement uncertainty, we combine this approach with borrowing of information through the addition of an individual-specific random effect formulation. We present different modalities of using the above formulation for prediction purposes and show how it may also allow for variable selection. We evaluate the proposed methods by comparing their predictive performance with the one achieved using the complete information as well as alternative methods to deal with the limit of detection.

Keywords

Clinical mass spectrometry-based proteomics limit of detection censored regression borrowing of information prediction variable selection

Get full access to this article

View all access options for this article.

References

Karpievitch

Stanley

Taverner

et al.

A statistical framework for protein quantitation in bottom-up MS-based proteomics. Bioinformatics 2009; 25: 2028–2034.

Karpievitch

Dabney

Smith

. Normalization and missing value imputation for label-free LC-MS analysis. BMC Bioinformatics 2012; 13: S5–S5.

Helsel

. Less than obvious: statistical treatment of data below the detection limit. Environ Sci Technol 1990; 24: 1766–1774.

Helsel

. Statistics for censored environmental data using MINTAB and R, New Jersey: Wiley Series in Statistics in Practice, 2012.

Hopke

Liu

Rubin

. Multiple imputation for multivariate data with missing and below-threshold measurements: time-series concentrations of pollutants in the arctic. Biometrics 2001; 57: 22–33.

Hornung

Reed

. Estimation of average concentration in the presence of nondetectable values. Appl Occup Environ Hyg 2000; 5: 46–51.

Succop

Clark

Chen

. Imputation of data values that are less than a detection limit. J Occup Environ Health 2004; 1: 436–441.

Dong

Liu

Petricoin

et al.

Combining markers with and without the limit of detection. Stat Med 2014; 33: 1307–1320.

Tekwe

Carroll

Dabney

. Application of survival analysis methodology to the quantitative analysis of LC-MS proteomics data. Bioinformatics 2012; 28: 1998–2003.

10.

Therneau

Grambsch

. Modeling survival data: extending the Cox model, New York: Springer, 2000.

11.

Therneau TM. A package for survival analysis in S, Version 2.38, http://CRAN.R-project.org/package=survival (2015, accessed 10 December 2016).

12.

Henningsen A. Estimating censored regression models in R using the censReg package. Vignette to the R package censReg, CRAN, http://cran.r-project.org/package=censReg (2010).

13.

Tobin

. Estimation of relationships for limited dependent variables. Econometrica 1958; 26: 24–36.

14.

Nicolardi

Velstra

Mertens

et al.

Ultrahigh resolution profiles lead to more detailed serum peptidome signatures of pancreatic cancer. Transl Proteomics 2014; 2: 39–51.

15.

Kakourou

Vach

Nicolardi

et al.

Accounting for isotopic clustering in Fourier transform mass spectrometry data analysis for clinical diagnostic studies. Stat Appl Genet Mol Biol 2016; 15: 415–430.

16.

Hughes

. Mixed effects models with censored data with application to HIV RNA levels. Biometrics 1999; 55: 625–629.

17.

Kelly MG, Hand DJ and Adams NM. The impact of changing populations on classier performance. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining, San Diego, California, USA, 15–18 August 1999, pp.367–371.

18.

Adams

Tasoulis

Anagnostopoulos

et al.

Temporally-adaptive linear classification for handling population drift in credit scoring. In: Proceedings of COMPSTAT 2010, Paris, France, 22–27 August 2010, pp.167–176 2010.

19.

Le Cessie

van Houwelingen

. Ridge estimators in logistic regression. Appl Stat 1992; 41: 191–201.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.16 MB

Adapting censored regression methods to adjust for the limit of detection in the calibration of diagnostic rules for clinical mass spectrometry proteomic data

Abstract

Keywords

Get full access to this article

References

Supplementary Material