Sage Journals: Discover world-class research

Abstract

The ordinary least squares (OLS) method is routinely used to estimate the unknown concentration of nucleic acids in a given solution by means of calibration. However, when outliers are present it could appear sensible to resort to robust regression methods.

We analyzed data from an External Quality Control program concerning quantitative real-time PCR and we found that 24 laboratories out of 40 presented outliers, which occurred most frequently at the lowest concentrations.

In this article we investigated and compared the performance of the OLS method, the least absolute deviation (LAD) method, and the biweight MM-estimator in real-time PCR calibration via a Monte Carlo simulation. Outliers were introduced by replacement contamination. When contamination was absent the coverages of OLS and MM-estimator intervals were acceptable and their widths small, whereas LAD intervals had acceptable coverages at the expense of higher widths. In the presence of contamination we observed a trade-off between width and coverage: the OLS performance got worse, the MM-estimator intervals widths remained short (but this was associated with a reduction in coverages), while LAD intervals widths were constantly larger with acceptable coverages at the nominal level.

Keywords

Calibration Least absolute deviation MM-estimator Real-time PCR Robust regression

Introduction

Controlled calibration (1), i.e. inverse predicting, enables to determine the unknown concentration of a particular substance in a given solution. Typically, several dilutions of a solution are prepared to cover a required range of known concentrations, and samples of these dilutions are used as a training dataset (standard preparations) to determine the relationship between concentration and the measured assay response. This relationship (standard curve) is then utilized to estimate the concentration in the “unknown” samples from their measured responses. The assay here considered is the real-time polymerase chain reaction (PCR), a molecular technique widely used to quantify nucleic acid content in a broad range of clinical and research applications.

For a detailed presentation of the rationale of the real time PCR assay the reader is referred to references 2-4. It suffices to say here that the basic equation of real-time PCR kinetics enables writing a linear “standard” relationship between the value of threshold ct (defined as the fractional cycle where a threshold amount of amplified cDNA is produced) and the explanatory variable (expressed as the logarithm of the starting nucleic acid concentration of the l-th standard concentration). The ordinary least squares (OLS) method is used to estimate both the standard regression line and the concentration of the unknown samples, along with the pertinent confidence intervals. However, the validity assumptions of the OLS method may fail when outlying observations are present; these may be vertical outliers only, i.e. outliers in the response variable (y-outliers), since the range of the explanatory variable is under the experimenter's control. If there is a relationship between the size of outliers and the samples' concentration, one possible approach consists in modeling the variance as a function of the mean (5). Alternatively, it could appear sensible to resort to robust regression methods, in particular to the MM-estimation procedure, combining robustness and efficiency (6, p. 119). As a matter of fact, the “[…] robust prediction procedure must maintain a balancing act: they must extract enough information from the relevant part of the data without being unduly influenced by outliers. Such intervals should maintain the nominal coverage while being reasonably accurate.” (7, p. 130)

This article reports the results of a study aiming at investigating and comparing the performance of the OLS method, the least absolute deviation (LAD) method (8), and the biweight MM-estimator (9) in real-time PCR calibration via a Monte Carlo simulation.

The paper is organized as follows: the first section reports the findings of a preliminary analysis justifying the features of the simulation scheme adopted; the materials and methods section provides a systematic presentation of calibration with the 3 different regression methods; the results section presents the simulation results, while the concluding remarks are reported in the discussion section.

Preliminary Considerations

An analysis of the second phase of an Italian project of External Quality Control concerning quantitative real-time PCR and involving 40 Italian laboratories was performed in 2011 (10).

The standard curve was based on 6 concentrations (l=1, 2, 3, 4, 5, 6) respectively containing 10², 10³, 10⁴, 10⁵, 10⁶, and 10⁷ copies/5 μL. Three replicates of ct were assayed for each standard concentration (j=1, 2, 3) as well as for the unknown concentration sample (K=3), so that a total of 21 values was supplied by each laboratory.

The statistical model corresponding to the standard curve is a simple linear regression:

y_{lj} = β_{0} + β_{1} \times_{l} + ε_{lj}

[1]

where y_lj specifies the value of ct measured for the j-th replication of the l-th standard concentration, x_l defines the logarithm of the starting nucleic acid concentration of the l-th standard, and ε_lj is the random error component assumed to be independently and identically normally distributed, with a mean of zero and constant error variance σ².

The validity assumptions of the OLS method were investigated by means of robust single-case diagnostics (lmrob procedure in software R). Observations whose absolute robust standardized residuals exceeded 2.5 were labeled “vertical outliers” (11). The findings can be summarized as follows: (i) 24 laboratories out of 40 presented vertical outliers; (ii) the maximum number of outliers per laboratory was 2; (iii) outliers occurred most frequently at the lowest concentrations (10², 10³, and 10⁴ copies/5 μL), however we were unable to pick up a possibly reliable relationship between variance and mean, owing to the low frequency of outlying observations in each laboratory. These findings supported the structure adopted in the Monte Carlo simulation.

The Monte Carlo Scheme

The Monte Carlo simulation mimicked the design adopted in the Italian project of External Quality Control. We fixed 6 standard concentrations, each having 3 replicates, and an unknown sample, also having 3 replicates. The true regression coefficients and the standard deviations' values adopted in the simulation correspond to those used in a previous paper (12). In particular, the simulation starting value for β₁ was -3.637, being the median value of the b₁ distribution obtained from the data of the participants in a European External Quality Assessment (EQA) program for quantitative real-time PCR assays (13). The starting value for β₀ was 43.814, pertaining to the laboratory whose b₁ was the median value. Note that the efficiency corresponding to β₁ is, E = 10^(1/-3.637) = 1.883 slightly lower than the ideal value. The standard deviation was σ=0.2 or σ=0.7, approximately corresponding to, respectively, the 10^th and 90^th centiles of the distribution of the estimated standard deviations of the EQA participants.

According to the terminology used by Rousseeuw and Leroy (14, p. 117), outliers were introduced by “replacement contamination”, which, for sake of brevity, hereafter will be indicated as contamination only (see also ref. 6, p. 19). Namely, the majority of errors were randomly sampled from the basic normal distribution (bulk) N(0, σ 2), and a portion of them was sampled from the contaminating normal distribution N(0,(cσ)2).

Pivotal elements of the Monte Carlo simulation were:

contamination modalities: a) all concentrations, b) lowest concentrations only (10², 10³, and 10⁴ copies/5 μL), coherently with the preliminary analysis findings;

ii)

contamination percentage: 0%, 5%, 10%, and 20%;

iii)

contamination intensity: c=2, 4, 6, 8;

iv)

calibration points: x₀=2.5, 4.5, 6.5.

A total of 126 scenarios were explored.

For each scenario M=1,500 independent data sets were simulated. For each data set, 21 random numbers (18 for the standard concentrations [ε_lj] and 3 for the unknown sample [ε_0k]) were generated. A given percentage of contamination (λ) was chosen; a random number was generated from a uniform distribution U(0,1) and if it was ≤1 - λ, ε_lj was sampled from the bulk distribution, otherwise ε_lj was sampled from the contaminating normal distribution. The ct values were then computed as follows: y_lj = β₀ + β₁x_l + ε_lj for the standard concentrations, and as y_0k = β₀ + β₁x₀ + ε_0k for the unknown sample. Random numbers were obtained using the rnorm function of software R and by setting the seed equal to the clock time.

Remark 1

It was assumed that sources producing errors in a biological assay act similarly on standard preparations and unknown samples having the same concentrations. Therefore, we used the same contamination strategy, in terms of intensity and percentage of contamination, for both the standard preparations and the unknown samples.

Materials and Methods

In matrix notation, the model [1] can be compactly written as:

y=x β + ε

where y (response) and ε (random errors) are two vectors of size n=18, β is a parameter vector of size p=2 to be estimated and X is the design matrix of size (18×2) (including intercept).

Define:

b estimate of β;

predicted value: ${\hat{y}}_{1} = x_{1}^{'} b$ where x_iis the i-th row of the matrix Xand the prime denotes transpose;

residual: $e_{1} = y_{1} - {\hat{y}}_{1}$ , estimate of ε_i, component of ε.

OLS Estimator

Validity assumption: ε is a vector of independent and identically distributed normal random variables ε ∼ N(0,σ²I).

The OLS estimates are such to satisfy the loss function:

\sum_{i=1}^{n} e_{i}^{2} = \min

and they are obtained by solving the equations:

\sum_{i=1}^{n} e_{i}^{} \times_{i} = 0

[2]

namely: b = (X' X)^–1X'y, with the estimated covariance matrix: $\hat{cov(b)} = s^{2} {(x′x)}^{- 1}$

where s², estimate of σ², is, $\frac{e′e}{n-p}$ [3], with 16 degrees of freedom (d.f.).

It is known that $b \sim N(β, σ^{2} {(X′X)}^{- 1})$ .

Calibration

The equation $\hat{y} = x^{'} b = b_{0} + b_{1} x$ can be used to predict ${\hat{y}}_{0}$ for a known x₀. Calibration enables predicting the unknown ${\hat{x}}_{0}$ , for a given y₀. The same approach applies to ${\bar{y}}_{0}$ . Namely:

{\hat{x}}_{0} = \frac{\bar{y} - b_{0}}{b_{1}} = (\bar{x} + \frac{{\bar{y}}_{0} - \bar{y}}{b_{1}})

[4]

where $\bar{y}$ and $\bar{x}$ are, respectively, the means of the response and the explanatory variable; they are obtained from the “standard”; ${\bar{y}}_{0}$ is obtained from the unknown sample.

Note that the ratio-type estimate is biased because, in general: $E ({\hat{x}}_{0}) \neq \frac{{E(y}_{0} {-b}_{0})}{{E(b}_{1})} {=x}_{0}$

However, the confidence interval for x₀ may be obtained by the exact method relying on the application of Fieller's theorem (15) (see ref. 16 for details), or by resorting to the approximate variance of ${\hat{x}}_{0}$ computed by the delta method through a first order Taylor series expansion. Namely:

\begin{array}{l} {(x^}_{0})= (\frac{σ^{2}}{β_{1}^{2}}) {[\frac{{({x^}_{0} - \bar{x})}^{2}}{{Dev}_{x x}} + \frac{1}{n}] + \frac{1}{m}} \\ = \frac{1}{n} (\frac{σ^{2}}{β_{1}^{2}}) {[\frac{{({x^}_{0} - \bar{x})}^{2}}{s_{x x}} + 1] + \frac{n}{m}} [5] \end{array}

[5]

where ${Dev}_{xx} = \sum_{i=1}^{n} {{(x}_{i} -x¯)}^{2}$ .

It can be shown (17, p. 287) that the approximate confidence limits given by the delta method are, for most purposes, valid approximations to the exact confidence limits given by Fieller's theorem when $g= \frac{^{s^{2} t^{2}} df,1 \frac{α}{2}}{{Dev}_{xx} b_{1}^{2}}$ <0.1, where t is a Student's r.v.

The (1-α)100% approximate confidence interval of x₀ is:

{\hat{x}}_{0} \pm t_{df,1 - \frac{α}{2}} \sqrt{\hat{var ({\hat{x}}_{0})}}

[6]

where $\hat{var ({\hat{x}}_{_{0}})}$ is obtained from [5], substituting σ² with s² (given by [3]), and β₁ with β₁.

Remark 2

The term $σ^{2} (\frac{{({\hat{x}}_{0} - \bar{x})}^{2}}{{Dev}_{xx}} + \frac{1}{n})$ represents the variability of ${\hat{y}}_{0}$ , which is a function of the parameter estimates and it is estimated on the information emerging from the “standard” preparation. On the other hand, the second term $(\frac{σ^{2}}{m})$ makes allowance for the variability of the mean of m observations (around ${\hat{x}}^{'}_{0}$ β); i. e. var( $\bar{y}$ ) = var( $\bar{ε}$ ).

LAD Estimator

The LAD regression is a robust method particularly useful when error distributions are heavy tailed or asymmetrical. The LAD estimator is defined by:

\sum_{i=1}^{n} | e_{i} | = \min

[7]

Unlike OLS, there are in general no explicit expressions for LAD estimates; however, nowadays very fast algorithms to compute them are available (for example the function qr in package quantreg of R software and command call lav in proc iml of SAS software).

Regardless of the algorithm adopted, the LAD regression line must, by definition, pass through 2 data points; hence, the number of non-zero residuals is n-2.

Consider equation [1] in the special case of “regression” when explanatory variables are absent. It becomes: y_j= β₀+ ε_j. In the context of OLS regression, it is well known that b₀=mean of the sample data. On the other hand, in LAD regression, owing to the minimization criterion [7], it results that b₀=median of the sample data.

Let ν be the median of the random variable Y with probability density function f(Y) and $\tilde{y}$ the median of the n-sized sample. According to Freeman (18, p. 195) it can be shown that, for n large, $\tilde{y}$ is normally distributed with mean ν and variance: $var (\tilde{y}) \frac{1}{4 {[(f(ν)]}^{2} n} = \frac{τ^{2}}{n}$ .

When the distribution is symmetrical, the mean and median coincide.

For Y normally distributed with location parameter ν and standard deviation σ:

τ = \frac{1}{2 (\frac{1}{σ \sqrt{2 π}})} = \sqrt{\frac{π}{2}} σ = 1.253 σ

When Y is normal, the ratio $\frac{τ}{σ} = 1.253$ ; this means that the sample median is less efficient than the sample mean for estimating the location parameter of the population.

The parameter τ is estimated according to McKean and Shrader (19) as used by call lav in proc iml of SAS software.

The LAD regression is to the OLS regression what the sample median is to the sample mean. For instance, both the sample mean and the OLS estimators are determined and influenced by the whole set of observations, whereas the sample median and the LAD regression estimates are determined by only a subset of observations.

The estimated covariance matrix of b_LAD is: $\hat{cov {(b}_{LAD})} = τ^{2} {(X′X)}^{- 1}$ (8, p. 82); moreover, Basset and Koenker (20) showed that b_LAD is consistent and asymptotically ∼N(β, τ²(X' X)^-1). Later on, Dielman and Pfaffenberger (21) studying the sampling distribution of the LAD estimates via Monte Carlo simulation concluded that the sampling distribution appeared to be normal for the samples sized 20 to 30 with normal or contaminated normal errors.

Coherently, the analogue of equation [4] is obtained by substituting ${\bar{y}}_{0}$ with ${\tilde{y}}_{0}$ and the analogue of equation [5] by substituting σ²with τ².

M and MM estimators

In the M-estimation the vector b is chosen so that $\sum_{i=1}^{n} ρ {(e}_{i})$ is as small as possible, where ρ is a suitable criterion (loss function). OLS and LAD can be seen as two particular cases of M-estimation in which ρ(e) = e² and ρ(e) = |e| respectively.

However, in order to be scale equivariant, the M-estimator must satisfy:

\sum_{i = 1}^{n} ρ (\frac{e_{i}}{s}) = \min

where s is now a robust scale estimate.

By setting the partial derivatives of ρ to 0, the M-estimator is obtained by solving the equations:

\sum_{i=1}^{n} ρ' (\frac{e_{i}}{s}) x_{i} \sum_{i=1}^{n} \frac{ψ {(y}_{i} {-x}_{i}^{'} {b)x}_{i}}{s} =0

These equations, unlike those in [2], are nonlinear equations in the constants and they may be solved by resorting to an iterative reweighted least squares (IRLS). Let's define the weight function $w (\frac{e_{i}}{s}) = \frac{ψ (\frac{e_{i}}{s})}{e_{i}}$ (22, p. 84).

Provided that initial robust estimates s₀ of the scale factor and b₀ of the regression coefficients are available, the IRLS may:

Define the scaled residuals $\frac{e_{i}}{g}$ and the corresponding weights w_i;

Compute the updated estimate b of β by the weighted least square regression method;

Iterate 1) and 2) until convergence.

Among the several ρ functions proposed in literature, we considered the popular bisquare weight (also called biweight) function (9). The corresponding ψ function is, $ψ (z)=z {[1- {(\frac{z}{k})}^{2}]}^{2} I (| z | \leq κ),$ where k=4.685 ensures an efficiency of 0.95 (6, p. 30).

The estimated covariance matrix of b_bi is: $\hat{{cov(b}_{b1})} {= \hat{v} (X′X)}^{-1}$ ,

where

{\hat{v} =s}^{2} \frac{{ave}_{i} [ψ {{(e}_{i} /s)}^{2}]}{{ave}_{i} {[ψ {´(e}_{i} /s)]}^{2}} \frac{n}{n-p}

[8]

Moreover, asymptotically b_bi~N(β,v(X'X)^-1) (6, p. 100).

It is worth noticing that Yohai (23) shows that an MM estimator is obtained when the initial estimates s₀ and b₀ of the previously outlined IRLS process are computed by means of the S-estimation procedure (6, pp. 129-130), so that the high breakdown point property and efficiency are important features of the MM-estimator.

The analogue of equation [4], i.e. ${\hat{x}}_{0 bi}$ , is obtained by substituting y₀ with ${\bar{y}}_{0 bi}$ , the biweight location estimate of y₀ (6, pp. 36-37). To compute the latter we adopted s₀ as estimate of the dispersion parameter.

It can be shown that, under some regularity conditions (24) on f(y) and ψ:

{\hat{x}}_{0 bi} \sim AN (x_{0}, var {({\hat{x}}_{0})}_{bi})

With regards to ${({\hat{x}}_{0})}_{bi}$ Müller and El Shaarawi (25, p. 33) give the analogue of equation [5] for an M-estimator.

In Remark 2 the meaning of the two components of expression [5] was underlined; in the OLS approach their estimates are based on the unbiased efficient estimate of σ² given by [3]. In the M-estimation setting, Fisher and Horn (7, pp. 132-133) suggested to use $\hat{v}$ as given in [8] and computed in the last iteration (final estimate) to estimate the variability of the prediction component ${\hat{y}}_{0}$ .

Moreover, to estimate the variance of ε, the authors suggested to use:

\begin{array}{l} s_{b i}^{2} = \frac{{nk}^{2} s_{0}^{2} \sum_{i=1}^{n} {(\frac{e_{i}}{{ks}_{0}})}^{2} {(1- {(\frac{e_{i}}{{ks}_{0}})}^{2})}^{4}}{\sum_{I = 1}^{n} (1- {(\frac{e_{i}}{{ks}_{0}})}^{2}) (1-5 {(\frac{e_{i}}{{ks}_{0}})}^{2})} \\ \max [1, - 2 + \sum_{I = 1}^{n} (1- {(\frac{e_{i}}{{ks}_{0}})}^{2}) (1-5 {(\frac{e_{i}}{{ks}_{0}})}^{2})] \end{array}

where the summation is over those $| \frac{e_{i}}{{ks}_{0}} |$ < 1 (see ref. 26, p. 108, for details). The residual set here utilized is obtained from the S-estimation fit (initial estimate) in the first step of the MM-estimator.

Consequently $var {({\hat{x}}_{0})}_{bi}$ is estimated by:

\hat{var {({\hat{x}}_{0})}_{bi}} = \frac{1}{n} (\frac{\hat{v}}{b_{1}^{2}}) [\frac{{\hat{x}}_{0} - \bar{x}}{s_{x x}} + 1] + \frac{n s_{bi}^{2}}{m b_{1}^{2}}

To compute the approximate confidence interval of x₀ the expression [6] is used, after substituting ${\hat{x}}_{0}$ with ${\hat{x}}_{0 bi}$ and $\hat{var ({\hat{x}}_{0})}$ with $\hat{var {({\hat{x}}_{0})}_{bi}}$ .

Coverage Assessment

Confidence intervals are computed. The confidence level 1-α=0.9 is used. The estimated coverage probability is the proportion of M confidence intervals including the true “concentration value”, defined as calibration point in “the Monte Carlo scheme” subsection. A possible criterion to consider an estimated coverage acceptable is that it should fall inside the interval $p \pm 2SE(p) = p \pm 2 \sqrt{\frac{p(1–p)}{M}},$ i.e. 88.48 and 91.52 (27).

Results

First, we observed that in our study the highest value for the threshold of the criterion g mentioned in the “calibration” subsection is 0.03 (in the scenario with σ=0.7, contamination percentage 20, contamination intensity c=8, contamination modality a). This justifies the use of the approximate delta method to compute the confidence interval for x₀.

Secondly, it is worth noting that the regression parameter coverages are all acceptable and included in the interval (88.48-91.52) (data not shown).

Four tables report the results of the study in terms of coverage and width of the confidence intervals obtained by OLS, LAD, and MM-estimator. Tables I and II report the results from the contamination of all concentration levels, with σ=0.2 in Tab. I and σ=0.7 in Table II. Tables III and IV report the results from the contamination occurring only at the three lowest concentration levels, with σ=0.2 in Tab. III and σ=0.7 in Table IV. Asterisks specify confidence intervals, the coverage of which cannot be considered acceptable according to the criterion for coverage assessment adopted (see “Coverage assessment” subsection).

We will first consider Tab. I and in particular the scenario where x₀=2.5, which is the most critical from a technical point of view. As expected, in the absence of contamination the coverage of OLS estimator interval is at the nominal level and its width is the smallest among those of the procedures considered; however, OLS performance deteriorates in the presence of contamination. In fact, although the width tends to increase when contamination increases, the coverage tends to decrease and to appear unacceptable. With regard to the robust methods, when contamination is absent the MM-estimator performance is similar to OLS in terms of both coverage and width, whereas the LAD interval has an acceptable coverage at the expenses of a high width (1.25 times the OLS one, as expected). Therefore, in the presence of contamination we can observe a trade-off between width and coverage. The MM estimator seems resistant to contamination, as the intervals widths remain short, but this is associated with a reduction in coverages. On the contrary, LAD intervals widths are constantly larger than those of the MM-estimator so that their coverages are acceptable at the nominal level, i.e. coverage acceptability is paid in terms of imprecision.

Table I

COVERAGES AND WIDTHS OF THE CONFIDENCE INTERVALS OBTAINED BY OLS, MM-ESTIMATOR, LAD. CONTAMINATION AFFECTS ALL CONCENTRATION LEVELS AND σ=0.2

			OLS			MM-estimator			LAD
			Coverage	Width	Relative width	Coverage	Width	Relative width	Coverage	Width	Relative width
x₀=2.5		0%	90.00	0.128	0.948	89.93	0.135	1	90.60	0.161	1.193
	c=2	5%	89.13	0.134	0.964	88.33	0.139	1	90.00	0.167	1.201
		10%	89.47	0.141	0.972	89.40	0.145	1	90.40	0.175	1.207
		20%	89.33	0.158	1.013	90.67	0.156	1	91.47	0.187	1.199
	c=4	5%	88.73	0.145	1.036	88.60	0.140	1	89.73	0.170	1.214
		10%	86.13^*	0.173	1.177	87.13^*	0.147	1	90.33	0.180	1.224
		20%	87.93^*	0.237	1.411	85.60^*	0.168	1	88.87	0.205	1.220
	c=6	5%	88.13^*	0.155	1.107	89.00	0.140	1	89.47	0.171	1.221
		10%	89.40	0.225	1.510	89.87	0.149	1	91.27	0.186	1.248
		20%	86.07^*	0.329	1.982	84.93^*	0.166	1	88.33	0.210	1.265
	c=8	5%	87.40^*	0.159	1.128	89.80	0.141	1	90.13	0.176	1.248
		10%	84.73^*	0.265	1.828	87.47^*	0.145	1	88.53	0.184	1.269
		20%	85.20^*	0.429	2.569	86.60^*	0.167	1	89.80	0.219	1.311
x₀=4.5		0%	88.53	0.117	0.936	89.13	0.125	1	90.67	0.150	1.200
	c=2	5%	90.67	0.123	0.961	90.07	0.128	1	90.93	0.154	1.203
		10%	88.27^*	0.129	0.985	87.40^*	0.131	1	89.93	0.158	1.206
		20%	88.60	0.145	1.000	88.60	0.145	1	89.93	0.170	1.172
	c=4	5%	88.87	0.133	1.031	88.00^*	0.129	1	89.27	0.157	1.217
		10%	87.80^*	0.161	1.193	87.13^*	0.135	1	90.20	0.163	1.207
		20%	86.67^*	0.212	1.395	86.87^*	0.152	1	90.47	0.189	1.243
	c=6	5%	88.00^*	0.141	1.085	89.20	0.130	1	91.13	0.159	1.223
		10%	87.27^*	0.197	1.470	87.60^*	0.134	1	89.33	0.168	1.254
		20%	86.67^*	0.310	2.013	87.07^*	0.154	1	90.20	0.197	1.279
	c=8	5%	86.13^*	0.150	1.181	88.47^*	0.127	1	90.07	0.158	1.244
		10%	85.20^*	0.247	1.857	88.13^*	0.133	1	90.73	0.170	1.278
		20%	86.93^*	0.392	2.613	85.80^*	0.150	1	89.67	0.196	1.307
x₀=6.5		0%	89.80	0.129	0.949	89.33	0.136	1	89.60	0.161	1.184
	c=2	5%	89.93	0.136	0.965	89.13	0.141	1	90.27	0.169	1.199
		10%	89.60	0.143	0.986	88.93	0.145	1	89.73	0.173	1.193
		20%	89.80	0.159	1.019	89.33	0.156	1	90.53	0.187	1.199
	c=4	5%	89.93	0.148	1.042	88.87	0.142	1	91.47	0.172	1.211
		10%	88.47^*	0.174	1.176	88.27^*	0.148	1	90.13	0.180	1.216
		20%	87.40^*	0.238	1.425	86.47^*	0.167	1	89.67	0.202	1.210
	c=6	5%	87.93^*	0.156	1.106	86.87^*	0.141	1	89.67	0.171	1.213
		10%	86.33^*	0.220	1.507	88.47^*	0.146	1	90.67	0.182	1.247
		20%	87.27^*	0.340	2.012	84.13^*	0.169	1	88.67	0.214	1.266
	c=8	5%	87.93^*	0.163	1.156	88.00^*	0.141	1	90.33	0.175	1.241
		10%	87.33^*	0.275	1.858	88.13^*	0.148	1	90.53	0.190	1.284
		20%	85.87^*	0.430	2.575	86.60^*	0.167	1	88.80	0.217	1.299

specifies confidence intervals the coverage of which cannot be considered acceptable.

Similar considerations apply to the remaining calibration points (x₀=4.5, x₀=6.5) and to the findings reported in Table II.

Table II

COVERAGES AND WIDTHS OF THE CONFIDENCE INTERVALS OBTAINED BY OLS, MM-ESTIMATOR, LAD. CONTAMINATION AFFECTS ALL CONCENTRATION LEVELS AND σ=0.7

			OLS			MM-estimator			LAD
			Coverage	Width	Relative width	Coverage	Width	Relative width	Coverage	Width	Relative width
x₀=2.5		0%	89.67	0.447	0.943	89.00	0.474	1	90.07	0.569	1.200
	c=2	5%	90.80	0.473	0.967	89.33	0.489	1	91.47	0.591	1.209
		10%	88.53	0.494	0.972	87.73^*	0.508	1	88.67	0.599	1.179
		20%	90.80	0.557	1.015	89.93	0.549	1	91.13	0.649	1.182
	c=4	5%	87.33^*	0.511	1.039	89.47	0.492	1	91.00	0.599	1.217
		10%	88.40^*	0.616	1.182	88.80	0.521	1	89.73	0.633	1.215
		20%	88.07^*	0.827	1.414	85.60^*	0.585	1	89.93	0.713	1.219
	c=6	5%	86.80^*	0.557	1.123	88.60	0.496	1	90.33	0.612	1.234
		10%	87.60^*	0.758	1.463	88.87	0.518	1	90.13	0.654	1.263
		20%	86.67^*	1.150	1.993	86.67^*	0.577	1	89.60	0.743	1.288
	c=8	5%	89.47	0.589	1.200	89.47	0.491	1	90.47	0.609	1.240
		10%	86.07^*	0.962	1.879	86.53^*	0.512	1	89.40	0.649	1.268
		20%	86.13^*	1.512	2.630	85.47^*	0.575	1	89.53	0.763	1.327
x₀=4.5		0%	90.40	0.410	0.940	89.33	0.436	1	90.60	0.526	1.206
	c=2	5%	90.20	0.431	0.960	88.80	0.449	1	91.00	0.544	1.212
		10%	90.40	0.453	0.981	89.93	0.462	1	91.60	0.558	1.208
		20%	89.13	0.510	1.006	88.80	0.507	1	90.93	0.606	1.195
	c=4	5%	88.53	0.468	1.040	88.93	0.450	1	91.87	0.555	1.233
		10%	86.33^*	0.557	1.183	86.27^*	0.471	1	88.27	0.583	1.238
		20%	86.60^*	0.795	1.483	85.67^*	0.536	1	88.93	0.660	1.231
	c=6	5%	89.47	0.504	1.100	89.53	0.458	1	90.60	0.556	1.214
		10%	86.80^*	0.706	1.489	88.40^*	0.474	1	90.80	0.592	1.249
		20%	86.53^*	1.080	2.034	85.20^*	0.531	1	89.00	0.681	1.282
	c=8	5%	87.60^*	0.535	1.186	89.07	0.451	1	90.87	0.568	1.259
		10%	86.53^*	0.845	1.817	88.93	0.465	1	90.67	0.582	1.252
		20%	86.33^*	1.422	2.772	83.13^*	0.513	1	87.67	0.685	1.335
x₀=6.5		0%	89.67	0.449	0.949	89.13	0.473	1	89.53	0.565	1.195
	c=2	5%	88.47	0.472	0.961	87.87^*	0.491	1	90.27	0.592	1.206
		10%	90.07	0.498	0.982	88.47^*	0.507	1	90.87	0.612	1.207
		20%	89.20	0.556	1.011	88.20^*	0.550	1	89.67	0.649	1.180
	c=4	5%	88.80	0.507	1.039	88.73	0.488	1	90.07	0.596	1.221
		10%	88.67	0.629	1.214	88.40^*	0.518	1	90.47	0.628	1.212
		20%	89.27	0.846	1.424	88.53	0.594	1	90.40	0.728	1.226
	c=6	5%	87.27^*	0.548	1.121	87.67^*	0.489	1	88.93	0.599	1.225
		10%	86.47^*	0.770	1.486	87.67^*	0.518	1	90.27	0.642	1.239
		20%	86.53^*	1.155	1.971	85.87^*	0.586	1	89.67	0.743	1.268
	c=8	5%	87.87^*	0.556	1.149	88.73	0.484	1	89.80	0.607	1.254
		10%	86.00^*	0.971	1.904	87.67^*	0.510	1	90.47	0.649	1.273
		20%	85.07^*	1.523	2.630	85.20^*	0.579	1	88.73	0.757	1.307

specifies confidence intervals the coverage of which cannot be considered acceptable.

With reference to Tab. III, it is important to note that x₀=2.5 and x₀=3.5 explore a situation where the unknown samples are expected to be contaminated, while x₀=5.5 and x₀=6.5 explore a situation where the unknown samples are expected to be uncontaminated, coherently with remark 1. As far as x₀=2.5 and x₀=3.5 are concerned, it is easy to notice that the findings are similar to those in Table I. The attention now focuses on x₀=5.5 and x₀=6.5. The unknown sample is uncontaminated; on the other hand, the variance of its response location measures ( ${\bar{y}}_{0}, {\tilde{y}}_{0}, {\bar{y}}_{0bi}$ ) is estimated through the regression line penalized by possibly contaminating data in the lowest concentrations; additionally, it is presumably greater than the true value that one would estimate in the absence of contamination. This implies an overestimation of the variance of ${\hat{x}}_{0}$ and, consequently, an improper increase of the confidence intervals width for all methods proposed.

Table III

COVERAGES AND WIDTHS OF THE CONFIDENCE INTERVALS OBTAINED BY OLS, MM-ESTIMATOR, LAD. CONTAMINATION AFFECTS 3 CONCENTRATION LEVELS AND σ=0.2

			OLS			MM-estimator			LAD
			Coverage	Width	Relative width	Coverage	Width	Relative width	Coverage	Width	Relative width
x₀=2.5	c=2	5%	86.60^*	0.131	0.949	86.73^*	0.138	1	89.27	0.167	1.210
		10%	88.20^*	0.136	0.971	87.33^*	0.140	1	89.13	0.168	1.200
		20%	85.47^*	0.144	0.993	85.60^*	0.145	1	88.47^*	0.173	1.193
	c=4	5%	86.60^*	0.136	1.000	88.13^*	0.136	1	90.20	0.167	1.228
		10%	84.33^*	0.148	1.042	86.40^*	0.142	1	89.20	0.171	1.204
		20%	81.20^*	0.175	1.174	84.80^*	0.149	1	89.60	0.181	1.215
	c=6	5%	85.40^*	0.139	1.015	87.67^*	0.137	1	90.00	0.168	1.226
		10%	81.93^*	0.157	1.106	85.60^*	0.142	1	88.27^*	0.174	1.225
		20%	77.93^*	0.226	1.537	83.07^*	0.147	1	86.13^*	0.185	1.259
	c=8	5%	83.53^*	0.142	1.029	88.47^*	0.138	1	89.67	0.170	1.232
		10%	80.13^*	0.172	1.229	87.67^*	0.140	1	88.60	0.173	1.236
		20%	77.27^*	0.273	1.845	82.93^*	0.148	1	85.73^*	0.188	1.270
x₀=3.5	c=2	5%	88.53	0.123	0.946	87.73^*	0.130	1	89.73	0.153	1.177
		10%	87.00^*	0.126	0.962	87.13^*	0.131	1	88.47^*	0.154	1.176
		20%	84.40^*	0.135	0.993	84.73^*	0.136	1	88.53^*	0.161	1.184
	c=4	5%	87.27^*	0.129	0.985	88.33^*	0.131	1	90.33	0.157	1.198
		10%	84.13^*	0.137	1.030	86.87^*	0.133	1	88.67	0.158	1.188
		20%	80.73^*	0.162	1.157	82.93^*	0.140	1	87.13^*	0.169	1.207
	c=6	5%	86.00^*	0.129	1.000	87.87^*	0.129	1	89.67	0.155	1.202
		10%	83.87^*	0.147	1.122	87.67^*	0.131	1	89.53	0.160	1.221
		20%	78.73^*	0.213	1.555	83.00^*	0.137	1	86.33^*	0.174	1.270
	c=8	5%	84.13^*	0.131	1.016	87.13^*	0.129	1	90.00	0.158	1.225
		10%	81.60^*	0.154	1.176	86.67^*	0.131	1	89.87	0.160	1.221
		20%	74.80^*	0.251	1.859	80.87^*	0.135	1	85.67^*	0.174	1.289
x₀=5.5	c=2	5%	90.67	0.124	0.954	91.13	0.130	1	91.93^*	0.157	1.208
		10%	92.53^*	0.125	0.962	90.53	0.130	1	92.27^*	0.158	1.215
		20%	93.20^*	0.134	0.985	90.60	0.136	1	92.13^*	0.163	1.199
	c=4	5%	91.73^*	0.129	0.985	89.47	0.131	1	91.20	0.159	1.214
		10%	93.47^*	0.138	1.053	90.20	0.131	1	91.73^*	0.160	1.221
		20%	95.27^*	0.163	1.190	91.13	0.137	1	93.20^*	0.169	1.234
	c=6	5%	91.20	0.129	1.008	88.13	0.128	1	90.40	0.154	1.203
		10%	94.20^*	0.144	1.091	90.40	0.132	1	92.73^*	0.161	1.220
		20%	97.67^*	0.212	1.547	93.47^*	0.137	1	93.80^*	0.171	1.248
	c=8	5%	91.67^*	0.131	1.016	89.53	0.129	1	91.27	0.156	1.209
		10%	94.67^*	0.152	1.160	90.47	0.131	1	92.47^*	0.163	1.244
		20%	97.47^*	0.264	1.927	90.67	0.137	1	93.73^*	0.175	1.277
x₀=6.5	c=2	5%	90.80	0.132	0.957	89.00	0.138	1	91.13	0.166	1.203
		10%	92.13^*	0.136	0.971	91.33	0.140	1	90.87	0.167	1.193
		20%	93.33^*	0.142	0.979	91.60^*	0.145	1	93.00^*	0.174	1.200
	c=4	5%	92.00^*	0.137	0.993	89.93	0.138	1	90.60	0.168	1.217
		10%	94.07^*	0.147	1.058	91.33^*	0.139	1	92.87^*	0.172	1.237
		20%	96.40^*	0.178	1.203	93.07^*	0.148	1	94.00^*	0.181	1.223
	c=6	5%	91.40	0.138	1.007	89.20	0.137	1	90.20	0.166	1.212
		10%	94.93^*	0.157	1.113	91.40	0.141	1	92.67^*	0.175	1.241
		20%	97.27^*	0.222	1.510	92.00^*	0.147	1	92.67^*	0.183	1.245
	c=8	5%	92.47^*	0.141	1.029	89.73	0.137	1	91.40	0.168	1.226
		10%	94.27^*	0.163	1.164	90.07	0.140	1	91.27	0.174	1.243
		20%	98.00^*	0.271	1.844	92.87^*	0.147	1	93.53^*	0.187	1.272

specifies confidence intervals the coverage of which cannot be considered acceptable.

Similar considerations apply to the scenarios reported in Table IV.

Table IV

COVERAGES AND WIDTHS OF THE CONFIDENCE INTERVALS OBTAINED BY OLS, MM-ESTIMATOR, LAD. CONTAMINATION AFFECTS 3 CONCENTRATION LEVELS AND σ=0.7

			OLS			MM-estimator			LAD
			Coverage	Width	Relative width	Coverage	Width	Relative width	Coverage	Width	Relative width
x₀=2.5	c=2	5%	88.33	0.462	0.953	88.20^*	0.485	1	89.27	0.573	1.181
		10%	88.20^*	0.474	0.965	87.73^*	0.491	1	89.27	0.584	1.189
		20%	85.07^*	0.502	0.998	83.40^*	0.503	1	87.07^*	0.614	1.221
	c=4	5%	86.80^*	0.484	0.988	88.33^*	0.490	1	90.47	0.601	1.227
		10%	85.07^*	0.519	1.051	87.13^*	0.494	1	89.80	0.608	1.231
		20%	81.13^*	0.645	1.226	82.07^*	0.526	1	87.20^*	0.648	1.232
	c=6	5%	84.80^*	0.493	1.012	88.07^*	0.487	1	89.27	0.590	1.211
		10%	83.27^*	0.566	1.153	86.53^*	0.491	1	88.53	0.606	1.234
		20%	78.20^*	0.757	1.476	81.47^*	0.513	1	85.80^*	0.650	1.267
	c=8	5%	84.33^*	0.498	1.029	87.40^*	0.484	1	89.20	0.587	1.213
		10%	82.67^*	0.571	1.175	86.67^*	0.486	1	89.40	0.598	1.230
		20%	75.33^*	0.942	1.836	79.87^*	0.513	1	84.67^*	0.651	1.269
x₀=3.5	c=2	5%	89.27	0.434	0.954	88.33^*	0.455	1	90.00	0.549	1.207
		10%	87.13^*	0.446	0.970	87.27^*	0.460	1	89.27	0.546	1.187
		20%	85.80^*	0.470	0.987	86.00^*	0.476	1	89.20	0.568	1.193
	c=4	5%	85.27^*	0.447	0.991	86.87^*	0.451	1	88.93	0.547	1.213
		10%	86.33^*	0.484	1.041	88.40^*	0.465	1	89.53	0.569	1.224
		20%	79.93^*	0.577	1.190	81.47^*	0.485	1	86.87^*	0.590	1.216
	c=6	5%	87.67^*	0.457	1.009	89.80	0.453	1	91.00	0.551	1.216
		10%	82.87^*	0.513	1.106	86.93^*	0.464	1	87.93^*	0.572	1.233
		20%	77.67^*	0.752	1.544	81.80^*	0.487	1	84.73^*	0.607	1.246
	c=8	5%	84.67^*	0.462	1.018	87.67^*	0.454	1	88.20^*	0.551	1.214
		10%	80.67^*	0.538	1.170	85.87^*	0.460	1	87.20^*	0.565	1.228
		20%	78.93^*	0.915	1.926	83.40^*	0.475	1	85.73^*	0.611	1.286
x₀=5.5	c=2	5%	91.00	0.431	0.951	89.60	0.453	1	90.87	0.542	1.196
		10%	91.13	0.441	0.965	89.73	0.457	1	91.27	0.551	1.206
		20%	92.80^*	0.471	0.983	90.40	0.479	1	92.00^*	0.573	1.196
	c=4	5%	91.00	0.451	0.983	90.00	0.459	1	90.73	0.561	1.222
		10%	94.33^*	0.492	1.051	90.93	0.468	1	91.80^*	0.565	1.207
		20%	95.60^*	0.589	1.217	91.07	0.484	1	92.13^*	0.595	1.229
	c=6	5%	92.93^*	0.462	1.007	90.73	0.459	1	91.73^*	0.562	1.224
		10%	94.67^*	0.517	1.136	90.20	0.455	1	92.07^*	0.561	1.233
		20%	97.60^*	0.724	1.502	92.67^*	0.482	1	94.00^*	0.611	1.268
	c=8	5%	92.47^*	0.462	1.009	90.33	0.458	1	91.67^*	0.558	1.218
		10%	95.40^*	0.538	1.159	91.60^*	0.464	1	93.00^*	0.578	1.246
		20%	97.73^*	0.902	1.879	92.80^*	0.480	1	93.67^*	0.615	1.281
x⁰=6.5	c=2	5%	91.40	0.462	0.959	90.47	0.482	1	91.27	0.584	1.212
		10%	91.93^*	0.473	0.969	91.60^*	0.488	1	91.53^*	0.577	1.182
		20%	92.80^*	0.501	0.990	92.40^*	0.506	1	92.47^*	0.611	1.208
	c=4	5%	90.27	0.482	0.994	87.80^*	0.485	1	89.47	0.582	1.200
		10%	93.67^*	0.519	1.046	90.67	0.496	1	92.07^*	0.602	1.214
		20%	94.87^*	0.626	1.208	91.73	0.518	1	92.13^*	0.632	1.220
	c=6	5%	92.87^*	0.490	1.006	89.93	0.487	1	91.00	0.593	1.218
		10%	94.73^*	0.553	1.122	91.47	0.493	1	91.73^*	0.602	1.221
		20%	97.33^*	0.793	1.537	92.60^*	0.516	1	92.53^*	0.656	1.271
	c=8	5%	92.13^*	0.495	1.027	88.87	0.482	1	89.87	0.593	1.230
		10%	93.67^*	0.581	1.171	89.93	0.496	1	91.20	0.620	1.250
		20%	97.40^*	0.982	1.933	91.13	0.508	1	93.53^*	0.642	1.264

specifies confidence intervals the coverage of which cannot be considered acceptable.

Discussion

In real-time PCR calibration, it seems reasonable to postulate a heterogeneous distribution of measurements errors; hence, robust regression methods should be implemented to estimate x₀ and to compute its confidence interval. However, to select the most appropriate robust procedure a thorough investigation of the features of the process generating measurements in each laboratory is needed.

As pointed out by one of the anonymous reviewers, in recent years different methods for quantification have been developed as an alternative to the Ct method used under the condition of constant efficiency. On the other hand we are aware that the Ct method is still used in several laboratories and our results could thus be helpful for these cases.

Footnotes

Financial Support:None.

Conflict of Interest Statements:The authors declare that no conflict of interest exists.

Meeting Presentation:This work was presented at the VII SISMEC National Congress, Rome, Italy, 25-28 September 2013.

References

Osborne

Statistical calibration: a review. Int Stat Rev. 1991; 59(3): 309–336.

Orlando

Pinzani

Pazzagli

Developments in quantitative PCR. Clin Chem Lab Med. 1998; 36(5): 255–269.

Bustin

Quantification of mRNA using real-time reverse transcription PCR (RT-PCR): trends and problems. J Mol Endocrinol. 2002; 29(1): 23–39.

Marubini

Verderio

Raggi

Pazzagli

Orlando

Italian Network for Quality Assessment of Tumor Biomakers; Italian Society of Clinical Chemistry and Clinical Molecular Biology. Statistical diagnostics emerging from external quality control of real-time PCR.Int J Biol Markers. 2004; 19(2): 141–146.

Davidian

Haaland

Regression and calibration with non constant error variance. Chemometr Intell Lab. 1990; 9(3): 231–248.

Maronna

Martin

Yohai

Robust Statistics: Theory and Methods. ChichesterJohn Wiley & Sons, 2006.

Fisher

Horn

Robust prediction intervals in a regression setting. Comput Stat Data Anal. 1994; 17(2): 129–140.

Birkes

Dodge

eds. Alternative methods of regression. New York: John Wiley & Sons, 1993.

Beaton

Tukey

The Fitting of Power Series, Meaning Polynomials, Illustrated on Band-Spectroscopic Data. Technometrics. 1974; 16(2): 147–185.

10.

Orenti

Applicazione di metodi robusti per il saggio con PCR real-time in presenza di outlier. Master's thesis. 2011.

11.

Rousseeuw

Van Zomeren

Unmasking multivariate outliers and leverage points. J Am Stat Assoc. 1990; 85(411): 633–639.

12.

Pizzamiglio

Verderio

Marubini

Bootstrap confidence intervals for nucleic acid concentration in absolute real-time PCR quantification. BioMed Stat Clin Epidemiol. 2008; 2: 109–115.

13.

Ramsden

Daly

Geilenkeuser

EQUAL-quant: an international external quality assessment scheme for real-time PCR. Clin Chem. 2006; 52(8): 1584–1591.

14.

Rousseeuw

Leroy

Robust regression and outlier detection. New York: John Wiley & Sons, 1987.

15.

Fieller

The biological standardization of insulin. JR Stat Soc. 1940;(Suppl 7): 1–64.

16.

Verderio

Orlando

Casini Raggi

Marubini

Confidence interval estimation for DNA and mRNA concentration by real-time PCR: A new environment for an old theorem. Int J Biol Markers. 2004; 19(1): 76–79.

17.

Brownlee

Statistical Theory and Methodology in Science and Engineering. New York: John Wiley & Sons, 1960.

18.

Freeman

Introduction to statistical inference. Reading, MassachussetAddison-Wesley, 1963.

19.

Mckean

Schrader

Least absolute errors analysis of variance. In: Dodge

editor. Statistical data analysis based on the L₁-norm and related methods. Amsterdam, HollandElsevier Science Publishers, 1987.

20.

Bassett

Jr Koenker

Asymptotic theory of least absolute error regression. J Am Stat Assoc. 1978; 73(363): 618–622.

21.

Dielman

Pfaffenberger

Least absolute value regression: necessary sample sizes to use normal theory inference procedure. Decis Sci. 1988; 19(4): 734–743.

22.

Holland

Welsch

Robust regression using iteratively reweighted least square. Commun Stat - Theor M. 1977; 6: 813–27.

23.

Yohai

High Breakdown-Point and High Efficiency Robust Estimates for Regression. Ann Stat. 1987; 15(2): 642–656.

24.

Jure kova

Sen

Uniform second order asymptotic linearity of M-statistics in linear models. Statist Decisions. 1989; 7: 263–276.

25.

Müller

El-Shaarawi

Confidence intervals for the calibration estimator with environmental applications. Environmetrics. 2002; 13: 29–42.

26.

Kafadar

The efficiency of the biweight as a robust estimator of location. J Res Natl Bur Stand. 1983; 88(2): 105–116.

27.

Burton

Altman

Royston

Holder

The design of simulation studies in medical statistics. Stat Med. 2006; 25(24): 4279–4292.

Performance of Robust Regression Methods in Real-Time Polymerase Chain Reaction Calibration

Abstract

Keywords

Introduction

Preliminary Considerations

The Monte Carlo Scheme

Remark 1

Materials and Methods

OLS Estimator

Calibration

Remark 2

LAD Estimator

M and MM estimators

Coverage Assessment

Results

Discussion

Footnotes

References