A novel Log penalty in a path seeking scheme for biomarker selection

Abstract

Biomarker selection or feature selection from survival data is a topic of considerable interest. Recently various survival analysis approaches for biomarker selection have been developed; however, there are growing challenges to currently methods for handling high-dimensional and low-sample problem. We propose a novel Log-sum regularization estimator within accelerated failure time (AFT) for predicting cancer patient survival time with a few biomarkers. This approach is implemented in path seeking algorithm to speed up solving the Log-sum penalty. Additionally, the control parameter of Log-sum penalty is modified by Bayesian information criterion (BIC). The results indicate that our proposed approach is able to achieve good performance in both simulated and real datasets with other $\ell_{1}$ type regularization methods for biomarker selection.

Keywords

Biomarker selection Log-sum regularization path seeking scheme Bayesian information criterion

1. Introduction

Biomarker selection or feature selection from survival data is a topic of considerable interest. The regularization methods are group of feature selection methods that embed different penalized methods in the learning procedure into a single process. In this way, the regularization methods could reduce the overfitting problem. The $\ell_{0}$ -norm is a total number of non-zero elements in a vector, but it faces the problem of combinatory optimization, i.e., $\ell_{0}$ -norm is too complex and almost impossible to solve. Thus, alternative sparsity-promoting functions which are more computationally efficient in finding the sparse solution are desirable. The most popular alternative is to replace the $\ell_{0}$ -norm with the $\ell_{1}$ -norm (or Lasso) penalty function that is the least absolute shrinkage and selection operator [1]. There are also some $\ell_{1}$ type regularizations can also take place of $\ell_{0}$ -norm, like smoothly clipped absolute deviation (or SCAD) [2], minimax concave penalty (or MCP) [3], group lasso [4]. Moreover, the general sparse representation method is to solve a linear representation system with the $\ell_{p}$ -norm minimization problem, especially $p=$ 0.1, 1/2, 2/3 or 0.9 [5, 6]. Recently, Candes et al. [7] proposed the Log-sum penalty that approximated $\ell_{0}$ -norm much better than other penalties by reweighting the $\ell_{1}$ -norm of transformed object. As shown in Fig. 1, these five states of art penalty methods satisfy properties of sparsity and continuity. Especially, the Log-sum penalty is sparser than other four penalty methods. Unlike Lasso and Elastic net biased estimators, the $\ell_{1/2}$ , SCAD and Log-sum penalties have unbiasedness, i.e., they can easily construct unbiased estimators when the coefficient is large. Then, Chartrand and Yin [8] used this iteratively reweighted $\ell_{1}$ minimization algorithm in sparse signal recovery. Xia et al. [9] proposed multiple linear regression with Log-sum penalty in a thresholding representation theory for drug discovery. We take the advantage of the Log-sum penalty in strong sparsity and employ it to select the key factors without any prior information in survival data.

Figure 1.

Various penalized function for orthonormal design: (a) Lasso, (b) $\ell_{1/2}$ , (c) Elastic net, (d) SCAD, (e) Log-sum.

For survival analysis, there are mainly two types of survival model, namely Cox model or Cox proportional hazards model [10] and accelerated failure time (AFT) model [11]. The Cox model usually is used for predicting the hazard rate of a disease, whereas the AFT model is available to estimate survival time of the patients with simply regressing the exponential over the key risk predictors [12]. Besides, the physical interpretation of AFT model is similar with standard regression, so the AFT model comes out as an attractive alternative to the Cox proportional hazard model for censored failure time data [12]. Furthermore, the log-linear form of AFT model increases its robustness to the model misspecification and yield narrower confidence interval for regression coefficients [13]. After the penalized Cox model with $\ell_{1}$ -norm [14], Datta et al. [15] combined AFT model with $\ell_{1}$ -norm to predict failure time outcomes. Other penalized AFT models aim to obtain more accurate and sparse predictor for survival analysis, such as AFT via bridge penalization [16], $\ell_{1/2}$ -norm AFT model [17], etc. In this article, we employ path seeking scheme to accelerate solving the Log-sum penalty for predicting patient survival time with fewer biomakers.

The rest of this article is organized as follows: Section 2 introduces the Log-sum penalized AFT model. In Section 3, we implement our proposed novel Log penalty in path seeking scheme. Finally, we discuss the experimental results in Section 4 and make some conclusions in Section 5.

2. Log-sum penalized AFT model

Suppose the survival data have ${h}$ patients $({{\tau}_{i},{\delta}_{i},\ldots{X}_{i}})_{{i}=1}^{h}$ , where the sample vector of a survival times ${{\bf T}}=({{\tau}_{1},{\tau}_{2},\ldots{\tau}_{h}})^{T}$ and ${\tau}_{i}={\min}({{t}_{i},{c}_{i}})$ , here ${t}_{i}$ is the true survival time and ${c}_{i}$ is the time to the first censoring event (e.g., study conclusion, date of final follow up) for each sample ${i}$ . The ${\delta}$ indicates the censoring time, i.e.,

$\displaystyle{\delta}_{i}=\left\{{{\begin{array}[]{ll}0,&\text{the right % censoring time}\\ 1,&\text{the completed time}\\ \end{array}}}\right..$

${X}_{i}$ denotes the gene expression data of i-th patient, i.e., ${X}_{i}=({{x}_{{i}1},{x}_{{i}2},\ldots{x}_{{ik}}})$ , where ${k}$ is the number of genes.

The accelerated failure time (AFT) model is used to define the survival time ${\tau}_{i}$ as follows:

$\displaystyle{\tau}_{i}=\exp\left({{\beta}_{0}+{X}_{i}{\beta}^{T}+{\varepsilon% }_{i}}\right)$ (1)

where ${\beta}\subseteq\mathbb{R}^{k}$ is the coefficient vector of ${k}$ variables, ${\beta}_{0}$ is the intercept, and ${\varepsilon}_{i}\sim{N}({0,1})$ is independent random error. In this article, we employ the mean imputation method [15] that converts the censoring survival time ${\tau}_{i}$ to the estimated survival time ${G}({{\tau}_{i}})$ as the following estimated function:

$\displaystyle{G}\left({{\tau}_{i}}\right)={\delta}_{i}\log\left({{\tau}_{i}}% \right)+\left({1-{\delta}_{i}}\right)\{\widehat{S}(\tau_{i})\}^{-1}\sum_{t_{(r% )}>t_{i}}\log(t_{(r)}\Delta\widehat{S}(t_{(r)}))$ (2)

where $r$ is the amount of individuals at risk of failing just before time $t_{(i)}$ that are different censored survival times in an ascending order, and ${\Delta}\widehat{{S}}({{t}}_{({r})})$ is the step of Kaplan-Meier estimator $\widehat{S}$ at time ${t}_{({r})}$ [18]. Then, we can directly using AFT model via standard least squares approach to minimize the loss function $L(\beta)$ as follows:

$\displaystyle{L}\left({\beta}\right)=\frac{1}{{h}}\sum\limits_{{i}=1}^{h}\left% ({{y}_{i}-\sum\limits_{{j}=0}^{k}{\beta}_{j}{x}_{{ij}}}\right)^{2}$ (3)

where ${y}_{i}$ is replaced by the estimated survival time ${G}({{\tau}_{i}})$ in Eq. (2), i.e., the survival times ${\tau}_{i}$ logarithmically transformed into ${y}_{i}$ .

3. Implementation of Log-sum penalized AFT model

The regularization methods are used to reduce the overfitting problem of learning procedure through adding the penalty term, therefore the general regularization can be modeled as:

$\displaystyle\widehat{\beta}\left(\lambda\right)=\mathop{\text{argmin}}_{\beta% }\{L\left(\beta\right)+\lambda P\left(\beta\right)\}$ (4)

where $\beta\subseteq{\mathbb{R}}^{{k}}$ is the coefficient of covariate, ${\lambda}{\ }>\ 0$ is a control parameter, ${L(}{\beta}{)}$ represents the loss term and ${P}({\beta})$ is the penalty term. Larger values of ${\ }{\lambda}$ exert higher penalties on regression coefficients, resulting on inclusion of fewer variables in the model and vice versa. The generalized cross-validation [19] has been widely used for given an appropriate value of the control parameter. Huang et al. [20] used a modified Akaike’s information criterion (AIC) for choosing tuning parameter. Wang and Song [21] used Bayesian information criterion (BIC) for tuning parameter selection under AFT model with adaptive Lasso. Friedman [22] obtained the control parameter by solving the component ratios of the gradient of the loss function and regularization term that is called generalized path seeking scheme. This scheme is much faster than general convex optimizers for squared-error loss.

For the regularization term ${P}({\beta})$ , many penalties are proposed to bridge the gap between the ${\ell}_{0}$ and ${\ell}_{1}$ minimization. Such a Log-sum penalty function was originally introduced in [23] for basis selection which indicates that Log-sum based methods present uniform superiority over the conventional ${\ell}_{1}$ -type methods. The Log-sum sparsity-encouraging functional for survival analysis leads to:

$\displaystyle\widehat{{\beta}}{=}\mathop{\text{argmin}}_{{\beta}}{\left\{{L}% \left({\beta}\right){+}{\lambda}{\ }\sum^{{k}}_{{j=1}}{{{\log}{(}\left|{{\beta% }}_{{j}}\right|{+}{\xi}{)}\ }}\right\}}$ (5)

where ${\xi}>0$ is a positive parameter to ensure that the function is well-defined. Especially, the Log-sum penalty function behaves like the ${\ell}_{0}$ -norm when ${\xi}{\to}{0}$ [24]. In this article, we employ the path seeking method [22] to solve the Log-sum penalized AFT model through constructing a path directly and successively in parameter space. Let ${\upsilon}$ measure length along the path and the step size ${\Delta}{\upsilon}>0$ can be calculated by

$\displaystyle\frac{{L}\left(\widehat{{\beta}}{(}{\upsilon}{)}\right){-}{L}% \left(\widehat{{\beta}}{(}{\upsilon}{+}{\Delta}{\upsilon}{)}\right)}{{L}\left(% \widehat{{\beta}}{(}{\upsilon}{)}\right)}{=0.01}$ (6)

Define

$\displaystyle{\varphi}_{{j}}\left({\upsilon}\right)=-{\left[\frac{{\partial}{L% }\left({\beta}\right)}{{\partial}{{\beta}}_{{j}}}\right]}_{{\beta}{=}\widehat{% {\beta}}\left({\upsilon}\right)}=-{\left[\frac{{\partial}\frac{{1}}{{h}}\sum^{% {h}}_{{i=1}}{{\left({{y}}_{{i}}{-}\sum^{{k}}_{{j=0}}{{{\beta}}_{{j}}}{{x}}_{{% ij}}\right)}^{{2}}}}{{\partial}{{\beta}}_{{j}}}\right]}_{{\beta}{=}\widehat{{% \beta}}\left({\upsilon}\right)}={\left[\frac{{2}}{{h}}\sum^{{h}}_{{i=1}}{{{x}}% _{{ij}}\left({{y}}_{{i}}{-}\sum^{{k}}_{{j=0}}{{{\beta}}_{{j}}}{{x}}_{{ij}}% \right)}\right]}_{{\beta}{=}\widehat{{\beta}}\left({\upsilon}\right)}$ (7) $\displaystyle{{\phi}}_{{j}}\left({\upsilon}\right){=}{\left[\frac{{\partial}% \sum^{{k}}_{{j=1}}{{{log}\left(\left|{{\beta}}_{{j}}\right|{+}{\xi}\right)\ }}% }{{\partial}\left|{{\beta}}_{{j}}\right|}\right]}_{{\beta}{=}\widehat{{\beta}}% \left({\upsilon}\right)}{=}{\left[\frac{{\partial}{{log}\left(\left|{{\beta}}_% {{j}}\right|{+}{\xi}\right)\ }}{{\partial}\left|{{\beta}}_{{j}}\right|}\right]% }_{{\beta}{=}\widehat{{\beta}}\left({\upsilon}\right)}{=}{\left[\frac{{1}}{% \left|{{\beta}}_{{j}}\right|{+}{\xi}}\right]}_{{\beta}{=}\widehat{{\beta}}% \left({\upsilon}\right)}$ (8) $\displaystyle{{\lambda}}_{{j}}\left({\upsilon}\right){=}\frac{{{\varphi}}_{{j}% }\left({\upsilon}\right)}{{{\phi}}_{{j}}\left({\upsilon}\right)}$ (9)

where ${{\lambda}}_{{j}}({\upsilon})$ is the value of ${\lambda}$ in Eq. (4) corresponding to ${\upsilon}$ , and is also the ratio of loss functiongradient ${{\phi}}_{{j}}({\upsilon})$ for ${L}\left({\beta}\right)$ in Eq. (3) and penalty function gradient ${{\varphi}}_{{j}}({\upsilon})$ with respect to $|{{\beta}}_{{j}}|$ . Without estimating ${\lambda}$ , this path seeking scheme can accelerate solving the Log-sum penalty. Besides, ${\xi}$ is chosen by ten-fold cross-validation. The details of the achievementfor Log-sum penalty are represented in Algorithm 1.

Algorithm 1 The algorithm of Log-sum penalty
1.	Initialize: ${\upsilon}{=0,}{{\ }{\{}\widehat{{{\beta}}_{{j}}}\left(0\right){=0}{\}}}^{{k}}% _{{1}}$
2.	repeat
3.	Compute ${{\ }{\{}{{\lambda}}_{{j}}\left({\upsilon}\right)\}}^{{k}}_{{1}}$
4.	${S=}{\{}{j\|}{{\lambda}}_{{j}}\left({\upsilon}\right){\cdot}\widehat{{{\beta}}_% {{j}}}\left({\upsilon}\right){<}0\}$
5.	if ${S=}$ empty then
6.	${{j}}^{{*}}{=}\mathop{\text{argmax}}_{{j}}{\left\|{{\lambda}}_{{j}}\left({% \upsilon}\right)\right\|}$
7.	else
8.	${{j}}^{{*}}{=}\mathop{\text{argmax}}_{{j}{\in}{S}}{\left\|{{\lambda}}_{{j}}% \left({\upsilon}\right)\right\|}$
9.	end if
10.	$\widehat{{{\beta}}_{{{j}}^{{}}}}\left({\upsilon}{+}{\Delta}{\upsilon}\right){% =}\widehat{{{\beta}}_{{{j}}^{{}}}}\left({\upsilon}\right){+}{\Delta}{\upsilon% }{\cdot}{\text{sign}(}{{\lambda}}_{{{j}}^{{*}}}\left({\upsilon}\right){)}$
11.	${\widehat{{\{{\beta}}_{{j}}}\left({\upsilon}{+}{\Delta}{\upsilon}\right){=}% \widehat{{{\beta}}_{{j}}}\left({\upsilon}\right)\}}_{{j}{\neq}{{j}}^{{*}}}$
12.	${\upsilon}{\leftarrow}{\upsilon}{+}{\Delta}{\upsilon}$
13.	until ${{\lambda}}_{{j}}\left({\upsilon}\right){=0}$

At first, we initialize the path, and then compute the vector ${\lambda}({\upsilon})$ by Eqs (7)–(9) in each step. Subsequently, the non-zero coefficients $\widehat{{{\beta}}_{{j}}}({\upsilon})$ are recognized. Those $\widehat{{{\beta}}_{{j}}}({\upsilon})$ have a sign opposite to that of their corresponding ${{\lambda}}_{{j}}({\upsilon})$ . Generally there are non the coefficient corresponding to the largest component of ${{\lambda}}_{{j}}({\upsilon})$ in absolute value is selected. If one or more ${{\lambda}}_{{j}}({\upsilon}){\cdot}\widehat{{{\beta}}_{{j}}}({\upsilon}){<}0$ , then the coefficient with corresponding large $|{{\lambda}}_{{j}}({\upsilon})|$ within this subset is instead selected. The selected coefficient $\widehat{{{\beta}}_{{{j}}^{{*}}}}({\upsilon})$ is then incriminated through a small amount in the direction of the sign of its correspond ${{\lambda}}_{{{j}}^{{*}}}({\upsilon})$ with all other coefficient residual unchanged, producing the solution for the next path point ${\upsilon}{+}{\Delta}{\upsilon}$ . Iterations continue until all components of ${\lambda}({\upsilon})$ are zero.

4. Numerical experiments

4.1 Simulated datasets

In order to simulate the high-dimensional and low-sample property of gene expression data, we assumed that 20 nonzero factors among ${k}=$ 2000 variables with different fraction and sample size ${h}=$ 90, 300 respectively based on the following model:

$\displaystyle{Y=}\sum^{{20}}_{{u=1}}{{{\beta}}_{{u}}{{X}}_{{u}}{+}{\sigma}{% \varepsilon}}$ (10)

where ${Y}$ denotes the vector of survival times logarithmically transformed ${{{y}}_{{i}}{=\log}({{\tau}}_{{i}})\ }$ in Eq. (3) without censored data, i.e., ${Y=(}{{y}}_{{1}},{{y}}_{{2}}{,}{\ldots}{,}{{y}}_{{h}}{)}$ , ${\varepsilon}$ is an independent random noise that is generated from a normal distribution ${N(0,1)}$ , ${\sigma}$ controls the noise strength and the coefficients of relevant features are specified as

$\displaystyle{{\beta}}{=(}\underbrace{2,\ldots,2}_{5}{,-2,1.5,-1.7,2.5,-1.8,}% \underbrace{{4,}\ldots{,4}}_{{10}},\underbrace{{0,}\ldots{,0}}_{{1980}})$

The $\bm{{X}}$ value is simulated from an array ${{tmp}}_{{i0,}\ldots{,ik}}{,\ (i\ =\ 1,}\ldots{,h)}$ of independent standard normal distribution:

$\displaystyle{{x}}_{{ij}}{=}\sqrt{{\varrho}}{\times}{{tmp}}_{{i0}}{+}\sqrt{{(1% -}{\varrho}{)}}{\times}{{tmp}}_{{ij}}$ (11)

where the correlation coefficient ${\varrho}$ are 0.1 and 0.3 respectively in our experiment.

Additionally, the both sensitivity and specificity for each procedure are calculated as follows:

$\displaystyle\textit{sensitivity}=\frac{\#\ \textit{correctly selected genes}}% {\#\textit{non -- zero in}\ \beta}{=}\frac{\#\ \textit{correctly selected % genes}}{20}$ (12) $\displaystyle\textit{specificity}{=}\frac{\#\ \textit{correctly rejected genes% }}{\#\ \textit{zero in}\ \beta}{=}\frac{\#\ \textit{correctly rejected genes}}% {1980}$ (13)

The optimal combination of ${\xi}$ is selected under ten-fold cross-validation by minimizing the Bayesian information criterion (BIC) defined as

$\displaystyle\text{{BIC}}_{{\xi}}{=-}\frac{{2}}{{h}}{{\log}\left({\text{mse}}% \left({\beta}\right)\right)\ }{-}\frac{{{\log}{(h)}\ }}{{2}}{{{d}{{f}}_{{\xi}}}}$ (14)

where ${h}$ is the total number of observations; ${d}{{f}}_{{\xi}}$ is the number of nonzero parameters; and ${mse}({\beta})$ measures the mean square error that is defined by

$\displaystyle\text{mse}\left({\beta}\right){=}\frac{{1}}{{h}}\sum^{{h}}_{{i=1}% }{{{(}{{\tau}}_{{i}}{-}\widehat{{{\tau}}_{{i}}}{)}}^{{2}}}$ (15)

where the predicted value $\widehat{{{\tau}}_{{i}}}{=\exp(}\sum^{{k}}_{{j=0}}{{{\beta}}_{{j}}}{{x}}_{{ij}% }{)}$ . In our simulations and application, the optimal ${\xi}$ is searched on grid points.

We also employ the concordance index (CI) to evaluate the predictive accuracy of survival models. CI or c-index can be interpreted as the fraction of all pairs of subjects whose predicted survival times are correctly ordered among all subjects that can actually be ordered. Therefore, it can be written as:

$\displaystyle{ci}\left({\beta}\right){=}\frac{\sum_{{i}}{\sum_{{j}}{{1\ (}% \widehat{{{\tau}}_{{i}}}{<}\widehat{{{\tau}}_{{j}}}{\ \text{and}\ }{{\delta}}_% {{i}}{=1)}}}}{\sum_{{i}}{\sum_{{j}}{{1\ (}{{\tau}}_{{i}}{<}{{\tau}}_{{j}}{\ % \text{and}\ }{{\delta}}_{{i}}{=1)}}}}$ (16)

Table 1

The results of different penalized methods in simulated data

$\sigma$	$\varrho$	Penalty	$h=$ 90				$h=$ 300
			Sensitivity	Specificity	MSE	CI	Sensitivity	Specificity	MSE	CI
1.0	0.1	Lasso	0.758	0.896	87.924	0.808	0.850	0.971	74.175	0.908
		$\ell_{1/2}$	0.856	0.908	57.750	0.863	0.900	0.978	29.825	0.913
		Elastic net	0.782	0.731	149.340	0.638	0.800	0.754	117.393	0.720
		SCAD	0.845	0.891	67.254	0.792	0.900	0.977	43.502	0.872
		Log-sum	0.902	0.953	54.615	0.871	0.950	0.986	23.845	0.928
	0.3	Lasso	0.684	0.786	98.183	0.782	0.762	0.864	64.536	0.858
		$\ell_{1/2}$	0.793	0.894	62.786	0.809	0.850	0.935	59.747	0.872
		Elastic net	0.912	0.726	174.570	0.800	0.999	0.757	141.602	0.722
		SCAD	0.760	0.882	74.009	0.765	0.800	0.945	47.681	0.885
		Log-sum	0.841	0.931	58.892	0.847	0.872	0.978	46.893	0.918
1.5	0.1	Lasso	0.721	0.860	71.004	0.643	0.800	0.967	54.601	0.762
		$\ell_{1/2}$	0.840	0.903	63.549	0.725	0.910	0.972	40.219	0.887
		Elastic net	0.750	0.683	198.627	0.569	0.850	0.705	179.118	0.641
		SCAD	0.767	0.889	69.073	0.703	0.850	0.965	51.669	0.847
		Log-sum	0.880	0.935	62.720	0.804	0.950	0.982	38.358	0.915
	0.3	Lasso	0.617	0.756	119.453	0.691	0.650	0.861	95.235	0.811
		$\ell_{1/2}$	0.750	0.857	82.944	0.714	0.800	0.928	60.812	0.849
		Elastic net	0.860	0.605	213.905	0.571	0.959	0.621	185.917	0.645
		SCAD	0.751	0.879	86.892	0.609	0.800	0.937	68.816	0.638
		Log-sum	0.800	0.917	65.092	0.784	0.850	0.974	48.973	0.890

The simulated experiments are repeated 100 times. From Table 1, we can conclude that the Log-sum penalty using the path seeking algorithm can achieve lower MSE with higher CI than other penalties. Furthermore, this Log-sum penalty results in higher sensitivity for identifying correct genes compared to the other four algorithms. With increasing sample size, the performance of Log-sum penalty is better. For example, when $\sigma=$ 1.0, $\varrho=$ 0.1, the performance of Log-sum in $h=$ 300 has lower MSE with higher CI than in $h=$ 90. The Elastic Net with ${\ell}_{2}$ -norm selected the largest number of genes within the synthetic data with poor performance, e.g., when $\sigma=$ 1.0, $\varrho=$ 0.3, $h=$ 300 Elastic Net nearly selected 20 non-zero coefficients (i.e., 20 $\times$ 0.999), but it selected 481 irrelevant coefficients (i.e., 1980 $\times$ 0.243). With the correlation ${\varrho}$ increasing among genes for various noise levels ${\sigma}$ , the ${\ell}_{{1}}$ -norm (Lasso) cannot distinguish the key genes very well, while other ${\ell}_{{1}}$ type penalties performed effectively, especially the Log-sum penalty.

4.2 Real datasets

To further demonstrate the performance of these regularization methods, we compare our proposed method with other four penalties on GSE22210 microarray expression data from NCBI’s gene expression omnibus (GEO). This breast cancer dataset includes 1,452 genes and 167 samples [25]. We divide the data set at random two-thirds samples (117 samples) are training set and the remainders (50 samples) are used to test. Table 2 shows that the Log-sum penalty achieves best predicting survival time just with fewer genes than other $\ell_{1}$ type regularization methods.

Table 2
The results in the GSE22210

Penalty	# Selected genes	MSE	CI
Lasso	46	21.023	0.776
$\ell_{1/2}$	23	25.271	0.783
Elastic net	159	33.331	0.809
SCAD	33	22.754	0.791
Log-sum	15	16.814	0.821

Table 3

The selected genes of different penalized methods in the GSE22210

	Lasso	$\ell_{1/2}$	Elastic net	SCAD	Log-sum
1	XIST	IL1B	SERPINB2	IL1B	XIST
2	LAT	XIST	XIST	XIST	LAT
3	IL1B	HLA-DQA2	IMPACT	CCND1	DNASE1L1
4	DNASE1L1	TGFA	IL1B	HLA-DQA2	IL1B
5	NFKB1	CDKN1A	LAT	NFKB1	BCL2L2
6	HDAC9	GNMT	CCND1	PTHR1	MEST
7	BCL2L2	LAT	NFKB1	GNMT	NFKB1
8	ESR2	BCL2L2	TGFA	LAT	DAB2IP
9	AFP	HDAC9	HLA-DQA2	CD44	ESR2
10	LAMC1	CD44	RASGRF1	ESR2	APC

As see from the Table 3, some genes are selected by all methods such as XIST, LAT and IL1B. Missing from XIST RNA, the X chromosome causes the basal-like subtype of invasive breast cancer [26]. Furthermore, LAT is short for Linker for Activation of T cells that plays a crucial role in the TCR-mediated signaling pathways. The adoptive transfer of T cells appears to be a promising new treatment for various type-s of cancer [27]. Collado-Hidalgo et al. [28] provided evidence that polymorphisms in IL1B increase the production of proinflammatory cytokines triggered by the treatment, which subsequently affects persistent fatigue in the aftermath of breast carcinoma. There are some unique genes selected by Log-sum, such as MEST, DAB2IP, APC, etc. MEST is also known as paternally expressed gene 1 (PEG1) that are often detected in invasive breast carcinomas [29]. The DAB2IP as a bona fide tumor suppressor that frequently silenced by promoter methylation in aggressive human tumors [30]. Furthermore, aberrant methylation of the APC gene is frequent in breast cancers [31].

5. Conclusion

In this paper, we propose a novel Log-sum regularization estimator with the AFT model in the path seeking scheme. Comparing with other $\ell_{1}$ type penalties, the results in both simulated and real datasets indicate that our proposed Log-sum penalty can effectively predict patient survival time with fewer biomakers. Thus, we believe it will be an effective tool for gene selection on high dimensional biological data.

Footnotes

Acknowledgments

The authors thank Dr. Zi-Yi Yang for excellent technical assistance. This work was supported by the Macau Science and Technology Develop Funds (Grant no. 003/2016/AFJ) of Macao SAR of China and China NSFC project (Contract no. 61661166011).

Conflict of interest

None to report.

References

Tibshirani

. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B (Methodological).1996; 267-288.

Fan

. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association.2001; 96(456): 1348-1360.

Zhang

. Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics.2010; 38(2): 894-942.

Yuan

Lin

. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology).2006; 68(1): 49-67.

Lyu

Lin

She

Zhang

. A comparison of typical Lpminimization algorithms. Neurocomputing.2013; 119: 413-424.

Chang

Zhang

. L1/2regularization: A thresholding representation theory and a fast solver. IEEE Transactions on neural networks and learning systems.2012; 23(7): 1013-1027.

Candes

Wakin

Boyd

. Enhancing sparsity by reweighted L1 minimization. Journal of Fourier Analysis and Applications.2008; 14(5): 877-905.

Chartrand

Yin

. Iteratively reweighted algorithms for compressive sensing. In: Acoustics, speech and signal processing, 2008; ICASSP 2008. IEEE international conference on. IEEE; 2008. pp. 3869-3872.

Xia

Wang

Meng

Yao

Chai

Liang

. Descriptor Selection via Log-Sum Regularization for the Biological Activities of Chemical Structure. International Journal of Molecular Sciences.2017; 19(1): 30.

10.

Cox

. Regression models and life-tables. In: Breakthroughs in statistics. Springer; 1992; p. 527-541.

11.

Kalbfleisch

Prentice

. The statistical analysis of failure time data. vol. 360; John Wiley & Sons; 2011.

12.

Wei

. The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. Statistics in Medicine.1992; 11(14-15): 1871-1879.

13.

Hutton

Monaghan

. Choice of parametric accelerated life and proportional hazards models for survival data: asymptotic results. Lifetime Data Analysis.2002; 8(4): 375-393.

14.

Tibshirani

. The lasso method for variable selection in the Cox model. Statistics in Medicine.1997; 16(4): 385-395.

15.

Datta

Le-Rademacher

Datta

. Predicting patient survival from microarray data by accelerated failure time modeling using partial least squares and LASSO. Biometrics.2007; 63(1): 259-271.

16.

Huang

. Variable selection in the accelerated failure time model via the bridge method. Lifetime Data Analysis.2010; 16(2): 176-195.

17.

Chai

Liang

Liu

. The L1/2 regularization approach for survival analysis in the accelerated failure time model. Computers in Biology and Medicine.2015; 64: 283-290.

18.

Datta

. Estimating the mean life time using right censored data. Statistical Methodology.2005; 2(1): 65-69.

19.

Craven

Wahba

. Smoothing noisy data with spline functions. Numerische Mathematik.1978; 31(4): 377-403.

20.

Huang

Xie

. Regularized Estimation in the Accelerated Failure Time Model with High-Dimensional Covariates. Biometrics.2006; 62(3): 813-820.

21.

Wang

Song

. Adaptive Lasso variable selection for the accelerated failure models. Communications in Statistics-Theory and Methods.2011; 40(24): 4372-4386.

22.

Friedman

. Fast sparse regression and classification. International Journal of Forecasting.2012; 28(3): 722-738.

23.

Coifman

Wickerhauser

. Entropy-based algorithms for best basis selection. IEEE Transactions on information theory.1992; 38(2): 713-718.

24.

Rao

Kreutz-Delgado

. An affine scaling methodology for best basis selection. IEEE Transactions on Signal Processing.1999; 47(1): 187-200.

25.

Holm

Hegardt

Staaf

Vallon-Christersson

Jönsson

Olsson

, et al. Molecular subtypes of breast cancer are associated with characteristic DNA methylation patterns. Breast Cancer Research.2010; 12(3): R36.

26.

Richardson

Wang

De Nicolo

Brown

Miron

, et al. X chromosomal abnormalities in basal-like human breast cancer. Cancer Cell.2006; 9(2): 121-132.

27.

June

. Adoptive T cell therapy for cancer in the clinic. The Journal of Clinical Investigation.2007; 117(6): 1466-1476.

28.

Collado-Hidalgo

Bower

Ganz

Irwin

Cole

. Cytokine gene polymorphisms and fatigue in breast cancer survivors: Early findings. Brain, Behavior, and Immunity.2008; 22(8): 1197-1200.

29.

Pedersen

Dervan

Broderick

Harrison

Miller

Delany

, et al. Frequent loss of imprinting of PEG1/MEST in invasive breast cancer. Cancer Research.1999; 59(21): 5449-5451.

30.

Di Minin

Bellazzo

Dal Ferro

Chiaruttini

Nuzzo

Bicciato

, et al. Mutant p53 reprograms TNF signaling in cancer cells through interaction with the tumor suppressor DAB2IP. Molecular Cell.2014; 56(5): 617-629.

31.

Virmani

Rathi

Sathyanarayana

Padar

Huang

Cunnigham

, et al. Aberrant methylation of the adenomatous polyposis coli (APC) gene promoter 1A in breast and lung carcinomas. Clinical Cancer Research.2001; 7(7): 1998-2004.

A novel Log penalty in a path seeking scheme for biomarker selection

Abstract

Keywords

1. Introduction

4.1 Simulated datasets

Table 2 The results in the GSE22210

Footnotes

Acknowledgments

Conflict of interest

References

Table 2
The results in the GSE22210