Sage Journals: Discover world-class research

Abstract

This study evaluates the performance of a set of machine learning techniques in predicting the prognosis of Hodgkin’s lymphoma using clinical factors and gene expression data. Analysed samples from 130 Hodgkin’s lymphoma patients included a small set of clinical variables and more than 54,000 gene features. Machine learning classifiers included three black-box algorithms (k-nearest neighbour, Artificial Neural Network, and Support Vector Machine) and two methods based on intelligible rules (Decision Tree and the innovative Logic Learning Machine method). Support Vector Machine clearly outperformed any of the other methods. Among the two rule-based algorithms, Logic Learning Machine performed better and identified a set of simple intelligible rules based on a combination of clinical variables and gene expressions. Decision Tree identified a non-coding gene (XIST) involved in the early phases of X chromosome inactivation that was overexpressed in females and in non-relapsed patients. XIST expression might be responsible for the better prognosis of female Hodgkin’s lymphoma patients.

Keywords

artificial neural network cancer prognosis Decision Tree Hodgkin’s lymphoma Logic Learning Machine Support Vector Machine

Introduction

Hodgkin’s lymphoma (HL) is a haematological malignancy accounting for about 10 per cent of all lymphoma cases in Western countries.^1,2 HL is composed of two distinct disease entities: classical HL, which accounts for about 95 per cent of the whole disease burden and is characterized by the presence of malignant multinucleated giant Reed–Sternberg cells, and nodular lymphocyte predominant HL, characterized by a neoplastic population of larger cells with folded lobulated nuclei.³

In the last decades, advances in radiation treatments and chemotherapy have greatly increased the survival rates of HL patients. Nonetheless, up to date, about 5–10 per cent of them are refractory to initial treatment and 10–30 per cent will relapse despite having achieved an initial complete remission.⁴

IPS (International Prognostic Score) is a prognostic index based on the combination of seven recognized prognostic factors for HL (namely, age ⩾45 years, stage IV, male sex, white blood count ⩾15,000 cells/mL, lymphocyte count <600 cells/mL, albumin < 4.0 g/dL, haemoglobin < 10.5 g/dL).⁵ IPS was demonstrated to be predictive of the patient outcome in multivariable analysis. For instance, patients with five or more factors were found to have a 5-year progression-free survival of 42 per cent, while patients with non-negative prognostic factors had an 84 per cent probability of being free from progression at 5 years from diagnosis.⁵ However, despite the quite good performance of IPS, the identification of new prognostic variables for HL patients is highly desirable to potentially increase patient survival and reduce treatment toxicity.⁴ For this purpose, numerous studies have been carried out in the last few years and many new putative prognostic markers have been identified.^6,7 Among such studies, a large microarray experiment identified a set of 271 genes differently expressed between relapsed and non-relapsed patients.⁸ Furthermore, the same study was able to associate a macrophage gene expression signature with primary treatment failure, even if this latter finding was questioned by further investigations.^9,10

This study is aimed at evaluating the performance of a set of supervised machine learning techniques, including the recently proposed Logic Learning Machine (LLM) method, in predicting the prognosis of HL patients using clinical and gene expression data from the large data set by Steidl et al.⁸

Materials and methods

Database description

Data were downloaded from GDS4222.soft, a microarray database stored in the GEO repository¹¹ at http://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS4222 . Data included information from 130 samples of classical HL and 54,675 gene expression features.⁸

Table 1 describes patient characteristics available in GDS4222.soft. Clinical and demographic variables included the following: relapse at any time after therapy (n = 38, 29.2%), gender (male: 56.2%; female: 43.8%), stage at diagnosis (Stage I: 12.3%; Stage II: 51.5%; Stage III: 22.3%; Stage IV: 13.8%) and IPS. This latter was aggregated into two categories, according to Steidl et al.⁸: high score, corresponding to IPS > 3 (24.6%), and low score, associated with IPS = 3 (75.4%). More details about patient selection, characteristics at diagnosis, assessment of disease status, primary line treatment, and methods for gene expression analysis, including data pre-processing and normalization, have been reported elsewhere.⁸

Table 1.

Demographic and clinical characteristics of 130 patients with Hodgkin’s lymphoma included in the analyses.

Patient characteristics	N	%
Gender
Males	73	56.2
Females	57	43.8
Ann Arbor stage at diagnosis
I	16	12.3
II	67	51.5
III	29	22.3
IV	18	13.8
International Prognostic Score
Low (=3)	98	75.4
High (>3)	32	24.6
Follow up status after treatment
Relapse	38	29.2
No Relapse	92	70.8

Supervised data mining methods

A set of supervised learning machine techniques was selected in order to predict HL patient prognosis. They included three common methods based on black-box algorithms (k-nearest neighbour classifier, kNN; Artificial Neural Network, ANN; and Support Vector Machine, SVM) and two methods based on intelligible threshold-based rules (Decision Tree, DT, and the innovative LLM method). Standard classification based on the IPS score was also performed. The probability of relapse at any time after therapy was considered as the outcome, while all the available variables were used as input data. Accuracy measures included the total proportion of correctly classified samples (total accuracy) and the proportion of correct classifications among both relapsed (sensitivity) and non-relapsed patients (specificity).

In order to control the overfitting bias, accuracy estimates of each supervised analysis were obtained by cross-validation. Due to the rather small sample size of our data set, the leave-one-out procedure was adopted.¹² Finally, a comparison between the set of intelligible rules generated by LLM and DT was also performed.

All the analyses were carried out by using Rulex Analytics, a software suite developed and commercialized by Rulex Inc ( http://www.rulex-inc.com ).

kNN

Consider a training set S including n input–output pairs ( x _j, y_j), with j = 1,…, n, where the output value y_j can be one of q possible classes, labelled by an integer A_i with i = 1,…, q. To classify any subject, described by an input vector x , the nearest k samples (with respect to x ) in the training set S, according to a suitable distance measure, are considered. Then, the subject x is associated with the class A_i that characterizes the majority of the k-nearest samples.¹³

In the present investigation, the set of values {1, 3, 5} was adopted for k and the standard Euclidean distance was employed, after having normalized the components of the input vector x to reduce the effect of biases possibly caused by unbalanced domain intervals in different input variables.

ANN

ANN is a connectionist model formed by the interconnection of simple units, called neurons, arranged in layers. The first layer receives the input vector x , whereas the remaining layers receive their inputs from the previous one. Each neuron computes a weighted sum of the inputs and applies a proper activation function to obtain the output value that will be propagated to the following layer. The last layer produces the output class y to be assigned to x . Weights for each neuron form the set of parameters for the ANN and are estimated by suitable optimization techniques.¹³

In this study, one intermediate layer was used, and the number of hidden neurons was allowed to vary from one to three. The nets were trained by means of the Levenberg–Marquardt version of the back propagation algorithm.¹³

SVM

SVM is a non-probabilistic binary linear classifier based on the identification of an optimal hyperplane of separation between two classes. Given a training set, the classifier selects a subset l of input vectors x _j in the training set S, called support vectors, and their corresponding outputs y_j ∈ {−1, 1}. The class y for any input vector x is given by

y = sgn (\sum_{j = 1}^{l} y_{j} α_{j} K (x_{j}, x) + b)

where the coefficients α_j and the offset b are evaluated by the training algorithm.

K(·,·) is a kernel function used to perform a non-linear classification by constructing an optimal hyperplane in a high dimensional projected space. Both a linear and a radial basis kernel function were tested on the GDS4222.soft data set. As it will be shown in the following section, in this case the linear kernel (which produces a linear classification) proves to be more robust with respect to overfitting. This is due to the fact that the classification problem is unbalanced (38 patients relapsed, while 92 did not), and moreover, the number of input attributes for classification far exceeds the number of training samples. The training algorithm was performed using the LIBSVM library, which is featured by the Rulex Analytics software.

DT

A DT is a graph where each node is associated with a condition based on an attribute of the input vector x (e.g. x_i > 5) and each leaf corresponds to an assignment for a specified output class. By navigating from a leaf to a root, a simple intelligible rule can be easily identified.¹³ DT is generated by adopting a ‘divide-and-conquer’ approach that provides disjoint rules. At each iteration, a new node is added to the DT by choosing the condition that best subdivides the training set S according to a specific measure of goodness.

In the present investigation, the information gain I_G (also called ‘the smallest maximum entropy’) was employed as goodness indicator function. In more detail, given a set Q and a partition in q subsets Q₁,…, Q_q, the information gain of Q with respect to the partition ${Q, Q_{j}}$ is defined by

I_{G} (Q, Q_{j}) = - \sum_{j = 1}^{q} \frac{| Q_{j} |}{| Q |} l o g_{2} \frac{| Q_{j} |}{| Q |}

where | . | indicates the number of elements in a set.

In our study, q = 1 identifies the subset of non-relapsed patients and q = 2 the subset of relapsed ones.

Finally, the pessimistic error pruning technique was adopted to reduce the complexity of the final DT and to increase its generalization ability. Briefly, let p be the error rate associated with a node s in a DT; all nodes and leafs below s are erased if the error p_–s associated with the node immediately below s exceeds the following quantity

+ 1.96 \sqrt{\frac{p (1 - p)}{n}}

where n represents the number of samples to be classified at the node r.¹⁴

LLM

LLM is an innovative method of supervised analysis based on an efficient implementation of the Switching Neural Network model,^15,16 which is associated with a classifier g( x ), described by a set of intelligible rules of the following type: if ‹premise› then ‹consequence›. The ‹premise› statement represents a logical product (AND) of conditions on the components of the input vector x and ‹consequence› provides a class assignment for the output y.

The general procedure employed to train an LLM passes through the following steps:

Discretization. Continuous and integer variables are properly discretized to reduce their variability, thus increasing the efficiency of the training algorithm and the accuracy of the resulting set of rules.

Binarization. Nominal and (discretized) ordered variables are coded into binary strings by adopting a suitable mapping that preserves ordering and distances.

Logic synthesis. Starting from the binarized version of the training set S, which can be viewed as a portion of a truth table, reconstruct the and–or expression of a consistent monotone Boolean function.

Rule generation. Transform every logical product of the and–or expression into an intelligible rule.

A valid and efficient way of performing Step 1 consists in adopting the attribute-driven incremental discretization (ADID),^17,18 which reduces the complexity of the input vector x while preserving the information included in the training set S concerning class discrimination. For each continuous or discrete input attribute, ADID is able to find a collection of separating points that lower its variability while maintaining its classification power. The core of ADID consists of an incremental algorithm that adds iteratively the cut-off scoring the highest value of a proper quality measure based on the capability of separating patterns of different classes. Smart updating procedures enable ADID to efficiently get an optimal discretization. Usually, ADID produces a minimal set of cut-offs for separating all the patterns belonging to different classes.¹⁶

Then, the (inverse) only-one coding¹⁵ is adopted at Step 2 to transform the training set S into a collection of binary strings that can be viewed as a portion of the truth table of a monotone Boolean function. Here, for each (binarized version of a) pattern x in S, the output is the class y, possibly coded in binary form if there are more than two classes.

To ensure a good generalization ability, the logic synthesis (Step 3) is performed via an optimized version of the Shadow Clustering (SC) algorithm,¹⁶ a proper technique for reconstructing monotone Boolean functions starting from a partially defined truth table. In contrast with methods based on a divide-and-conquer approach, SC adopts an aggregative policy, that is, at any iteration some patterns (coded in binary form) belonging to the same output class are clustered to produce an intelligible rule. A suitable heuristic approach is employed to generate implicants (rules) exhibiting the highest covering and the lowest error; a trade-off between these two different objectives generally leads to final models showing a good accuracy.

The training algorithm for LLM requires to define a single parameter ϵ, the maximum error that can be scored by each generated rule. In all our trials, we have used the value ϵ = 0.

Results

Table 2 resumes the performance of standard clinical classification in leave-one-out cross-validation, based on the IPS index, and that of the selected supervised methods. IPS correctly classified 68 per cent of total patients, with 37 per cent sensitivity and 80 per cent specificity.

Table 2.

Comparison between standard clinical classification by IPS score and the selected methods of supervised analysis in leave-one-out cross-validation.

Classification method	Global accuracy		Sensitivity		Specificity
Classification method	N	%	N	%	N	%
Standard clinical classification
IPS	88	67.7	14	36.8	74	80.4
Black-box methods
kNN
k = 1	97	74.6	17	44.7	80	87.0
k = 3	84	64.6	7	18.4	77	83.7
k = 5	90	69.2	6	15.8	84	91.3
ANN
One hidden neuron	92	70.8	17	44.7	75	81.5
Two hidden neurons	93	71.5	17	44.7	76	82.6
Three hidden neurons	91	70.0	13	34.2	78	84.8
SVM
RBF kernel	92	70.8	0	0.0	92	100
Linear kernel	106	81.5	21	55.3	85	92.4
Rule-based methods
DT	85	65.4	18	47.4	67	72.8
LLM	91	70.0	17	44.7	74	80.4

kNN: k-Nearest Neighbour classifier; ANN: Artificial Neural Network; SVM: Support Vector Machine; RBF: Radial Basis Function; LLM: Logic Learning Machine; DT: Decision Tree; IPS: International Prognostic Score; N: number of patients correctly classified.

Among the three black-box methods, the best performance was achieved by SVM with linear kernel (global accuracy = 82%, sensitivity =55%, specificity = 92%). kNN with k = 1 also outperformed the standard clinical classification (global accuracy = 75%, sensitivity = 45%, specificity = 87%), whereas models with higher k values showed a poor performance and, in particular, a very low sensitivity. With regard to ANN, the model with two hidden neurons shows the highest performance, which lies between that of IPS only and that of kNN (global accuracy = 72%, sensitivity = 45%, specificity= 83%).

Among the two considered rule-based methods, LLM showed the best performance (global accuracy =70% vs 65% for DT), even if sensitivity was slightly lower (45% vs. 47%).

When the analysis was repeated on the whole data set, LLM selected 25 rules that included a minimum of two and a maximum of six conditions; the corresponding covering ranged between 2.2 and 53.3 per cent.

Table 3 shows the rules generated by LLM after the exclusion of those with a low coverage (<20%). This restriction was made in order to reduce the effect of outliers, thus allowing a more reliable comparison with DT after the pruning procedure.

Table 3.

Classification rules identified by the Logic Learning Machine on the whole data set.

No.	Relapse	Condition 1	Condition 2	Condition 3	Condition 4	Covering %
1	No	Stage 1 or 2	MS4A3 > 1.729	MUC5AC > 2.701	–	53.3
2	No	Stage 3	MS4A3 > 1.729	RPS8 > 11.432	–	51.1
3	No	Stage 2 or 4	RPS8 > 11.432	MUC5AC > 2.701	–	40.2
4	No	Female gender	MS4A3 > 1.729	–	–	38.0
5	No	Stage 1 or 2	RPS8 > 11.432	DMD > 1.989	–	34.8
6	No	MS4A3 > 1.729	DMD > 1.989	MUC5AC > 2.701	Low IPS	34.8
7	No	Male gender	Stage 2	MUC5AC > 2.701	–	29.3
8	No	Female gender	Stage 3	DMD > 1.989	–	27.2
9	No	Male gender	Stage 3	RPS8 >11.432	MUC5AC > 2.701	26.1
10	Yes	Male gender	Stage 3 or 4	MUC5AC = 2.701	–	26.3
11	Yes	Stage 2 or 4	MS4A3 = 1.729	MUC5AC = 2.701	–	26.3
12	Yes	Male gender	Stage 1 or 4	RPS8 = 11.432	–	21.1
13	Yes	Stage 3	RPS8 = 11.432	DMD = 1.989	MUC5AC = 2.701	21.1

IPS: International Prognostic Score.

The 13 out of 25 rules with at least 20 per cent of covering are shown.

All the LLM rules included at least one clinical or demographic characteristic of patients. On the whole, LLM identified four features relevant for classification, all inversely associated with the occurrence of relapse (namely, MS4A3, RPS8, DMD and MUC5AC). With regard to clinical conditions, advanced stages (3 and 4) were more often associated with relapse, but with some exceptions (e.g. rule 2, condition 1). IPS was included in only one rule (no. 6), and as expected, a low value corresponded to the absence of relapse. Finally, gender was included in 6 out of 13 rules. Among the four rules identifying relapsed patients, two included males (no. 10 and no. 12, respectively, condition 1), whereas females were never selected.

Figure 1 shows the classifier obtained by DT. Classification was performed by seven rules that involved gene expression only (namely, XIST, EPOR, GPR82, AV719529 and KIAA1430). A prediction of relapse was associated with low values of XIST and GPR82 and high values of AV19529 and KIAA1430.

Figure 1.

Classification obtained by DT on the whole data set. Percentages indicate the covering of each rule.

Discussion

Despite advances in therapeutic treatment, about 20 per cent of HL patients eventually die, whereas a similar proportion is likely to be over-treated.⁸ The large availability of new potential tumour markers for HL prognosis, including genome-wide gene expression data, might contribute to the improvement of the performance of IPS in predicting patient survival.^7,8,19

Many supervised methods of data analysis are available to exploit and combine information from new tumour markers and clinical prognostic factors. In particular, ANN, kNN and the more recent SVM have shown a high accuracy in predicting survival of cancer patients when applied to gene expression data in many different clinical settings.^20–26 However, such algorithms are usually referred to as ‘black-box’ methods since classification is made through a mathematical formula that makes it difficult to evaluate the biological and clinical role of variables included in the analysis. Conversely, algorithms based on intelligible threshold rules, like DT and the recently proposed LLM, can provide useful information for a better understanding of tumour biology and for addressing therapeutic approaches.^18,23

The good performance of LLM compared to that of common supervised techniques was demonstrated in a set of biomedical studies.^18,27,28 However, different from DT,^20–29 LLM has been never applied for classification purposes to large databases of highly correlated features, such as microarray gene expression data. In this study, in agreement with results from previous investigations, LLM showed a performance quite similar to that of some common competing black-box methods (ANN and kNN), but lower than that of SVM.

LLM outperformed DT and was able to combine information from clinical variables with expression values from a small panel of selected genes. In particular, stage and gender were in some cases associated in the same rule, but never associated with IPS (Table 2). Since IPS is constructed using clinical variables that also include stage and gender,⁵ this finding suggests that LLM tends to reject redundant information. Furthermore, a low IPS score, a low stage at diagnosis and female gender were more often associated with a good prognosis, in agreement with knowledge from previous investigations.⁵

Taken together, these results suggest that the combination of clinical data and gene expression features could provide useful information for assessing the prognosis of HL patients. This observation is in agreement with previous studies on different malignancies, indicating that clinical information can enrich microarray data in identifying a suitable classifier for the prediction of cancer survivability.^23,30,31

Gene expressions selected by LLM were all different from those identified by DT, and they also differed from the 30 most relevant features identified by the original analysis. However, MUC5AC and EPOR were also included into the complete list of differentially expressed genes reported by Steidl et al.⁸ The four genes identified by LLM were all under-expressed in relapsed patients. MS4A3 (membrane-spanning 4-domains subfamily A member 3) is localized in 11q12 and encodes a membrane protein probably involved in signal transduction.³² Interestingly, MS4A3 belongs to the same membrane-spanning 4-domains gene subfamily of MS4A4, which was recognized to be associated with HL prognosis in previous investigations.³³ RPS8 is localized in 1p34.1-p32 and encodes a ribosomal protein that is a component of the 40S subunit.³⁴ DMD (dystrophin) locates at Xp21.2 and is a highly complex gene, containing at least eight independent, tissue-specific promoters and two polyA-addition sites.³⁵ Finally, MUC5AC is located in 11p15.5³⁶ and encodes for a protein (mucin) involved in secretion of gastrointestinal mucosa.

With regard to DT, genes with a known function ( http://www.ncbi.nlm.nih.gov/gene ) include the following: XIST (X inactive specific transcript), which is a non-coding gene located in Xq13.2, involved in the inactivation of X chromosome in human females,³⁷ and EPOR, located in 19p13.3-p13.2, which encodes an erythropoietin receptor.³⁸ Moreover, GPR82, localized in Xp11.4, encodes for a protein with unknown function but is a member of a family of proteins that contain seven transmembrane domains and transduce extracellular signals through heterotrimeric G proteins.³⁹ Interestingly, partly consistently with our observation of a higher relapse probability among subjects with low XIST expression, XIST was demonstrated to activate apoptosis in T lymphoma cells via ectopic inactivation of the X chromosome.⁴⁰ In our data, XIST was strongly overexpressed among females (data not shown), thus potentially providing a new insight about the biological mechanism at the basis of the better prognosis commonly observed among females.

Results of our study may be prone to some limitations. In particular, we selected the GDS4222 data set because, at least to our knowledge, it was among the biggest publicly available gene expression databases including information about prognosis of HL patients. However, as a whole, its sample size (130 patients, including 38 relapsed) was too small to allow drawing definitive conclusions, and all findings reported in our study need confirmation by other independent investigations. Furthermore, sensitivity of any applied method (including SVM) was unsatisfactory (<60%). In fact, in the presence of unbalanced outcomes, as in our study, rules extracted from LLM can be weighted to improve their accuracy.¹⁷ According to this property, we tried to reclassify a posteriori the patients under study by assigning a 1:10,000 weight in favour of relapsed outcome, but also in this further analysis sensitivity never achieved 60 per cent (data not shown), pointing out that the limit of 60 per cent for sensitivity is difficult to be exceeded for any of the considered methods. The lack of potentially relevant clinical information (e.g. absolute lymphocyte count, age at diagnosis and first line treatment) and the poor measure of the outcome, which did not include time-to-event values, could have contributed to lowering the sensitivity of our study. Moreover, we performed all the analyses without applying any pre-filtering technique to the data under study. Previous investigations have demonstrated that the performance of supervised methods can be enhanced by applying pre-filtering and feature selection methods, which can reduce overfitting.^41–43 Their effect on LLM classification has not been investigated yet.

Conclusion

LLM provided simple intelligible rules that could contribute to the knowledge of HL biology and to address therapeutic approaches by combining clinical information and gene expression data.

The role of genes identified by both LLM and DT in the clinical course of HL patients should be investigated in further studies. In particular, the higher expression of XIST in patients with a good outcome and among females might be related to the still unknown factors favouring the better prognosis of female patients with HL.

Footnotes

Acknowledgements

Stefano Parodi is a research fellow of the Italian MIUR Flagship project ‘InterOmics’.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship and/or publication of this article.

References

Parodi

Stagnaro

. Hodgkin’s Disease Worldwide – Incidence, Mortality, Survival, Prevalence and Time Trend. New York: Nova Science Publisher, 2009, pp. 1–8.

Banerjee

. Recent advances in the pathobiology of Hodgkin’s Lymphoma: potential impact on diagnostic, predictive, and therapeutic strategies. Adv Hematol 2011; 2011: 439456.

Swerdlow

Campo

Harris

et al . WHO Classification of Tumours of Haematopoietic and Lymphoid Tissues. 4th ed. Lyon: IARC Press, 2008.

Ansell

. Hodgkin Lymphoma: 2014 update on diagnosis, risk-stratification, and management. Am J Hematol 2014; 89: 771–779.

Hasenclever

Diehl

. A prognostic score for advanced Hodgkin’s disease: international prognostic factors project on advanced Hodgkin’s disease. New Engl J Med 1998; 339: 1506–1514.

King

Howard

Bagg

. Hodgkin Lymphoma: pathology, pathogenesis, and a plethora of potential prognostic predictors. Adv Anat Pathol 2014; 21: 12–25.

Cuccaro

Bartolomei

Cupelli

et al . Prognostic factors in Hodgkin lymphoma. Mediterr J Hematol Infect Dis 2014; 6: e2014053.

Steidl

Lee

Shah

et al . Tumor-associated macrophages and survival in classic Hodgkin’s lymphoma. New Engl J Med 2010; 362: 875–885.

Azambuja

Natkunam

Biasoli

et al . Lack of association of tumor-associated macrophages with clinical outcome in patients with classical Hodgkin’s lymphoma. Ann Oncol 2012; 23: 736–742.

10.

Sánchez-Espiridión

Martin-Moreno

Montalbán

et al . Immunohistochemical markers for tumor associated macrophages and survival in advanced classical Hodgkin’s lymphoma. Haematologica 2012; 97: 1080–1084.

11.

Edgar

Domrachev

Lash

. Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 2002; 30: 207–210.

12.

Dudoit

Fridlyand

. Classification in microarray experiments. In: Speed

(ed.) Statistical analysis of gene expression microarray data. Boca Raton, FL: Chapman & Hall, 2003, pp. 93–158.

13.

Michie

Spiegelhalter

Taylor

. Machine learning: neural and statistical classification. Chichester: Ellis Horwood, 1999.

14.

Quinlan

. C4.5 programs for machine learning. San Francisco, CA: Morgan Kaufmann Publishers, 1992.

15.

Muselli

. Switching neural networks: a new connectionist model for classification. In: Apolloni

Marinaro

Nicosia

et al . (eds) Lecture notes in computer science (vol. 3931). Berlin: Springer-Verlag, 2006, pp. 23–30.

16.

Muselli

Ferrari

. Coupling logical analysis of data and shadow clustering for partially defined positive Boolean function reconstruction. IEEE T Knowl Data En 2011; 23: 37–50.

17.

Ferrari

Muselli

. Maximizing pattern separation in discretizing continuous features for classification purposes. In: Proceeding of the 2010 international joint conference on neural networks (IJCNN), Barcelona, 18–23 July 2010.

18.

Cangelosi

Muselli

Parodi

et al . Use of attribute driven incremental discretization and logic learning machine to build a prognostic classifier for neuroblastoma patients. BMC Bioinformatics 2014; 15(Suppl. 5): S4.

19.

Montalbán

García

Abraira

et al . Influence of biologic markers on the outcome of Hodgkin’s lymphoma: a study by the Spanish Hodgkin’s lymphoma study group. J Clin Oncol 2004; 22: 1664–1673.

20.

Chen

et al . A five-gene signature and clinical outcome in non-small-cell lung cancer. New Engl J Med 2007; 356: 11–20.

21.

Chen

Chiu

. Risk classification of cancer survival using ANN with gene expression data from multiple laboratories. Comput Biol Med 2014; 48: 1–7.

22.

Shi

Beauchamp

Zhang

. A network-based gene expression signature informs prognosis and treatment for colorectal cancer patients. PLoS One 2012; 7: e41292.

23.

Cruz

Wishart

. Applications of machine learning in cancer prediction and prognosis. Cancer Inf 2007; 2: 59–77.

24.

Sørlie

Perou

Fan

et al . Gene expression profiles do not consistently predict the clinical treatment response in locally advanced breast cancer. Mol Cancer Ther 2006; 5: 2914–2918.

25.

Lisboa

Taktak

. The use of artificial neural networks in decision support in cancer: a systematic review. Neural Netw 2006; 19: 408–415.

26.

Barrier

Lemoine

Boelle

et al . Colon cancer prognosis prediction by gene expression profiling. Oncogene 2005; 24: 6155–6164.

27.

Muselli

Costacurta

Ruffino

. Evaluating switching neural networks through artificial and real gene expression data. Artif Intell Med 2009; 45: 163–171.

28.

Muselli

. Extracting knowledge from biomedical data through Logic Learning Machines and Rulex. EMBnet J 2012; 18: 56–58.

29.

Irshad

Bansal

Castillo-Martin

et al . A molecular signature predictive of indolent prostate cancer. Sci Transl Med 2013; 5: 202ra122.

30.

Futschik

Reeve

Kasabov

. Evolving connectionist systems for knowledge discovery from gene expression data of cancer tissue. Artif Intell Med 2003; 28: 165–189.

31.

Miyake

Fujisawa

. Prognostic prediction following radical prostatectomy for prostate cancer using conventional as well as molecular biological approaches. Int J Urol 2013; 20: 301–311.

32.

Adra

Lelias

Kobayashi

et al . Cloning of the cDNA for a hematopoietic cell-specific protein related to CD20 and the beta subunit of the high-affinity IgE receptor: evidence for a family of proteins with four membrane-spanning regions. Proc Natl Acad Sci U S A 1994; 91: 10178–10182.

33.

Steidl

Connors

Gascoyne

. Molecular pathogenesis of Hodgkin’s lymphoma: increasing evidence of the importance of the microenvironment. J Clin Oncol 2011; 29: 1812–1826.

34.

Davies

Fried

. The structure of the human intron-containing S8 ribosomal protein gene and determination of its chromosomal location at 1p32-p34.1. Genomics 1993; 15: 68–75.

35.

Zimowski

Fidziańska

Holding

et al . Two mutations in one dystrophin gene. Neurol Neurochir Pol 2013; 47: 131–137.

36.

Guyonnet Duperat

Audie

Debailleul

et al . Characterization of the human mucin gene MUC5AC: a consensus cysteine-rich domain for 11p15 mucin genes? Biochem J 1995; 305: 211–219.

37.

Weakley

Wang

Yao

et al . Expression and function of a large non-coding RNA gene XIST in human cancer. World J Surg 2011; 35: 1751–1756.

38.

Lisowska

Frackowiak

Mikosik

et al . Changes in the expression of transcription factors involved in modulating the expression of EPO-R in activated human CD4-positive lymphocytes. PLoS One 2013; 8: e60326.

39.

Lee

Nguyen

Lynch

et al . Discovery and mapping of ten novel G protein-coupled receptor genes. Gene 2011; 275: 83–91.

40.

Agrelo

Souabni

Novatchkova

et al . SATB1 defines the developmental context for gene silencing by Xist in lymphoma and embryonic cells. Dev Cell 2009; 16: 507–516.

41.

Bala

Huang

Vafaie

et al . Hybrid learning using genetic algorithms and decision trees for pattern classification. In: Proceedings of the 14th international joint conference on Artificial intelligence, Montreal, WI, 20 August 1995, pp. 719–724. San Francisco, CA: Morgan Kaufmann Publishers.

42.

Hsu

. Genetic wrappers for feature selection in decision tree induction and variable ordering in Bayesian network structure learning. Inform Sciences 2004; 163: 103–122.

43.

Hajiloo

Rabiee

Anooshahpour

. Fuzzy support vector machine: an efficient rule-based classification technique for microarrays. BMC Bioinformatics 2013; 14(Suppl. 13): S4.

Logic Learning Machine and standard supervised methods for Hodgkin’s lymphoma prognosis using gene expression data and clinical variables

Abstract

Keywords

Introduction

Materials and methods

Database description

Supervised data mining methods

kNN

ANN

SVM

DT

LLM

Results

Discussion

Conclusion

Footnotes

Acknowledgements

Declaration of conflicting interests

Funding

References