We present new techniques for the application of a Bayesian network learning framework
to the problem of classifying gene expression data. The focus on classification permits us to
develop techniques that address in several ways the complexities of learning Bayesian nets.
Our classification model reduces the Bayesian network learning problem to the problem of
learning multiple subnetworks, each consisting of a class label node and its set of parent
genes. We argue that this classification model is more appropriate for the gene expression
domain than are other structurally similar Bayesian network classification models, such as
Naive Bayes and Tree Augmented Naive Bayes (TAN), because our model is consistent with
prior domain experience suggesting that a relatively small number of genes, taken in different
combinations, is required to predict most clinical classes of interest. Within this framework,
we consider two different approaches to identifying parent sets which are supported by
the gene expression observations and any other currently available evidence. One approach
employs a simple greedy algorithm to search the universe of all genes; the second approach
develops and applies a gene selection algorithm whose results are incorporated as a prior
to enable an exhaustive search for parent sets over a restricted universe of genes. Two
other significant contributions are the construction of classifiers from multiple, competing
Bayesian network hypotheses and algorithmic methods for normalizing and binning gene
expression data in the absence of prior expert knowledge. Our classifiers are developed
under a cross validation regimen and then validated on corresponding out-of-sample test
sets. The classifiers attain a classification rate in excess of 90% on out-of-sample test sets
for two publicly available datasets. We present an extensive compilation of results reported
in the literature for other classification methods run against these same two datasets. Our
results are comparable to, or better than, any we have found reported for these two sets,
when a train-test protocol as stringent as ours is followed.