Abstract
Deep Neural Networks (DNNs) have powerful recognition abilities to classify different objects. Although the models of DNNs can reach very high accuracy even beyond human level, they are regarded as black boxes that are absent of interpretability. In the training process of DNNs, abstract features can be automatically extracted from high-dimensional data, such as images. However, the extracted features are usually mapped into a representation space that is not aligned with human knowledge. In some cases, the interpretability is necessary, e.g. medical diagnoses. For the purpose of aligning the representation space with human knowledge, this paper proposes a kind of DNNs, termed as Conceptual Alignment Deep Neural Networks (CADNNs), which can produce interpretable representations in the hidden layers. In CADNNs, some hidden neurons are selected as conceptual neurons to extract the human-formed concepts, while other hidden neurons, called free neurons, can be trained freely. All hidden neurons will contribute to the final classification results. Experiments demonstrate that the CADNNs can keep up with the accuracy of DNNs, even though CADNNs have extra constraints of conceptual neurons. Experiments also reveal that the free neurons could learn some concepts aligned with human knowledge in some cases.
Keywords
Get full access to this article
View all access options for this article.
