Abstract
This position paper discusses relationships among hybrid neural-symbolic models, dual-process theories, and cognitive architectures. It provides some historical backgrounds and argues that dual-process (implicit versus explicit) theories have significant implications for developing neural-symbolic (neurosymbolic) models. Furthermore, computational cognitive architectures can help to disentangle issues concerning dual-process theories and thus help the development of neural-symbolic models (in this way as well as in other ways).
Keywords
Introduction
In this relatively short position paper, I will briefly discuss hybrid neural-symbolic models, dual-process theories, and cognitive architectures, as well as mutually beneficial relationships among them and their relevance to each other. I will first provide some historical backgrounds. Based on that, I will argue that dual-process (implicit versus explicit) theories have significant implications for developing neural-symbolic (or neurosymbolic) models. Furthermore, through computational means, cognitive architectures can help to disentangle complex issues concerning dual-process theories and thus help with the further development of hybrid neural-symbolic models (in this as well as other ways). The present position paper inevitably reflects, of course, a personal perspective resulting from personal experiences, and is not meant to be comprehensive and complete (nor technically detailed).
Neural networks and deep learning are good at many things, but they have shortcomings and limitations, despite their popularity, as has long been known (e.g., difficulties with certain types of extrapolation, generalization, abstraction, explanation, and so on; see, e.g., very early arguments at the beginning of “new connectionism”, such as Fodor and Pylyshyn [11]; Marcus [27]; Pinker and Prince [33]).1?> But, as many have noticed, neural networks and symbolic models have some complementary strengths. Thus, combining them in some way seems a good idea naturally, and the resulting models can be advantageous – For instance, they can be more expressive, more versatile, or more capable otherwise. Hence the idea of hybrid neural-symbolic (or neurosymbolic) models was in place, transcending many heated and conflicting claims of advantages and disadvantages of each paradigm (Sun [40,45]).
This idea immediately harkens back to the 1990s when hybrid models
Despite the fact that this important (or even crucial) idea was very much ignored or downplayed early on by the mainstreams on both the neural and the symbolic side, by now, even some strong advocates for symbolic AI have suggested that hybrid neural-symbolic models should indeed be pursued (e.g., Kautz [21]). At the same time, some strong advocates for neural networks and deep learning have also started to see that (e.g., LeCun [24]). When people from the two opposing camps converge onto the same idea (along with people who have always been on that ground somewhere in the middle), the likelihood of this idea leading to major advances has just skyrocketed. (Some of them even invoked good old dual-process theories in support of their arguments; more on this later.) There have also been quite a number of recent events promoting this theme (which I shall not enumerate, for this would take far too long), including, in particular, the launch of the present journal.
However, there have been many possible ways of structuring such models; that is, there exist many different ways of combining symbolic processing and neural networks (Sun [40]). Many of them, however, appear ad hoc or task specific, or otherwise problematic. Therefore, a major question that immediately comes to mind in regard to such hybrid models is: how
Some early history (or what you might call pre-history)
Some early work on neural-symbolic models is worth mentioning here, due to their close relevance to what we have discussed thus far. Even during the very early days of such models (e.g., in the 1990s), there was some important work that had been done. One may choose to view such early work as the pre-history to today’s neural-symbolic models (especially when they involve complex deep neural networks that were not yet fully developed or popularized back then), or may just simply view it as the early history of this evolving field, thus conceiving a more continuous process of development and progression, In this regard, it should be noted that in the 1990s, there were some edited volumes (books) with contributions from many authors, which meant that they were likely representative of the state of the art of the work done during that period and contained many different models and ideas, some of which may still be relevant and useful today. Therefore, we shall have a look.
One of these books was Sun and Bookman [50] (which, by the way, happened to be the first such volume on the topic of neural-symbolic models). One of the contributions of this book (and some other related ones) is dealing with the
For example, some general categories discussed in Sun and Bookman [50] included: (1) designing specialized, structured, localist neural networks for symbolic processing (e.g., some forms of rule-based reasoning); (2) embedding symbolic processing within usual neural networks with distributed representations; (3) having separate neural and symbolic processing modules but connecting them; (4) using neural networks as elements (modules) in a structured, symbolic architecture (see Sun and Bookman [50] for details of these categories). On the other hand, from the perspective of a whole system, (1) a system could be a single-module system, in which (1.1) representations can be symbolic, localist, or distributed, and (1.2) mappings between symbols and representations could be direct, translational, or transformational (fully utilizing complex neural dynamics); or (2) it could be a multi-module system, which (2.1) could be either homogeneous or heterogeneous in terms of representation or structure and (2.2) could vary in terms of granularity of modules, and (2.3) the relationship between modules could vary, for example, either loosely or tightly coupled. There can also be finer distinctions and variations (Sun and Alexandre [49]). Symbolic reasoning, logic, and probabilistic reasoning could have various neural implementations or approximations. These ideas represented the state of the art in the 1990s, so, obviously, there are many newer methods and ideas by now (e.g., back then, there was no sophisticated deep learning or Big Data). However, note that these models and methods were designed to address the fundamental shortcomings of purely neural or purely symbolic models; these shortcomings were known even back then but remain a challenge today despite the progress that has been made. Many of the concepts described above (such as localist versus distributed representations, single- versus multi-module systems, loosely versus tightly coupled modules, and so on) and many common structures of neural-symbolic systems identified back then (such as direct mapping of symbolic models to neural networks, embedding symbolic processing within neural networks, multiple neural and/or symbolic modules with different control and coordination methods, and so on) remain viable in describing today’s neural-symbolic models.
However, one question that one faced then, and is still facing today, is: Given these options, that is, given these many different ways of combining symbolic processing and neural networks, structurally, mechanistically, and process-wise, how can we produce better hybrid models going forward? That is, how should we best structure and implement them, choosing among all these options? In particular, how do we structure them in a principled and justifiable way, rather than in an ad hoc or task-specific way?
I argued back then and am still arguing today (e.g., in some of my earlier or more recent books, such as Sun [38,41,47]) that a better approach is structuring them in a cognitively (i.e., psychologically) motivated and justified way – that is, based on human mental architecture (which is, in turn, inferred based on experimental psychology, neuroscience, and so on; see Newell [31]; Sun [42]). In particular, within the human mental architecture, we need to take into account dual processes (e.g., as has been variously termed as implicit versus explicit, unconscious versus conscious, intuition versus reason, System 1 versus System 2, and so on, albeit sometimes with somewhat different connotations). Incidentally, dual-process (or two-system) theories have become quite popular lately (see, e.g., Kahneman [19]), although they are not new ideas either (see, e.g., Reber [34]; Sun [42]; and others). Dual-process theories can provide both philosophical and psychological justifications for (at least some of) hybrid neural-symbolic models. In fact, they constitute both theoretical and empirical grounding for hybrid neural-symbolic models.
Dual-process theories
So, at this juncture, we should examine (however briefly) dual-process theories that have been proposed in the literature – There have been quite a few of them out there, and their historical roots go even deeper. It is worthwhile to look into them here for the sake of better understanding of neural-symbolic models.
Theoretical or philosophical proposals concerning two types of mental processes have long existed, dating back to even before the inception of cognitive science and artificial intelligence. For instance, Martin Heidegger’s distinction of the pre-ontological versus the ontological is an early, very abstract version of such a duality (Heidegger [15]). William James’s [18] distinction between “empirical thinking” and “true reasoning” is even more evidently relevant. Going back even earlier, Immanuel Kant’s [20] dictum was: “Thoughts without intuition are empty, intuition without concepts is blind”. “Intuition and concepts constitute... the elements of all our knowledge, so that neither concepts without an intuition in some way corresponding to them, nor intuition without concepts, can yield knowledge” (Kant [20]).
In contemporary experimental psychology and neuroscience, there have been empirically based proposals in this regard. The distinction of implicit and explicit processes has been empirically demonstrated in the implicit memory literature (e.g., Schacter [35]). The distinction of implicit and explicit processes has also been empirically demonstrated in the implicit learning literature (e.g., Reber [34]). In social psychology, there have been dual-process models that are roughly based on the co-existence of implicit and explicit processes (see Chaiken and Trope [3] for some examples). Similar distinctions have been proposed by other researchers, based on similar or different empirical or theoretical considerations (e.g., Erickson and Kruschke [8]; Grossberg [14]; McClelland, McNaughton, and O’Reilly [28]; Milner and Goodale [30]; Sun [38,39]; etc.).
Here is my version of a dual-process theory from a long time ago. In my 1994 book, I made the following distinction:
“…
That is, the two “levels” (the two systems or modules, or two types of processes) encode somewhat similar content, but they encode their content in different ways: Symbolic versus subsymbolic representations are used, respectively. Therefore, they utilize different mechanisms. They can thus have qualitatively different “flavors”, including different degrees of accessibility (leading to different degrees of conscious awareness). One may be termed the implicit level, and the other the explicit level (or, alternatively, they may be termed System 1 and System 2, respectively, or whatever other terms one may wish to use as mentioned earlier).
Furthermore, according to this account, a highly plausible reason for having the two different levels is that these different levels (with different representations and different mechanisms) can potentially work together synergistically, complementing and supplementing each other (as argued, in detail, in Sun [38,39,41], and Sun et al. [51]; see also Booch et al. [2] and Kautz [21] for more recent views). This is probably, at least in part, the reason why there are these two levels – This likely contributes to why nature has chosen this design (in an evolutionary sense, of course). I referred to the above two points as the dual representation hypothesis and the synergy hypothesis, respectively (see Sun [41,46]).
A number of other dual-process theories exist, although some of them can be problematic sometimes. For instance, a comparatively more recent view (compared with those precedent views mentioned earlier) was proposed by Kahneman (e.g., [19]). The main ideas of his were as follows: There are two styles of processing: intuition and reasoning. Intuition (or System 1) is based on associative processes, fast and automatic, involving strong emotional bonds, based on formed habits, and difficult to change or manipulate. Reasoning (or System 2) is slower, more volatile, and subject to conscious judgments and attitudes.
At roughly around the same time, Evans [9] espoused essentially the same view. According to Evans, System 1 is “rapid, parallel and automatic in nature: only their final product is posted in consciousness”. He also noted the “domain-specific nature” of System 1 learning. System 2, on the other hand, is “slow and sequential in nature and makes use of the central working memory system”; it “permits abstract hypothetical thinking that cannot be achieved by System 1”. Further, there is the “default-intervention” relationship between the two systems: System 1 forms the default response unless there is active intervention from System 2
However, it should be worth noting that some of these claims above, unfortunately, may be somewhat simplistic. For one thing, intuition can be slow sometimes (see, e.g., Helie and Sun [16]). For another, intuition can be subject to conscious control and manipulation; that is, it may not be entirely “automatic” (e.g., Curran and Keele [5]; Stadler [37]). Furthermore, intuition can be subject to conscious “judgment” (e.g., Libet [25]). Moreover, explicit thinking may be engaged right from the outset and can be constantly present, not necessarily just an occasional intervention, and there can be complex interactions between the two systems (Helie and Sun [16]). The reader is referred to Sun [44,46] for detailed discussions of these points. For various alternative dual-process views, see also Evans and Frankish [10] and Macchi et al. [26].
To come up with a more nuanced and more precise characterization of the two systems, it is important that we ask some pertinent and deeper questions, taking into account architectural and mechanistic nuances of the mind. For instance, for either type of process, there can be the following important questions (that are mechanistic and/or process-oriented):
How deep is the processing (in terms of precision, certainty, and so on)?
How broad is the processing (e.g., how much information is involved)?
How degraded (incomplete, inconsistent, or uncertain) is the information available?
How typical or atypical is the situation to be addressed?
Is the process involved procedural or declarative?
How many processing cycles are needed considering the factors above?
There are many other similar or related questions, including those related to further divisions of modules within each level and across levels, such as procedural versus declarative modules (see, e.g., Anderson and Lebiere [1]) and their relations to the implicit-explicit distinction. (Note that the relationship between System 1 versus 2 and procedural versus declarative processes is a topic that is worth an article-length or even a book-length treatment by itself. It is certainly not simply the case that System 2 involves only declarative processes with complex knowledge representation, while System 1 involves only procedural processes with simple learning, as some researchers in the field seem to believe. See Sun [43] for details.)
These questions need to be addressed, to ensure cognitive-psychological realism of theories or models. To put it in a different way, the complexity of the matter should be adequately understood and should be properly taken into account, especially in any mechanistic models or theories that purport to value cognitive-psychological realism (Sun [46]).2 Hybrid neural-symbolic models belong (or, at least, should belong) to this category of models.
Computational cognitive architectures
To better understand relevant issues and to better answer these questions raised earlier, one would certainly like to have an overall theoretical framework
Computational cognitive architectures provide an overall framework that facilitates more detailed explorations of the mind (e.g., through providing conceptual tools and constraints; Laird et al. [22]; Taatgen and Anderson [52]). This general framework can then lead to specific theories about components or functionalities of the mind. We accomplish the above through specifying computational details of cognitive (psychological) mechanisms and processes; in other words, we embody our theories of the mind and its various functionalities in computer programs. In so doing, we generate runnable computer programs, which enables the simulation of empirical human data; simulation results can then be compared to human data, in a qualitative or quantitative way, to validate the theories or to develop better ones. Through this iterative and detailed process of comparing human data and simulations, we can achieve better understanding of the inner workings of the human mind. This is the essence of computational psychology (Sun [42]). With cognitive architectures, we focus first on a general theory (a general framework), then move on to more specific theories (Newell [31]).
As an example, the Clarion cognitive architecture is a dual-process, dual-representation cognitive architecture, incorporating both implicit and explicit processes. It uses dual representations, that is, both symbolic and neural representations, for computationally capturing and differentiating explicit and implicit processes (see Sun [41,47] regarding Clarion; see also Sun [38] for ideas leading up to it). Clarion has been aimed to be an
One can view modeling and simulation based on cognitive architectures as a way of theoretical interpretation (albeit in a detailed and precise, namely mechanistic, fashion). In this way, Clarion has indeed helped to clarify many issues related to dual processes mentioned earlier. For instance, some claimed that implicit processes (“intuition”) are always faster than explicit processes (“reasoning”). In terms of procedural processes, this is generally true, using theoretical interpretation through simulation (e.g., Sun et al. [51]). In terms of declarative processes, however, this is generally not true (e.g., Helie and Sun [16]). See Sun ([44,46]) for analysis of these and other issues related to dual-process theories and beyond.
The established methodology of developing computational cognitive architectures on the basis of empirical data and findings from psychology, neuroscience, and other disciplines, constitutes, at the same time, a principled (and thus arguably preferred) way of developing neural-symbolic systems in general, regardless of whether these systems are for theoretical purposes or for practical applications (cf. Pew and Mavor [32]; Sun [42]). In relation to practical applications, the advantages of this approach include, for example, that a system resulting from this methodology will likely be more similar to humans in some fundamental way, that it is possible that such a system is more easily understood by humans (and vice versa), and that it may be easier for such a system to communicate with humans, not just through language, but also through other explicit or implicit means of communication (Sun [48]).
Some further remarks
Without going further into details of how dual processes can be (and indeed have been) further developed conceptually and mechanistically within Clarion and in turn help the development of neural-symbolic models (which is due to the space limit, for one thing, but also due to the nature of a position paper such as this), I would like to simply highlight a few key points:
Dual-process theories have significant implications for developing computational cognitive architectures and cognitively-psychologically realistic neural-symbolic models (Sun [38,40,41]).
If cognitive-psychological realism is what one wants to achieve in developing theories or models, dual-process theories must be taken into consideration, not just as an option, not merely for expediency, but as a rather fundamental requirement (Sun [46]). (Of course, it is obvious that cognitive-psychological realism may not be for everyone, especially those who are only concerned with AI applications in the short run. But, even in those cases, there have been some calls for dual-process theories to be applied, although mostly for the sake of practical expediency.)
Dual-process theories can serve as the inspiration (and both theoretical and empirical justifications) for computational cognitive architectures and neural-symbolic models in general (especially cognitively-psychologically motivated ones).
Recent developments of deep learning (but also see, e.g., Schmidhuber [36] for its long history) have been exciting, especially the successes that convolutional neural networks, transformers, large language models based on transformers, computer vision based on deep neural networks, and other similar fronts have been enjoying. Many ideas that have been developing over the past three to four decades came together in a metaphoric big bang. Of course, such developments have also introduced many new issues and many new possibilities in relation to neural-symbolic models (Garcez et al. [13]; Garcez and Jiménez-Ruiz [12]; Hitzler et al. [17]), including, for example, probabilistic approaches towards integrating neural and symbolic processes, reasoning on the basis of large language models, and so on. However, there are reasons to believe that these fundamental points that were made earlier (such as these listed above) still stand (although adjustments and adaptations might be needed along the way naturally). It should be pointed out that the various recent calls (in some presidential speeches no less) for attention to the distinction between System 1 and System 2 in developing artificial intelligence systems involving deep learning, although timely, are nothing new either – Systems and models with the same or similar distinctions have been developed at one point or another, or (in some cases) have been in continuous development, non-stop, since the 1990s. Such ideas have stood the test of time.
Going back to the points and categories mentioned earlier in our brief review of the early history of neural-symbolic models, dual-process theories point to multi-module, multi-representational, heterogeneous neural-symbolic systems as the basis of the human mental architecture (as discussed in the early years in Sun and Bookman [50]; Sun and Alexandre [49]; Wermter and Sun [53]). Further empirical and/or theoretical support pointing in that direction has been mounting over the past several decades since then.
It is worth repeating that, despite the fact that dual-process theories are quite popular right now (as clearly indicated by the recent surge of interests in System 1 and System 2), some important issues involved in dual-process theories (e.g., relative speed, intention, control, and interaction) are more complex than often assumed. In this regard, it is essential to gain a more fine-grained understanding, and further work is needed to disentangle these issues (Sun [46]). These issues, however, are crucial in further developing cognitive architectures (either enhancing existing ones or starting anew, but beyond just a standard model; cf. Laird et al. [22]), and in turn, cognitive architectures can help in disentangling these and other theoretically important issues, by providing an overarching framework as well as by enabling computational simulation. Together, they may lead to better neural-symbolic models in the future; in other words, they may guide the development of future neural-symbol systems that are principled and cognitively-psychologically realistic. Nevertheless, there is, of course, still a long way to go, and new breakthroughs, incremental developments, enhancements, and refinements are very much needed (for some of the more recent developments, see, e.g., Chaudhuri et al. [4]; Dong [6]; Dong et al. [7]; Lamb et al. [23]; Mekik [29]; amongst many others).
Footnotes
Acknowledgement
The present article was written while the author was supported (in part) by ARI grant W911NF-17-1-0236 and IARPA HIATUS contract 2022-22072200002. The views and conclusions contained herein are those of the author’s and should not be interpreted as necessarily representing the official policies and positions, either expressed or implied, of those agencies. Thanks are due to the three anonymous reviewers for their useful suggestions.
