Sage Journals: Discover world-class research

Abstract

External representations (ERs) are objects or performances in the world whose proper function is to communicate about other things in the world. Why and how did we make them, and what do they give us? We outline a simple framework for thinking about ERs grounded in modern machine learning. We propose a minimal set of neural mechanisms needed for open-ended ER production. Our constructivist enactive view contrasts with nativist views which propose specialist neural modules requiring symbolic internal representations. We propose a plausible set of (biological and cultural) evolutionary steps to full Gricean symbolic communication via a set of increasingly complex enactive algorithms for ER production. A pragmatic space of games is defined, which includes not only fully cooperative language games but also science, art, and evolved signal manipulation games. This space is defined by the complexity of learning needed by sender and receiver. We propose that one important step towards open-ended ER use was selection for bush reading, which like mind-reading is an inferential process requiring complex contextual and syntactic understanding of cues about events displaced in space and time. Bush reading pre-adapted receivers to be receptive, competent, and perspicacious interpreters of later intentionally produced signals about hidden topics such as felt mental states. This paved the way for minimally Gricean communication, which subsequently could be bootstrapped into explicit theories of mind, folk psychology narratives, and symbolic language in general. Recent findings in cognitive archaeology are integrated within the framework, and new experiments in machine learning suggested.

Keywords

Language science art cognitive archaeology machine learning large language models representation

1. Introduction

External representations (ERs) are objects or performances created in the physical media of the environment that are about other entities in the environment. The power and scope of these bizarre acts of socially agreed pretending are staggering. They have been promulgated culturally for generations in all extant civilizations. Our aim here is to understand their function and origin, from their simplest manifestations to all their current forms.

Modifying the definition of Harvey (2008), and taking a broad biosemiotic perspective, an external representation (ER) exists when an agent, S (sender), executes a sensorimotor policy to produce a sensible (detectable) event or object (ER) whose proper function, (Millikan, 2017), is to communicate about a topic (T) to an agent R (receiver) who may well be themselves. A sensorimotor policy (or policy for short) consists of an action that the agent enacts/executes in a given sensory state, to achieve their goal (Kaelbling et al., 1996). Examples of such human ERs and the paraphernalia for producing them include diagrams, maps, graphs, paintings, tallys, numbers, geometry, algebra, spoken words, writing, counters, abacuses, clocks and calendars, calculators, and computer languages. These are all fully Gricean ERs, that is, the receiver and sender both operate by inferring mental states, i.e. the communicative intentions (goals) of the other (Moore, 2017). At the other extreme, ERs also include signals with Carnapian pragmatics, which do not require intentionality. This includes most of animal signalling, such as mimicry (Bar-On and Moore, 2017). This article is about (1) how full Gricean ERs could originate, and (2) what they give us. We propose a plausible trajectory from Carnapian to fully Gricean ERs.

In contrast to nativist claims that propositional thought needs special neural symbol processing machinery, (Pinker, 1994) we take, along with others before us, a more constructivist position (Dennett, 2008; Heyes, 2018; Moore, 2021). Contrary to Harvey (2008), we do not believe it is theoretically impossible that special neural symbol processing machinery could have come to exist; after all, the triplet code came to be about amino acids, genes about their phenotypes, and neural firing patterns about actions. This shows that there is a general tendency for informational control systems to abstract and make entangled their topics (Smith and Szathmary, 1997; Millikan, 2017). However, along with Heyes (2018), we believe the evidence points to the fact that ERs arose primarily through a process of cultural evolution, grounded in a minimal core set of neural functionalities. These are shown in Figure 1. The aim of this article is to show that this minimal neural functionality is sufficient to explain all instances of human ER production.

Figure 1.

Minimal neural requirements for open-ended ER production, capable of unbounded discourse, free retrieval, and infinite generativity van Mazijk (2022b). It consists of a suitable RL-modulated, multi-modal model trained for next-step prediction (e.g.: a vision-language model (VLM), or other multimodal transformer), a multidimensional value system which includes intrinsic motivation (reward production) for establishing cooperative joint attention, and a prefrontal working memory for imagination (needed for running policies offline). With this core innate minimal neural machinery, embodied and enactive metaplasticity can be ignited, which is then sufficient for open-ended ER use (Roberts, 2022). The whole system can be thought of as the Peircian interpreter in his triadic framework for understanding how a sign and an object can have meaning (Iliopoulos, (016a).

2. Overview of external representations

We claim that propositional thought is a sensorimotor policy for the creation and manipulation of ERs. Early humans’ mastery of ERs allowed the representation of hidden states, such as intentions of self and others, which led to enhanced causal inference abilities. ERs allowed us to develop intentional stances, model otherwise invisible phenomena, and execute arbitrarily complex hierarchical and recursive thinking in the world. The evolutionary pressures that favoured the production and use of ERs included the need to model events not immediately present, but that were displaced in space and time. ERs were useful for both individual reasoning and interpersonal coordination. Several selective pressures favoured ER production, notably the ability to model invisible events displaced in space and time, for example, the minds of humans (Dunbar, 1998; Barona, 2021) and other animals (Szilágyi et al., 2023); see Figures 2 and 3. Many functions of ERs are as useful for the individual ‘communicating’ with themselves, that is, thinking (as has been explored in work on cognitive gadgets (Heyes, 2018), and epistemic tools Tang, 2020)) as they are for communication (Pinker, 1994; Bickerton, 1990; Tomasello, 2008).

Figure 2.

The process by which a policy for constructing ERs in the world can allow two events greatly displaced in time A and B to be associated where this was not possible before.

Figure 3.

By using an external counter object and an ‘increment’ policy, it is possible to learn a simple rule for generalizable prediction, not learnable by a standard recurrent neural network (RNN) or a long-short term memory network (LSTM) as shown by Evans (2022).

Why are humans unique in their motivation and efficiency in creating open-ended, novel, effective ERs for thinking with? Our runaway competence with ERs is the distinctive ability separating humans from other primates. It is something to be explained. We claim that an essential evolutionary stepping stone to open-ended ER creation was the selection pressure we underwent for bush reading skills. In these skills, natural signs such as tracks were understood to be about other (often hidden) topics T (e.g. a distant animal – an unintentional ‘sender’). The ER did not need to resemble the topic, that is, they were indices, and their topics were abducted (Gell et al., 1998). Bush reading has been described as the origin of science (Liebenberg, 1990). Often (but not always) invisible objects and relations, such as animal behaviours and motivations, were inferred from visible natural signs (evidence) (Evans, 2022). Not only that, the teaching of bush reading; that is, cultural transmission, involves artificially constructing tracks (e.g. using a deer foot), observing the tracks being made in different contexts, and observing their deterioration over time. Thus, bush reading is not a passive observational act but involves active making from the start (Brown, 1983; Brown Jr, 1986).

The latest evidence suggests that the evolutionary transition to upright posture and gait began from 6-4MYA (Harcourt-Smith, 2010). There is evidence for ancestral species inhabiting marshes in the Miocene (Niemitz, 2010), with hominids having to adapt to seasonal mosaic environments including wading in wetlands (Domínguez-Rodrigo, 2014). Such rapidly variable environments imposed strong selection pressure for diverse hunting and foraging tasks, requiring improved abductive reasoning skills (Potts, 1998b,a; Ambrose, 1998; Charnov et al., 1976).

With the ability to make inferences from complex natural signs about invisible topics, it was then a small step to create self-generated signs about topics, for example, names or signals which persisted when the topic was not visible, and thus helped improve the scope of our neural world models Gell et al. (1998). By observing how natural signs were about invisible animal behaviours and motivations, we could bio-mimic them. In other words, do mind-craft; create artificial indexes about our own introspected mental states (Humphrey, 1978) and psychology, and the hypothesized causes (intervening variables (Whiten, 2013)) of behaviour, see Figure 4.

Figure 4.

From bush reading to mind-reading. Dark clouds are to rain what the word ‘sad’ is to tears. Words are the naturalization of meaning. This is contrary to the position of Bar-On and Moore (2017), who claim that a recipient capable of understanding natural meaning (i.e. without an intentional sender) is fundamentally different from one capable of understanding non-natural meaning. However, our view is also distinct from Fitch (2010) as we propose that a special causal inference focus on currently invisible antecedents and consequences was needed, and this may have required pre-Gricean naming of antecedents and consequences when they were visible as in Figure 2 such that associations could be made between events displaced in time.

Once such ERs were produced, their topics were made sensible (visible/audible), and they gave new synthetic data for standard associative neural world models to operate upon. Once receivers had a motivation to infer currently invisible topics from sensible messages, senders could exploit this to create synthetic messages (ERs) to modulate the recipient’s inferences about other things. Our origin account for ERs is a version of the signal manipulation theory of the origin of animal communication proposed by Krebs and Davies (2009), by which a receiver’s preparedness to understand a signal makes producing an effective signal easier to evolve. We take pains to avoid the Gricean paradox as outlined by Moore, in that our origin story for language does not require an explicit theory of mind, that is, reasoning about hidden mental states, which we claim depends on ERs, mainly language Moore (2021).

We define a space of games for ER creation, illustrated in Figure 5, with prototypical instances being: the science game (naming and explaining phenomena), the language game (developing arbitrary conventional symbols), and the art game (exploiting a recipient’s perceptual priors to create ERs that enable them to rapidly generalize in zero shot, i.e., upon first encounter). Of particular interest is the art game in which inventing an ER requires no learning by the receiver, that is, allows them to undertake zero-shot inference (Larochelle et al., 2008). The art game can be contrasted with a language game (Wittgenstein, 1953), in which both receiver and sender learn, and from which purely arbitrary messages can arise. Art games do not produce arbitrary messages but produce icons or indexes (Chandler, 2017), in which the artist (sender) creates an ER which exapts and co-opts the perceptual and abductive priors of the receiver (Gell et al., 1998) to achieve a desired response (Noë, 2023). In this way, the pragmatics of art falls between a Carnapian and a fully Gricean pragmatics, as the receiver does not necessarily need to make an interpretation of the ER on the basis of the artist’s intention.

Figure 5.

A space of ER creation games organized by the complexity and dependencies of learning required during ontogeny in the sender and receiver. In science games, there is no learning in the sender (there is no sender), only the receiver must construct a model of the observed world. In art games, there is no learning by the recipient; the sender executes a search in order to manipulate the receiver without any need for their cooperation, that is, with icons and indexes. In language games, both sender and receiver must learn on the basis of shared reward to achieve their communicative goals. From this process, arbitrary symbolic conventions can arise. On the bottom-left at the origin, we have pure Carnapian communication. Here evolution has done the heavy lifting to produce effective signals and responses in a hard wired manner. No learning happens in either sender or receiver. In the special case of found art, the ER arises by chance alone.

3. The constructivist nativist tension

3.1. Nativism

Nativist arguments for the origin of language and thought propose that, in some form or other, we have innate, unlearned, genetically evolved specialist internal representations in the brain for symbol processing that are about language. These are said to augment the connectionist mechanisms of Hebbian learning and Reinforcement learning present in non-human primates.

Nativist explanations come in various flavours, for example, the language of thought hypothesis describes internal symbolic representations as mentalese, with its own grammar and semantics (Fodor, 1975; Fodor and Pylyshyn, 1988). The physical symbol systems hypothesis states that tangible symbols reside in the neural circuitry of the brain, and are necessary for human level intelligence (Newell and Simon, 1976). Chomsky proposes an innate neural mechanism implementing a universal grammar, shared by all humans (Chomsky, 2014), and Pinker proposes a more gradualist genetic theory of language evolution, in short that we have a language instinct (Pinker, 1994). Cosmides and Tooby view the mind as composed of multiple, genetically specialized cognitive modules, a computational system primarily internal and brain-based, constructing internal representations of the external world, with symbols seen as functions of these internal cognitive processes (Cosmides et al., 1992).

More recently, Tenenbaum claims that the brain contains probabilistic structured, compositional programs for internally representing knowledge, which allow causal reasoning that is not possible with connectionist systems alone (Tenenbaum et al., 2006, 2011). Gary Marcus outlines the limitations of neural networks to generalize out of distribution, specifically their inability to form genuine hidden causal models of phenomena, and seeks solutions in symbolic neural architectures (Marcus, 2003).

An attempt to integrate sub-symbolic connectionist neural mechanisms with the physical symbol systems referred to by Fodor reached its apogee in the Neural Turing Machine Graves et al. (2014). This was an implementation of a Turing Machine with a tape and read and write heads, which could be trained by backpropagation of error and gradient descent. Biologically plausible variants of backpropagation exist (Lillicrap et al., 2020). Such neuro-symbolic architectures add inductive biases to standard connectionist architectures (Spies et al., 2022; Shanahan and Mitchell, 2022). These architectures attempt to integrate the physical symbol system hypothesis (Newell and Simon, 1976) with connectionist models (d’Avila Garcez and Lamb, 2020; Fernando, 2011).

However, there is now considerable evidence against nativist explanations for language and thought, or at least a weakening of existing evidence. There is a wealth of stimulus, for example, social shaping by adults of infants’ and children’s utterances, that goes counter to Chomsky’s poverty of stimulus argument, which states that children don’t have enough data to learn language by associative mechanisms.

The evidence for critical periods, that is, specific times in development where language learning is possible, is weak (Lenneberg, 1967). Even if children can learn phonemes that interfere with subsequent learning of novel phonemes as adults, this can be taken as evidence for domain-general cerebral plasticity (Cheour et al., 1998; Kuhl, 2004).¹ Evidence from severely abused and neglected children not exposed to language, in fact, argues that language can be learned later in life. For instance the case of Genie, she was able to learn simple grammar, such as prepositions ‘under’, ‘next to’, ‘beside’, and ‘over’, and developed a considerable vocabulary (Fromkin et al., 1974). She was able to produce three and four-word sentences at the time of writing and continued to improve, despite her horrific treatment. In any case, Heyes has argued that the existence of a critical period does not in itself provide evidence for or against a constructivist hypothesis because if the critical period is due to domain-general windows of cerebral plasticity, both nativist and constructivist explanations can explain a lack of language acquisition (Heyes, 2018).

There is no convincing evidence for brain regions that are uniquely specialized for language (Anderson, 2008; Poldrack, 2006), with increasing emphasis on a distributed language connectome broader than Broca’s and Wernicke’s areas as traditionally formulated in the classic model (Tremblay and Dick, 2016). Language learning ability is instead strongly correlated with general sequence learning abilities (Christiansen and MacDonald, 2009; Misyak and Christiansen, 2012).

Also, the effectiveness of explicit theory of mind is correlated with children’s exposure to particular ways of talking about others, that is, using sentential complement syntax and the frequency with which parents use mental state verbs (de Villiers and de Villiers, 2000). This makes it more likely that it is a culturally inherited set of linguistic tools that allows children to think about propositional attitudes rather than an innate explicit theory of mind module in the brain (Moore, 2021; Hutto, 2012), although an implicit ToM does seem to exist in all primates (Butterfill and Apperly, 2013).

3.2. Constructivism

In the opposing camp, constructivists broadly emphasize that language and thought are constructed to varying degrees during developmental interactions between domain-general neural learning mechanisms and the environment, thus avoiding the heavy burden that nativism puts on genetic mechanisms (Tomasello, 1995). The term constructivism stems from Piaget, Bruner, and Vygotsky’s notions that childhood learning results in concepts emerging through experience, for example, the concepts of noun, verb, and object persistence (So, 1964; Bruner, 1974; Vygotsky et al., 1994). Connectionism later sought to operationalize these explanations (Elman, 1996).

Constructivist explanations come in various flavours which differ in the proposed mechanisms and substrates for acquisition of novel cognitive functionalities such as language. Piaget’s so-called radical or cognitive constructivism emphasizes individual self-constructed learning mechanisms sparked by the child’s brain in a bi-directional interaction with the environment involving assimilation and accommodation, rather than socially learned policies (Von Glasersfeld, 2013), whereas Vygotsky’s (and Bruner’s) social constructivism emphasizes the role of more pragmatic social interactions with others and the environment (Vygotsky et al., 1994; Bruner, 1974).

Quartz and Sajnowski are on the Piagetian side when they describe themselves as neural constructivists, proposing that structural (hardware) changes in neural networks caused primarily by synaptic, axonal, and dendritic growth could create non-stationary neural substrates for novel internal representations (Quartz and Sejnowski, 1997). Modern machine learning, until the very recent and unexpected success of large language models, fell on Piaget’s neural constructivist side also, seeking neural mechanisms for internal representation formation. The Beta-VAE is an example of this, a neural network with learning rules designed to extract disentangled visual concepts about images such as lighting direction, skin colour, or age (Higgins et al., 2016). In it, each ‘concept’ is represented internally as the neural activation of a single neuron in a hidden layer.

In contrast to Piaget’s cognitivist constructivism and its neural internal representationalist extensions described above, our approach champions enactive constructivism, which is more in keeping with Vygotsky (Baerveldt and Verheggen, 1999). Instead of claiming, as the Beta-VAE does for instance, that concepts are primarily internal representations formed in the brain, we propose that concepts are constituted primarily by having sensorimotor policies. These policies allow us to modify things like lighting direction by moving lamps or changing the shapes of objects through construction, independently of each other. We can change the shapes of things independently of the directions by which they are illuminated, suggesting that these concepts of shape and illumination are distinct. Our grasp of a concept is demonstrated by enacting policies in the world itself. The demonstration of localized activation in the brain associated with an enacted concept is not logically necessary for demonstrating that we have such a concept (Ward et al., 2017).

A concept is best understood as a sensorimotor policy for its enactment, rather than as an internal representation in the brain. The enactive constructivist account says the brain is not a computational device that processes internal representations before externalizing them through behaviour (Iliopoulos, 2016b). In keeping with this position, Christiansen and Chater have outlined a cultural theory of language evolution that is broadly consistent with an enactive constructivist view, in which linguistic constructions are the units of cultural evolution (Christiansen and Chater, 2016). Similarly, Overmann has analysed how writing systems culturally evolve, with tendencies to abstraction and automatization. One of the main features of material construction (ERs) is their facilitation of sustained, communal reuse, which creates new stable niches that then allow mutually agreed salient features, and eventually neural changes by the Baldwin effect (Overmann, 2021).

The most convincing evidence for our enactive constructivist position is the remarkable success of large language models (LLMs) such as GPT-4 (OpenAI, 2023) and Gemini (Team et al., 2023). The capacity of LLMs for language use does not depend on specialist neural symbol processing machinery (Fernando et al., 2023; Wei et al., 2022). It depends on being exposed to the right kinds of language data. They provide a proof in principle that associative learning is sufficient for complex language use. Counter to Chomsky, nothing more than efficient attention-guided Hebbian associative learning, supervised by prediction of the next token in the text sequence, seems to be necessary for learning grammar (Vaswani et al., 2017; Radford et al., 2021). All the ‘cleverness’ displayed by these models is in the conditional structure of language itself.

Both humans and LLMs exhibit similar ‘semantic content effects’, that is, failures in achieving systematic consistency in abstract reasoning tasks. They both reason better about more common and plausible semantic settings grounded in realistic situations (Dasgupta et al., 2022). These new results raise the possibility that failures of humans in reasoning tasks, unless ‘materially anchored’ by real-world examples, may be explained by LLM-like training biases rather than requiring more complex theories of conceptual blends (D’Andrade, 1989; Hutchins, 2005). LLM neural activation predicts nearly all the variance in neural responses to sentences in humans (Schrimpf et al., 2021), and surprisal (a probabilistic measure of the amount of new information conveyed by a word) in LLMs predicts reading time in humans (Goodkind and Bicknell, 2018), all suggesting similar mechanisms of neural processing in humans and LLMs. Whilst theory of mind tasks are harder for LLMs in a single step (Trott et al., 2023), just as mentalization in humans may require multiple steps of reasoning (System 2 processes Kahneman (2011)), for example, explicitly prompting oneself to ask why someone might have acted as they did, ToM in LLMs may also improve by learning better multi-step mentalization strategies (Kosinski, 2023; Moghaddam and Honey, 2023; Sahoo et al., 2024).

Furthermore, the same transformer architecture that can model language in an LLM is also able to model the ‘grammar’ of images and the ‘grammar’ of robotic actions (Brohan et al., 2023), suggesting that our brain may implement something akin to a giant multi-modal transformer, predicting the next state or action in a sequence of rich multi-modal contexts. We appear to possess a domain-general generic world model capable of next-step prediction (Clark, 2013). Prior to transformers, it was believed that specific algorithms for internal concept formation, such as the Beta-VAE, were needed (Higgins et al., 2016), but transformers have largely made these machine learning algorithms somewhat obsolete, suggesting instead that symbols and concepts arise enactively due to a tight sensorimotor coupling with the world in a continuous sensorimotor loop.

3.3. Minimal neural requirements

Figure 1 shows a minimal set of neural mechanisms needed for our enactive constructivist argument. The figure shows (1) a multi-modal transformer capable of associative learning within a maximum context window, which implements a sensorimotor policy (Vaswani et al., 2017). This is capable of operating in two modes: engaged and neutral. When engaged, it senses and acts in the world. When in neutral, it runs the same process but without explicit motor execution of acts, that is, offline imagination (Hamrick, 2019). (2) An intrinsic motivation for joint attention through gaze following (Mundy, 2018), which is lacking in non-human primates (Tomasello, 2008; Savage-Rumbaugh and Lewin, 1994). (3) A reinforcement learning system capable of sensorimotor policy improvement (Niv, 2009), and (4) a multi-objective homeostatic reward system which implements a handful of intrinsic motivations for innate goals (Keramati and Gutkin, 2014). We aim to explain a plausible historical trajectory to open-ended ER use by means of only the algorithmic functionality above, combined with a set of selective pressures or ‘games’ that this machinery was selected for playing and which were culturally evolved.

The critical principle to appreciate from Figure 1 is that so-called ‘neural representations’ or ‘neurological substrates’ are best seen as neural world models that output predictions or commands at time t+1, conditioned on the sensory state of the world at time t. This tightly coupled intentional arc generates words, images, and actions step by step, rather than ballistically (i.e. in one go, without a sensorimotor feedback loop) outputting a fully formed externalization of something that was internally represented in a fully formed manner at first. What is implemented by the neural substrate is a sensorimotor policy which learns a sensorimotor sequence from sensory state s(t) to motor output m(t+1).

3.4. Neuroscientific evidence

Evidence from neuroscience can be interpreted in a manner consistent with Figure 1. We focus on the most promising candidates for the neurological substrate of a VLM. These are the associative areas in the brain that are well-positioned to integrate sensory and motor information to learn sensorimotor sequences.

Whilst there is disagreement about the extent to which human brains are allometrically scaled-up versions of non-human primate brains (Herculano-Houzel, 2012), recent evidence shows that a greater cortical surface is allocated to association cortex (Preuss, 2017), with changes in the connectivity of the executive prefrontal cortex (Rilling, 2014) and changes in language areas (Rilling, 2014). Indeed, studies of skulls show enlarged parietal lobes in humans compared to Neanderthals (Pereira-Pedro et al., 2020). The parietal cortex is substantially expanded in the Homo genus (Bruner et al., 2023) compared to hominids 100–300,000 years ago (Bruner and Pearson, 2013).

Compared to monkeys, the intraparietal sulcus (IPS) shows more representational complexity for complex 2D and 3D perceptual object concepts (Orban et al., 2006), ranging from simple to abstract (Iriki and Taoka, 2012). Parietal cortex lesions result in tool and constructional apraxias (Bruner et al., 2023), consistent with failures of goal-directed (causal/intentional) complex sensorimotor sequence learning, which involves observation and execution of tool actions (Orban and Caruana, 2014), critical for imitation, and therefore the cultural evolution of tool use (Stout and Hecht, 2017) and possibly more innate abilities such as numerosity (Coolidge and Overmann, 2012). Failures of understanding when and how to execute a sensorimotor sequence, such as finger counting (Andres et al., 2008), may be failures of goal-directed perceptual concept using policies.

When chimpanzees were tasked with copying the configuration of three rectangular blocks arranged as ‘line’, ‘cross-stack’, and ‘arch’, they couldn’t do it. We propose the inability to perform a copying task is because chimpanzees don’t have the mutually agreed perceptual concepts just listed (e.g. in the form of words) agreed with the task setter, which allows them to set appropriate measurable perceptual goals for success (Potì et al., 2009). This is not to deny that non-human primates can undertake goal-directed behaviour, but to say that their goals are largely innate or self-generated. Their ability to acquire new socially learned goals is severely limited compared to language users. In fact, the evidence for copying is (to our knowledge) limited to learning novel policies to execute old goals already possessed by the recipient, not the communication of explicit new goals themselves (Hobaiter and Byrne, 2010; Horner and Whiten, 2005; Van De Waal et al., 2014; Whiten et al., 2009).

As a note of caution against taking neurological properties as evidence for the nativist position, specific cortical enlargements may have both genetic and enactive explanations due to structural plasticity (Schmidt et al., 2021). It has been shown that brain regions enlarge after playing video games (Colom et al., 2012) or navigating as a London taxi driver (Maguire et al., 2000).

Along with others, we believe the right way to think about the brain-behaviour relation is that external representation and tool use bring forth enlargement in neural representations of sensorimotor policies (Malafouris, 2013; Kirsh, 1995; Piazza and Izard, (2009). Even evidence for increased gene expression associated with synaptic transmission and plasticity in human brains, compared to non-human primates, may have a developmental secondary basis (Verendeev and Sherwood, 2017; Preuss et al., 2004). This would appear to have domain-general effects rather than implementing specific modules, for example, for language Sherwood et al. (2008).

4. Functions of external representations

Donald (1991) has listed several properties of ERs, which he calls exograms, in contrast to the engram (Semon, 1921): unlimited physical media, unconstrained format, permanence, unlimited capacity, unlimited perceptual access, spatial structure, and clear access. The extended mind hypothesis, introduced by Andy Clark and David Chalmers, argues that cognitive processes can sometimes extend beyond the confines of the individual’s brain and include parts of the external world. An example of this is cognitive offloading, where external tools reduce cognitive load, enhanced problem-solving, information storage, guiding behaviour, and externalizing thoughts (Clark and Chalmers, 1998).

However, Clark’s extended mind hypothesis remains committed to internal representations. Extensions of cognition in the world are secondary scaffolds that merely support internal thought processes. Theirs is a weakly embodied framework (Aston, 2019). The view presented here is more radical and aligns with the lines proposed by Malafouris, who outlines the ‘representational fallacy [which] pertains to treating material culture as the epiphenomenal product of a representation-processing mechanism located inside the brain’ (Iliopoulos, 2016a p247; Malafouris, 2013). Instead, he proposes material culture plays a constitutive role in the generation of cognition. Colin Renfrew refers to ‘substantialization’ as the means by which concepts are brought forth enactively, which is related to Malafouris’ material signification (Renfrew and Zubrow, 1994).

Ingold goes as far as to be wary of talking about even external representations, taking an enactivist position, saying, ‘Cookings, story-tellings, and whistlings are not representations, they are not traits, indeed they are not objects of any kind; they are rather enactions in the world’, material scaffolds of a sensorimotor policy (Ingold, 2020).

4.1. Extending primate neural world models

Pertaining to the generic world model shown in Figure 1, there is considerable evidence that sensorimotor policies are implemented in the brain and that these can be optimized by associative and reinforcement learning (Miranda et al., 2020; Ha and Schmidhuber, 2018; Russek et al., 2017; Daw et al., 2005; Akam et al., 2015; Wikenheiser and Schoenbaum, 2016). Executive prefrontal systems allow planning, that is, running counterfactual experiments, trying out a variety of different policies in the imagination, and improving them before execution in the world itself. Daniel Dennett calls creatures with this kind of ability ‘Popperian’ (Dennett, 2008); Ha and Schmidhuber, 2018).

However, primate neural model-building machinery has evolved to model events of significance at normal primate behavioural timescales, with data that is directly sensed, for example, in the visible, audible, tactile, and proprioceptive fields (Niv, 2009). ERs extend the neural world model in Figure 1 in two ways: (1) extending the associative time window and (2) making the invisible visible. We consider each in turn.

4.2. Extending the associative time window

There is no evidence that apes can learn new relations of events displaced over long time scales (Clark, 2013). The trace interval (the longest duration for which an association between stimulus and reward can be learned) is only two minutes in chimpanzees (Rosati et al., 2007). Whilst some experiments argue that apes can save tools for future use for up to 14 hours (Mulcahy and Call, 2006), a detailed analysis reveals that there is no evidence for such ‘mental time travel’ in non-human primates in this or any other experiment. The results can be explained by associations being formed at much shorter timescales (Suddendorf and Corballis, 2007).

Evidence that apes can remember associations made over short time scales, many years later, for example, when recognizing human foster parents, is not evidence against the above claim (Lewis et al., 2023). Caching in squirrels and scrub-jays is also not evidence of the ability to learn new relations separated over long time scales. It demonstrates only the ability to learn new relations separated over short time scales, for instance, the location of buried nuts, and an ability to recall them over long time scales (Clayton and Dickinson, 1998).

Whilst the associative windows of neural world modelling algorithms could potentially have slowly been lengthened by genetic evolution in early hominids, there is a faster way: namely, creating ERs. There is evidence that language use can extend neural world models in this way. For example, children’s ability to anticipate future needs and act accordingly develops alongside their narrative abilities (Atance and O’Neill, 2001). Non-human primates do not routinely invent and put things into the world (like words) with the intention of extending their planning and predictive abilities in this way.

The ability to create ERs was a tremendous ‘software upgrade’ to existing neural world modelling ‘hardware’, allowing its application to broader domains. ERs can make visible (or audible, or touchable) messages that stand for topics potentially displaced in space and time. These messages could be modelled at convenient behavioural timescales, and stand in for temporally displaced topics they represent. We perform such ‘imagination improving or aiding’ actions in the world to increase the domain of applicability of our ‘primate’ imagination algorithms.

If we pretend that something we make in the world, for example, a clay object or a word, stands for another object such as the sun, we can model the sun’s slow dynamics at a faster timescale by simply moving the clay object, or by using the word.

Consider this minimal example. Figure 2 shows two events in the world, A and B, displaced by an interval that is too long to be associated given the working temporal context window of neural world models. If a sender constructs, using an ER-making policy, entities in the world, a and b (either a mark or a sound repeated at intervals to prevent forgetting) which represent A and B respectively, and if a persists until b, then the same neural world model can associate a and b. If B can be rendered from b, then B can now be anticipated from A, where this was not possible before.

Once these new ERs have been created in the world and policies developed for their manipulation, these policies can be run offline, that is, imagined (Hills and Butterfill, 2015). In other words, a and b no longer need to be produced in the world but can be simulated by the neural world model and scaffold the association between A and B (Peterson and Rideout, 1998; Nelson, 1993).

4.3. Making the invisible visible

Neural world models cannot model invisible things unless those things can be rendered sensible (Penn et al. (2008). Consider an abstract situation in which ‘naming’ a hidden variable and using it to predict visible states is preferable to only modelling visible variables. When trying to predict the continuation of the sequence 0k0kk0kkk0kkkk0kkkkk0, without knowing a counting sensorimotor policy, it would be necessary to learn a growing number of transition rules such as k − > 0kk, and kk − > 0kkk over visible variables, see Figure 3. One might argue that it is simply possible to notice that each block adds an instance of k. However, this very noticing utilizes the concept of ‘add’, which depends on a concept of number and incrementing that number.

It has been demonstrated that a neural network, even a recurrent neural network, is not capable of generalizing to values of k outside the range that they were trained on Evans (2022). It does not invent the concept of a number or the operator ‘add’. In fact, this is closely related to the core criticism Gary Marcus makes of connectionism Marcus (2003). Marcus then aims to solve the problem by proposing new neural mechanisms. We, however, propose that the solution lies in external representation invention and use.

If one were to invent an ER-based policy for ‘counting’ by creating a physical instantiation of a variable called ‘number of k’ and learning an ‘increment by one’ policy conditioned on the state of the variable ‘number of k’ and exploiting a set of ordering relations of external number representations, then greater generalization would be possible. This is because the same procedure applies to strings of k of any size.

Using the ‘number of k’ variable and the ‘increment by 1’ operator gives (for large ‘number of k’) a shorter and more general theory for explaining the data than learning each transition rule from ‘number of k’ to ‘number of k + 1’ in isolation. In machine learning, it is known that shorter theories will be selected in model comparison by the automatic Bayes Occam’s razor MacKay (1992), as they generalize better.

Consider a more natural situation of predicting human behaviours related to ‘anger’. We support a two-system account of a theory of mind in which there are both implicit (Carnapian/behavioural) and explicit (Gricean/ER mediated) mechanisms (Butterfill and Apperly, 2013), which will be outlined below. The causes and the consequences (objective effects) of what we label as anger are often not hidden, for example, body postures, behavioural changes, and facial expressions. These are not unique to humans, sharing brain structures with other primates (Panksepp, 2004; LeDoux, 2000). Indeed, some overt displays of anger may be both old and advantageous from an evolutionary perspective.

However, these behaviourally observable causes and consequences of anger are not anger itself. Neither do feelings of anger available only through introspection constitute anger. Anger is a socially constructed ER for describing an emotion that manifests as an amorphous constellation of feelings and acts. Without being able to classify behaviours and introspections as anger, one can feel and exhibit these behaviours and introspections without knowing one is angry.

To see this is the case, it is sufficient to understand that several emotional states may cause the same body postures, behavioural changes, and facial expressions. Consider increased heart rate and facial flushing which can be caused by anger and by running fast in sport. None of these observable variables has a one-to-one correspondence with anger itself. Furthermore, anger may also exist without any observable consequences, or with the consequence being delayed until much later, for example, in cases of delayed revenge.

Many consequences of anger are not old and advantageous from an evolutionary perspective, for example, blowing the horn on one’s car during road rage or switching off the boiler while someone is having a hot shower. Feelings associated with the words ‘anger’ or ‘thirst’ have been described as intervening variables by Whiten (Whiten, 2014). He intends them to be ontologically real essential mental states. However, we propose that whilst such feelings of anger certainly exist, it is possible to have such feelings without explicitly knowing one is angry.

When one has an explicit theory of mind as opposed to merely an implicit theory of mind, one is capable of utilizing introspection and behavioural observation to classify one’s emotion as anger (Butterfill and Apperly, 2013). Without the capacity to invent and use ERs, one is not able to do this. There is no evidence that non-human primates without external representations have such a belief representation system (Horschler et al., 2020).

As Humphrey has described, by introspection, an individual has access to many feelings due to hormones, etc., which by observing others we do not (Humphrey, 1978), and these can provide extra constraints for inventing and enacting an emotion word such as ‘anger’. Fenici and Garofoli have investigated how linguistic practices were invented by ancient humans to allow understanding of others’ actions in terms of mental reasons (Fenici and Garofoli, 2017). They conjecture that the integration of embodied actions within pantomimic capacities scaffolded the cultural construction of propositional structures and narrative practices, eventually resulting in the culturally diverse practices of folk psychology we observe today (Hutto, 2012).

For example, children’s understanding of verbs describing mental processes is initially context-specific and rather naive, only later becoming more general across contexts. It is only in humans that these felt states have been enacted by gesture or vocalization and refined in a community to more precisely refer to initially vague feelings common to the vast majority of speakers.

It is this critical process of enactment and community refinement of what was previously nothing more than a private feeling in an individual and a constellation of behaviours that constitutes the construction of an emotion word such as ‘anger’.

Whilst neither the feelings nor effects of anger are culturally constructed, the ER for anger is culturally constructed, and having it gives us certain extra abilities. Our ability to label anger permits a greater awareness of it in ourselves, which is the basis of mentalization-based therapy (MBT) for personality disorder, helping the patient to explicitly verbalize their emotions, allowing them to reason about their emotions and those of others more effectively (Allen and Fonagy, 2006).

Typically, we learn to label our emotions in primary childhood attachment relationships (Fonagy and Campbell, 2016). Just because the ER for anger is a culturally constructed concept does not mean it is entirely arbitrary and not tied to essential (native) attractors in animal behaviour dynamics. Coming up with the term ‘anger’ is to cut nature at its joints. This is why the term has caught on and been so useful.

Without the term existing, we do not deny it is possible for an individual to discover the structure present in anger’s causes and consequences by observing a large number of instances of causes and consequences of anger. But here is the problem with that implicit approach (Butterfill and Apperly, 2013). Suppose there are x anger causes and y anger consequences, you would need to learn x * y ‘rules’ without positing the anger hidden variable. However, by entertaining an anger variable which can take states 0 or 1 (angry or not angry), you only need x + y ‘rules’ for predicting the possible consequences of anger from its causes.

In short, making up intervening hidden variables and using them in neural world models simplifies neural world models and increases their power Evans (2022), Whiten (2013, 2014). The use of x * y rules about visible behaviours (instead of x + y rules) is how the implicit minimal theory of mind referred to by Butterfill and Apperly (2013) operates; it does not require ERs about invisible states such as ‘anger’ to be invented and used for prediction.

There is no evidence that chimpanzees understand false beliefs in cases where more than a minimal or implicit ‘theory of mind’ is needed, that is, where what is required is more than associating visible or audible events in the sensory field (Call and Tomasello, 2008; Butterfill and Apperly, 2013). Villiers et al. show that human children understand false beliefs explicitly typically only when they can use ERs like ‘John thinks that the toy is in the box’ (de Villiers and de Villiers, 2000), although there is controversy on this topic (Westra, 2017).

The evidence points to a distinct implicit minimal theory of mind process capable of entertaining belief-like states shared with infants and non-human primates. This process is surprisingly capable and depends on a subtle evaluation of perceived cues. It is augmented by a much more sophisticated explicit theory of mind system that comes with ER use (Moore, 2021). It is only when mental states are talked about or otherwise represented that they can be explicitly modelled (Hutto, 2012). Once emotions and beliefs could be named, behaviour could be more efficiently modelled and predicted (Chater, 2018).

David Kirsh, in his article ‘Thinking with External Representations’, shares our view that ERs ‘allow us to think the previously unthinkable’, enabling new operations, changing the domain and range of cognition, and priming a constellation of associations by making things visible or audible. This allows the formation of different and more extensive associations with the real thing being represented, externalizes structures allowing communication with the self, and allows refinement by a community of users. This process produces shared references and learning by their construction (Kirsh, 2010).

However, Kirsh and others still believe that these external representations are not primary but augment symbolic thought processes that originate internally Tversky (2014). Other examples of making the invisible visible involve externally representing entities such as atoms, gravity, germs, electromagnetism, and genes.

4.4. Transfer of mastery from convenient to inconvenient domains

We view mastery as consisting of an effective (error-correcting) sensorimotor policy to achieve a goal. In many cases of material construction, such as during stone knapping, the maker’s sensitivity to the perceived and task-relevant features of the stone influences their actions. Actions are not merely ballistic executions of a pre-conceived internal specification (Tennie et al., 2017; Ingold, 2013). Instead, the properties of the stone must be discovered, with each stroke revealing a new property of the medium (Malafouris, 2013).

Contrast this to 3D printing, where the physical printer is designed to exteriorize a perfect internal representation of the object present in the computer. With knapping, the final form of the object arises through a tight coupling of the sensorimotor contingencies of the knapper with the stone’s own properties (Malafouris, 2013), achieving in a skilled knapper ‘maximum grip’ and a ‘tight intentional arc’ (Merleau-Ponty, 2013).

This sensorimotor policy is the kind of thing that the transformer in Figure 1 implements. When Rodney Brooks says the world is its own best model, he means that by learning perceptually contingent motor actions, mastery can be achieved without needing ‘internal representations’ (Brooks, 1991).

How can mastery in one convenient domain of competence be exapted (re-used) for mastery in a more inconvenient and complex domain? Whilst Hutchins and Fauconnier speak of internal mental representations (concepts/meaning) being projected onto external material structures, with the association of conceptual structure with material structure resulting in a ‘conceptual blend’ (Hutchins, 2005; Fauconnier, 1997), our enactive perspective eschews internal representation talk.

Whilst Hutchins goes a step further than Fauconnier in proposing the material anchor, he still makes use of mental conceptual structure, which we do not need in our formulation. We deny that associations can be made between anything that is not directly sensible. We do not need to speak of conceptual structures or conceptual blends to understand the examples Hutchins gives. A queue is, according to Hutchins, a material structure blended with a conceptual structure (the notion of sequential order).

A simpler way to understand a queue is that it is a set of material elements which can be operated on by a sensorimotor policy to achieve certain goals. For example, FIFO (first in first out) or LIFO (last in first out) queues consist of being able to implement the appropriate policy for establishing sequential order and executing popping and pushing actions to the queue. The concept of sequential order is itself dependent on lower-level sensorimotor policies (operations/procedures) for determining the ordering of two elements. In our terms, the ‘material anchor’ is the physical (data) structure that provides the substrate for the sensorimotor policy implemented by the brain.

If the properties of the unfamiliar elements have relations that are shared with the familiar elements, then transfer from the mastered to the unmastered may be even easier (Evans, 2022). For example, clay can be easily divided into objects, as can the things it models, such as bushels of wheat. This allows easy-to-implement sensorimotor policies on the clay to be applicable to more complex policies on the wheat. Dividing bushels of wheat may take longer than the associative time window of the VLM in Figure 1, whereas dividing clay, which represents the bushels of wheat, is much faster (Schmandt-Besserat, 1992; Iliopoulos, 2016b).

ERs allow analogy and relational reasoning by exposing systematic relational correspondences, for example, that the sun is to the solar system what the nucleus is to the atom, allowing physical and cognitive competence in one domain to be transferred to another (Gentner, 2003).

A slightly more complex example is the method of loci used to form a memory palace (Hutchins, 2005). This works because if a policy already exists for traversing a set of familiar elements in sequential order, and if unfamiliar elements can be associated with each of the familiar elements, then that new set of unfamiliar elements can be recalled without needing to form new associations between the unfamiliar elements directly. If making N new associations of unfamiliar elements with N familiar elements is easier than forming N-1 associations between unfamiliar elements, then it follows that the use of a memory palace will aid in learning a new order of unfamiliar elements.

One step further from the queue and the memory palace, Overmann has described how material policies for externally representing simple number concepts like ordinal sequence can use body-counting, of which finger counting is only one form, with later culturally constructed forms being, for instance, the number line Overmann (2018). Overmann has analysed in detail the cultural trajectories that increasingly sophisticated material forms and associated policies to operate on these material forms have taken (Overmann, 2018).

Applied mathematics contains countless examples sine qua non of how various kinds of notation (external representations) culturally evolved to permit mappings between mastered domains to less familiar domains, helping us to think (Overmann, 2023).

In problem-solving domains, drawings can help us solve insight learning tasks such as the 9-dot problem better (Lewis-Williams, 2002; Öllinger et al., 2014; Spiridonov et al., 2019). Creating the highly ritualized ERs of mass and force and the mathematics that operate on equations permits the prediction of the motion of many objects (Wigner, 1990). The visuospatial reasoning engendered by Venn diagrams, the periodic table, and Feynman diagrams helps make sense of other aspects of experience that are difficult or impossible to understand without external props that can be conveniently manipulated (Tversky, 2005).

In language, metaphor is the means by which this transfer is achieved. Competence with policies for balancing on a seesaw can be transferred to understanding ‘balance’ in a financial or mental health context. See also Lakoff and Johnson (2008) for a thorough and detailed analysis of the grounding of conceptual terms in concrete experience. For example, the container schema, which has the structure of interior, boundary, and exterior, can be transferred to more abstract domains (Malafouris, 2013). Semantic extensions of perception verbs to conceptual verbs, like from ‘see’ to ‘know’ and from ‘hear’ to ‘understand’, have been observed cross-culturally (Evans and Wilkins, 2000).

4.5. Other functions of external representations

External representations permit new kinds of categorical judgement that would not have been possible without them. Humans can categorize on the basis of cross-dimensional relations, that is, same/different on only some properties of multiple objects, whereas non-human primates cannot ignore more predispositionally salient properties in order to focus on less salient ones for the purposes of classification (Christie et al., 2007; Gentner and Colhoun, 2010; Thompson and Oden, 2000; Overmann, 2021). This is consistent with the fact that humans are capable of inventing a salient ER which is about a non-salient property, and then using that ER as the new salient sensible entity to base the categorization on.

Evidence for this is that apes can be trained to identify relations-between-relations only if they are first trained to use a symbol system by which propositional representations can be encoded and manipulated (Thompson and Oden, 2000).

In a revealing pair of recent articles, it is clear that both Gopnik and Povinelli share a nativist belief that higher order concepts such as ‘same’ and ‘different’ are implemented by culture-independent neural mechanisms rather than being constructed through socio-cultural agreement (Walker and Gopnik, 2017; Glorioso et al., 2021). This is revealed when Povinelli criticizes (Walker and Gopnik’s, 2017) experimental procedure for detecting whether 18-month-old children have a concept of same and different. The basis of their criticism is that the task is solvable by the detection of perceptual variability between or among stimuli.

It is our position that ultimately all cases of ‘same’/‘different’ can only be determined by detection of perceptual variability between or among stimuli. Where ERs such as ‘same’ and ‘different’ can be communicated and policies agreed upon to implement their meaning, it is possible for agents to reach agreement about the procedures which should be used for determining whether two objects are ‘same’ or ‘different’ in a particular context. For example, if I ask you whether two objects are both mine or one is mine and one is yours, thus, in the class ‘same owner’ or ‘different owner’, then I assume you can execute a policy to solve this problem which involves attaching an ER ‘mine’ or ‘yours’ to each object, and then applying perceptual entropy to the labels ‘mine-mine’ and ‘mine-yours’ (mine-yours having higher perceptual entropy).

As neither animals nor 18-month-old infants reach such agreements on the meanings of same/different in a given context, it is unreasonable to expect experiments to determine that they have this concept. Such an experiment can only show what perceptual predispositions an animal has, prior to a concept of same/different being agreed upon, if such an agreement is indeed possible.

Language is the prime example of a set of ERs. It permits propositions to be made, such as ‘The sky is blue’, which, along with a policy for testing the truth or falsity of the proposition, creates a basis for the accumulation of factual knowledge. It permits intentional stances to be taken, that is, to represent the goals, desires, thoughts, and beliefs of individuals. This is used for ‘reason talk’ in folk psychology in its myriad forms around the world. Such talk is used to explain, predict, and coordinate ourselves (Dennett, 2008; Heyes, 2018; Hutto, 2012).

Folk psychology concepts are not literal descriptions of content-bearing internal states, but convenient fictions (Moore, 2021). If two people behave differently given exactly the same stimulus, then an ER about a hidden (mental) state can explain the difference (Moore, 2021).

Using ERs, it is possible to explicitly instruct a recipient to use skill X for task Y, rather than it being necessary for a specialized neural transfer mechanism to discover this (Bengio et al., 2013; Pan and Yang, 2010; Weiss et al., 2016). Clear examples of language-based transfer come from prompt engineering of large language models. By choosing the right text prompt, an LLM can be made to carry out novel reasoning tasks in a zero-shot manner (Jiang et al., 2019; Wei et al., 2022).

Human ER use gives us the capacity for transfer learning by explicit instruction, whereas other primates are limited to more cumbersome forms of imitation with random errors (Tomasello et al., 1993). This ability to see, describe, and instruct others in a cumulative fashion (Dean et al., 2014) about the relations between one task and another may have allowed technology to blossom by creating a kind of global workspace in the world itself, rather than primarily in the brain as suggested by more nativist theories, which we reject (Mithen, 1998).

ERs allow remarkable intersubjective plasticity, that is, for ‘diverse collectives of individuals and groups to adopt and transition between numerous social identities [roles] and behaviours with rapidity and flexibility’ Aston (2019) p65. Barrett et al. emphasize the creativity of constructed narrative intelligence in creating complex social spaces Barrett et al. (2007), rather than the opposite causality proposed in the social brain hypothesis, which suggests that we genetically evolved big brains because of selection for larger group sizes (Dunbar, 1998).

Linguistic and other material scaffolds, such as apps and clothing, allow human socio-technical systems to emerge. External representations can be seen as affecting social and cognitive niche construction. Ironically, this has been made clear by an article which tries to prevent us from inappropriately anthropOxfordomorphizing LLMs. It argues that large language models can be thought of as engaging in role-play, being easily prompted to take on various roles (Shanahan et al., 2023). However, the very same properties of language that make this possible for LLMs make it possible for us to be prompted into occupying a diversity of roles (specify a diversity of games for us to play). What distinguishes us from LLMs is that we have a multidimensional reward system and modify our LLM by reinforcement learning during engagement in environments with other agents.

5. From bush reading to mind reading

We now address the question of what prompted us to start making external representations in the first place. One of the main problems to solve is how ostensive-inferential (Gricean) communication can arise without paradoxes (Moore, 2017). A necessary but not sufficient component of Gricean communication is that the recipient can entertain things like the communicative intents (goals) of the sender. A goal is not directly sensible. To do so, we claim that a sender would have needed to name these invisible mental states, that is, turn them into a signal which the recipient understood as meaning an intent (goal) of the sender. How can this have happened?

Most of us with urban lifestyles, especially in the West, are unaware that we are almost completely blind to the complex goings-on in the natural world. When we go for a walk in the woods or a meadow, we miss practically all the meaning in the things that are happening around us (Brown Jr, 1986). Bush reading provides an intermediate situation where names need not be created for hidden states because, in bush reading, contextualized cues form a myriad of natural signs or indexes, from which their causes can be inferred based on a very broad and complex context (Gell et al., 1998). Bush reading requires learning to understand (and also actively produce for teaching purposes) complex denotation systems (Willats, 2006).

We argue that selection for bush reading pre-adapted humans to put effort into developing ER technologies (such as the policy for making and agreeing on the names of things) for inferring the hidden causes and consequences of visible contextualized cues/signals.

The step to be made from bush reading to mind-reading was for a sender to invent a name for a felt mental state, for example, to enact an intermediate variable (Whiten, 2013). If this strange creative act by the sender is done, then the same associative machinery used in bush reading can be used by the recipient to associate an ER, such as the word ‘sad’, with its underlying feeling, its causes, and its consequences, see Figure 4. In that figure, dark clouds and the word ‘sad’ both refer to hidden causes and consequences of the cue/signal. For example, dark clouds might mean it will rain, and the utterance ‘sad’ might mean the person will need comforting, both depending on context.

Communicative intentions, such as wanting the consequences of one’s sadness attended to, can best be anticipated and understood by speaking them, that is, by naturalizing them. Modifying the example from Bar-On and Moore (2017), the word ‘sadness’ is to the currently invisible causes and consequences of sadness what a dark cloud is to the possibility of rain.

There is evidence that Homo sapiens, and perhaps Homo erectus, was strongly selected for causal inference; detecting things in the world that were natural signs (indexes) (Gell et al., 1998) of other events that were (often but not always) displaced in space and time (Sobel et al., 2004), to a far greater extent than other primates Deacon, 1997). There is no evidence that chimpanzees follow trails or deliberately create them. Trail use is not acknowledged in the literature (Green et al., 2020), and Michael Tomasello (personal communication) suggests the same: ‘Very few primates eat meat at all, so there is no tracking. Chimpanzees do eat small game, but it is basically always when they come upon them and see them visually - or sometimes if they hear them, they go to them and chase them. But I know of nothing suggesting that they track game based on the direction of footprints or the like’.

We can find no evidence that non-human animals track footprints to find invisible prey without using smell (Boesch and Boesch-Achermann, 2000; Janson and Byrne, 2007). The use of smell is quite different from visual tracking because smell becomes more intense as the tracked animal gets closer, whereas the footprint and other bush craft indexes do not need to resemble the animal itself in the direct way smell does.

After the origin of bipedalism around 6 MYA (Demenocal, 1995), hominids were exposed to seasonal mosaic environments (Domínguez-Rodrigo, 2014). Variability selection theory proposes that hominids needed to adapt to rapid climatic oscillation from 1.7 MYA to 1.0 MYA, with shrinking forests and woodlands and expanding grasslands, which placed pressure on the hominin niche (Potts, 1998b; Wynn et al., 2021). Increasingly variable, severe, and risky glacial/interglacial cycles over the past 800,000 years and more dramatic short-term climatic events, such as the volcanic winter arising from the Mount Toba supervolcano explosion, produced strong selection pressure resulting in severe human population bottlenecks (Ambrose, 1998).

Under marginal arid conditions with less vegetation, there would have been strong selection pressure for tracking: reading animal tracks, scat, broken branches, or disturbed vegetation to infer the recent presence, type, size, age, health, and possible future movement of game (or predators) (Liebenberg, 1990). Tracking would also have been easier in open country. Plant foraging, that is, using clues to where certain plants grow, for example, insect presence or bird calls (e.g. near water sources), would have become more valuable, as would have been indicators of nearby fresh water sources. To follow sparser food sources, navigation would have been needed over longer distances, using the position of the sun, stars, moss growth, and tree bending as indicators of cardinal direction or location.

There are conditions in which making inferences from signs about hidden topics is very helpful. For example, by seeing lots of nettles, clover, foxgloves, thistles, or poppies, one can infer that some kind of human habitation or disturbance was likely to have been at this site. There is no formal logical reason for there to be human habitation given nettles (e.g. the nettles could be generated for other reasons), and there is no arbitrary symbolic linguistic relation between human habitation and nettles either (unless one is agreed on previously), and there is no resemblance between nettles and human habitation. Instead, nettles are demonstrative of human habitation when found in a certain context. In Grice’s terms, the nettle has ‘natural meaning’ (Grice, 2020) or is an index (Gell et al., 1998).

Bush reading is replete with skills for inferring hidden events from their visible effects or antecedents. Hidden eggs or moving game from footprints, a future storm from cloud formations (cirrus followed by cirrostratus indicating a warm front and rain) and wind direction (using the cross-winds method), inferring direction from understanding the signs of the prevailing wind, for instance the ‘wedge effect’ on growth in a line of trees, the shape of trees, the asymmetric distribution of tree roots, the moss growing on bark, and the alignment of lettuce leaves north-south when exposed to lots of sun (Gooley, 2010).

‘The Inuit know that when purple saxifrage is in flower, the reindeer are in calf’. Precise timing of recent disturbances less than fifteen minutes old is noticeable from the reddening and then blackening of brittlegill (Gooley, 2010). Many rhymes have been invented to represent these facts, for example, ‘When the glow-worm lights her lamp, the air is always damp’, or ‘sea-gull, sea-gull sit on the sand, it’s never good weather when you’re on the land’ (Gooley, 2012). Finding animals requires understanding their tracks, scat, slough, remains, refuse, dens, and odour (Canterbury, 2015). From the tracks, many things can be inferred, such as, behaviour, age of the animal, when the track was made, the direction of the animal, etc (Liebenberg et al., 2010; Brown, 1983).

We became adept at seeing references to things (that did not necessarily resemble those things to which they referred) that were not immediately there, that is, the events which represented their hidden causes. The most thorough analysis of tracking and its relation to cognition is by Kim Shaw-Williams, who has proposed the social trackways theory. According to this theory, tracks of humans (and other animals) record narratives of what their authors were up to in the past, and these were constructed with communicative intent. Tracks are durable, and unlike mental states, sometimes you can see the tracks being made. Through association alone, one can observe the complex ‘grammatical’ relations between tracks and the track-maker’s behaviour and intention. Tracks provide a highly rich and ready-made representational narrative (Shaw-Williams, 2017).

In their article ‘Learning to induce causal structure’, Ke et al. (2022) show that if an LLM, as shown in Figure 1, is trained to associate paired ERs of a causal graph with data from a structure in the world that is well described by that causal graph, then it can generalize to infer novel causal structures in the world from that data. Therefore, if the children in Gopnik’s causal inference experiments (Kushnir et al., 2010; Gopnik et al., 2004; Seiver et al., 2013) can be shown to have been trained with causal model language describing the generative system responsible for an experience, associated with the observations (and interventions) from that experience, then an architecture no more complex than Figure 1 is sufficient to explain their causal inference ability. No other special neural causal inference machinery is needed.

Once we became good at using general indexes that did not resemble the things they represented, for understanding those things, hidden and not-hidden, it was a small step to being able to make our own references to topics, hidden and not-hidden, to manipulate ‘recipients’ who had this index understanding ability, in what we call ‘art games’. Gell has described this process as the basis of art (Gell et al., 1998). Iliopoulos writes, ‘in deliberately manipulating material signs, individuals would have exercised their significative beliefs about them, allowing others to observe and assimilate their habitualized dispositions towards things taken to share certain qualities or attributes’ (Iliopoulos, 2016a) p257.

But how could the step be made from passive observation of messages to the making of messages? Intentional senders cannot have come first unless they thought they had something to send to. This problem is related to the ‘paradox’ in the origin of animal communication (Krebs and Davies, 2009), which has two possible origins. The first is that a signal arises from the receiver’s exploitation of a pre-existing action in a sender, which then becomes ritualized. This has a material analogy in early artistic processes whereby found objects with an initial resemblance to a prototype were modified by a ‘receiver’ to look more like the thing the receiver thought they should look like (e.g. the Makapansgat pebble (Bednarik, 1998)).

The second way a signal can arise is if the sender exploits an existing perceptual property of the receiver, consider how an orchid may mimic a female bee to such an extent that the male bee will choose it in preference to a real female bee (Scott-Phillips, 2015). It is this second signal manipulation process by the sender of bush-reading-driven causal inference skills in the receiver that we suggest is the route to the ERs about events displaced in space and time. Teaching tracking is also an active process involving making tracks yourself, such as stalking each other (Brown Jr, 1986). Teaching that X is about T for a natural sign is no different from teaching that X is about T for an artificial sign, for example, that a raised rim of soil indicates where a turtle has buried itself is analogous to teaching the meaning of a word.

6. A space of External representation making games

We describe a space of ER creation games, where in the extreme corners lie science games, language games, and the art games, see Figure 5.

6.1. The science game

In the science game, visible topics are named, and hidden states are named to explain visible topics (Evans, 2022). Policies are learned for inferring the hidden meanings of indices (policies for observation and experiment). This is done during the practice of bush reading (Liebenberg, 1990), where an explanatory model is made to predict and understand events displaced in space and time on the basis of visible evidence. No learning takes place in the sender as there is no intentional sender at all. The sign is natural. All the learning takes place in the receiver, who comes to infer the meaning of those natural signs and make sense of the causes of the observed variables.

There may be a rich syntactical structure in the visible variables which implies a complex semantics about the physical and biological processes that constructed them. In the face of unpredictable environments, highly structured, rigid, repetitive, and redundant rituals are likely to develop (Hobson et al., 2018). An interesting unconscious consequence of such rituals might have been to ‘keep all else equal’; an important principle in scientific experimentation, which may have allowed more effective observations of causally relevant variables (Demmrich, 2023).

6.2. The language game

In language games, both the sender and receiver learn (Számadó and Szathmáry, 2006; Steels, 1999; Wang et al., 2018). By language game, we mean any game that involves agreement about conventional forms of communication by both sender and receiver. This may include the ‘language’ of visual art or mathematics as well as natural language. Our classification is based on distinct functional dynamics of learning processes rather than the entities that arise from such processes. The classic example of a language game is a Lewis signalling game in which two agents must agree on a communication protocol to solve a cooperative task (Rita et al., 2022; Kirby and Hurford, 2002; Wittgenstein, 1953). This can result in the co-creation of messages that are entirely arbitrary, that is, purely symbolic, not just indexical, through shared attentional procedures such as pointing (Steels, 1999; Tomasello, 2008).

Increasing abstraction is a typical finding over many iterations of transmission and has been demonstrated experimentally in human communication tasks of this kind. Various perceptual priors determine whether complete arbitrariness of sign evolves (Hawkins et al., 2021; Fan et al., 2020; Fernando et al., 2020). Overmann has analysed increasing abstraction in the mathematical domain, where such ‘language’ game dynamics also operate (Overmann, 2023). There are many variants of such games, only some of which are Gricean.

In this corner are also non-natural language convention creation systems and processes, each with different success criteria and methods for valid and invalid communication about topics, such as refutability (Popper, 2005), clarity, freedom from contradiction, and unambiguousness of reference (Cassirer et al., 1955). How to use traffic lights, lighthouses, smoke signals, and ostrich eggshell beads used by !Kung San hunter-gatherers of the Kalahari can be taught (Wiessner, 1982), or transmitted by a process of cultural inheritance (Heyes, 2018). All these existing conventional systems are likely to have adaptations for their transmission (Csibra and Gergely, 2009; Heyes, 2018), such as exploiting mechanisms of selective social learning, imitation, mind-reading, or natural language.

There can also be resistance to transmission. For example, in the realm of visual art, a ‘primitive’ artist refuses to accept such conventions (Berger, 2009), while an outsider artist does not know them (Ferrier, 1998).

6.3. The art game

We define an art game as any communication game in which the recipient is not trained and does not need to learn to be appropriately acted on by the artwork. In an art game, the sender learns a new way to modify the recipient’s experience by producing some message (ER) that permits zero-shot generalization by the recipient to new tasks, that is, the recipient should just ‘get it’ without having to try.

In Figure 5, on the left, some bird footprints are shown. If I know that you already have a competence for following these to find eggs, then I can play an art game and learn to produce abstractions of these footprints (arrows), which you immediately recognize as meaning that you should follow them, without requiring novel conventional agreement. Eventually, the arrow acquires meaning independently of the original context in which it was invented. The diagram shows an arrow being used to signal to aliens the route of the Pioneer spacecraft out of the solar system from Earth. Whether aliens will understand this external representation is a moot point.

This is a universal and functional definition of art as a discovery process by the sender. It is an algorithmic formal description of a communication process not requiring conventional agreement with a recipient. It defines a natural kind of art that could happen anywhere in the universe (Hacking, 1991). Our definition does not need to cohere with particular Western biases that pervade institutional, representational, and expressive theories of art (Bloch, 2014; Carroll and Carroll, 1999; Zeki et al., 1999; Arnheim, 19540; Gombrich, 1961; Hague, 1990; Collingwood, 2016; Carroll and Carroll, 1999). For us, an art game is an extreme case of a communication game. Many instances of art as defined by current theories of art may not be produced as purely the result of art games as we define them here. They may involve conventional agreements with the recipient, that is, contain elements of language games. Our definition is a definition of an art process ‘from the outside’ rather than ‘from the inside’, as Bloch puts it (Bloch, 2016).

Why define this strange kind of game? Because an art game has a power that language games do not. Evolution plays art games (as we define them) regularly; the result being mimicry by the creation of Carnapian ERs. An example is how butterfly eyespots may deter predators or deflect attacks (Kodandaramaiah, 2011). This does not require agreement from the predator. In fact, even if a butterfly could speak to a bird, there is nothing it could say to stop itself from being eaten. Language games are of no value in this setting.

Artists exploit their knowledge of the recipient’s perceptual priors to construct novel external representations that achieve the functional goal they desire, without agreement from the recipient. For example, performances (rituals) or images may be invented that exploit the recipient’s innate preferences (Dissanayake, 1992, 1998). However, in the case of animal mimicry, it is not the animals themselves who are the agents that produced the ER; it is the evolutionary process that has played the art game. This is an art made with pre-Gricean pragmatics.

In the domain of sound, an example of an art game would be the creation of an onomatopoeia (Davis, 2022), literally the ‘giving of a name’ as the Greeks speculated was the origin of language (Rastall, 2021). The onomatopoeia is an example of an art game rather than a language game because the recipient doesn’t have to do any work but can infer the meaning of the sound through direct resemblance alone. The artist, in Alva Noë’s terms, has reorganized the receiver’s experience of a topic to help communicate about it in a new way (Noë, 2015, 2023).

Non-human primates cannot be trained to draw depictions (Savage-Rumbaugh and Lewin 1994; Tomasello, 2008; Oxford, 1962; Seghers, 2014), whereas human children are easily motivated to exploit perceptual priors to discover effective external representations that influence a recipient. They invent ‘graphic equivalents’ for the objects that they want to draw (Kellogg and O’Dell, 1967; Arnheim, 1954; Winner, 2006). Parts of a picture (picture primitives) are used to refer to parts of the world by denotation systems (Willats, 2006), and relations between picture primitives are used to refer to relations between parts of the world by drawing systems. Both systems depend on what the child is trying to achieve (Tomasello, 2009).

Children do not typically aim to produce a viewer-centred photorealistic image whose final aim is mimicry from a single viewpoint. Instead, they often seek a description of things independently of any particular point of view (Marr, 2010). Similarly, the topic in many oil paintings is the object experienced over extended time and space (Elkins, 1997).

In the denotation system, points, lines, and regions are used to represent objects in the world. Initially, children use regions to represent volumes, but later learn that lines can be used to represent the edges of shapes much better. This involves rejecting previously used rules. Over time, the mismatch between picture production and picture perception spurs children to find new solutions, for example, to communicate spatial order, enclosure, join relations, spatial direction, and relative size. We will see later that similar depictive goals and trajectories of discovery are apparent in prehistoric art (Froese, 2019).

7. An evolutionary trajectory to fully Gricean external representations

We argued in the previous sections that the neural machinery sketched in Figure 1 is sufficient to explain all human ER invention and use. Here, we consider a genetic evolutionary trajectory to that machinery, as well as a cultural evolution-based trajectory through the pragmatic space of ER-making games defined in the previous section, which is capable of bootstrapping full Gricean communication. Our approach takes from ‘intentional act analysis’ the requirements for an effective lineage explanation, that is, small changes, no sky hooks, new meaning patterns should become available, the change should be adaptive, and the change should fit with the empirical evidence (van Mazijk, 2022a).

To achieve their communicative goal, senders can use increasingly sophisticated search algorithms for finding a good stimulus (ER) which, when presented, will achieve the desired behaviour in the receiver (perhaps but not necessarily by using knowledge of the receiver’s mental states). There are pre-Gricean, minimally Gricean, and fully Gricean algorithms for these kinds of interaction (Moore, 2021; Butterfill and Apperly, 2013; Bar-On, 2021). The ER perspective and the minimal neural machinery in Figure 1 impose logical constraints that determine how such trajectories of algorithms can evolve, for example, language must come prior to explicit (but not implicit) ToM (Moore, 2021). Figure 6 shows different kinds of ER-making processes that should be carefully distinguished, each one depending on increasingly complex abilities, with only the last one (top) being capable of developing a full propositional attitude, that is, of producing ERs that represent invisible causes such as mental states, allowing fully Gricean communication (Butterfill and Apperly, 2013).

Figure 6.

A plausible evolutionary trajectory of communicative algorithms from bottom to top.

7.1. Game type 1

Let us call the simplest algorithm in Figure 6 (bottom) evolved signalling. S (the evolved sender) produces an external representation X which results in R (the evolved recipient) producing action A. This, to a first approximation, describes most Carnapian animal signalling, such as vocalizations of non-primates, bee dances (von Frisch, 1967), cicadas’ mating songs, firefly flashes, and octopus colour changes (Bar-On, 2021). This corresponds to the bottom-left corner in Figure 5, where there is no exploratory learning in sender or receiver in this type of communication game. Evolution has ‘designed’ the signal X. Millikan would say that the signal’s proper function was communication, evolved by mutual adjustment between signallers and receivers over evolutionary time (Bar-On, 2021). Certainly, no mind-reading is required. The signals need not be treated by their users as communicative to accomplish their purpose. No thinking about others’ states of mind is needed. Cultural as well as genetic evolution may be the substrate for this kind of signalling, for example, in bowerbird decorative styles (Madden, 2008).

7.2. Game type 2

A more complex process in the middle of Figure 6 is reinforcement learning optimized signalling. S (a Signaller capable of learning by RL) can search over different actions X(i) to induce R to produce a desired action A(i), which maximizes reward for S. Animals may try out a variety of actions to manipulate other animals in this way, if they are intrinsically rewarded for doing so. The aim is the manipulation of the receiver for the sake of the sender van Mazijk (2022b). If S is motivated to direct the attention of R as measured by joint gaze direction, then several attention-directing signals may be explored and tested for efficacy by S. Gestures (but less so vocalizations) are under voluntary control in non-human primates (Tomasello, 2008).

Eventually, a demonstrative such as a gesture of pointing, or an arbitrary sound meaning something like ‘this’ standing in for a gesture, may be invented. This would have no specific referential content but have a strong correlation with internal mental states of the sender, such as them wanting the recipient to attend to the topic at hand (Bar-On, 2021).² In this way, the recipient may come to know that S intends them to make a response A(i), but without requiring an explicit theory of mind, merely by tracking perceptual states that are correlated with intentions (Moore, 2017).

At this transitional stage, external representations would have been ‘practice-embedded’ without free availability, tied to specific contexts and shared practices, being brittle references, non-compositional, non-systematic, and lacking generality (van Mazijk, 2022b).

7.3. Game type 3

This game leads to a more complex and efficient process for communication at the top of Figure 6, which Moore has called minimally Gricean communication (Moore, 2017). Let S have a goal that R produces A(i), which will maximize S’s reward. S knows R produces A(i) in response to an externally observable event T. Through joint attention (possibly developed in game type 2 above), S is capable of making R attend to T because R knows that S intended them to attend to T when, for example, the sound ‘this’ combined with pointing was uttered, while S simultaneously (within the neural world model temporal context window) produces the external representation X. R is then capable of inferring that S intended X to be associated with T. Therefore, A(i), which was once executed in response to only T, is now executed in response to X. X does not need to resemble T, but R is practiced with such arbitrary meanings from their background in bush reading.

Also, conventions (biases) develop during ontogeny by associative learning about the kinds of properties of T (e.g. shape) that X is typically about, as shown by investigations of the order of word learning in children (Crane, 2000). S can modify X or the joint attention process, for example, pointing, to improve the effectiveness with which R should interpret X to mean T. Only implicit theory of mind has been used. Interaction theory, described by Froese and Gallagher, operates at this dynamical systems level of embodied sociality, showing that pre-Gricean processing can do a lot of work in achieving joint attention and coordination (Froese and Gallagher, 2012).

7.4. Game type 4

Building upon the minimal Gricean communication framework above, in game type 4, new Xs can be invented that stand for internal mental states such as emotions or beliefs. For example, S can pair X with conditions in which S feels thirsty, and where T corresponds to one of many possible means by which S’s thirst could be allayed by R, and A(i) are actions on R’s part that would help to allay S’s thirst. This is the enactment of an intermediate variable in Whiten’s terms, which comes to express a desire/goal of S (Whiten 2013, 2014), also called an avowal (Bar-On, 2004). The structural organization of the game is no different from the minimal Gricean setting. The only difference is that ‘thirst’ is a more general intermediate variable that refers to an internal state in S, rather than a specific external topic T.

Other Xs could come to mean the self, for example, a handprint, which would make R execute actions associated with R having seen S, even when S was not currently present.³ Once communicative intent can be understood, increasingly sophisticated games can be invented, a game being defined as a goal and a set of constraints for play. By this stage, methods exist for teaching the meanings of arbitrary Xs, for example, transmitting arbitrary rituals, a ubiquitous finding in all cultures which allows achieving desirable outcomes through obscure means (Demmrich, 2023).

An explicit theory of mind has now been developed because, through the enactment of intermediate variables, Xs have been created that refer to internal states in the sender, and full Gricean communication is possible. S and R can model, using their world model that includes these Xs, their own intentions and those of others. This is factic knowledge, not merely practical knowledge (Millikan, 2017). R can explicitly model what S wants when uttering X. This is possible because the ERs for ‘this’, ‘self’, ‘thirsty’, etc., have been invented and can be understood. The utterance X is produced with speaker meaning and interpreted as such (Bar-On, 2021). The speaker intends to communicate their state of mind to the recipient, and the recipient seeks to infer that state of mind. This is possible because both can represent states of mind using external representations.

Scott-Phillips has a different point of view (Scott-Phillips, 2015), proposing that language developed as an adaptation in a species already genetically adapted for ostensive-inferential communication, thus already capable of mind-reading and the intentional stance. We claim that explicit ostensive-inferential communication arose only with linguistic capacities (Bar-On, 2021), but that simple forms of implicit and embodied approximations to inferring communicative intent arose culturally prior to language and were needed for bootstrapping language.

8. External representations and pretend play

Pretend play is fully Gricean, that is, when X comes to stand for T, it is known by sender and receiver that X is not T but is intended to be about T, and that S and R are pretending together that X stands for T in some mutually constructed, bracketed-off systematic map between domains (Bennett, 1996). One wonders how Homo heidelbergensis thought about the Berekhat Ram figurine (Morriss-Kay, 2010). Did the figurine have Gricean ‘aboutness’ as well as Carnapian ‘thingness’? There is no evidence that spears were thrown at bison painted on cave walls. While it is possible that an entity ER with a resemblance to T is known to not be T but to be about T, this can only be strongly confirmed when the ER does not resemble T.

We argue that other animals do not engage in fully Gricean pretend play because it requires that the recipient knows that the sender intends that X be about T, and that X is not merely about T by resemblance alone, that is, pretending is a language game rather than an art game.

Cats play with many objects that scaffold similar sensorimotor policies as mice, for example, rubber bands, mittens, wadded-up post-it notes, and feet under blankets. But would it be fair to say that a cat pretends that the ball of wool is a mouse? Lorenz describes that it is because the ball resembles a mouse that it releases a fixed action pattern in the cat (Schleidt, 1974). In line with this, the ball of wool does not release all the mousy action patterns, for example, the cat does not eat it, it is hoped. The cat does not understand the principle of representation but is at the whim of the creator of the ‘material scaffold’.

Cats cannot pretend that any arbitrary thing is a mouse; there has to be a resemblance to a mouse for it to treat it as a mouse, whereas we can pretend anything is a mouse. Cats can’t even construct objects that resemble mice in order to play with them. As R.W. Mitchell writes, ‘Noticing similarities is widespread among animals; creating resemblances is not’ (Mitchell, 2002).

Similarly, when a cat play fights, it is not pretending to fight by doing something else like jumping or climbing; it is having a not very serious real fight. To claim that when a lion cub growls in a fight it is pretending that the growl has aggressive intent is to claim that, based on the knowledge that aggressive intent is the hidden cause of growls in fights, and by reasoning that if they growl, the recipient will infer that they have aggressive intent, they decide to emit a real growl for this reason. If they were capable of such reasoning, then they could be generally more Machiavellian in other respects, demonstrating a higher level of manipulative reasoning, for example, pretending to be vulnerable or pretending to be affectionate or in distress to gain an advantage or control over a situation before attacking another lion.

While thanatosis or ‘playing possum’ is seen in various species, such as ground-nesting birds ‘pretending’ to have a broken wing to lure predators away from their nests, or possums playing dead when threatened, these behaviours are instincts; it is evolution that has done the work of producing the pretence, not the animal. As far as we know, a lion cub does not even randomly search for stimuli that might deceive another lion cub in a play fight, even without an understanding of the opponent’s hidden mental states.

Neither do chimpanzees have a general capacity for inventing props to pretend with; that is, they don’t make performances or objects in the world that are intended to be about other things in the world. Chimpanzees playing with dolls is not evidence of ER use (Kahlenberg and Wrangham, 2010), but rather that a fixed action pattern is released to stimuli, for example, dolls which resemble an infant stimulate instinctive caregiving activities (Lorenz, 1981). The behaviour patterns of caring for young and the affective responses, which an animal experiences when confronted with a child, are released by a number of cues in the doll, for instance a large head in proportion to the body.

A recent article ‘Pretense in Chimpanzees’ is unconvincing in terms of ER creation and use Matsuzawa (2020). Their main claims are that playing with a log as if it were a child constitutes pretending, and that imitating a human putting a banana to their ear could be interpreted as pretending a banana is a phone. This is imitation, not pretending, as there is no evidence that the chimpanzee knows the banana is not a real phone but stands for one. Real pretending would require the spontaneous decision to use something as a phone, which did not resemble the phone, without seeing someone else do the same.

Even cases of deception in apes offer ambiguous evidence of pretense because they can be explained on the basis of action-reaction regularities, that is, scripts (Mitchell, 1999). However, we do observe cases of what seems to be RL optimized deception but not external representation in chimpanzees, where rocks were hidden behind straw to throw at humans in a zoo (Osvath and Karvonen, 2012). The straw (or rock) does not stand for anything else.

This is not to say that both chimps and humans can’t create culturally evolved rituals that are iconic or indexical, which can work at a pre-Gricean level of communication, as the bowerbird can. For example, the penis-holding ritual in Walbiri Australian Aborigines is about trust (Meggitt, 1966), as is the ‘scrotum grasp’ of male baboons (Van Leeuwen et al., 2012; Dal Pesco and Fischer, 2018), or the more globally prevalent handshake, but these are all not entirely arbitrary symbols as considerable damage could be inflicted if trust did not indeed exist during the ritual.

Also, to claim that non-human primates do not engage in pretend play is not to say that animals do not have imagination (Weber et al., 2017). Imagination is distinct from pretending in that pretending involves creating new messages (ER) in the world that stand for tokens (T), in order to increase the scope of the imagination, new things to imagine with. As Merlin Donald has described, apes do not have the capacity for symbolic invention, even though they can be trained to use signs that are given to them (Donald, 1991).

Contrast this with object-substitution play, which occurs by 18 months of age, where children pretend that one thing is another thing, spontaneously without imitation and without resemblance (Bijvoet-van den Berg and Hoicka, 2019). This allows motor habits learned in one domain to be utilized for other goals. In this sense, pretending is constituted by co-opting skills in one domain to achieve goals in another by making a mapping between domains. To represent or pretend is to use one skill, decouple it from its existing goal, and attach it to another goal.

Children by 2 years of age engage in pretend/imitative play in which the abstracted play world is bracketed off from the real world. ‘How is it possible for a child to think about a banana as if it were a telephone, a lump of plastic as if it were alive, or an empty dish as if it contained soap?’ (Leslie, 1987) p. 412. The ability to pretend depends on the capacity to transform present objects (Fein, 1975) in order to represent absent objects and situations. By this age, children learn that a drawing of a face is about a face (i.e. the drawing is a whole sensuous substratum in itself), and not a face; it is a ‘pretend’ face, also bracketed off by the borders of the article or the frame. They can imagine (model) fictional counterfactual scenarios not present in reality and not confuse the two. This may have an adaptive advantage in challenging environments containing problems and conflict, planning responses to them based on one’s emotional responses, conveying true information in an engaging way, or encouraging seeing things from other people’s perspectives (Dutton, 2009).

9. The first external representations

What evidence is there of the first external representations and the algorithms used for making them? The notion of a single punctuated creative cultural explosion with the origin of modern human behaviour in the Upper Paleolithic 50,000–40,000 years ago in Europe due to some neural advance has been strongly refuted (Pfeiffer, 1982; Henshilwood and Marean, 2003). The notion of behavioural modernity has been questioned (Roberts, 2016). Instead, we find evidence of increasing ER use from 300,000 to 100,000 years ago in the Middle Stone Age in Africa (Mendoza Straffon, 2014; Malafouris, 2013; McBrearty and Brooks, 2000). Even as early as 2–3.3 million years ago, there is stone modification and culturally inherited metaplastic neural changes explaining the gradual and often fractured progression of scaffolded cultural norms (Roberts, 2022; McPherron et al., 2010).

Artifacts can be seen as divided into several categories: pleasing found objects, found objects enhanced, symmetric hand-axes, ochre use for body painting and other applications, personal ornaments, incised objects with geometric patterns, carved figures, and finally representational painting (David, 2017; Mithen, 1998). What do these artifacts imply about a trajectory of enacted ERs? We ask what ER creation games may have been responsible for their production.

9.1. Found and enhanced objects

The simplest category of object is found art, or minimally modified objects. Of the first category, the Makapansgat cobble is the earliest known, which seems to have been moved by an Australopithecine 2 million years ago because it resembled a face (Oakley, 1981). Also in the category of found art, the Tan-Tan figurine dates from 500,000 to 300,000 years ago. It may contain some enhancements by human flaking, although this finding is highly contentious (Bednarik, 2003). The Berekhat Ram figurine (300,000–200,000 years ago), also a found object, has been enhanced and perhaps painted (d’Errico and Nowell, 2000). In Norfolk 500,000 years ago, an Acheulean hand-axe was made with a centrally embedded shell (Flanders and Key, 2023). All these objects probably belong in the lower parts of Figure 5, although they may have reliably come to stand for the feelings associated with their observation; how they were used beyond that is unknown. Where enhancement has taken place, this is evidence for an art game in which there was a perceptual goal in mind and an associated goal to make a policy to achieve the perceptual goal. Where there are multiple such objects, there is evidence that this perceptual goal and policy were themselves heritable.

9.2. Rule-based structured objects

This category of object (especially if found in multiple copies) provides strong evidence that there must have existed certain heritable sensorimotor policies for construction and use. Prior to 400,000 years ago, these policies would have been highly stereotyped and intuitive. For example, those for making hand-axes or spears may have largely been by ‘instinct’ or an inflexible (not yet explicitly represented, reactive construction policy) in which the full set of micro-steps (sub-goals and explicit rules) for their manufacture may not have been talked about (named) and so were not accessible for conscious planning and modification (Ingold, 2013; Garofoli, 2015). While there is considerable skill in a particular task, this lack of diversity could imply that the individual steps or sub-goals in making the hand-axes at this early stage may not have been externally represented. Perhaps machine learning is at this stage, where reinforcement learning can optimize a sensorimotor policy for solving a single problem such as Go (Silver et al., 2017), but where methods for describing how the solution was reached, or modifying the game of Go, do not exist.

Analysis of the Levallois technique for knapping stone cores in the Middle Paleolithic suggests that a deliberate, engineered strategy oriented toward specific goals would have been needed, although it is not known how such goals and skills would have been transmitted (Eren and Lycett, 2012). Evidence exists that a goal during the Levallois technique may have been ‘minimization of “waste” while aiming to maximize cutting edge length of flakes obtained from cores of a given size’ (Lycett and Eren, 2013) p. 2384. This practice-embedded scaffolding of activities by ERs would slowly have culturally evolved into more flexible, general, and abstract ER construction and use methods (van Mazijk, 2022b). For example, there is evidence that compound adhesives were used even 70,000 years ago to attach shaped stone to hafts. This implies that ERs for construction recipes existed capable of representing the complex hierarchical sub-goals and knowledge of steps required for the gathering of materials and for their subsequent assembly (Wadley, 2010).

Note, ERs can be both material (persistent, e.g. written words, clay objects, drawings, representative sculptures) and performative (transient, spoken word, dances about events, gestures with referents). Material objects are ERs if they are about things or events. However, in the case of linguistic ERs, spoken words are also in a sense ‘persistent’, present and ready to hand, manipulable, and exchangeable, as the article ‘Words as Objects’ discusses in a fantasy culture (the Wao) where words are objects (Elias and Gallagher, 2014). An extant example of ERs for construction recipes is available by observing how bow and arrow use is embedded in the performative group practices of hunter-gatherer communities. A rich set of performances and rituals are inherited which permits transmission and manifestation of bow and arrow making and using skills (Walls, 2019).

Pigment use for body decoration by Homo heidelbergensis may have gone back 400,000 years ago (Wynn and Coolidge, 2012). In the Blombos Cave at 170-100kya, we see powdered ochre for decoration (Iliopoulos, 2016a). Their use in body adornment implies that external representations can become associated with private values in the multidimensional reward system shown in Figure 1, which later may have chained to represent more complex and general concepts such as wealth and status (Iliopoulos, 2016a). The arbitrary non-adaptive standardization of form is evidence for aesthetic preferences, for example, when ochre is preferred in its red (not yellow or black) forms (Prum, 2012; Kissel and Fuentes, 2017). However, without further evidence, we cannot say whether regalia worn as body ornamentation had Carnapian or fully Gricean functionality.

Consider the making of shell beads in the Middle Stone Age in Blombos Cave, dated at 75,000 years ago (Henshilwood et al., 2004; Iliopoulos, 2016a). Firstly, their existence implies an art game, that is, that to a first approximation, some properties of the finished product were evaluated, and that a policy was optimized to highlight those desired properties. The property may have been very distal, for example, social approval, or proximal and related to the object’s form directly. In the latter case, we can ask what material properties of the shell bead necklace were aesthetically valued. Time and effort were needed to construct them because the closest estuary to where they are found is 20 km away (Iliopoulos, 2016a). Their existence also implies a complex behavioural repertoire of searching for shells, perforating them with careful and controlled actions, and stringing them together. This highly complex sequential process is unlikely to have been developed without a format for externally representing the sub-goals required to carry out the construction. External prompts must have existed to represent the intentions (goals) needed for each sub-task. By observing the finished product, a competent observer would have been aware of the time investment required and the history of activity needed for their construction (van Mazijk, 2022b). It seems likely that a rich ER-based ritualized scaffold of norms would have been needed to encode and transmit the construction recipes. As the beads were found at four stratigraphic levels, Vanhaeren has argued that they were produced over a long period of time and were not a one-off production by an individual (Vanhaeren et al., 2013). An alternative explanation for bead similarity over long periods is the inherent constraints due to the handling of materials of that form.

The motivation to adorn one’s body may have several foundations, the simplest being an association between how well one is treated by others and wearing the adornment (we know they were worn because they have been eroded by human sweat) (Iliopoulos, 2016a; van Mazijk, 2022b). This, in turn, presupposes that observers were aware of the value of the adornment, for example, arising from the rarity or difficulty of manufacture referred to above. They are not needed for food, so it is a costly display (Zahavi, 1975). The adornment becomes an index for the history of its manufacture, which is itself associated with a valuable individual.

A more complex motivation is one where the wearer is explicitly capable of externally representing the viewer’s mental states. The communicative function of adornments of these different types belongs in different parts of Figure 5. Chimpanzees don’t wear jewellery, possibly because jewellery would not, for them, be about the events of its manufacture displaced in space and time that are represented by the jewellery, and therefore, would not indicate the value and power of the wearer or their likely actions in the future. However, see a report of ‘grass-in-ear’ behaviour being copied spontaneously in chimpanzees, perhaps for purely Carnapian effects on innate preferences (Van Leeuwen et al., 2014) of the type responsible for sexual selection (Miller, 2001).

What process could have driven the invention of various geometric forms, such as zig-zags, criss-crosses, nested curves, and parallel lines that are found commonly in cave art, for example, Morriss-Kay (2010)? There is evidence in the earliest mark making with lines (threads and traces (Ingold, 2016)) that policies were optimized to try to achieve ‘regular shapes’ and ‘parallel lines’, which appear to have been perceived to the extent that their partial achievement could be used as reward feedback to optimize more robust construction policies for their production. In other words, the lines were made with perceptual intent. On Figure 5, this involves first a movement up (naming of a property) followed by a movement to the right (construction of a form with that perceived property), perhaps in a recurrent loop (or rather upwards staircase) of ‘creative thinging’ by creating, exploring, and discovering affordances during enactive signification in which material signs bring forth meaningful experiences rather than externalizing pre-conceived internal designs (Malafouris, 2021; Van Dijk and Rietveld, 2017). This is, at its core, a kind of mathematical process of investigation. All marks are about the movement that produced them. They signify the action or movements of their construction (Malafouris, 2021).

Consider the engraved ochre piece from 73,000 years ago in the Blombos cave, showing complex latticework patterns. First, the ochre surface is flattened by grinding and scraping, and then incised with lines using a lithic point, see Figure 7. As Malafouris has eloquently described, we can tell that at least one game was played. The maker is known to have first made the marks from top-left to bottom-right, then from top-right to bottom-left, and then the parallel horizontal lines. The maker is concerned and attentive to the marks they make. They may have had ERs for features such as ‘parallel’ and ‘triangle’, although the level of abstractness of such ERs is not clear. For example, we do not see triangles made in all kinds of different ways, such as with pointillism or internal shading, which would be better evidence of a full ER for ‘triangle’ as we know it today.

Figure 7.

A cross-hatched pattern drawn on ochre in 73,000-year-old Middle Stone Age levels at Blombos Cave, South Africa. It is one of the earliest abstract drawings that has been found. It demonstrates an enactive understanding of parallel and equidistant lines, and equal angles, as otherwise the probability of such lines appearing in this configuration would be very low. The lines have no obvious function or purpose, but one goal must have been unsupervised control and mastery of the perceptual forms arising due to motor policies, that is, creative thinging (Malafouris, 2021). This iterative process of feature creation and manipulation bootstraps games which move from the bottom-left to the top-right of Figure 5. ‘Science’ labels new features that are discovered in the drawing process, and ‘art’ gains mastery in producing these new features with increasingly robust policies.

A more likely possibility is that the sensorimotor policies for constructing ‘parallel’ things or ‘triangles’ were domain-specific and brittle. Assuming that such a piece could have been constructed more than once and not just by chance, to make such a piece would have required awareness and perhaps social agreement using ERs for describing the properties to look for when making it. For example, ensuring lines were parallel and triangles were equilateral, although without multiple copies, we cannot be sure what geometric features were the goals of the construction process (Fan, 2015; Gureckis and Markant, 2012; Larkin and Simon, 1987). Possibly one of the greatest insights for the maker, in making objects like this, was that intentions (goals) could be externally specified and manifested permanently. The ‘footprints’ of these marks revealed the geometric intentions of their makers.

9.3. Figurative art

With figurative art comes a new kind of ‘aboutness’. At the Carnapian level, the art is not ‘about’ at all but only ‘thing’; pure resemblance. The utility of making veridical representations prior to Gricean communication appears limited; why make an object that resembles its topic for a pre-Gricean receiver other than to deceive them, or as pornography? We know what to do with (how to see) pictures now, but when they were first invented, what to do with them (how to see them), what they meant, and why they were made, would not have been clear. Indeed, the psychological power of the image to the uninitiated must have been immense (Malafouris, 2007). Just as children feel, rub, and strike pictures when they are first seen, the first viewers of such images would have had difficulty knowing how to interpret their ‘aboutness’. If simultaneously, denotation systems for drawing were also being invented, the magnitude of the creative endeavour becomes more evident.

Perhaps this is why complex policies for making figurative art come later, starting with perhaps bush reading inspired hand stencils from 40,000 years ago (Froese, 2019), moving to the famous cave paintings of animal forms (37,000 to 10,000 years ago), chimeric animal carvings such as humans with lion heads (40,000 years ago), and the Venus of Willendorf (28,000 to 24,000 years ago) and other Upper Paleolithic female figurines (Fenici and Garofoli, 2017). The denotation systems in cave art seem to have been based on outlines in strict profile and canonical perspective with an understanding of occluding edges. The game being played, therefore, seems to have been to provide maximal information about the animal category. The artist was discovering a set of Gestalt principles about our perceptual systems (proximity, good continuation, closure, and similarity) to manipulate our experience, to expand and explore our own minds and perceptual systems, and to represent ourselves (Malafouris, 2007).

10. Creative external representation creation and inheritance

The minimal architecture in Figure 1 is currently being intensively studied in machine learning (Goertzel, 2023; Fernando et al., 2023; Chang et al., 2023). LLMs used in a single-step manner, after undergoing only basic pre-training, generate a kind of sophisticated but thoughtless ‘stream-of-consciousness’ resembling the language use of semantic aphasics. They are large language models, not large thought models, as it says on the tin. To produce more useful and sensible outputs, further fine-tuning and feedback are required (Ouyang et al., 2022). More recent articles have extended the basic large language models (LLMs) to attempt to more directly model sequential thought processes, for example, tree of thoughts (Yao et al., 2023) and selection-inference (Creswell et al., 2022).

Thought is implicitly defined here as a Markov Decision Process in which, conditioned on the current ER state, the LLM self-prompts itself with a text-action (prompt) to produce the next text-state. We think of thought as an enactive multi-step sensorimotor process or game played by sensing and outputting ERs. We propose that the power of LLMs results from their ability to apply the knowledge contained in the enormous training set of ERs, which includes natural language and computer programs, for example, and use these to think with. We think about things with things. Sometimes we think about the things we make with the things we make, that is, creative thinging (Malafouris, 2013). In other words, we augment the original data with new data and then try to gain mastery of that new data as well as the original data. This is the core nature of mathematics, where we construct triangles and aim to understand the regularities and rules underlying their construction, for example, Pythagoras’s theorem. Before creative thinging, only things approximating triangles were passively observed.

LLMs do not (in their basic form) execute multiple recursive steps of deliberative self-prompting (self-ERing) as we do. They don’t use reinforcement learning to learn how to do new kinds of self-prompting to solve a problem. Without this, it appears that errors of reasoning are rife because they lack the ability to use ERs flexibly to fit a particular problem at hand. An example is The Reversal Curse: LLMs trained on ‘A is B’ fail to learn ‘B is A’ (Radford et al., 2021). No matter how many examples of statements of logical equivalence are provided to the LLM during training of its weights, when given examples of new A is B’ statements, they fail to infer that B is A’. Interestingly though, if they are given in-context’ examples (i.e. prompted by text ERs not during training time but at inference time), they do succeed. But it is the human who provides these prompts currently. This is evidence that certain kinds of basic logical reasoning not learnable by association can be solved with iterative self-prompting, that is, by using ERs in the world itself. But even so, humans can still think about equivalence without acting in the world, and so this suggests that another slower and distinct neural modelling process is involved (also shared with primates but perhaps enhanced), which simulates external ‘in-context’ processes, a natural candidate being prefrontal cortex mediated working memory (Heyes, 2018), which is therefore included in Figure 1.

Another important difference between human language use and LLMs is that LLMs require vastly more data to train than human children do when learning language. Is this because the training set used by children is of much higher quality, or is there a neuronal difference still to be uncovered? Probably both. Another difference is that LLMs still don’t spontaneously create new useful ERs – in the sense that they don’t, for example, invent adaptive new words or write new books to explain things (in contrast to the problem of ‘hallucinating,’ i.e. LLMs making up plausible but fake ‘facts’, showing no novel insight (Zhang et al., 2023)). When one has to invent new words or combinations of words to explain things, this requires a hard combinatorial search, and we have not yet combined LLMs and RL with sufficiently rich reward feedback to address this creative challenge, although see (Zelikman et al., 2024). Another challenge for machine learning is to understand how the effective enaction of an intermediate variable as proposed by (Whiten 2013, 2014), is possible.

ERs scaffold new sensorimotor contingencies. In other words, thinking doesn’t happen first in the head; it happens in new kinds of engagement with the world (Wynn and Coolidge, 2012). As we agree with a strong form of the sensorimotor contingency account of consciousness, it follows that ERs permit a new kind of consciousness (O’regan and Noë, 2001). The sensorimotor contingency theory of consciousness proposes that to experience red is constituted by the sensory consequences of actions made towards red objects; redness being a property of this certain kind of tight sensorimotor coupling (O’regan and Noë, 2001). With the origin of ERs, new sensorimotor contingencies (affordances (Gibson, 2014)) were created. One became able to experience the sensory (ER) consequences of motor (ER) acts (Withagen and van Wermeskerken, 2010). This is not internal thought externalized; it is a behavioural policy on ERs in the world, the ER being experienced in the same way red is experienced. Just as the sensorimotor similarity of two visible objects can be compared by primates, (Vonk and MacDonald, 2002), ERs allow experiencing that one ER ‘is like’ another ER.

We do not think it is a coincidence that a core property of consciousness described by Nagel – the capacity to represent/re-describe experience A as experience B – is also the core function of an ER. Nagel claimed conscious thought requires qualia, which is constituted by the fact that there is a ‘what it’s like(ness)’ to experiencing something, such as the redness of red (Nagel, 1980). Using ERs, we can create new ‘what it’s like(ness)es’. Conscious thoughts are always about something; they possess content and reference (Brentano, 1995), and the things that they are about are often reportable, that is, can be described (Block, 2007). Jackendorf has examined the limitations and utility of linguistic ERs in explaining conscious thought (Jackendoff, 1996). Carruthers examines the role of linguistic ERs in reasoning and memory, looking at the visuospatial sketchpad and the phonological loop (inner speech). However, he argues these are due to specialized neural modules (Carruthers, 2002).

Binding and the phenomenal unity of consciousness (Revonsuo, 1999) can be explained by an enactive account in which the world itself serves as a unifying reference point (Varela et al., 1991). The content of conscious thought is selective and only refers to a subset of experienced states (Dehaene and Changeux, 2011). This is consistent with the fact that only a limited set of ERs can be entertained at any one moment using the standard attention machinery common to all primates. Reflective or evaluative self-awareness associated with a narrative-self requires the capacity to represent properties about oneself using ERs (Gallagher, 2000).

We actively construct reality through culturally and historically specific narratives that operate according to certain conventions (Bruner, 1991). For example, the San of Southern Africa, or North American drawings produced in vision quests (Lewis-Williams, 2002), show visually mythological explanations communicating deeply personal and significant experiences, religious purposes, and totemic functions (Clottes et al., 2016). In the Western tradition, Bloom has argued that Shakespeare invented radically new forms of self-narrative (Bloom, 2008).

11. Conclusion

Whilst we are capable of manually prompt engineering LLMs to permit them to carry out novel tasks in a zero-shot manner (Jiang et al., 2019; Wei et al., 2022), there is a certain ‘black art’ to prompt engineering (which is the use of existing ERs to prompt LLMs to achieve new goals). Recently, this process has been automated in several articles such as Promptbreeder (Fernando et al., 2023; Liu et al., 2023). However, these algorithms assume task-specific human-designed reward functions (extrinsic motivation), lacking the intrinsic motivations of joint attention and the multidimensional value system shown in Figure 1.

Currently, LLMs use a next-step-prediction loss for training, but during training, they do not generate new ERs, that is, new synthetic sentences that could minimize this loss on existing data. Instead, they depend only on existing data. They don’t make stuff up in order to reorganize their habits or invent new texts to help understand existing texts better (Noë, 2023). And certainly, they don’t do creative thinging, that is, playing around to produce new data, which they also try to gain mastery of, such as inventing new mathematical structures and trying to explain them. Why else did we make squares and circles and triangles?

Humans, on the other hand, are motivated to invent new data that allows better compression of existing data and to compress interesting new data. Let us limit ourselves to explaining existing data first. What would this look like as an algorithm? During the training of the LLM R on the existing dataset, another LLM S is trained to generate synthetic data to augment the dataset on which R is being trained, to minimize the next step prediction loss of R on the original dataset. If this is done, the ‘understanding’ of the original data isn’t just in the weights of R’s neural network; it is contained in the ERs that S injects into the world to help R understand better. This is a fully enactive approach to cognition using external representations to aid understanding. New knowledge is created in the world and can accumulate over time and be accessed by multiple LLM agents.

Whilst this may seem similar to the notion of the evolution of correct in-context (few-shot) examples, which are later used to provide a context for further LLM calls on similar questions (Brown et al., 2020; Fernando et al., 2023), there are several important differences. Existing methods for the discovery of in-context examples do not accumulate such examples effectively over multiple tasks, the methods for writing these notes are not themselves learned and improved, and readers do not learn to use a library of these notes in an adaptive way. The exception is a recent article called Voyager (Wang et al., 2023), which demonstrates an accumulating self-prompting process in Minecraft where programs are generated and stored to help the avatar explore the Minecraft world better.

If a cultural co-evolutionary system consisting of readers motivated (selected) to read notes conditioned on the task at hand, and writers motivated (selected) to write notes that are useful to readers, can be produced with LLMs, then the accumulation of ERs and new policies for using them is possible, as these LLMs play a range of communication games of the kinds described in Figure 5.

We have argued that, using the minimal neural machinery shown in Figure 1, it is possible to explain the origin and cultural evolution of open-ended external representation use. Having shown this, we have proposed how existing LLM algorithms could be developed to permit open-ended external representation creation, which we equate with being a prerequisite for true general intelligence and certain types of conscious thought.

Footnotes

Acknowledgements

Many thanks to Victoria and Lizzie Mitchell, Richard Moore, Eors Szathmary, Inman Harvey, Richard Evans, Olivier Tieleman, Karol Gregor, Piotr Mirowski, Georg Ostrovski, Kitty Stacpoole, Sam Adjei, James Elkins, Charles Blundell, Reed Roberts, Ezequiel Di Paolo, Merlin Donald, Martina Yin, and Murray Shanahan, and Leee Overmann for their invaluable kind help and encouragement in improving the manuscript.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by Google DeepMind.

ORCID iD

Chrisantha Fernando

Notes

About the Authors

Dr Chrisantha Fernando is a Research Scientist at Google Deepmind. Originally he studied Medicine at Wadham College Oxford, then did a MSc in Evolutionary and Adaptive Systems at Sussex followed by a PhD on the Origin of Life. After some time at the Collegium Budapest working with Prof Eors Szathmary he lectured in Queen Mary University, London. He’s worked on evolution and learning at DeepMind for the last 10 years, and is now interested in the links between machine learning, art and literature.

Dr Simon Osindero is a Director of Research and Principal Research Scientist at Google Deepmind, where he leads the Agent Frontiers team and conducts research into fundamental problems in, and applications of, artificial intelligence and machine learning.

Dr Dylan Banarse is a Senior Software Engineer at Google DeepMind working at the intersection of AI, artificial-life and computational creativity. After his PhD on biologically-inspired neural networks he moved into industry to lead development on multiple BAFTA-nominated projects, from the Creatures series of artificial-life simulations to four series of the mixed-reality BBC TV show BAMZOOKi. To escape the digital domain he experiments with analogue electronic generative music alongside analogue video synthesis.

References

Akam

Costa

Dayan

(2015). Simple plans or sophisticated habits? State, transition and learning interactions in the two-step task. PLOS Computational Biology, 11(12), e1004648. https://doi.org/10.1371/journal.pcbi.1004648

Allen

J. G.

Fonagy

(2006). The handbook of mentalization-based treatment. Wiley Online Library.

Ambrose

S. H.

(1998). Late pleistocene human population bottlenecks, volcanic winter, and differentiation of modern humans. Journal of Human Evolution, 34(6), 623–651. https://doi.org/10.1006/jhev.1998.0219

Anderson

M. L.

(2008). Circuit sharing and the implementation of intelligent systems. Connection Science, 20(4), 239–251. https://doi.org/10.1080/09540090802413202

Andres

Di Luca

Pesenti

(2008). Finger counting: The missing tool? Behavioral and Brain Sciences, 31(6), 642–643. https://doi.org/10.1017/s0140525x08005578

Apperly

S. A. I. A.

Apperly

I. A.

(2013). How to construct a minimal theory of mind. Mind & Language, 28(5), 606–637. https://doi.org/10.1111/mila.12036

Arnheim

(1954). Art and visual perception. Faber paper covered editions. University of California, https://books.google.co.uk/books?id=Cn4yEyF-Gr4C

Aston

(2019). Metaplasticity and the boundaries of social cognition: Exploring scalar transformations in social interaction and intersubjectivity. Phenomenology and the Cognitive Sciences, 18(1), 65–89. https://doi.org/10.1007/s11097-018-9601-z

Bar-On

(2004). Speaking my mind: Expression and self-knowledge. Oxford University Press.

10.

Bar-On

(2021). How to do things with nonwords: Pragmatics, biosemantics, and origins of language in animal communication. Biology & Philosophy, 36(6), 50. https://doi.org/10.1007/s10539-021-09824-z

11.

Bar-On

Moore

(2017). Pragmatic interpretation and signaler-receiver asymmetries in animal communication.

12.

Barona

A. M.

(2021). The archaeology of the social brain revisited: Rethinking mind and material culture from a material engagement perspective. Adaptive Behavior, 29(2), 137–152. https://doi.org/10.1177/1059712320941945

13.

Barrett

Henzi

Rendall

(2007). Social brains, simple minds: Does social complexity really require cognitive complexity? Philosophical Transactions of the Royal Society B: Biological Sciences, 362(1480), 561–575. https://doi.org/10.1098/rstb.2006.1995

14.

Bednarik

(2003). A figurine from the African Acheulian. Current Anthropology, 44(3), 405–413. https://doi.org/10.1086/374900

15.

Bednarik

R. G.

(1998). The archaeology of rock-art. Cambridge University Press.

16.

Bengio

Courville

Vincent

(2013). Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8), 1798–1828. https://doi.org/10.1109/TPAMI.2013.50

17.

Bennett

A. T.

(1996). Do animals have cognitive maps? The journal of experimental biology, 199(Pt 1), 219–224. https://doi.org/10.1242/jeb.199.1.219

18.

Berger

(2009). About looking. Bloomsbury. https://books.google.co.uk/books?id=xiAbCnC4KtgC

19.

Bickerton

(1990). Language and species. University of Chicago Press.

20.

Bijvoet-van den Berg

Hoicka

(2019). Preschoolers understand and generate pretend actions using object substitution. Journal of Experimental Child Psychology, 177, 313–334. https://doi.org/10.1016/j.jecp.2018.08.008

21.

Bloch

(2014). Different types of creativity on the two sides of shutters. Pragmatics & Cognition, 22(1), 109–123. https://doi.org/10.1075/pc.22.1.06blo

22.

Bloch

(2016). Imagination from the outside and from the inside. Current Anthropology, 57(S13), S80–S87. https://doi.org/10.1086/685496

23.

Block

(2007). Consciousness, accessibility, and the mesh between psychology and neuroscience. Behavioral and brain sciences, 30(5-6), 481–548. https://doi.org/10.1017/S0140525X07002786

24.

Bloom

(2008). Shakespeare: The invention of the human. Fourth Estate. https://books.google.co.uk/books?id=efpDulDho6EC

25.

Boesch

Boesch-Achermann

(2000). The chimpanzees of the Taï forest: Behavioural ecology and evolution. Oxford University Press.

26.

Brentano

(1995). Psychology from an empirical Standpoint (2nd edition): Routledge.

27.

Brohan

Brown

Carbajal

Chebotar

Chen

Choromanski

Ding

Driess

Dubey

Finn

, (2023). Rt-2: Vision-language-action models transfer web knowledge to robotic control. arXiv preprint arXiv:2307.15818.

28.

Brooks

R. A.

(1991). Intelligence without representation. Artificial intelligence, 47(1-3), 139–159. https://doi.org/10.1016/0004-3702(91)90053-m

29.

Brooks

S. A. S.

Brooks

A. S.

(2000). The revolution that wasn’t: A new interpretation of the origin of modern human behavior. Journal of Human Evolution, 39(5), 453–563. https://doi.org/10.1006/jhev.2000.0435

30.

Brown

(1983). Nature observation and tracking. Penguin Group (USA) Incorporated. https://books.google.co.uk/books?id=4nIXAAAACAAJ

31.

Brown

Jr. (1986). The tracker: The true story of Tom Brown Jr. Berkley.

32.

Brown

T. B.

Mann

Ryder

Subbiah

Kaplan

Dhariwal

Neelakantan

Shyam

Sastry

Askell

Agarwal

Herbert-Voss

Krueger

Henighan

Child

Ramesh

Ziegler

D. M.

Winter

Amodei

(2020). Language models are few-shot learners. https://arxiv.org/abs/2005.14165

33.

Bruner

Battaglia-Mayer

Caminiti

(2023). The parietal lobe evolution and the emergence of material culture in the human genus. Brain Structure and Function, 228(1), 145–167. https://doi.org/10.1007/s00429-022-02487-w

34.

Bruner

(1974). Toward a theory of instruction. Harvard University Press.

35.

Bruner

(1991). The narrative construction of reality. Critical Inquiry, 18(1), 1–21. https://doi.org/10.1086/448619

36.

Butterfill

T. T. S.

Butterfill

(2015). From foraging to autonoetic consciousness: The primal self as a consequence of embodied prospective foraging. Current Zoology, 61(2), 368–381. https://doi.org/10.1093/czoolo/61.2.368

37.

Byrne

C. H. R.

Byrne

(2007). What wild primates know about resources: Opening up the black box. Animal Cognition, 10(3), 357–367. https://doi.org/10.1007/s10071-007-0080-9

38.

Byrne

C. R. W.

Byrne

R. W.

(2010). Able-bodied wild chimpanzees imitate a motor procedure used by a disabled individual to overcome handicap. PLoS One, 5(8), e11959. https://doi.org/10.1371/journal.pone.0011959

39.

Call

N. J. J.

Call

(2006). Apes save tools for future use. Science, 312(5776), 1038–1040. https://doi.org/10.1126/science.1125456

40.

Canterbury

(2015). Advanced Bushcraft: An Expert field Guide to the art of Wilderness Survival. Bushcraft Series. Adams Media. https://books.google.co.uk/books?id=p8qpDgAAQBAJ

41.

Carroll

(1999). Philosophy of art: A Contemporary introduction. In Routledge contemporary introductions to philosophy. Routledge. https://books.google.co.uk/books?id=ck51_sMuIJUC

42.

Carruthers

(2002). The cognitive functions of language. Behavioral and Brain Sciences, 25(6), 657–725. https://doi.org/10.1017/s0140525x02000122

43.

Caruana

G. A. F.

Caruana

(2014). The neural basis of human tool use. Frontiers in psychology, 5, 310. https://doi.org/10.3389/fpsyg.2014.00310

44.

Cassirer

Manheim

Hendel

(1955). The philosophy of symbolic forms: Volume 1: Language. Yale University Press. https://books.google.co.uk/books?id=T2FNL3NwXc8C

45.

Chalmers

A. D.

Chalmers

D. J.

(1998). The extended mind. Analysis, 58(1), 7–19. https://doi.org/10.1111/1467-8284.00096

46.

Chandler

(2017). Semiotics: The basics. Routledge.

47.

Chang

J. D.

Brantley

Ramamurthy

Misra

Sun

(2023). Learning to generate better than your llm. arXiv preprint arXiv:2306.11816.

48.

Changeux

S. J. P.

Changeux

J.-P.

(2011). Experimental and theoretical approaches to conscious processing. Neuron, 70(2), 200–227. https://doi.org/10.1016/j.neuron.2011.03.018

49.

Charnov

E. L.

Orians

G. H.

Hyatt

(1976). Ecological implications of resource depression. The American Naturalist, 110(972), 247–259. https://doi.org/10.1086/283062

50.

Chater

(2018). The mind is Flat: The illusion of mental Depth and the Improvised mind. Penguin Books Limited. https://books.google.co.uk/books?id=DNY3DwAAQBAJ

51.

Cheour

Ceponiene

Lehtokoski

Luuk

Allik

Alho

Näätänen

(1998). Development of language-specific phoneme representations in the infant brain. Nature neuroscience, 1(5), 351–353. https://doi.org/10.1038/1561

52.

Chomsky

(2014). Aspects of the theory of syntax. MIT press. Number 11.

53.

Christiansen

J. B. M. H.

Christiansen

M. H.

(2012). Statistical learning and language: An individual differences study. Language Learning, 62(1), 302–331. https://doi.org/10.1111/j.1467-9922.2010.00626.x

54.

Christiansen

M. H.

Chater

(2016). Creating language: Integrating evolution, acquisition, and processing. Mit Press.

55.

Christie

Gentner

Vosniadou

Kayser

(2007). Relational similarity in identity relation: The role of language. In Proceedings of the second European cognitive science conference (pp. 601–666). London, England, 2007, Taylor & Francis.

56.

Clark

(2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and brain sciences, 36(3), 181–204. https://doi.org/10.1017/S0140525X12000477

57.

Clottes

Martin

(2016). What is paleolithic art? Cave paintings and the dawn of human creativity. University of Chicago Press. https://books.google.co.uk/books?id=eoebCwAAQBAJ

58.

Colhoun

D. J.

Colhoun

(2010). Analogical processes in human thinking and learning. In Towards a theory of thinking: Building blocks for a conceptual framework (pp. 35–48). https://doi.org/10.1007/978-3-642-03129-8_3

59.

Collingwood

(2016). The principles of art. Ravenio Books. https://books.google.co.uk/books?id=Ju26CwAAQBAJ

60.

Colom

Quiroga

M. Á.

Solana

A. B.

Burgaleta

Román

F. J.

Privado

Escorial

Martínez

Álvarez-Linera

Alfayate

García

Lepage

Hernández-Tamames

J. A.

Karama

(2012). Structural changes after videogame practice related to a brain network associated with intelligence. Intelligence, 40(5), 479–489. https://doi.org/10.1016/j.intell.2012.05.004

61.

Corballis

T. M. C.

Corballis

M. C.

(2007). The evolution of foresight: What is mental time travel, and is it unique to humans? Behavioral and brain sciences, 30(3), 299–351. https://doi.org/10.1017/S0140525X07001975

62.

Cosmides

Tooby

Barkow

J. H.

(1992). The adapted mind: Evolutionary psychology and the generation of culture: Oxford University Press.

63.

Crane

A. A.

(2000). Learning how to learn words. In Becoming a Word Learner: A debate on lexical acquisition (51).

64.

Creswell

Shanahan

Higgins

(2022). Selection-inference: Exploiting large language models for interpretable logical reasoning. arXiv preprint arXiv:2205.09712.

65.

Dal Pesco

Fischer

(2018). Greetings in male Guinea baboons and the function of rituals in complex social groups. Journal of Human Evolution, 125, 87–98. https://doi.org/10.1016/j.jhevol.2018.10.007

66.

D’Andrade

R. G.

(1989). Culturally based reasoning. In Cognition and social worlds (pp. 132–143).

67.

Dasgupta

Lampinen

A. K.

Chan

S. C.

Creswell

Kumaran

McClelland

J. L.

Hill

(2022). Language models show human-like content effects on reasoning. arXiv preprint arXiv:2207.07051.

68.

David

(2017). Cave art. Thames & Hudson. https://books.google.co.uk/books?id=gGwfDgAAQBAJ

69.

d’Avila Garcez

Lamb

L. C.

(2020). Neurosymbolic ai: The 3rd wave. arXiv e-prints.arXiv–2012.

70.

Davis

(2022). Art in the after-culture: Capitalist crisis and cultural strategy. Haymarket Books. https://books.google.co.uk/books?id=R-JaEAAAQBAJ

71.

Daw

N. D.

Niv

Dayan

(2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature neuroscience, 8(12), 1704–1711. https://doi.org/10.1038/nn1560

72.

Deacon

T. W.

(1997). The symbolic species: The Co-evolution of language and the brain. WW Norton & Company.

73.

Dean

L. G.

Vale

G. L.

Laland

K. N.

Flynn

Kendal

R. L.

(2014). Human cumulative culture: A comparative perspective. Biological reviews, 89(2), 284–301. https://doi.org/10.1111/brv.12053

74.

deMenocal

P. B.

(1995). Plio-pleistocene african climate. Science, 270(5233), 53–59. https://doi.org/10.1126/science.270.5233.53

75.

Demmrich

(2023). Ritual: How seemingly senseless acts make life worth living, written by dimitris xygalatas. Numen, 70(5-6), 630–632. https://doi.org/10.1163/15685276-20231709

76.

Dennett

D. C.

(2008). Kinds of minds: Toward an understanding of consciousness. Basic Books.

77.

d’Errico

Nowell

(2000). A new look at the berekhat ram figurine: Implications for the origins of symbolism. Cambridge Archaeological Journal, 10(1), 123–167.

78.

de Villiers

J. G.

de Villiers

P. A.

(2000). Linguistic determinism and the understanding of false beliefs. In Children’s reasoning and the mind (pp. 191–228). Psychology Press.

79.

Dick

P. A. S.

Dick

A. S.

(2016). Broca and wernicke are dead, or moving past the classic model of language neurobiology. Brain and language, 162, 60–71. https://doi.org/10.1016/j.bandl.2016.08.004

80.

Dickinson

N. S. A.

Dickinson

(1998). Episodic-like memory during cache recovery by scrub jays. Nature, 395(6699), 272–274. https://doi.org/10.1038/26216

81.

Dissanayake

(1992). Homo aestheticus: Where art comes from and why. Free Press. https://books.google.co.uk/books?id=JVTXAAAAMAAJ

82.

Dissanayake

(1998). Komar and melamid discover pleistocene taste. Philosophy and Literature, 22(2), 486–496. https://doi.org/10.1353/phl.1998.0039

83.

Domínguez-Rodrigo

(2014). Is the “savanna hypothesis” a dead concept for explaining the emergence of the earliest hominins? Current Anthropology, 55(1), 59–81. https://doi.org/10.1086/674530

84.

Donald

(1991). Origins of the modern mind: Three stages in the evolution of culture and cognition. Harvard University Press. https://books.google.co.uk/books?id=4Sk4vWkrUAgC

85.

Dunbar

R. I.

(1998). The social brain hypothesis. Evolutionary Anthropology: Issues, News, and Reviews: Issues, News, and Reviews, 6(5), 178–190. https://doi.org/10.1002/(sici)1520-6505(1998)6:5<178::aid-evan5>3.3.co;2-p

86.

Dutton

(2009). The art instinct: Beauty, pleasure, and human evolution. Oxford University Press.

87.

Elkins

(1997). The object stares back: On the nature of seeing. A Harvest book. Harcourt Brace, https://books.google.co.uk/books?id=xhKtH4gGTyYC

88.

Elman

J. L.

(1996). Rethinking innateness: A connectionist perspective on development (volume 10). MIT press.

89.

Eren

S. J. M. I.

Eren

M. I.

(2013). Levallois economics: An examination of ‘waste’production in experimentally produced levallois reduction sequences. Journal of Archaeological Science, 40(5), 2384–2392. https://doi.org/10.1016/j.jas.2013.01.016

90.

Evans

(2022). The apperception engine. In Kim

Schönecker

(Eds.), Kant and artificial intelligence (pp. 39–104). De Gruyter. https://doi.org/10.1515/9783110706611-002

91.

Fan

J. E.

(2015). Drawing to learn: How producing graphical representations enhances scientific thinking. Translational Issues in Psychological Science, 1(2), 170–181. https://doi.org/10.1037/tps0000037

92.

Fan

J. E.

Hawkins

R. D.

Goodman

N. D.

(2020). Pragmatic inference and visual abstraction enable contextual flexibility during visual communication. Computational Brain & Behavior, 3(1), 86–101. https://doi.org/10.1007/s42113-019-00058-7

93.

Fauconnier

(1997). Mappings in thought and language. Cambridge University Press.

94.

Fein

G. G.

(1975). A transformational analysis of pretending. Developmental Psychology, 11(3), 291–296. https://doi.org/10.1037/h0076568

95.

Fernando

(2011). Symbol manipulation and rule learning in spiking neuronal networks. Journal of theoretical biology, 275(1), 29–41. https://doi.org/10.1016/j.jtbi.2011.01.009

96.

Fernando

Banarse

Michalewski

Osindero

Rocktäschel

(2023). Promptbreeder: Self-referential self-improvement via prompt evolution. arXiv preprint arXiv:2309.16797.

97.

Fernando

Zenkova

Nikolov

Osindero

From language games to drawing games. CoRR, abs/2010.02820, 2020. https://arxiv.org/abs/2010.02820

98.

Ferrier

(1998). Outsider art. Terrail. https://books.google.co.uk/books?id=BgHqAAAAMAAJ

99.

Fitch

(2010). The evolution of language. Approaches to the evolution of language. Cambridge University Press. https://doi.org/10.1017/cbo9780511817779; https://books.google.co.uk/books?id=RProTk_Ag7gC

100.

Fodor

J. A.

(1975). The Language of thought. Harvard University Press.

101.

Fonagy

Campbell

(2016). Attachment theory and mentalization. In The Routledge handbook of psychoanalysis in the social sciences and humanities (pp. 115–131). Routledge.

102.

Froese

(2019). Making sense of the chronology of paleolithic cave painting from the perspective of material engagement theory. Phenomenology and the Cognitive Sciences, 18(1), 91–112. https://doi.org/10.1007/s11097-017-9537-8

103.

Fromkin

Krashen

Curtiss

Rigler

(1974). The development of language in genie: A case of language acquisition beyond the “critical period”. Brain and language, 1(1), 81–107. https://doi.org/10.1016/0093-934x(74)90027-3

104.

Fuentes

M. A.

Fuentes

(2017). Semiosis in the pleistocene. Cambridge Archaeological Journal, 27(3), 397–412. https://doi.org/10.1017/s0959774317000014

105.

Gallagher

J. Z. S.

Gallagher

(2014). Word as object: A view of language at hand. Journal of cognition and culture, 14(5), 373–384. https://doi.org/10.1163/15685373-12342132

106.

Gallagher

(2000). Philosophical conceptions of the self: Implications for cognitive science. Trends in Cognitive Sciences, 4(1), 14–21. https://doi.org/10.1016/s1364-6613(99)01417-5

107.

Gallagher

T. S.

Gallagher

(2012). Getting interaction theory (it) together: Integrating developmental, phenomenological, enactive, and dynamical approaches to social interaction. Interaction Studies, 13(3), 436–468. https://doi.org/10.1075/is.13.3.06fro

108.

Garofoli

(2015). A radical embodied approach to lower palaeolithic spear-making. The Journal of Mind and Behavior, 1, 1–25.

109.

Garofoli

M. D.

Garofoli

(2017). The biocultural emergence of mindreading: Integrating cognitive archaeology and human development. Journal of Cultural Cognitive Science, 1(2), 89–117. https://doi.org/10.1007/s41809-017-0008-0

110.

Gell

Thomas

Press

O. U.

(1998). Art and agency: An anthropological theory. Clarendon Press. https://books.google.co.uk/books?id=9YU3AwAAQBAJ

111.

Gentner

(2003). Why we’re so smart. In Gentner

Goldin-Meadow

(Eds.), Language in mind: Advances in the study of language and thought (pp. 195–236). MIT Press. https://doi.org/10.7551/mitpress/4117.003.0015

112.

Gergely

G. G.

Gergely

(2009). Natural pedagogy. Trends in Cognitive Sciences, 13(4), 148–153. https://doi.org/10.1016/j.tics.2009.01.005

113.

Gibson

J. J.

(2014). The ecological approach to visual perception (classic edition). Psychology press.

114.

Glorioso

Kuznar

Pavlic

Povinelli

(2021). Still no solution to non-verbal measures of analogical reasoning: Reply to walker and gopnik (2017). Cognition, 214, 104288. https://doi.org/10.1016/j.cognition.2020.104288

115.

Goertzel

(2023). Generative ai vs. agi: The cognitive strengths and weaknesses of modern llms. arXiv preprint arXiv:2309.10371.

116.

Gombrich

E. H.

(1961). Art and illusion. Pantheon Books.

117.

Goodkind

Bicknell

(2018). Predictive power of word surprisal for reading times is a linear function of language model quality. In Proceedings of the 8th workshop on cognitive modeling and computational linguistics (CMCL 2018) (pp. 10–18).

118.

Gooley

(2010). The natural navigator. Ebury Publishing. https://books.google.co.uk/books?id=28hJOKsUGRgC

119.

Gooley

(2012). The natural explorer: Understanding your landscape. Hodder & Stoughton. https://books.google.co.uk/books?id=llchVaSVNAAC

120.

Gopnik

Glymour

Sobel

D. M.

Schulz

L. E.

Kushnir

Danks

(2004). A theory of causal learning in children: Causal maps and bayes nets. Psychological Review, 111(1), 3–32. https://doi.org/10.1037/0033-295X.111.1.3

121.

Gopnik

C. M. A.

Gopnik

(2017). Discriminating relational and perceptual judgments: Evidence from human toddlers. Cognition, 166, 23–27. https://doi.org/10.1016/j.cognition.2017.05.013

122.

Graves

Wayne

Danihelka

(2014). Neural turing machines. https://arxiv.org/abs/1410.5401

123.

Green

S. J.

Boruff

B. J.

Bonnell

T. R.

Grueter

C. C.

(2020). Chimpanzees use least-cost routes to out-of-sight goals. Current Biology, 30(22), 4528. https://doi.org/10.1016/j.cub.2020.08.076

124.

Grice

(2020). Meaning/bedeutung (Englisch/Deutsch): Great papers philosophie. Reclam Verlag.

125.

Gutkin

M. B.

Gutkin

(2014). Homeostatic reinforcement learning for integrating reward collection and physiological stability. Elife, 3, e04811. https://doi.org/10.7554/eLife.04811

126.

Schmidhuber

(2018). World models. https://doi.org/10.5281/ZENODO.1207631

127.

Hacking

(1991). A tradition of natural kinds. Philosophical Studies: An International Journal for Philosophy in the Analytic Tradition, 61(1/2), 109–126. https://doi.org/10.1007/bf00385836

128.

Hague

(1990). Charlotte brontë and intuitive consciousness. Texas Studies In Literature And Language, 32(4), 584–601.

129.

Hamrick

J. B.

(2019). Analogues of mental simulation and imagination in deep learning. Current Opinion in Behavioral Sciences, 29, 8–16. https://doi.org/10.1016/j.cobeha.2018.12.011

130.

Harcourt-Smith

W. H.

(2010). The first hominins and the origins of bipedalism. Evolution: Education and Outreach, 3(3), 333–340. https://doi.org/10.1007/s12052-010-0257-6

131.

Harvey

(2008). Misrepresentations. 6. https://sussex.figshare.com/articles/presentation/Misrepresentations/23341685

132.

Hawkins

R. D.

Sano

Goodman

N. D.

Fan

J. E.

(2021). Visual resemblance and communicative context constrain the emergence of graphical conventions. CoRR, abs/2109.13861, https://arxiv.org/abs/2109.13861

133.

Hecht

D. E. E.

Hecht

E. E.

(2017). Evolutionary neuroscience of cumulative culture. Proceedings of the National Academy of Sciences, 114(30), 7861–7868. https://doi.org/10.1073/pnas.1620738114

134.

Henshilwood

d'Errico

Vanhaeren

van Niekerk

Jacobs

(2004). Middle stone age shell beads from South Africa. Science, 304(5669), 404. https://doi.org/10.1126/science.1095905

135.

Herculano-Houzel

(2012). The remarkable, yet not extraordinary, human brain as a scaled-up primate brain and its associated cost. Proceedings of the National Academy of Sciences, 109 Suppl 1(Suppl 1), 10661–10668. https://doi.org/10.1073/pnas.1201895109

136.

Heyes

(2018). Cognitive gadgets: The cultural evolution of thinking. Harvard University Press.

137.

Higgins

Matthey

Pal

Burgess

Glorot

Botvinick

Mohamed

Lerchner

beta-vae (2016). Learning basic visual concepts with a constrained variational framework.

138.

Hobson

N. M.

Schroeder

Risen

J. L.

Xygalatas

Inzlicht

(2018). The psychology of rituals: An integrative review and process-based framework. Personality and Social Psychology Review, 22(3), 260–284. https://doi.org/10.1177/1088868317734944

139.

Horschler

D. J.

MacLean

E. L.

Santos

L. R.

(2020). Do non-human primates really represent others’ beliefs? Trends in Cognitive Sciences, 24(8), 594–605. https://doi.org/10.1016/j.tics.2020.05.009

140.

Humphrey

N. K.

(1978). Nature’s psychologists. New Scientist, 1109, 900–904.

141.

Hurford

S. J. R.

Hurford

J. R.

(2002). The emergence of linguistic structure: An overview of the iterated learning model. Simulating the evolution of language, 1, 121–147. https://doi.org/10.1007/978-1-4471-0663-0_6

142.

Hutchins

(2005). Material anchors for conceptual blends. Journal of Pragmatics, 37(10), 1555–1577. https://doi.org/10.1016/j.pragma.2004.06.008

143.

Hutto

D. D.

(2012). Folk psychological narratives: The sociocultural basis of understanding reasons. MIT press.

144.

Iliopoulos

(2016a). The evolution of material signification: Tracing the origins of symbolic body ornamentation through a pragmatic and enactive theory of cognitive semiotics. Signs and Society, 4(2), 244–277. https://doi.org/10.1086/688619

145.

Iliopoulos

(2016b). The material dimensions of signification: Rethinking the nature and emergence of semiosis in the debate on human origins. Quaternary International, 405, 111–124. https://doi.org/10.1016/j.quaint.2015.08.033

146.

Ingold

(2013). Making: Anthropology, archaeology, art and architecture. Routledge. https://books.google.co.uk/books?id=SUmrPwAACAAJ

147.

Ingold

(2016). Lines: A brief history. Routledge.

148.

Ingold

(2020). From the transmission of representations to the education of attention. In The debated mind (pp. 113–153). Routledge. https://doi.org/10.4324/9781003086963-7

149.

Izard

M. V.

Izard

(2009). How humans count: Numerosity and the parietal cortex. The Neuroscientist, 15(3), 261–273. https://doi.org/10.1177/1073858409333073

150.

Jackendoff

(1996). How language helps us think. Pragmatics & Cognition, 4(1), 1–34. https://doi.org/10.1075/pc.4.1.03jac

151.

Jiang

S. S.

Murphy

K. P.

Finn

(2019). Language as an abstraction for hierarchical deep reinforcement learning. Advances in Neural Information Processing Systems, 32.

152.

Kaelbling

L. P.

Littman

M. L.

Moore

A. W.

(1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237–285. https://doi.org/10.1613/jair.301

153.

Kahneman

(2011). Thinking, fast and slow. Macmillan.

154.

Karvonen

M. E.

Karvonen

(2012). Spontaneous innovation for future deception in a male chimpanzee. PLoS One, 7(5), e36782. https://doi.org/10.1371/journal.pone.0036782

155.

N. R.

Chiappa

Wang

Goyal

Bornschein

Rey

Weber

Botvinic

Mozer

Rezende

D. J.

(2022). Learning to induce causal structure. arXiv preprint arXiv:2204.04875.

156.

Kellogg

O’Dell

(1967). The psychology of children’s art. In A Psychology today book. CRM. https://books.google.co.uk/books?id=63tPAAAAMAAJ

157.

Key

E. A.

Key

(2023). The west tofts handaxe: A remarkably average, structurally flawed, utilitarian biface. Journal of Archaeological Science, 160, 105888. https://doi.org/10.1016/j.jas.2023.105888

158.

Kirsh

(1995). The intelligent use of space. Artificial intelligence, 73(1-2), 31–68. https://doi.org/10.1016/0004-3702(94)00017-u

159.

Kirsh

(2010). Thinking with external representations. AI & Society, 25(4), 441–454. https://doi.org/10.1007/s00146-010-0272-8

160.

Kodandaramaiah

(2011). The evolutionary significance of butterfly eyespots. Behavioral Ecology, 22(6), 1264–1271. https://doi.org/10.1093/beheco/arr123

161.

Kosinski

(2023). Evaluating large language models in theory of mind tasks. arXiv e-prints. arXiv–2302.

162.

Krebs

J. R.

Davies

N. B.

(2009). Behavioural ecology: An evolutionary approach. John Wiley & Sons.

163.

Kuhl

P. K.

(2004). Early language acquisition: Cracking the speech code. Nature Reviews Neuroscience, 5(11), 831–843. https://doi.org/10.1038/nrn1533

164.

Kushnir

Gopnik

Lucas

Schulz

(2010). Inferring hidden causal structure. Cognitive science, 34(1), 148–160. https://doi.org/10.1111/j.1551-6709.2009.01072.x

165.

Lakoff

Johnson

(2008). Metaphors we live by. University of Chicago Press. https://books.google.co.uk/books?id=r6nOYYtxzUoC

166.

Larochelle

Erhan

Bengio

(2008). Zero-data learning of new tasks. Proceedings of the national conference on artificial intelligence, 1, 3.

167.

LeDoux

J. E.

(2000). Emotion circuits in the brain. Annual Review of Neuroscience, 23(1), 155–184. https://doi.org/10.1146/annurev.neuro.23.1.155

168.

Lenneberg

E. H.

(1967). Biological foundations of language: Wiley.

169.

Leslie

A. M.

(1987). Pretense and representation: The origins of” theory of mind. Psychological Review, 94(4), 412–426. https://doi.org/10.1037//0033-295x.94.4.412

170.

Lewis

L. S.

Wessling

E. G.

Kano

Stevens

J. M.

Call

Krupenye

(2023). Bonobos and chimpanzees remember familiar conspecifics for decades. Proceedings of the National Academy of Sciences, 120(52), e2304903120. https://doi.org/10.1073/pnas.2304903120

171.

Lewis-Williams

(2002). The mind in the cave: Consciousness and the origins of art. Thames & Hudson. https://books.google.co.uk/books?id=HJZNLQIEpdgC

172.

Liebenberg

(1990). The art of tracking the origin of science.

173.

Liebenberg

Louw

Elbroch

(2010). Practical tracking: A guide to following footprints and finding animals. Stackpole Books. https://books.google.co.uk/books?id=m2ySjeG4HXsC

174.

Lillicrap

T. P.

Santoro

Marris

Akerman

C. J.

Hinton

(2020). Backpropagation and the brain. Nature Reviews Neuroscience, 21(6), 335–346. https://doi.org/10.1038/s41583-020-0277-3

175.

Liu

Chen

Tang

Ong

Y.-S.

(2023). Large language models as evolutionary optimizers. arXiv preprint arXiv:2310.19046.

176.

Lorenz

(1981). The foundations of ethology. Springer-Verlag.

177.

Lycett

M. I. S. J.

Lycett

S. J.

(2012). Why levallois? A morphometric comparison of experimental ‘preferential’levallois flakes versus debitage flakes. PLoS One, 7(1), e29273. https://doi.org/10.1371/journal.pone.0029273

178.

MacDonald

J. S. E.

MacDonald

S. E.

(2002). Natural concepts in a juvenile gorilla (gorilla gorilla gorilla) at three levels of abstraction. Journal of the experimental analysis of behavior, 78(3), 315–332. https://doi.org/10.1901/jeab.2002.78-315

179.

MacDonald

M. H. M. C.

MacDonald

M. C.

(2009). A usage-based approach to recursion in sentence processing. Language Learning, 59(s1), 126–161. https://doi.org/10.1111/j.1467-9922.2009.00538.x

180.

MacKay

D. J.

(1992). Bayesian interpolation. Neural Computation, 4(3), 415–447. https://doi.org/10.1162/neco.1992.4.3.415

181.

Madden

J. R.

(2008). Do bowerbirds exhibit cultures? Animal Cognition, 11, 1–12. https://doi.org/10.1007/s10071-007-0092-5

182.

Maguire

E. A.

Gadian

D. G.

Johnsrude

I. S.

Good

C. D.

Ashburner

Frackowiak

R. S.

Frith

C. D.

(2000). Navigation-related structural change in the hippocampi of taxi drivers. Proceedings of the National Academy of Sciences, 97(8), 4398–4403. https://doi.org/10.1073/pnas.070039597

183.

Malafouris

(2007). Before and beyond representation: Towards an enactive conception of the palaeolithic image. In Renfrew

Morley

(Eds.), Image and imagination: A global history of figurative representation. The McDonald Institute, pp. 287–300.

184.

Malafouris

(2013). How things shape the mind. MIT press.

185.

Malafouris

(2021). Mark making and human becoming. Journal of Archaeological Method and Theory, 28(1), 95–119. https://doi.org/10.1007/s10816-020-09504-4

186.

Marcus

G. F.

(2003). The algebraic mind: Integrating connectionism and cognitive science. MIT press.

187.

Marean

C. S. C. W.

Marean

C. W.

(2003). The origin of modern human behavior: Critique of the models and their test implications. Current Anthropology, 44(5), 627–651. https://doi.org/10.1086/377665

188.

Markant

T. M. D. B.

Markant

D. B.

(2012). Self-directed learning: A cognitive and computational perspective. Perspectives on Psychological Science, 7(5), 464–481. https://doi.org/10.1177/1745691612454304

189.

Marr

(2010). Vision: A computational investigation into the human representation and processing of visual information. MIT press.

190.

Matsuzawa

(2020). Pretense in chimpanzees.

191.

McPherron

S. P.

Alemseged

Marean

C. W.

Wynn

J. G.

Reed

Geraads

Bobe

Béarat

H. A.

(2010). Evidence for stone-tool-assisted consumption of animal tissues before 3.39 million years ago at dikika, Ethiopia. Nature, 466(7308), 857–860. https://doi.org/10.1038/nature09248

192.

Meggitt

M. J.

(1966). Gadjari among the walbiri aborigines of central Australia. Oceania, 36(3), 173–213. https://doi.org/10.1002/j.1834-4461.1966.tb00286.x

193.

Mendoza Straffon

(2014). Art in the making: The evolutionary origins of visual art as a communication signal. PhD thesis. Leiden University.

194.

Merleau-Ponty

(2013). Phenomenology of perception. Routledge.

195.

Miller

(2001). The mating mind: How sexual choice shaped the evolution of human nature. Anchor Books. https://books.google.co.uk/books?id=pJfoKtG1n2wC

196.

Millikan

R. G.

(2017). Beyond concepts: Unicepts, language, and natural information. Oxford University Press.

197.

Miranda

Malalasekera

W. N.

Behrens

T. E.

Dayan

Kennerley

S. W.

(2020). Combined model-free and model-sensitive reinforcement learning in non-human primates. PLoS Computational Biology, 16(6), e1007944. https://doi.org/10.1371/journal.pcbi.1007944

198.

Mitchell

(2002). Pretending and imagination in animals and children. Cambridge University Press. https://books.google.co.uk/books?id=-K3t41WMsMcC

199.

Mitchell

R. W.

(1999). Deception and concealment as strategic script violation in great apes and humans. In The mentalities of gorillas and orangutans. Cambridge University Press. 295–315, https://doi.org/10.1017/cbo9780511542305.016

200.

Mithen

(1998). The prehistory of the mind: A search for the origins of art, religion and science. A Phoenix paperback. Phoenix. https://books.google.co.uk/books?id=bVNTQgAACAAJ

201.

Moghaddam

S. R.

Honey

C. J.

(2023). Boosting theory-of-mind performance in large language models via prompting. arXiv preprint arXiv:2304.11490.

202.

Moore

(2017). Gricean communication and cognitive development. The Philosophical Quarterly, 67(267), 303–326.

203.

Moore

(2021). The cultural evolution of mind-modelling. Synthese, 199(1-2), 1751–1776. https://doi.org/10.1007/s11229-020-02853-3

204.

Morriss-Kay

G. M.

(2010). The evolution of human artistic creativity. Journal of Anatomy, 216(2), 158–176. https://doi.org/10.1111/j.1469-7580.2009.01160.x

205.

Mundy

(2018). A review of joint attention and social-cognitive brain systems in typical development and autism spectrum disorder. European Journal of Neuroscience, 47(6), 497–514. https://doi.org/10.1111/ejn.13720

206.

Nagel

(1980). What is it like to be a bat? In The Language and thought series (pp. 159–168). Harvard University Press.

207.

Nelson

(1993). The psychological and social origins of autobiographical memory. Psychological Science, 4(1), 7–14. https://doi.org/10.1111/j.1467-9280.1993.tb00548.x

208.

Niemitz

(2010). The evolution of the upright posture and gait—a review and a new synthesis. Naturwissenschaften, 97(3), 241–263. https://doi.org/10.1007/s00114-009-0637-3

209.

Niv

(2009). Reinforcement learning in the brain. Journal of Mathematical Psychology, 53(3), 139–154. https://doi.org/10.1016/j.jmp.2008.12.005

210.

Noë

(2015). Strange tools: Art and human nature. Hill and Wang.

211.

Noë

(2023). The entanglement: How art and philosophy make us what we are. Princeton University Press. https://books.google.co.uk/books?id=fJCkEAAAQBAJ

212.

Oakley

K. P.

(1981). Emergence of higher thought 3.0-0.2 ma bp. Philosophical Transactions of the Royal Society of London B Biological Sciences, 292(1057), 205–211.

213.

Oden

R. K. D. L.

Oden

D. L.

(2000). Categorical perception and conceptual judgments by nonhuman primates: The paleological monkey and the analogical ape. Cognitive Science, 24(3), 363–396. https://doi.org/10.1207/s15516709cog2403_2

214.

Öllinger

Jones

Knoblich

(2014). The dynamics of search, impasse, and representational change provide a coherent explanation of difficulty in the nine-dot problem. Psychological Research, 78(2), 266–275. https://doi.org/10.1007/s00426-013-0494-8

215.

O'Neill

C. M. D. K.

O’Neill

D. K.

(2001). Episodic future thinking. Trends in Cognitive Sciences, 5(12), 533–539. https://doi.org/10.1016/s1364-6613(00)01804-0

216.

OpenAI . (2023). Gpt-4 technical report.

217.

Orban

G. A.

Claeys

Nelissen

Smans

Sunaert

Todd

J. T.

Wardak

Durand

J.-B.

Vanduffel

(2006). Mapping the parietal cortex of human and non-human primates. Neuropsychologia, 44(13), 2647–2667. https://doi.org/10.1016/j.neuropsychologia.2005.11.001

218.

O’regan

J. K.

Noë

(2001). A sensorimotor account of vision and visual consciousness. Behavioral and brain sciences, 24(5), 939–973.

219.

Ouyang

Jiang

Almeida

Wainwright

C. L.

Mishkin

Zhang

Agarwal

Slama

Ray

Schulman

Hilton

Kelton

Miller

Simens

Askell

Welinder

Christiano

Leike

Lowe

(2022). Training language models to follow instructions with human feedback.

220.

Overmann

F. L. K. A.

Overmann

K. A.

(2012). Numerosity, abstraction, and the emergence of symbolic thinking. Current Anthropology, 53(2), 204–225. https://doi.org/10.1086/664818

221.

Overmann

K. A.

(2018). Constructing a concept of number. Journal of Numerical Cognition, 4(2), 464–493. https://doi.org/10.5964/jnc.v4i2.161

222.

Overmann

K. A.

(2021). The material difference in human cognition. Adaptive Behavior, 29(2), 123–135. https://doi.org/10.1177/1059712320930738

223.

Overmann

K. A.

(2023). The materiality of numbers: Emergence and elaboration from prehistory to present. Cambridge University Press.

224.

Oxford

M. J.

(1962). The biology of art: A study of the picture-making behaviour of the great apes and its relationship to human art. By desmond morris. Animal Behaviour, 10, 1–2. london: Methuen.

225.

Panksepp

(2004). Affective neuroscience: The foundations of human and animal emotions. Oxford University Press.

226.

Pearson

E. O.

Pearson

(2013). Neurocranial evolution in modern humans: The case of jebel irhoud 1. Anthropological Science, 121(1), 31–41. https://doi.org/10.1537/ase.120927

227.

Penn

D. C.

Holyoak

K. J.

Povinelli

D. J.

(2008). Darwin’s mistake: Explaining the discontinuity between human and nonhuman minds. Behavioral and brain sciences, 31(2), 109–178. https://doi.org/10.1017/S0140525X08003543

228.

Pereira-Pedro

A. S.

Bruner

Gunz

Neubauer

(2020). A morphometric comparison of the parietal lobe in modern humans and neanderthals. Journal of Human Evolution, 142, 102770. https://doi.org/10.1016/j.jhevol.2020.102770

229.

Pfeiffer

J. E.

(1982). The creative explosion: An inquiry into the origins of art and religion. (No Title) .

230.

Pinker

(1994). The Language instinct. Harper Perennial Modern Classics.

231.

Poldrack

R. A.

(2006). Can cognitive processes be inferred from neuroimaging data? Trends in Cognitive Sciences, 10(2), 59–63. https://doi.org/10.1016/j.tics.2005.12.004

232.

Popper

(2005). The logic of scientific discovery. Routledge Classics. Taylor & Francis. https://books.google.co.uk/books?id=LWSBAgAAQBAJ

233.

Potì

Hayashi

Matsuzawa

(2009). Spatial construction skills of chimpanzees (pan troglodytes) and young human children (homo sapiens sapiens). Developmental Science, 12(4), 536–548. https://doi.org/10.1111/j.1467-7687.2008.00797.x

234.

Potts

(1998a). Environmental hypotheses of hominin evolution. American Journal of Physical Anthropology: The Official Publication of the American Association of Physical Anthropologists. Suppl 27(S27), 93–136. https://doi.org/10.1002/(sici)1096-8644(1998)107:27+<93::aid-ajpa5>3.0.co;2-x

235.

Potts

(1998b). Variability selection in hominid evolution. Evolutionary Anthropology: Issues, News, and Reviews: Issues, News, and Reviews, 7(3), 81–96. https://doi.org/10.1002/(sici)1520-6505(1998)7:3<81::aid-evan3>3.0.co;2-a

236.

Preuss

T. M.

(2017). The human brain: Evolution and distinctive features. In On human nature (pp. 125–149). Elsevier. https://doi.org/10.1016/b978-0-12-420190-3.00008-9

237.

Preuss

T. M.

Cáceres

Oldham

M. C.

Geschwind

D. H.

(2004). Human brain evolution: Insights from microarrays. Nature Reviews Genetics, 5(11), 850–860. https://doi.org/10.1038/nrg1469

238.

Prum

R. O.

(2012). Aesthetic evolution by mate choice: Darwin’s really dangerous idea. Philosophical Transactions of the Royal Society B: Biological Sciences, 367(1600), 2253–2265. https://doi.org/10.1098/rstb.2011.0285

239.

Pylyshyn

J. A. Z. W.

Pylyshyn

Z. W.

(1988). Connectionism and cognitive architecture: A critical analysis. Cognition, 28(1-2), 3–71. https://doi.org/10.1016/0010-0277(88)90031-5

240.

Radford

Kim

J. W.

Hallacy

Ramesh

Goh

Agarwal

Sastry

Askell

Mishkin

Clark

, et al. (2021). Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748–8763). PMLR.

241.

Rastall

(2021) Synaesthesia, onomatopoeia and the origin of language (24). Linguistica Online.

242.

Renfrew

Zubrow

(1994). Towards a cognitive archaeology.

243.

Revonsuo

(1999). Binding and the phenomenal unity of consciousness. Consciousness and Cognition, 8(2), 173–185. https://doi.org/10.1006/ccog.1999.0384

244.

Rideout

C. R.

Rideout

(1998). Memory for medical emergencies experienced by 1-and 2-year-olds. Developmental Psychology, 34(5), 1059–1072. https://doi.org/10.1037//0012-1649.34.5.1059

245.

Rilling

J. K.

(2014). Comparative primate neuroimaging: Insights into human brain evolution. Trends in Cognitive Sciences, 18(1), 46–55. https://doi.org/10.1016/j.tics.2013.09.013

246.

Rita

Tallec

Michel

Grill

J.-B.

Pietquin

Dupoux

Strub

(2022). Emergent communication: Generalization and overfitting in lewis games. Advances in Neural Information Processing Systems, 35, 1389–1404.

247.

Roberts

(2016). ‘we have never been behaviourally modern’: The implications of material engagement theory and metaplasticity for understanding the late pleistocene record of human behaviour. Quaternary International, 405, 8–20. https://doi.org/10.1016/j.quaint.2015.03.011

248.

Roberts

(2022). Metaplasticity, not “modernity”: Uniting the search for human minds. Materials, and Environments.

249.

Rosati

A. G.

Stevens

J. R.

Hare

Hauser

M. D.

(2007). The evolutionary origins of human patience: Temporal preferences in chimpanzees, bonobos, and human adults. Current Biology, 17(19), 1663–1668. https://doi.org/10.1016/j.cub.2007.08.033

250.

Russek

E. M.

Momennejad

Botvinick

M. M.

Gershman

S. J.

Daw

N. D.

(2017). Predictive representations can link model-based reinforcement learning to model-free mechanisms. PLoS Computational Biology, 13(9), e1005768. https://doi.org/10.1371/journal.pcbi.1005768

251.

Sahoo

Singh

A. K.

Saha

Jain

Mondal

Chadha

(2024). A systematic survey of prompt engineering in large language models: Techniques and applications. arXiv preprint arXiv:2402.07927.

252.

Savage-Rumbaugh

E. S.

Lewin

(1994). Kanzi: The ape at the brink of the human mind. Wiley.

253.

Schleidt

W. M.

(1974). How “fixed” is the fixed action pattern? Zeitschrift für Tierpsychologie, 36(1-5), 184–211. https://doi.org/10.1111/j.1439-0310.1974.tb02131.x

254.

Schmandt-Besserat

(1992). Before writing: From counting to cuneiform. University of Texas Press.

255.

Schmidt

Gull

Herrmann

K.-H.

Boehme

Irintchev

Urbach

Reichenbach

J. R.

Klingner

C. M.

Gaser

Witte

O. W.

(2021). Experience-dependent structural plasticity in the adult brain: How the learning brain grows. NeuroImage, 225, 117502. https://doi.org/10.1016/j.neuroimage.2020.117502

256.

Schoenbaum

A. M. G.

Schoenbaum

(2016). Over the river, through the woods: Cognitive maps in the hippocampus and orbitofrontal cortex. Nature Reviews Neuroscience, 17(8), 513–523. https://doi.org/10.1038/nrn.2016.56

257.

Schrimpf

Blank

I. A.

Tuckute

Kauf

Hosseini

E. A.

Kanwisher

Tenenbaum

J. B.

Fedorenko

(2021). The neural architecture of language: Integrative modeling converges on predictive processing. Proceedings of the National Academy of Sciences, 118(45), e2105646118. https://doi.org/10.1073/pnas.2105646118

258.

Scott-Phillips

(2015). Speaking our minds: Why human communication is different, and how language evolved to make it special. Macmillan Education UK. https://books.google.co.uk/books?id=d09GEAAAQBAJ

259.

Seghers

(2014). Cross-species comparison in the evolutionary study of art: A cognitive approach to the ape art debate. Review of General Psychology, 18(4), 263–272. https://doi.org/10.1037/gpr0000015

260.

Seiver

Gopnik

Goodman

N. D.

(2013). Did she jump because she was the big sister or because the trampoline was safe? Causal inference and the development of social attribution. Child Development, 84(2), 443–454. https://doi.org/10.1111/j.1467-8624.2012.01865.x

261.

Sejnowski

S. R. T. J.

Sejnowski

T. J.

(1997). The neural basis of cognitive development: A constructivist manifesto. Behavioral and brain sciences, 20(4), 537–596. https://doi.org/10.1017/s0140525x97001581

262.

Semon

(1921). The mneme. George Allen & Unwin Ltd.

263.

Shanahan

McDonell

Reynolds

(2023). Role play with large language models. Nature, 623(7987), 493–498. https://doi.org/10.1038/s41586-023-06647-8

264.

Shanahan

Mitchell

(2022). Abstraction for deep reinforcement learning. arXiv preprint arXiv:2202.05839. https://doi.org/10.24963/ijcai.2022/780

265.

Shaw-Williams

(2017). The social trackways theory of the evolution of language. Biological Theory, 12(4), 195–210. https://doi.org/10.1007/s13752-017-0278-2

266.

Sherwood

A. C. C.

Sherwood

C. C.

(2017). Human brain evolution. Current opinion in behavioral sciences, 16, 41–45. https://doi.org/10.1016/j.cobeha.2017.02.003

267.

Sherwood

C. C.

Subiaul

Zawidzki

T. W.

(2008). A natural history of the human mind: Tracing evolutionary changes in brain and cognition. Journal of Anatomy, 212(4), 426–454. https://doi.org/10.1111/j.1469-7580.2008.00868.x

268.

Silver

Hubert

Schrittwieser

Antonoglou

Lai

Guez

Lanctot

Sifre

Kumaran

Graepel

, (2017). Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815.

269.

Simon

A. H. A.

Simon

H. A.

(1976). Computer science as empirical inquiry: Symbols and search. Communications of the ACM, 19(3), 113–126. https://doi.org/10.1145/360018.360022

270.

Simon

J. H. H.

Simon

H. A.

(1987). Why a diagram is (sometimes) worth ten thousand words. Cognitive science, 11(1), 65–100. https://doi.org/10.1016/s0364-0213(87)80026-5

271.

Smith

J. M.

Szathmary

(1997). The major transitions in evolution. OUP.

272.

(1964). Cognitive development in children: Piaget development and learning. Journal of Research in Science Teaching, 2, 176–186.

273.

Sobel

D. M.

Tenenbaum

J. B.

Gopnik

(2004). Children’s causal inferences from indirect evidence: Backwards blocking and bayesian reasoning in preschoolers. Cognitive science, 28(3), 303–333. https://doi.org/10.1207/s15516709cog2803_1

274.

Spies

A. F.

Russo

Shanahan

(2022). Sparse relational reasoning with object-centric representations. arXiv preprint arXiv:2207.07512.

275.

Spiridonov

Loginov

Ivanchei

Kurgansky

A. V.

(2019). The role of motor activity in insight problem solving (the case of the nine-dot problem). Frontiers in psychology, 10(2), 2. https://doi.org/10.3389/fpsyg.2019.00002

276.

Steels

(1999). The talking heads experiment.

277.

Szathmary

S. E.

Szathmáry

(2006). Selective scenarios for the emergence of natural language. Trends in Ecology & Evolution, 21(10), 555–561. https://doi.org/10.1016/j.tree.2006.06.021

278.

Szilágyi

Kovács

V. P.

Czárán

Szathmáry

(2023). Evolutionary ecology of language origins through confrontational scavenging. Philosophical Transactions of the Royal Society B, 378(1872), 20210411. https://doi.org/10.1098/rstb.2021.0411

279.

Tang

K.-S.

(2020). The use of epistemic tools to facilitate epistemic cognition & metacognition in developing scientific explanation. Cognition and Instruction, 38(4), 474–502. https://doi.org/10.1080/07370008.2020.1745803

280.

Taoka

A. M.

Taoka

(2012). Triadic (ecological, neural, cognitive) niche construction: A scenario of human brain evolution extrapolating tool use and language from the control of reaching actions. Philosophical Transactions of the Royal Society B: Biological Sciences, 367(1585), 10–23. https://doi.org/10.1098/rstb.2011.0190

281.

Team

Anil

Borgeaud

Alayrac

J.-B.

Soricut

Schalkwyk

Dai

A. M.

Hauth

, (2023). Gemini: A family of highly capable multimodal models. arXiv preprint arXiv:2312.11805.

282.

Tenenbaum

J. B.

Griffiths

T. L.

Kemp

(2006). Theory-based bayesian models of inductive learning and reasoning. Trends in Cognitive Sciences, 10(7), 309–318. https://doi.org/10.1016/j.tics.2006.05.009

283.

Tenenbaum

J. B.

Kemp

Griffiths

T. L.

Goodman

N. D.

(2011). How to grow a mind: Statistics, structure, and abstraction. Science, 331(6022), 1279–1285. https://doi.org/10.1126/science.1192788

284.

Tennie

Premo

L. S.

Braun

D. R.

McPherron

S. P.

(2017). Resetting the null hypothesis: Early stone tools and cultural transmission. Current Anthropology, 58(5), 652–672. https://doi.org/10.1086/693846

285.

Tomasello

J. M.

Tomasello

(2008). Does the chimpanzee have a theory of mind? 30 years later. Trends in Cognitive Sciences, 12(5), 187–192. https://doi.org/10.1016/j.tics.2008.02.010

286.

Tomasello

(1995). Language is not an instinct.

287.

Tomasello

(2008). Origins of human communication. MIT Press.

288.

Tomasello

(2009). Universal grammar is dead. Behavioral and Brain Sciences, 32(5), 470–471. https://doi.org/10.1017/s0140525x09990744

289.

Tomasello

Kruger

A. C.

Ratner

H. H.

(1993). Cultural learning. Behavioral and Brain Sciences, 16(3), 495–511. https://doi.org/10.1017/s0140525x0003123x

290.

Trott

Jones

Chang

Michaelov

Bergen

(2023). Do large language models know what humans know? Cognitive Science, 47(7), e13309. https://doi.org/10.1111/cogs.13309

291.

Tversky

(2005). Visuospatial reasoning. In Holyoak

K. J.

Morrison

R. G.

(Eds.), The cambridge handbook of thinking and reasoning (pp. 209–240). Cambridge University Press.

292.

Tversky

(2014). Visualizing thought. In Handbook of human centric visualization (pp. 3–40). Springer. https://doi.org/10.1007/978-1-4614-7485-2_1

293.

van de Waal

Bshary

Whiten

(2014). Wild vervet monkey infants acquire the food-processing variants of their mothers. Animal Behaviour, 90, 41–45. https://doi.org/10.1016/j.anbehav.2014.01.015

294.

van Dijk

Rietveld

(2017). Foregrounding sociomaterial practice in our understanding of affordances: The skilled intentionality framework. Frontiers in Psychology, 7, 1969. https://doi.org/10.3389/fpsyg.2016.01969

295.

Vanhaeren

d'Errico

van Niekerk

K. L.

Henshilwood

C. S.

Erasmus

R. M.

(2013). Thinking strings: Additional evidence for personal ornament use in the middle stone age at blombos cave, South Africa. Journal of Human Evolution, 64(6), 500–517. https://doi.org/10.1016/j.jhevol.2013.02.001

296.

van Leeuwen

E. J.

Cronin

K. A.

Haun

D. B.

(2014). A group-specific arbitrary tradition in chimpanzees (pan troglodytes). Animal Cognition, 17(6), 1421–1425. https://doi.org/10.1007/s10071-014-0766-8

297.

van Leeuwen

E. J.

Cronin

K. A.

Haun

D. B.

Mundry

Bodamer

M. D.

(2012). Neighbouring chimpanzee communities show different preferences in social grooming behaviour. Proceedings of the Royal Society B: Biological Sciences, 279(1746), 4362–4367. https://doi.org/10.1098/rspb.2012.1543

298.

van Mazijk

(2022a). How to dig up minds: The intentional analysis program in cognitive archaeology. European Journal of Philosophy. 32, 130–144. https://doi.org/10.1111/ejop.12831

299.

van Mazijk

(2022b). Symbolism in the middle palaeolithic: A phenomenological account of practice-embedded symbolic behavior.

300.

van Wermeskerken

R. M.

van Wermeskerken

(2010). The role of affordances in the evolutionary process reconsidered: A niche construction perspective. Theory & Psychology, 20(4), 489–510. https://doi.org/10.1177/0959354310361405

301.

Varela

F. J.

Thompson

Rosch

(1991). The embodied mind: Cognitive science and human experience. MIT press.

302.

Vaswani

Shazeer

Parmar

Uszkoreit

Jones

Gomez

A. N.

Kaiser

Ł.

Polosukhin

(2017). Attention is all you need. Advances in Neural Information Processing Systems, 30, 121.

303.

Verheggen

C. T.

Verheggen

(1999). Enactivism and the experiential reality of culture: Rethinking the epistemological basis of cultural psychology. Culture & Psychology, 5(2), 183–206. https://doi.org/10.1177/1354067x9952006

304.

von Frisch

(1967). The dance language and orientation of bees: Belknap Press of Harvard University Press.

305.

Von Glasersfeld

(2013). Radical constructivism. Routledge.

306.

Vygotsky

L. S.

van der Veer

R. E.

Valsiner

J. E.

Prout

T. T.

(1994). The Vygotsky reader. Basil Blackwell.

307.

Wadley

(2010). Compound-adhesive manufacture as a behavioral proxy for complex cognition in the middle stone age. Current Anthropology, 51(S1), S111–S119. https://doi.org/10.1086/649836

308.

Walls

(2019). The bow and arrow and early human sociality: An enactive perspective on communities and technical practice in the middle stone age. Philosophy & Technology, 32(2), 265–281. https://doi.org/10.1007/s13347-017-0300-4

309.

Wang

Xie

Jiang

Mandlekar

Xiao

Zhu

Fan

Anandkumar

(2023). Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291.

310.

Wang

J. X.

Hughes

Fernando

Czarnecki

W. M.

Duéñez-Guzmán

E. A.

Leibo

J. Z.

(2018). Evolving intrinsic motivations for altruistic behavior. arXiv preprint arXiv:1811.05931.

311.

Ward

Silverman

Villalobos

(2017). Introduction: The varieties of enactivism. Topoi, 36(3), 365–375. https://doi.org/10.1007/s11245-017-9484-6

312.

Weber

Racanière

Reichert

D. P.

Buesing

Guez

Rezende

D. J.

Badia

A. P.

Vinyals

Heess

Pascanu

Battaglia

Hassabis

Silver

Wierstra

(2017). Imagination-augmented agents for deep reinforcement learning. https://arxiv.org/abs/1707.06203

313.

Wei

Wang

Schuurmans

Bosma

Xia

Chi

Q. V.

Zhou

, (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35, 24824–24837.

314.

Weiss

Khoshgoftaar

T. M.

Wang

(2016). A survey of transfer learning. Journal of Big data, 3(1), 1–40. https://doi.org/10.1186/s40537-016-0043-6

315.

Westra

(2017). Pragmatic development and the false belief task. Review of Philosophy and Psychology, 8(2), 235–257. https://doi.org/10.1007/s13164-016-0320-5

316.

Whiten

(2013). Humans are not alone in computing how others see the world. Animal Behaviour, 86(2), 213–221. https://doi.org/10.1016/j.anbehav.2013.04.021

317.

Whiten

(2014). Grades of mindreading. In Children’s early understanding of mind (pp. 47–70). Psychology Press.

318.

Whiten

McGuigan

Marshall-Pescini

Hopper

L. M.

(2009). Emulation, imitation, over-imitation and the scope of culture for child and chimpanzee. Philosophical Transactions of the Royal Society B: Biological Sciences, 364(1528), 2417–2428. https://doi.org/10.1098/rstb.2009.0069

319.

Whiten

V. A.

Whiten

(2005). Causal knowledge and imitation/emulation switching in chimpanzees (pan troglodytes) and children (homo sapiens). Animal Cognition, 8(3), 164–181. https://doi.org/10.1007/s10071-004-0239-6

320.

Wiessner

, (1982). Risk, reciprocity and social influences on! kung san economics. Politics and History in Band Societies, 61, 84.

321.

Wigner

E. P.

(1990). The unreasonable effectiveness of mathematics in the natural sciences. In Mathematics and science (pp. 291–306). World Scientific. https://doi.org/10.1142/9789814503488_0018

322.

Wilkins

N. D.

Wilkins

(2000). In the mind’s ear: The semantic extensions of perception verbs in australian languages. Language, 76(3), 546–592. https://doi.org/10.2307/417135

323.

Willats

(2006). Making sense of children’s drawings. Psychology Press.

324.

Winner

(2006). Development in the arts: Drawing and music. In Kuhn

Siegler

R. S.

(Eds.), Handbook of child psychology: Vol. 2, Cognition, perception, and language (6 edition, pp. 859–904). Wiley.

325.

Wittgenstein

(1953). Philosophical investigations. Blackwell.

326.

Wrangham

S. M. R. W.

Wrangham

R. W.

(2010). Sex differences in chimpanzees’ use of sticks as play objects resemble those of children. Current Biology, 20(24), R1067–R1068. https://doi.org/10.1016/j.cub.2010.11.024

327.

Wynn

Coolidge

F. L.

(2012). How to think like a neandertal. OUP.

328.

Wynn

Overmann

K. A.

Malafouris

(2021). 4e cognition in the lower palaeolithic.

329.

Yang

(2024). Rethinking tokenization: Crafting better tokenizers for large language models. arXiv preprint arXiv:2403.00417.

330.

Yang

S. J. Q.

Yang

(2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359. https://doi.org/10.1109/tkde.2009.191

331.

Yao

Zhao

Shafran

Griffiths

T. L.

Cao

Narasimhan

(2023). Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601.

332.

Zahavi

(1975). Mate selection—a selection for a handicap. Journal of Theoretical Biology, 53(1), 205–214. https://doi.org/10.1016/0022-5193(75)90111-3

333.

Zeki

University of Oxford . (1999). Inner vision: An exploration of art and the brain. Oxford University Press. https://books.google.co.uk/books?id=4e1sQgAACAAJ

334.

Zelikman

Harik

Shao

Jayasiri

Haber

Goodman

N. D.

(2024). Quiet-star: Language models can teach themselves to think before speaking. arXiv preprint arXiv:2403.09629.

335.

Zhang

Cui

Cai

Liu

Huang

Zhao

Zhang

Chen

, (2023). Siren’s song in the ai ocean: A survey on hallucination in large language models. arXiv preprint arXiv:2309.01219.

The origin and function of external representations

Abstract

Keywords

1. Introduction

2. Overview of external representations

3. The constructivist nativist tension

3.1. Nativism

3.2. Constructivism

3.3. Minimal neural requirements

3.4. Neuroscientific evidence

4. Functions of external representations

4.1. Extending primate neural world models

4.2. Extending the associative time window

4.3. Making the invisible visible

4.4. Transfer of mastery from convenient to inconvenient domains

4.5. Other functions of external representations

5. From bush reading to mind reading

6. A space of External representation making games

6.1. The science game

6.2. The language game

6.3. The art game

7. An evolutionary trajectory to fully Gricean external representations

7.1. Game type 1

7.2. Game type 2

7.3. Game type 3

7.4. Game type 4

8. External representations and pretend play

9. The first external representations

9.1. Found and enhanced objects

9.2. Rule-based structured objects

9.3. Figurative art

10. Creative external representation creation and inheritance

11. Conclusion

Footnotes

Acknowledgements

Declaration of Conflicting Interests

Funding

ORCID iD

Notes

About the Authors

References