Sage Journals: Discover world-class research

Abstract

Resolving the dichotomy between the human-like yet constrained reasoning processes of cognitive architectures (CAs) and the broad but often noisy inference behavior of large language models (LLMs) remains a challenging yet exciting pursuit, aimed at enabling reliable machine reasoning capabilities in LLMs. Previous approaches that employ off-the-shelf LLMs in manufacturing decision-making face challenges in complex reasoning tasks, often exhibiting human-level yet unhuman-like behaviors due to insufficient grounding. This present article start to address this gap by asking whether LLMs can replicate cognition from CAs to make human-like decisions. We introduce cognitive LLMs, which are hybrid decision-making architectures comprised of a CA and an LLM through a knowledge transfer mechanism LLM-ACTR . Cognitive LLMs extract and embed knowledge of CA’s internal decision-making process as latent neural representations, inject this information into trainable LLM adapter layers, and fine-tune the LLMs for downstream prediction tasks. We find that, after knowledge transfer through LLM-ACTR , the cognitive LLMs offers better representations of human decision-making behaviors on a novel design for manufacturing problem, compared to an LLM-only model that employs chain-of-thought. Taken together, the results open up new research directions for equipping LLMs with the necessary knowledge to computationally model and replicate the internal mechanisms of human cognitive decision-making. We release the code and data samples at https://github.com/SiyuWu528/LLM-ACTR.

Keywords

neurosymbolic integration cognitive architectures foundation models human-centered AI decision augmentation

1. Introduction

Large language models (LLMs) have gained considerable popularity for a wide range of prediction and decision-making tasks, spanning applications, such as robotics and control, neural question-answering, scene understanding, code generation, mathematical reasoning. LLMs are trained on massive datasets, can be used both as discriminative scoring functions as well as generative models, and their capacity allows them to accumulate and retain vast amounts of knowledge (Andreas, 2022; Brown et al., 2020; Dong et al., 2022; Francis et al., 2022; Hu et al., 2023; Tatiya et al., 2023). Typical LLMs’ use resembles system-1 reasoning process (Hagendorff et al., 2023; Sloman, 1996), offering quick, intuitive responses for everyday tasks. And advancements in multi-agent LLM frameworks and emergent capabilities such as in-context learning (Vaswani et al., 2017; Coda-Forno et al., 2024; and Dong et al., 2022) have pushed LLMs toward system-2 reasoning process (Tversky & Kahneman, 1974), for example, ‘‘chain-of-thought’’ reasoning (CoT) (Bhattamishra et al., 2023), enabling more deliberate cognition for complex decisions (Brown et al., 2020; Webb et al., 2022). However, issues such as discrepancies in human-like reasoning (Liu et al., 2024), problems with insufficient grounding (Yao et al., 2023), and hallucination (Chakraborty et al., 2025) persist. Specifically, when using off-the-shelf LLMs to augment decision-making in manufacturing, where the value stream map (VSM) (Rahani & Al-Ashraf, 2012) with intertwined variables is vital for smart scheduling (Rossit et al., 2019), plant managers often struggle with using LLMs’ unhuman-like and noisy predictions (Makatura et al., 2024) (also see Appendix: LLM Conversation Examples).

This article is part of a larger project aimed at augmenting LLMs with human cognition to improve manufacturing efficiency, structured in two phases. Phase one, focuses on modeling human cognition using symbolic knowledge representation through cognitive architectures (CAs). CAs are codable computational frameworks designed to capture the invariant mechanisms of human cognition. These mechanisms include functions related to attention, control, learning, memory, adaptivity, perception, and action (Laird et al., 2017; Taatgen & Anderson, 2010). Through CAs we can construct cognitive decision-making models that can store, retrieve, and process knowledge, for example, Marewski and Mehlhorn (2011) and Kang (2001). Specifically, it employs a CA to build models representing decisions and their processes, with the primary goals of boosting productivity and ensuring consistent quality. This model leverages data derived from the VSM and decision-makers at Bosch plants. We developed a cognitive model, VSM-ACTR (Wu et al., 2025, 2024), which functions as a standalone tutor for decision-makers in manufacturing, guiding them through decision-making processes by reflecting learners’ learning progression.

Phase two, which is the center of this article, we ask whether LLMs can replicate cognition from CAs to make human-like decisions. We propose Cognitive LLMs as an solution, which are hybrid decision-making architectures comprised of a CA and an LLM through a developing knowledge transfer framework LLM-ACTR . Cognitive LLMs extract and embed knowledge of cognitive model’s internal decision-making process as latent neural representations, inject this information into trainable LLM adapter layers, and fine-tune the LLMs for downstream prediction tasks.

Cognitive LLMs (Figure 1) begins with (i) defining decision-making problems such as from manufacturing management documentation, considering domain knowledge including the VSM and human factors like feedback from plant managers; (ii) we then use a representative CA, ACT-R (Laird et al., 2017; Ritter et al., 2023), widely used for understanding human cognition (Anderson, 2009) and modeling human behaviors (Anderson et al., 2019; Tehranchi et al., 2023), to build a cognitive model. The model simulates human-like decision-making to address the defined problem. Techniques include ontology-based formalization and psychometrics are employed to model the symbolic components of the task, that is, declarative and procedural knowledge, and to set the subsymbolic parameters, for example, learning rate, similarity matching. (iii) The cognitive model is then run at scale stochastically to collect cognitive decision-making reasoning stamps. Collected data are processed into vector embeddings using techniques such as tokenization and dimensionality reduction. (iv) Lastly, Cognitive LLMs learn the embedded vectors of cognitive decision-making through the developing knowledge transfer framework LLM-ACTR . It leverages the strengths of both LLMs and CAs by using the natural language processing and generative capabilities of LLMs, complemented by the human-like learning and reasoning offered by CAs.

Figure 1.

Cognitive LLMs architecture, where CAs instruct LLMs for cognitive decision-making using LLM-ACTR knowledge transfer framework.

We present a case study of Cognitive LLMs in manufacturing decision-making. The task is associated with a key aspect of design for manufacturing (DFM): enhancing product development and optimizing production system performance by improving time efficiency and reducing headcount costs (Ulrich et al., 1993).

The present article poses three research questions: RQ1.

What are the properties of a neural network representation of the decision-making process in CAs? Answering this question sets the ground for developing a context-aware domain knowledge base for augmenting decision-making in LLMs.

RQ2.

What level of complexity in behavior representation can LLMs capture? Previous research used LLMs’ conceptual embeddings to predict human-reinforced decisions (Binz & Schulz, 2023), indicating that embeddings from LLMs could be trained to predict human-like behaviors. By incorporating more training sets using CAs, the study addresses the limitation of high data collection costs with human subjects and aims to broaden the investigation into the extent to which innate LLMs can learn human cognition.

RQ3.

Can we inform the LLMs with knowledge about the reasoning process of the CAs? Answering this question offers insights into knowledge transfer from domain-specific bases to LLMs, and opens up new research directions for equipping LLMs with the necessary knowledge to computationally model and replicate the internal mechanisms of human cognitive decision-making.

The following sections are sequentially arranged as follows: related work; an explanation of Cognitive LLMs, which comprises two components: the CA and its constructed cognitive model; the LLM-ACTR framework, which facilitates knowledge transfer using a developed domain knowledge base; and the experiments conducted to address the research questions, followed by the results, discussion, and implications.

2. Related Work

This section starts by integrating cognitive psychology principles into LLMs, along with decision intelligence in manufacturing and cognitive decision-making. It then highlights the domain limitations of these approaches. It concludes by discussing the current integrating of CAs and LLMs, and points out how our approach differs from others.

2.1. Relating Cognitive Psychology to Human-Like Artificial Intelligence

Human-like artificial intelligence (HLAI) has been a goal since the emergence of machines (McCarthy, 2007). In recent years, the development of transformer-based LLMs has revolutionized HLAI, demonstrating impressive human-level capabilities. However, LLMs sometimes fail to display human-like behavioral traits. Analyzing the areas where LLMs currently fall short in replicating human cognition and behavior highlights the problems in exhibiting human-level capabilities that are unhuman-like (Dorobantu, 2021), including behavior discrepancies between LLM inference and human reasoning (Binz & Schulz, 2023; Liu et al., 2024), insufficient grounding (Yao et al., 2023), and hallucination (Chakraborty et al., 2025).

The challenges mentioned have catalyzed an integration of cognitive psychology with LLMs, toward human-like trustworthy LLMs. Recent studies have used cognitive psychology experiments to investigate and comprehend behaviors in these models, focusing more on behavioral insights than on conventional performance metrics (Binz & Schulz, 2023; Coda-Forno et al., 2024). In addition, the use of LLMs’ neural representations has been applied in behavioral psychological science research, which involves and not limited to prompt engineering, feature extraction, and fine-tuning:

Feature Extraction. The process begins with passing text that mirrors a psychological experiment through the open-source LLM to capture contextualized embeddings from the final layer (Hussain et al., 2024). These embeddings can be employed in various psychological experiments applications, such as predicting similarities between personality constructs (Abdurahman et al., 2024), choices in reinforcement learning (Binz & Schulz, 2024), or perceptions related to risk or health (Wulff & Mata, 2023). For tasks that require sequence prediction, decoder models are preferred due to their larger size and more extensive training data (Hussain et al., 2024).

Zero-Shot and Few-Shot Learning. Zero-shot learning enables the creation of categorical or numerical predictions, such as evaluating sentiments on social media (dos Santos et al., 2024), without requiring training specific to the task. Few-shot learning extends this concept by adding minimal supervision, such as a small number of example pairs, to improve the accuracy of the model.

Fine-Tuning. Fine-tuning smaller LLMs for human-like behaviors can achieve performance that matches or exceeds that of larger models under zero- or few-shot learning conditions (Hussain et al., 2024). This involves adjusting model weights to improve task-specific outcomes. For example, one study fine tuned BERT in zero-shot learning to predict reinforcement learning behaviors of human subjects (Hussain et al., 2024). However, the generalization of this approach is impeded by the high cost of collecting large cognitive psychological datasets involving human subjects.

2.2. Common Model of Cognition, Cognitive Architectures, and Cognitive Models

To work toward integrating human-like behavioral traits into LLMs, we use a suite of tools rooted in the common model of cognition (CMC) to provide a wider range of tasks into the training dataset. CMC embodies unified Theory of Cognition (Laird et al., 2017; Newell, 1990), a theoretical framework that presents a model of human cognition codified as a computational architecture. The CMC is a brain-inspired framework validated by large-scale neuroscience data. The CMC identifies core components and processes fundamental to human cognition, including memory, perception, motor actions, and decision-making. The model assumes a cyclical process where these components interact to produce human behavior. The CMC includes a feature-based declarative long-term memory, a buffer-based working memory, a system for pattern-directed action invocation stored in procedural memory, and specialized systems for perception and action (Stocco et al., 2021).

The CMC integrates essential features from various CAs (Anderson, 2009; Kotseruba & Tsotsos, 2020, 2025; Laird, 2012), which propose a set of fixed mechanisms to model human behavior, functioning akin to agents and aiming for a unified representation of the mind. By using task-specific knowledge, these architectures not only simulate but also explain behavior through direct examination and real-time reasoning tracing.

Two representative CAs related to the CMC are ACT-R and Soar (Laird, 2021). Other CA could also be chosen from a recent extensive review (Kotseruba & Tsotsos, 2020, 2025), as long as a trace is available. ACT-R is a theory of simulating and understanding human cognition (Anderson et al., 2019; Ritter et al., 2023), through which we can construct models that can store, retrieve, and process knowledge, as well as explain and predict performance (Bothell, 2017).

The two most commonly used representations in ACT-R are declarative knowledge and procedural knowledge. Declarative knowledge consists of chunks of memory (e.g., the production line comprises five sections), while procedural knowledge performs basic operations, moves data among buffers, and identifies the next instructions to be executed (e.g., lower defect rate will lead to higher efficiency rate).

Soar, on the other hand, is a general CA that provides a computational infrastructure that resembles the cognitive capabilities exhibited by a human (Laird, 2012). It implements knowledge-intensive reasoning that enables execution of rules based on the context, and the capability to integrate learning into the intelligent agent using chunking or reinforcement learning. Soar’s general computing concept is based on objectives, problem spaces, states and operators. Soar encompasses multiple memory constructs (e.g., semantic, episodic, etc.) and learning mechanisms (e.g., reinforcement, chunking, etc.)

One primary difference between these two architectures is that ACT-R was designed to model human behavior and has a track record of predicting human performance and timing to the millisecond level. In contrast, Soar places less emphasis on replicating human behavior and more on developing general agents with cognitive capabilities (Laird, 2021).

2.3. Decision Intelligence in Manufacturing

Industry 4.0 aims to create ‘‘intelligent factories,’’ where advanced manufacturing technologies facilitate smart decision-making through real-time communication and cooperation among humans, machines, and sensors (Hozdić, 2015). One example of this is smart scheduling, which employs advanced models and algorithms using sensor data (Rossit et al., 2019).

Decision intelligence (Leyer & Schneider, 2021) is a crucial component of smart scheduling and comprises three stages. Decision support. Machines provide basic tools to aid human decision-making, such as alerts, analytics, and data exploration. Here, the decisions are made entirely by humans. Decision augmentation. Machines take on a more proactive role in the decision-making process. They analyze data and generate recommendations and predictions for decision-makers to review and validate. Humans can base their decisions on these suggestions, or they can collaborate with the machine to refine the recommendations. Decision Automation. Machines handle both the decision-making and execution steps autonomously. Humans maintain a high-level overview, monitoring risks and unusual activities, and regularly review outcomes to enhance the system.

A value stream map (VSM) is a critical tool in manufacturing decision intelligence, functioning as an flowchart that visualizes and controls the production line (Manos, 2006). VSM meticulously tracks metrics such as inputs, outputs, processes, overall equipment effectiveness (OEE), and cycle times (CT). However, plant managers encounter significant challenges when transitioning VSM in production management from decision support to decision augmentation. These challenges stem from the difficulty of applying VSM concepts to complex, real-world scenarios characterized by numerous intertwined variables (Makatura et al., 2024).

2.4. Cognitive Decision Making

Representative CAs, for example, Soar, ACT-R, have been used to build models that automate decision-making tasks, for example, Marewski and Mehlhorn (2011) and Kang (2001). Among them, the ACT-R CA is applied to build models across psychology and computer science that are closely aligned with human behaviors. It has a track record of accurately predicting human performance and timing across a variety of tasks (see Plitt & Russwinkel, 2024), which meets our needs for developing synthetic agents that can provide human-like cognitive reasoning in learning and training environments.

The ACT-R modeling approaches include: (a) strategy or rule-based, where different problem-solving strategies are implemented through various production rules and successful strategies are rewarded (Best & Lebiere, 2003; Wu et al., 2023); (b) exemplar or instance-based, which relies on past experiences stored in declarative memory to solve problems (Gonzalez et al., 2003); and (c) hybrid approaches that combine strategies and exemplars (Prezenski et al., 2017).

A few features distinguish the use of ACT-R in creating decision-making models that involve learning: Modular design that mirrors the symbolic aspects of human cognition: ACT-R’s modules emulate human cognitive functions: perceptual modules update the system’s view of the environment, a goal module tracks progress toward objectives, a declarative module uses past experiences for contextual understanding, and a central buffer system enables communication between modules. Additionally, the central production system recognizes patterns to initiate coordinated actions. Subsymbolic processes for decision-making: ACT-R can retrieve relevant memories and activate appropriate rules, ensuring both efficient and adaptive performance in decision-making tasks. It does so at a simulated pace that mirrors human performance at the millisecond level.

However, ACT-R does not have LLM-like dialogic interaction with other ACT-R models, which limits their usability for decision-making. Intuitively, a solution could take the best of both CAs and LLMs, where ACT-R models serve as synthetic agents to instruct LLMs. They do this by providing knowledge of cognitive decision-making through LLMs’ training, which includes aspects such as learning. The trained LLMs can then be generalized to unseen problems.

2.5. Integration of Cognitive Architectures and LLMs

Efforts have been made toward leveraging the strengths of both CAs and LLMs to create a more robust unified theory of computational cognitive models. Some approaches include using the implicit world knowledge of LLMs to replace traditional declarative knowledge mechanisms (Wray et al., 2024), employing Chain-of-Thought reasoning to enhance the symbolic mechanisms for procedural knowledge (Kirk et al., 2024), and leveraging language models as external knowledge sources for cognitive systems, while exploring ways to improve the effectiveness of knowledge extraction (Kirk et al., 2023).

Moreover, Sumers et al. (2023) examines how principles from CAs can guide the design of LLM-based agent frameworks, demonstrating a comprehensive integration effort that spans from knowledge representation to interaction with the environment. Additionally, Sun (2024) proposes a direction for creating computational CAs using dual-process models and hybrid neurosymbolic methods. Using the Clarion CA Sun (2006) as an example, Sun illustrates the theoretical opportunities for incorporating LLMs into Clarion’s modules of perception, memory, motor control, and communication, leveraging LLMs’ natural language processing and generalization abilities. This present study builds upon previous research; however, we have adopted a different perspective by leveraging CAs to ground the decisions of LLMs in a data-driven manner. We aim to examine the properties of a neural network representation of the decision-making process in CAs and investigate whether knowledge from CAs can be preserved in an embedding space and infused into LLMs through the transfer of learning.

3. Problem Definition: Design for Manufacturing

This article presents a case study of training a cognitively inspired LLM for decision-making in the DFM domain. We define the terminology that constitutes our decision-making problem. The DFM problem setting is a prototypical manufacturing production-line workflow, from supplier to customer, for which there exists a VSM (Figure 2), which allows for tracking the efficiency at different sectors of the process and abstracts the overall problem for mathematical modeling and optimization. Decision candidates come from sectors such as body production, pre-assembly, assembly. Early sectors pose potential efficiency problems in the workflow and may warrant optimization (triangles), while later stages are governed by first-in-first-out (FIFO) processes. The metrics at each stage include cycle time (CT), overall equipment effectiveness (OEE), and/or mean absolute error (MAE).

Figure 2.

A Value Stream Map of our manufacturing task process. Adapted from a Bosch manufacturing production handbook.

Focused on maintaining stable output for manufacturing plants, we consider plant managers’ feedback alongside the VSM structure to define the decision-making problem that aim to reduce total production time while minimizing total defect rate increase (see Figure 1(1) define decision-making problems). When facing unseen DFM problems, which are yet constrained to fixed decision candidates and unknown decision metrics. Cognitive LLMs takes a natural language question prompt (see Figure 1(a) for prompt template), and outputs a binary decision (0 or 1) on which of two sectors, pre-assembly or assembly, requires a time reduction.

4. Cognitive LLMs: Hybrid Architectures for Human-Aligned Decision Making

Cognitive LLMs are comprised of a CA and an LLM through a developing knowledge transfer mechanism LLM-ACTR . Thus, we start by introducing the selected cognitive architecure ACT-R, then details about LLM-ACTR .

4.1. VSM-ACTR, A Human-Like Decision Making Cognitive Model

The ACT-R CA was chosen to develop the cognitive model for our task because it has a track record of accurately predicting human performance and timing across a variety of tasks, which meets our need to develop synthetic agents with individual differences in learning and training, for example, Marewski and Mehlhorn (2011) and Plitt and Russwinkel (2024). We created the VSM-ACTR cognitive model, which is a rule-based ACT-R cognitive decision-making model for the DFM problem that implements multiple problem-solving strategies through a combination of production rules.

VSM-ACTR has incorporated the meta-cognitive processes that reflect on and evaluate the progress of chosen strategies—with an emphasis on headcount (manufactoring) cost evaluation, through a reward structure that enables a process akin to reinforcement learning. This system enables the model to dynamically assess the impact of decisions on headcount costs, computing a reward or penalty for each decision cycle. These rewards or penalties then dynamically adjust the utility of the productions associated with each decision-making cycle. This helps the model to exhibit a human-like learning progression, that is inherited from its knowledge and ACT-R’s mechanisms. Below we briefly introduce the model and the model details can be found in Wu et al. (2024) and Wu et al. (2025)

4.1.1. Declarative Memory

VSM-ACTR integrates the prototypical decision process with insights into how cognitive models represent different levels of expertise, for example, Martin et al. (2004) and Paik et al. (2015), categorizing users into three levels of expertise: novices, intermediates, and experts. Novices engage in decision-making using deliberative chunks. Intermediates can manage key metrics such as CT and OEE but struggle with the systematic analysis of intertwined variables. Experts, on the other hand, make judgments systematically. The cognitive model employs three types of knowledge chunks: decisions, decision merits, and goals. The ‘decision chunk’ encodes eight slots including reduction time (goal), decision-making state (novice, intermediate, expert), and related variables. The ‘‘decision merits chunk’’ holds information on sector weights, defect increases by sector, and comparative defect rate increases. The ‘‘goal chunk’’ captures the initial production conditions and the ultimate goal of achieving the optimal decision.

4.1.2. Production Rule Sets

Three sets of production rules represent the decision-making behaviors of novice, intermediate, and expert decision-makers. These sets comprise a total of 18 rules, each driven by goal-focused objectives across 20 states, covering actions such as choosing strategies, actions, working memory management, decisions, and evaluations.

We use the expert production rule set as an example (Figure 3), once the decision-choice center decides to activate a set of expert decision productions, the process begins by perceiving the problem and retrieving related decision-making metrics from chunks. The imaginal buffer then acts as a working memory platform, holding and manipulating relevant information during the decision-making process. It allows the model to construct new mental representations or modify existing ones based on incoming data or problem-solving needs. This involves using the imaginal buffer to assess the relationships between the decision target and decision metrics, particularly considering the impact of each sector’s weight on the defect rate change, and determining the final defect rate increase for each sector. These results are stored in the imaginal buffer and later retrieved for comparison. This enables the model to select the sector with the lowest defect increase. After one decision-making cycle, the model evaluates the headcount cost, rewarding or penalizing the entire process based on the evaluation results and decision strategy used before looping back to the next decision-making round.

Figure 3.

Production rules control structure for expert decision making and their use of the ACT-R Goal and Imaginal buffers. Adapted from Wu et al. (2025).

4.1.3. Level of Expertise Mechanism

The model can learn while performing tasks through a mechanism leading to varying levels of expertise, as shown in Figure 4. The model mimics human decision-making behavior through differentiating knowledge representations. Declarative Memories: These memories store knowledge that aligns with human intuition and expertise gained from the VSM. For example, the green triangles in the figure represent a portion of the intuition used by novice decision-makers, while the red circles contains VSM domain knowledge used by intermediate decision-makers. Production Rules: These rules capture the rational decision-making processes observed in human subjects. The green lines illustrate how the imaginal buffer retrieves relevant portions of the novice declarative memory and feeds them to the novice production rule set. Intermediate and expert decision-making levels follow the same principle. Red and blue shapes represent their respective declarative memory chunks, and the corresponding colored arrows show the flow of information through their production rule sets. Finally, the goal buffer uses the ’goal focus’ command to manipulate the different phases of the task.

Figure 4.

Level of expertise mechanism in VSM-ACT-R. Excerpted from Wu et al. (2025).

Figure 5.

(a) Obtaining decision representations from VSM-ACT-R. (b) LLM feature extraction for behavior prediction.

The model also simulates the learning progress through the Decision-Choice Control, which manages errors, learning, and memory via utility learning and reinforced rewards. Novice decision-making productions start with a utility base and include a noise parameter. Each round of decisions receives rewards or penalties, and the utility of associated production rules updates with the adjustment of memory retention, which depends on the time passed since the rule last fired.

4.1.4. Foster Metacognition to Support Learning

With the aim of making the model assess the effectiveness of decisions while learning—akin to human metacognition, self-assessing and self-correcting in response to self-assessment (Nelson & Narens, 1994)—we consequently developed a dynamic reward function that rewards actions after self-evaluating the chosen strategy.

VSM-ACTR uses the temporal difference (TD) algorithm from reinforcement learning (Sutton & Barto, 2018) as expressed in Eq. (2). Each production rule in the ACT-R model has a utility—a value or strength—associated with it, which is updated using the TD algorithm:

\begin{aligned} U_{i} (n) = U_{i} (n - 1) + α [R_{i} (n) - U_{i} (n - 1)] \end{aligned}

(1)

where

U_{i} (n)

represents the value or utility of some item

i

(i.e., a production) after its

n

-th occurrence, and

R_{i} (n)

represents the reward received on the

n

-th occurrence. The parameter

α

(

0 < α < 1

) controls the learning rate. If multiple productions compete with expected utility values

U_{j}

, the probability of selecting production

i

is given by Eq. (2):

\begin{aligned} Probability (i) = \frac{e^{U_{i} / \sqrt{2 s}}}{\sum_{j} e^{U_{j} / \sqrt{2 s}}}, \end{aligned}

(2)

where the summation over

j

is over all the productions that currently have their conditions satisfied; and

s

is a noise parameter.

The utilities of production are learned as the model runs, based on the rewards or penalty that are received. We designed the reward function as $R (s, f (x))$ that calculates the reward at the end of each decision-making round. This function takes two parameters: $S$ , representing the strategy used, and $f (x)$ , which results from headcount cost analysis, leading to either a weighted reward or a penalty. For example, in one decision round, a penalty of $- 2$ is computed due to the use of a novice strategy coupled with inefficient headcount cost analysis. Factoring in the memory retention effect after a 0.05-second step, the calculation using the TD algorithm modifies the impact of the decision on the utility of the next production as:

U_{i} (n + 1) = U_{i} (n) + α [- 2 - 0.05 - U_{i} (n)] .

This will then sequentially update the utility of the chain of productions for the chosen strategy. We find that when the model encounters certain types of problems both novice and expert strategies result in similar efficiencies in cost assessment. In these cases, the model is prone to staying with the novice strategy and exhibits a more gradual learning curve, similar to the tendency for people facing bounded rationality in decision-making (Fu & Gray, 2004; Hastie & Dawes, 2010), where they are likely to select the less effortful option when faced with multiple choices that produce very similar outcomes.

4.1.5. VSM-ACTR Model Evaluation

To answer the question of whether VSM-ACTR decisions demonstrate learning progression, and capture individual differences, this study first uses descriptive statistics and linear regression to show the average progression of decision types across trials. It then use a mixed linear model to assess and illustrate the effects of trials on decision types across ACT-R model personas, with repeated measures of trials, and random effects to account for individual differences. Last but not least, it uses ordered logistic regression to analyze and understand the relationship between the number of trials and an ordinal dependent variable of learning progress from novice to expert.

We ran the VSM-ACTR model 2,012 times to understood its behavior (Ritter et al., 2011). Each time, we asked it to run 15-16 trials until the model achieved stable expert behavior. We collected data with decision types encoded as 0, 1, and 2 for novice, intermediate, and expert strategies.

Figure 6 shows a significant positive impact of trial exposure on decision-making progression, evidenced by a linear coefficient of 0.086 ( $P < 0.05$ ). furthermore, the standard deviation starts relatively low but quickly increases, peaking around the third trial. This could reflect a diverging approach to decision-making as VSM-ACTR personas experiment with different strategies. the standard deviation gradually decreases thereafter, stabilizing between 0.5 and 0.75, which points to a convergence in decision-making strategies among personas.

Figure 6.

Trend of decision types over trials, blue line is average decision types, red line is variance, decision type 0 is novice, 1 is intermediate, and 2 is expert.

Figure 7.

Reduced embedding map to full traces from VSM-ACTR one trail.

A mixed linear model regression confirms the effect of trials on decision-making and further reveals a variance of 0.007 in the random group effects, suggesting that the trials themselves predominantly explain the variability in decision type, while the individual differences exists. Threshold analysis using ordered logistic regression reveals significant transition thresholds. The transition from novice to intermediate has a significant threshold of 0.88 ( $P < 0.05$ ), indicating a challenging progression to higher decision-making skills. In contrast, the transition from intermediate to expert shows a significantly lower threshold of 0.1 (P = 0.021), suggesting it is easier to progress from intermediate to expert than from novice to intermediate. These findings validate that the repeated reinforcement decisions from VSM-ACTR demonstrate human-like learning progression and capture individual differences.

4.2. The Knowledge Transfer Framework: LLM-ACTR

With the validated model in hand, we then explain the LLM-ACTR framework, beginning with its cognitive knowledge input, followed by its knowledge transfer mechanism.

4.2.1. Cognitive Decision-Making Knowledge

This study curated VSM-ACTR decision-making knowledge through VSM-ACTR’s traces, which capture the reasoning steps in real time using a concurrent protocol. These traces log the cognitive operations executed by the modules at each decision point. The traces exhibit metacognition, which involves awareness and understanding of one’s own decision-making processes. This is represented through model traces that demonstrate the use of the imaginal buffer for accessing working memory, procedural memory matching and firing, and the self assessment of strategy effectiveness. Traces also exhibit executive function (Gilbert & Burgess, 2008), which involves the evolution of decision-making results across trials and shows how decisions adapt through learning and experience.

As shown in Table 1, the model begins by establishing the goal (line 1) and then proceeds with a novice strategy (line 3, BRUTE/Novice). For the production rules associated with each strategy, the utility of each production rule is updated based on the received reward and the time since the last selection. For instance, the reward computation based on cost analysis (line 6) for the BRUTE choice results in a reward of $- 2$ (line 10). Consequently, the utility of the NAIVE-CHOICE rule, impacted by a penalty of $- 2.25$ for the time passed since the last selection, decreases from 3 to 1.96 (lines 14–16). As the utility of naive strategies declines, the probability of triggering the Intermediate Strategy (lines 26–27) and the EXPERT Strategy (lines 87–89) increases.

4.2.2. Learning An Embedding Space of Decision Traces

The next step is to convert the traces into vectors that LLMs can process. To retain executive function processes, we log decision results and strategy traces, which are then numerically encoded. For instance, 0’ represents a decision for reduced time in the preassembly section, and 1’ for assembly. Encoded data are subsequently fed into the neural network as single vectors.

To retain both executive function and metacognition processes, this study employs a semantic extraction and dimensionality reduction approach. This approach aims to transform a vast number of cognitive reasoning stamps into a vector format that balances information retention with computational efficiency. Traces for each task are processed through a sentence transformer to obtain semantic embeddings for each timestamp. A sum of ranked explanatory effects (SREE) analysis is then applied to determine the number ( $N$ ) of principal components that account for at least 70% of the variance. These embeddings are then reduced to $N$ dimensions using principal component analysis (PCA) (Abdi & Williams, 2010) (see Figure 5a). The learned embeddings can then be concatenated into a one-dimensional vector that serves as a content vector. This content vector could then be used to elicit meaningful cognitive decision-making behavior perturbations in LLMs. For example, the preliminary experiment explores the transfer of both metacognitive and executive function processes into LLMs by adding the cognitive content vector to the forward pass of LLM next token prediction to elicit meaningful behavioral perturbations.

4.2.3. Transfer of Learning

LLM-ACTR (see Figure 1(4) LLM-ACTR framework) begins by (a) parsing consistent template prompts that reflect the decision making task into an open-source LLM, mapping the task for the cognitive model; (b) using the LLM as the base model to access the last hidden layer and obtain masked embeddings; (c) constructing a classification layer with softmax activation on top of the base model; (d) using targets containing the salient decision representations of the cognitive model and features from the masked embeddings of the base LLM, and fine-tuning the LLM for classification using the LORA method.

Fine-tuning, which involves optimizing model weights for a specific task, has been widely applied in the transfer of learning (Guo et al., 2019). Aiming at transferring human-like decisions with learning, the targets are the encoded vectors that represent executive function processes of each VSM-ACTR persona. The transfer of learning has been reformulated into a classification fine-tuning task, where the final layer of contextualized embeddings—capturing the in-context meaning of tokens by recombining them with other tokens’ embeddings—is used as features. These selected contextualized embeddings provide the richest semantic information while balancing minimal information loss and reduced computational costs for fine-tuning. Additionally, Low-Rank Adaptation (LoRa) was employed for its computational efficiency (Hu et al., 2022). The current LLM-ACTR framework can also be extended to transfer other cognitive processes such as metacognition, as demonstrated in the following preliminary experiments section.

5. Experiments

5.1. Use Semantic Mapping to Evaluate Cognitive Decision Making Traces Vector

To answer RQ1 regarding the properties of a neural network representation of the decision-making process in CAs, we conducted a semantic mapping analysis of the first two principal components of the learned embeddings of each trace. The goal is to explore how the neural network has the potential to learn guided perception, memory, goal-setting, and actions — key components of cognitive decision-making — in an embedding space. We then used MANOVA analysis to examine how the learned embeddings correspond to the semantic of ACT-R’s components, including procedural memory, imaginal memory, goal knowledge, utility updating, and decision-making actions.

5.2. Feature Extraction for Behavior Prediction

To answer RQ2: What level of complexity in behavior representation can LLMs effectively capture? This study adopted the similar method of LLMs’ feature extraction for behavior prediction (Hussain et al., 2024). We created datasets consisting of LLMs’ last contextual embeddings as features and the corresponding different levels of VSM-ACTR decisions as targets. We obtained embeddings by passing prompts that included all the information that VSM-ACTR had access to on a given trial and then extracting the hidden activations of the final layer, as shown in Figure 5(b).

The first dataset used targets as VSM-ACTR decisions, where ‘‘0’’ indicates preassembly and ‘‘1’’ indicates assembly. The second dataset’s prompt template added an explanation of the strategy adopted by VSM-ACTR (see Appendix: LLM System Prompt Templates) and used compound targets comprising both the decisions and the strategies reflecting the learning trajectory (novice, intermediate, and expert). The targets were encoded as follows: 0, 1, and 2 for preassembly choices using novice, intermediate, and expert strategies, respectively, and 3, 4, and 5 for assembly choices following the same pattern. With these two datasets, we fitted a regularized logistic regression model using 10-fold cross-validation for the first dataset and multinomial regression using 10-fold cross-validation with L2 regularization for the second. Model performance was assessed by measuring the goodness of fit through negative log-likelihood (NLL) and the predictive accuracy of hold-out data.

5.3. Knowledge Transfer

To answer RQ3: whether LLMs can be informed with knowledge about the reasoning processes of CAs, we use a case study to examine whether Cognitive LLMs offer better representations of human decision-making behaviors on a novel DFM problem, compared to an LLM-only model that employs chain-of-thought reasoning strategies.

5.3.1. Base Model and Data

The case study uses the LlaMa-2 13B (Touvron et al., 2023) model as the base model because it demonstrated effectiveness and efficiency in NLP tasks (Huang et al., 2024). As a state-of-the-art LLM, LlaMa has been trained on trillions of tokens from publicly available datasets. Unlike other transformer-based models such as the GPT family, which can only be accessed at the user’s end, LlaMa’s architecture, including its pre-trained weights, is fully accessible. Furthermore, evidence that its internal representations can be trained to become more aligned with human neural activity has been presented (Binz & Schulz, 2024).

To determine the target size that can effectively perform the fine-tuning task while balancing efficacy and resource limitations, we referred to Kumar et al. (2024), who showed evidence that LlaMa-2 13B would maintain competitive performance in resource-limited text classification with datasets of nearly 1,000 rows per class. Based on this, we created a dataset that contains the 2,012 decision-making trials, obtained by running the developed VSM-ACTR model across 32 problem sets; each ACT-R persona was run for 15 to 16 trials until stable expert behavior was achieved.

5.3.2. Experiment Metrics

The fine-tuning process employs cross-entropy as the loss function and uses Adam optimization. Training involves a train-test split of 0.2 and a batch size of 5 for both training and validation phases. The learning rate was set to 1e $- 5$ , with training spanning across 10 epochs. To ensure regularization and prevent overfitting, weight decay of 0.01, a dropout rate of 0.5 were applied, and gradient accumulation was set to 2. Last but not least, gradient clipping was employed to maintain a maximum gradient norm of 1.0 for gradient explosion control.

5.3.3. Baseline Models

To assess the model’s ability to make human-like decisions, we first split the data into train and validation sets to reserve a set of unseen problems. We then compared the predictive negative log-likelihood (NLL), a measure of goodness-of-fit, of Cognitive LLMs in predicting VSM-ACTR’s decisions on the unseen problems, against a pre-trained LlaMa and a random guess model.

A random choice model serves as the basic form of control condition to distinguish the effects of treatment from chance (Gaab et al., 2019). This approach allows assessing the extent to which decisions are influenced by knowledge versus being purely stochastic. On the other hand, using LlaMa without fine-tuning as a baseline provides a reference point to measure the impact of knowledge transfer on the model’s performance.

6. Results

6.1. Finding Useful Cognitive Decision Making Embeddings

The approach of distilling executive function processes captures the evolution of decision-making results across trials and illustrates how decisions adapt through learning and experience, all represented as a sequential single vector. This approach is easy to use for downstream tasks but retains only partial knowledge of cognitive decision-making

In addition, Figure 9 displays the reduced embeddings of both metacognitive and executive function processes corresponding to the semantic mapping of ACT-R’s components. The MANOVA analysis was conducted to assess the overall effect of the independent variables, including label categories or ACT-R components, on the combined dependent variables—components of reduced embeddings. This analysis reveals a significant relationship with the semantic mapping of ACT-R’s components. For instance, the Wilks’ lambda value (0.0004) suggests that the label or ACT-R component categories explain nearly all the variance in the dependent variables, indicative of a strong group effect. The statistical tests applied—Wilks’ lambda, Pillai’s trace, Hotelling-Lawley trace, and Roy’s greatest root—all demonstrate strong significance, as evidenced by p-values less than 0.05 across all tests. It shows that the semantics of symbolic and subsymbolic representations of cognitive models can be learned using a neural network, and the principal components retained successfully capture the essential variance related to these cognitive processes, providing a way to preserve cognitive decision-making knowledge in a compact embedding space.

6.2. Assessing Behavior Complexity Captured by the Innate LLM

Table 2 shows that LLM-ACTR captures a single facet of decision-making, achieving an average accuracy of 0.64 across 10 validation folds in the holdout task. When decision-making targets involve multiple facets—encompassing both choices and strategies that shape the learning trajectory—the accuracy decreases to 0.42. In addition, the NLL reveals greater predictive uncertainty for multifaceted decision-making processes, as evidenced by a significantly higher NLL of 1.18 compared to 0.65 in single-facet scenarios. The results show that prompt embeddings generated through feature extraction capture the overall structure of learning. However, they struggle to capture complex decision-making rationales.

Table 2.
Evaluation for Single and Multi Facets Targets.

Target Type NLL Accuracy

Single Facet Target 0.63 0.64

Multi Facets Target 1.18 0.42

Target Type	NLL	Accuracy
Single Facet Target	0.63	0.64
Multi Facets Target	1.18	0.42

6.3. Learning Cognitive Decision-Making Through LLM-ACTR

We first report training and validation losses, across 10 epochs, to reveal the fine-tuned model’s learning and generalization behavior. Initially, the training loss begins at approximately 0.73, with a slight fluctuation observed in subsequent epochs, peaking around epoch 2 and showing a notable dip at epoch 7. In contrast, the validation loss starts at around 0.64 and remains remarkably stable throughout the epochs. This consistency in validation loss, coupled with a generally downward trend in training loss after its initial variations, suggests that the model is learning effectively.

We report next in Table 3 the comparison of the Cognitive LLMs with the baseline models on goodness of fit using negative log likelihood (NLL) and accuracy score for hold-out data. The Cognitive LLMs demonstrates significantly better performance across all metrics compared to the LlaMa-only model, highlighting its effectiveness in decision-making tasks involving reinforced learning. Additionally, the LlaMa-only model performs worse than the chance-level model. We believe this underscores the necessity of fine-tuning pre-trained language models like LlaMa to adapt them to human-like decision-making patterns.

Table 3.
Comparison of VSM-ACTR With Baselines.

Model NLL Accuracy

Chance-level 0.6931 0.4826

LlaMa 1.1330 0.3564

LLM-ACTR (ours) 0.6534 0.6576

Model	NLL	Accuracy
Chance-level	0.6931	0.4826
LlaMa	1.1330	0.3564
LLM-ACTR (ours)	0.6534	0.6576

7. Preliminary Experimental Results on Extending LLM-ACTR

Following results for RQ1 that the semantics of symbolic and subsymbolic representations of cognitive models can be learned using a neural network, we conducted a preliminary experiment to extend LLM-ACTR to transfer holistic cognitive processes.

After retaining a randomly-chosen 240 full cognitive reasoning traces from the VSM-ACTR model, we processed both executive function and metacognition processes using a semantic extraction and dimension reduction approach (see Figure 5a). The resulting embeddings were concatenated into 240 one-dimensional tensors. We then addressed the issue of ragged tensors due to the individual difference by padding, then calculated the standardized mean values of these tensors to serve as a content vector.

The preliminary experiment extends LLM-ACTR with the content vector into training. The content vector is injected into one of the hidden layers during a forward pass to introduce differentiated activations. Using the modified LLM as the base model, it accesses the last contextualized embedding and obtains the masked embedding. A classification layer with softmax activation is constructed on top to form the decision-making layer. Using targets of ACT-R model decisions, the Cognitive LLM is fine-tuned for the classification task in decision-making using LoRA (see Figure 8). We switched to a smaller size of LlaMa 7b for the experiment to strike a balance between the computational costs of back propagation when modifying the model’s hidden layers and the overall efficacy of the base model.

Figure 8.

Infusing holistic VSM-ACTR traces as content vectors through fine-tuning.

Figure 9.

Comparison of NLL across 10 epochs for fine-tuning only and fine-tuning with cognitive content vectors.

The LlaMa model with the modified hidden layer is fine-tuned with 2,012 data points for the binary classification task. The content vectors are set to be trainable. To assess the model’s ability to make human-like decisions, we first split the data into train and validation sets to reserve a set of unseen problems. We then compared the predictive NLL of Cognitive LLM in predicting VSM-ACTR’s decisions on the unseen problems, against LlaMa fine-tuned without content vectors.

The results (Figure 9) show that the addition of the vector representation of VSM-ACTR’s holistic traces during fine-tuning resulted in a slightly decreased mean and reduced variance of NLL across 10 epochs, demonstrating better model fitting and stability compared to fine-tuning only. It indicates that allowing the model to integrate and learn from the cognitive vector during training potentially leads to more nuanced and human-like decision-making capabilities, as captured by the cognitive features encoded in the vector. However, the influence of the cognitive content vector is limited and warrants further investigation, partly because the stochastic simulation of the VSM-ACTR produces decision-making vectors of various lengths. This study addresses ragged tensors by padding, but this approach potentially dilutes or changes the semantics of each vector. To improve the impact of the cognitive vector, additional techniques such as vector optimization will be needed.

8. Discussion and Conclusion

8.1. Main Insights/Takeaways

This article starts to show how to enable LLMs to replicate cognitive decision-making in CAs via a data-driven approach. We introduce Cognitive-LLMs, a novel neurosymbolic architecture designed to enhance human-like decision-making by integrating the CAs’ cognitive processes with LLMs. We mark several contributions. (i) It examines latent representations of CAs through neural networks. The findings show that distilling the executive function process preserves high-level symbolic knowledge but only partially capturing decision-making involves learning. A holistic semantic preservation approach, covering both executive function and metacognitive processes, retains symbolic and subsymbolic semantics in a low-dimensional space. However, challenges with ragged tensors derived from individual differences in downstream tasks require further optimization. (ii) We then collected domain knowledge as the executive function process and used the knowledge as labeled targets in a feature extraction for behavior prediction task to investigate the LLMs’ innate capabilities in capturing the complexity of behavioral representations. The results show that prompt embeddings generated through feature extraction capture the overall structure of learning. However, they struggle to capture complex decision-making rationales.

Further more, (iii) this study presents a developing framework LLM-ACTR for knowledge transfer from cognitive models to LLMs, rooted in the mechanism of LLMs’ next-token prediction and the knowledge representation of cognitive models. This includes methods such as using the cognitive models’ decisions for fine-tuning (Guo et al., 2019), and integrating a cognitive decision-making vector into hidden layer to elicit meaningful behavior perpetuation (Panickssery et al., 2023). (iv) It advances previous efforts on human-like LLMs alignment using data from large-scale cognitive psychology experiments involving human subjects (Binz & Schulz, 2023; Coda-Forno et al., 2024). It reduces the cost of data collection by using synthetic data from cognitive models. The synthetic data present real-time cognitive reasoning with tasks, including metacognition, which is hard to quantify in human subjects (Fleming & Lau, 2014). (v) The case study of Cognitive LLMs in manufacturing decision-making demonstrates that Cognitive LLMs achieves better fitting of human-like decisions on unseen problems compared to a pre-trained model in the DFM task. Thus, it is possible to transfer decision-making knowledge from CAs to LLMs.

This development opens up new research directions for equipping LLMs with the necessary knowledge to computationally model and replicate the internal mechanisms of human cognitive decision-making (Oltramari, 2023; Oltramari et al., 2021). It also complements ongoing work showing that LLMs could possibly be transformed into cognitive models through knowledge transfer, (Binz and Schulz 2024), Coda-Forno et al. (2024) and Coda-Forno et al. (2024). For example, Binz et al. (2024) shows that through fine-tuning, LLMs’ internal representations can become more aligned with human behaviors.

8.2. Limitations and Future Work

One limitation also stems from the novelty of this study. How closely can we claim that cognitive model personas replicate human behaviors? Currently, our focus is on tuning the model to align with general patterns of learning and error-making; however, VSM-ACTR still requires more granular human data for cognitive fine-tuning. The closer the VSM-ACTR model aligns with human behavior, the more accurately it can represent human decision-making processes and explain human behavior.

However, the more meaningful questions arise from considering the landscape of enabling machine cognitive reasoning. We must ask ourselves what we can learn about cognitive decision-making when we infuse knowledge from CAs into LLMs. For now, our insights are limited to the observation that knowledge from cognitive models can be preserved in an embedding space and could be learned by LLMs, and that embeddings from LLMs can be trained to predict human-like decisions. While this is interesting in its own right, it certainly is not the end of the story. Looking beyond the current work, transitioning from transferring cognitive models’ human-like decisions to LLMs, to guided perception, memory, goal-setting, and actions, will provide the opportunity to apply a wide range of explainability techniques to LLMs’ cognitive decision-making.

One application of this further work can be used to address a common limitation in machine learning innovations—cross-domain generalization, (Akrout et al., 2023) and Yoon et al. (2024). Cognitive LLMs can currently only generalize to unseen problems within an applicable domain, constrained by fixed decision candidates and unknown decision metric values. In applying Cognitive LLMs to evolving manufacturing problems that may incorporate an increasing number of decision candidates and associated metrics, it becomes critical to solve out-of-domain problems (Wang et al., 2022). This will require LLM-ACTR to advance in transferring guided perception, memory, and goal-setting to LLMs. As Zhu and Simmons (2024) found, training the LLM with the rules of guided perception in cognitive models can help generalize robotics problem-solving to out-of-distribution tasks.

Credit Author Statement

Siyu Wu: Conceptualization, Methodology, Software, Experiments, Writing - Original Draft, Writing - Review & Editing. Alessandro Oltramari: Conceptualization, Funding Acquisition, Methodology, Software, Writing - Review & Editing. Jonathan Francis: Methodology, Experiments, Writing- Review & Editing. C. Lee Giles: Conceptualization, Writing - Review & Editing. Frank E. Ritter: Writing - Review & Editing.

Footnotes

Acknowledgments

The authors thank anonymous reviewers for constructive feedback.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Appendix

References

Abdi

Williams

L. J.

(2010). Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2(4), 433–459.

Abdurahman

Zou

Ungar

Bhatia

(2024). A deep learning approach to personality assessment: Generalizing across items and expanding the reach of survey based research. Journal of Personality and Social Psychology, 126(2), 321. https://doi.org/10.1037/pspp0000480

Akrout

Feriani

Bellili

Mezghani

Hossain

(2023). Domain generalization in machine learning models for wireless communications: Concepts, state-of-the-art, and open issues. IEEE Communications Surveys & Tutorials, 25, 3014–3037.

Anderson

J. R.

(2009). How can the human mind occur in the physical universe? Oxford University Press.

Anderson

J. R.

Betts

Bothell

Hope

Lebiere

(2019). Learning rapid and precise skills. Psychological Review, 126, 727–760.

Andreas

(2022). Language models as agent models. In Findings of the association for computational linguistics: EMNLP 2022 (pp. 5769–5779).

Best

B. J.

Lebiere

(2003). Teamwork, communication, and planning in ACT-R agents engaging in urban combat in virtual environments. In Proceedings of the 2003 IJCAI workshop on cognitive modeling of agents and multi-agent interactions (pp. 64–72).

Bhattamishra

Patel

Blunsom

Kanade

(2023). Understanding in-context learning in transformers and LLMs by learning to learn discrete functions, arXiv preprint arXiv:2310.03016.

Binz

Akata

Bethge

Brändle

Callaway

Coda-Forno

Dayan

Demircan

Eckstein

M. K.

Éltetö

Griffiths

T. L.

Haridi

Jagadish

A. K.

Ji-An

Kipnis

Kumar

Ludwig

Mathony

Mattar

Schulz

(2024). Centaur: A foundation model of human cognition, arXiv:2410.20268.

10.

Binz

Schulz

(2023). Using cognitive psychology to understand GPT-3. Proceedings of the National Academy of Sciences, 120(6), e2218523120.

11.

Binz

Schulz

(2023). Turning large language models into cognitive models. arXiv preprint arXiv:2306.03917.

12.

Bothell

(2017). ACT-R 7 reference manual. http://act-r.psy.cmu.edu/wordpress/wp-content/themes/ACT-R/actr7/reference-manual.pdf

13.

Brown

T. B.

Mann

Ryder

Subbiah

Kaplan

Dhariwal

Neelakantan

Shyam

Sastry

Askell

Agarwal

(2020). Language models are few-shot learners. In Advances in neural information processing systems (NeurIPS) (Vol. 33, pp. 1877–1901).

14.

Chakraborty

Ornik

Driggs-Campbell

(2025). Hallucination detection in foundation models for decision-making: A flexible definition and review of the state of the art. ACM Computing Surveys, 57(7), 1–35.

15.

Coda-Forno

Binz

Akata

Botvinick

Wang

Schulz

(2023). Meta-in-context learning in large language models. Advances in Neural Information Processing Systems, 36, 65189–65201.

16.

Coda-Forno

Binz

Wang

J. X.

Schulz

(2024). Cogbench: a large language model walks into a psychology lab. arXiv preprint arXiv:2402.18225.

17.

Dong

Dai

Zheng

Chang

Sun

Sui

(2022). A survey on in-context learning, arXiv preprint arXiv:2301.00234.

18.

Dorobantu

(2021). Human-level, but non-humanlike: Artificial intelligence and a multi-level relational interpretation of the imago dei. Philosophy, Theology and the Sciences, 8(1), 81–107.

19.

dos Santos

V. G.

Santos

G. L.

Lynn

Benatallah

(2024). Identifying citizen-related issues from social media using LLM-based data augmentation. In International conference on advanced information systems engineering (pp. 531–546). Springer Nature.

20.

Fleming

S. M.

Lau

H. C.

(2014). How to measure metacognition. Frontiers in Human Neuroscience, 8, 443.

21.

Francis

Kitamura

Labelle

Navarro

(2022). Core challenges in embodied vision-language planning. Journal of Artificial Intelligence Research, 74, 459–515.

22.

W.-T.

Gray

W. D.

(2004). Resolving the paradox of the active user: Stable suboptimal performance in interactive tasks. Cognitive Science, 28, 901–935.

23.

Gaab

Kossowsky

Ehlert

Locher

(2019). Effects and components of placebos with a psychological treatment rationale—Three randomized-controlled studies. Scientific Reports, 9(1), 1421.

24.

Gilbert

S. J.

Burgess

P. W.

(2008). Executive function. Current Biology, 18(3), R110–R114.

25.

Gonzalez

Lerch

J. F.

Lebiere

(2003). Instance-based learning in dynamic decision making. Cognitive Science, 27, 591–635. https://doi.org/10.1207/s15516709cog2704_2

26.

Guo

Shi

Kumar

Grauman

Rosing

Feris

(2019). Spottune: Transfer learning through adaptive fine-tuning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4805–4814).

27.

Hagendorff

Fabi

Kosinski

(2023). Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT. Nature Computational Science, 3(10), 833–838.

28.

Hastie

Dawes

R. M.

(2010). Rational choice in an uncertain world: The psychology of judgment and decision making. Sage.

29.

Hozdić

(2015). Smart factory for industry 4.0: A review. International Journal of Modern Manufacturing Technologies, 7(1), 28–35.

30.

E. J.

Shen

Wallis

Allen-Zhu

Wang

Chen

(2022). LoRA: Low-rank adaptation of large language models. In International conference on learning representations. https://openreview.net/forum?id=nZeVKeeFYf9

31.

Xie

Jain

Francis

Patrikar

Keetha

Kim

Xie

Zhang

Zhao

(2023). Toward general-purpose robots via foundation models: A survey and meta-analysis, arXiv preprint arXiv:2312.08782.

32.

Huang

Wang

(2024). Performance analysis of Llama 2 among other LLMs. In 2024 IEEE conference on artificial intelligence (CAI) (pp. 555–559).

33.

Hussain

Binz

Mata

Wulff

D. U.

(2024). A tutorial on open-source large language models for behavioral science. Behavior Research Methods, 56(8), 8214–8237.

34.

Kang

(2001). Team-Soar: A computational model for multilevel decision making. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 31(6), 708–714. https://doi.org/10.1109/3468.983407

35.

Kirk

J. R.

Wray

R. E.

Laird

J. E.

(2023). Exploiting language models as a source of knowledge for cognitive agents. In Proceedings of the AAAI symposium series (Vol. 2, pp. 286–294).

36.

Kirk

J. R.

Wray

R. E.

Lindes

Laird

J. E.

(2024). Improving knowledge extraction from LLMs for task learning through agent analysis. In Proceedings of the AAAI conference on artificial intelligence (Vol. 38, pp. 18390–18398).

37.

Kotseruba

Tsotsos

J. K.

(2020). A review of 40 years of cognitive architecture research: Focus on perception, attention, learning and applications. AI Review, 53, 17–94.

38.

Kotseruba

Tsotsos

J. K.

(2025). The computational evolution of cognitive architectures. Oxford University Press.

39.

Kumar

Sharma

Bedi

(2024). Towards optimal NLP solutions: Analyzing GPT and LLaMA-2 models across model scale, dataset size, and task diversity. Engineering, Technology & Applied Science Research, 14(3), 14219–14224.

40.

Laird

J. E.

(2012). The Soar cognitive architecture. MIT Press.

41.

Laird

J. E.

(2021). An analysis and comparison of ACT-R and Soar. In Proceedings of the ninth annual conference on advances in cognitive systems. Paper 6.

42.

Laird

J. E.

Lebiere

Rosenbloom

P. S.

(2017). A standard model of the mind: Toward a common computational framework across artificial intelligence, cognitive science, neuroscience, and robotics. AI Magazine, 38(4), 13–26.

43.

Leyer

Schneider

(2021). Decision augmentation and automation with artificial intelligence: Threat or opportunity for managers? Business Horizons, 64(5), 711–724.

44.

Liu

Geng

Peterson

J. C.

Sucholutsky

Griffiths

T. L.

(2024). Large language models assume people are more rational than we really are, arXiv cs.CL. https://doi.org/10.48550/arXiv.2406.17055

45.

Makatura

Foshey

Wang

Hähnlein

Deng

Tjandrasuwita

Spielberg

Owens

C. E.

Chen

P. Y.

Zhao

Zhu

Norton

W. J.

Jacob

Schulz

Matusik

(2024). How can large language models help humans in design and manufacturing? Part 2: Synthesizing an end-to-end LLM-enabled design and manufacturing workflow, Harvard Data Science Review. (Special issue 5).

46.

Manos

(2006). Value stream mapping-an introduction. Quality Progress, 39(6), 64.

47.

Marewski

J. N.

Mehlhorn

(2011). Using the ACT-R architecture to specify 39 quantitative process models of decision making. Judgment and Decision Making, 6(6), 439–519.

48.

Martin

M. K.

Gonzalez

Lebiere

(2004). Learning to make decisions in dynamic environments: ACT-R plays the Beer Game. In Proceedings of the sixth international conference on cognitive modeling. Carnegie Mellon University/University of Pittsburgh. 178–183.

49.

McCarthy

(2007). From here to human-level AI. Artificial Intelligence, 171(18), 1174–1182.

50.

Nelson

T. O.

Narens

(1994). Why investigate metacognition? In Metacognition: Knowing about knowing (pp. 1–25). The MIT Press.

51.

Newell

(1990). Unified theories of cognition. Harvard University Press.

52.

Oltramari

(2023). Enabling high-level machine reasoning with cognitive neuro-symbolic systems. In Proceedings of the AAAI symposium series (Vol. 2, pp. 360–368).

53.

Oltramari

Francis

Ilievski

Mirzaee

(2021). Generalizable neuro-symbolic systems for commonsense question answering. In Neuro-symbolic artificial intelligence: The state of the art (pp. 294–310). IOS Press.

54.

Paik

Kim

J. W.

Ritter

F. E.

Reitter

(2015). Predicting user performance and learning in human-computer interaction with the Herbal compiler. ACM Transactions on Computer-Human Interaction, 22(5), 25.

55.

Panickssery

Gabrieli

Schulz

Tong

Hubinger

Turner

A. M.

(2023). Steering LLaMA 2 via contrastive activation addition, arXiv preprint arXiv:2312.06681.

56.

Plitt

Russwinkel

(2024). Modeling of individual naturalistic decision making in a cognitive architecture. In Proceedings of the 10th workshop on formal and cognitive reasoning (FCR-2024) at the 47th German conference on artificial intelligence (KI-2024).

57.

Prezenski

Brechmann

Wolff

Russwinkel

(2017). A cognitive modeling approach to strategy formation in dynamic decision making. Frontiers in Psychology, 8, 1335.

58.

Rahani

A. R.

Al-Ashraf

(2012). Production flow analysis through value stream mapping: A lean manufacturing process case study. Procedia Engineering, 41, 1727–1734.

59.

Ritter

F. E.

Schoelles

M. J.

Quigley

K. S.

Klein

L. C.

(2011). Determining the number of model runs: Treating cognitive models as theories by not sampling their behavior. In Human-in-the-loop simulations: Methods and practice (pp. 97–116). Springer-Verlag.

60.

Ritter

F. E.

Tehranchi

Oury

J. D.

(2023). ACT-R: A cognitive architecture for modeling cognition. Wiley Interdisciplinary Reviews: Cognitive Science, 10, 833–838.

61.

Rossit

D. A.

Tohmé

Frutos

(2019). Industry 4.0: Smart scheduling. International Journal of Production Research, 57(12), 3802–3813.

62.

Sloman

S. A.

(1996). The empirical case for two systems of reasoning. Psychological Bulletin, 119(1), 3–22.

63.

Stocco

Sibert

Steine-Hanson

Koh

Laird

J. E.

Lebiere

C. J.

Rosenbloom

(2021). Analysis of the human connectome data supports the notion of a “common model of cognition” for human and human-like intelligence across domains. NeuroImage, 235, 118035.

64.

Sumers

T. R.

Yao

Narasimhan

Griffiths

T. L.

(2023). Cognitive architectures for language agents, arXiv preprint arXiv:2309.02427.

65.

Sun

(2006). The CLARION cognitive architecture: Extending cognitive modeling to social simulation. In Cognition and multi-agent interaction (pp. 79–99).

66.

Sun

(2024). Can a cognitive architecture fundamentally enhance LLMs? Or vice versa?. arXiv preprint arXiv:2401.10444.

67.

Sutton

R. S.

Barto

A. G.

(2018). Reinforcement learning: An introduction, 2nd edn. The MIT Press.

68.

Taatgen

Anderson

J. R.

(2010). The past, present, and future of cognitive architectures. Topics in Cognitive Science, 2(4), 693–704. https://doi.org/10.1111/j.1756-8765.2010.01107.x

69.

Tatiya

Francis

H.-H.

Bisk

Sinapov

(2023). Mosaic: Learning unified multi-sensory object property representations for robot perception, arXiv:2309.08508.

70.

Tehranchi

Bagherzadeh

Ritter

F. E.

(2023). A user model to directly compare two unmodified interfaces: A study of including errors and error corrections in a cognitive user model. Artificial Intelligence for Engineering Design, Analysis and Manufacturing, 37, e27.

71.

Touvron

Lavril

Izacard

Martinet

Lachaux

M.-A.

Lacroix

Roziére

Goyal

Hambro

Azhar

Rodriguez

Joulin

Grave

Lample

LaMA

(2023). Open and efficient foundation language models. https://api.semanticscholar.org/CorpusID:257219404

72.

Tversky

Kahneman

(1974). Judgment under uncertainty: Heuristics and biases. Science (New York, N.Y.), 185, 1124–1131.

73.

Ulrich

Sartorius

Pearson

Jakiela

(1993). Including the value of time in design-for-manufacturing decision making. Management Science, 39(4), 429–447.

74.

Vaswani

Shazeer

Parmar

Uszkoreit

Jones

Gomez

A. N.

Kaiser

Polosukhin

(2017). Attention is all you need. In Proceedings of the 31st international conference on neural information processing systems, NIPS’17 (pp. 6000–6010).

75.

Wang

Lan

Liu

Ouyang

Qin

Philip

S. Y.

(2022). Generalizing to unseen domains: A survey on domain generalization. IEEE Transactions on Knowledge and Data Engineering, 35(8), 8052–8072.

76.

Webb

Holyoak

K. J.

(2022). Emergent analogical reasoning in large language models, arXiv preprint arXiv:2212.09196.

77.

Wray

R. E.

Kirk

J. R.

Laird

J. E.

(2024). Eliciting problem specifications via large language models, arXiv preprint arXiv:2405.12147.

78.

Ferreira

Ritter

F. E.

Walter

(2023). Comparing LLMs for prompt-enhanced ACT-R and Soar model development: A case study in cognitive simulation. In Proceedings of the 38th annual association for the advancement of artificial intelligence (AAAI) conference on artificial intelligence, fall symposium series on integrating cognitive architecture and generative models. Arlington. https://doi.org/10.1609/aaaiss.v2i1.27710

79.

Oltramari

Ritter

(2025). VSM-ACTR 2: A human-like decision making model with metacognition for manufacturing solutions. Computational and Mathematical Organization Theory. https://doi.org/10.1007/s10588-025-09405-5

80.

Oltramari

Ritter

F. E.

(2024). VSM-ACT-R: Toward using cognitive architecture for manufacturing solutions. In Proceedings of the 17th international conference on social computing, behavioral-cultural modeling & prediction and behavior representation in modeling and simulation (SBP-BRIM) (pp. 69–79).

81.

Wulff

D. U.

Mata

(2023). Automated jingle–jangle detection: Using embeddings to tackle taxonomic incommensurability. https://doi.org/10.31234/osf.io/9h7aw.

82.

Yao

Zhao

Shafran

Narasimhan

Cao

(2023). ReAct: Synergizing reasoning and acting in language models. In International conference on learning representations (ICLR).

83.

Yoon

J. S.

Shin

Mazurowski

M. A.

Suk

H. I.

(2024). Domain generalization for medical image analysis: A survey. Proceedings of the IEEE, 112(10), 1583–1609.

84.

Zhu

Simmons

(2024). Bootstrapping cognitive agents with a large language model. In Proceedings of the 2024 AAAI conference on artificial intelligence (Vol. 38, pp. 655–663).

Cognitive LLMs: Toward Human-Like Artificial Intelligence by Integrating Cognitive Architectures and Large Language Models for Manufacturing Decision-Making

Abstract

Keywords

1. Introduction

2.1. Relating Cognitive Psychology to Human-Like Artificial Intelligence

2.2. Common Model of Cognition, Cognitive Architectures, and Cognitive Models

2.3. Decision Intelligence in Manufacturing

2.4. Cognitive Decision Making

2.5. Integration of Cognitive Architectures and LLMs

3. Problem Definition: Design for Manufacturing

4.1. VSM-ACTR, A Human-Like Decision Making Cognitive Model

4.1.1. Declarative Memory

4.1.2. Production Rule Sets

4.2.1. Cognitive Decision-Making Knowledge

4.2.2. Learning An Embedding Space of Decision Traces

4.2.3. Transfer of Learning

5. Experiments

5.1. Use Semantic Mapping to Evaluate Cognitive Decision Making Traces Vector

5.2. Feature Extraction for Behavior Prediction

5.3. Knowledge Transfer

5.3.1. Base Model and Data

5.3.2. Experiment Metrics

5.3.3. Baseline Models

6. Results

6.1. Finding Useful Cognitive Decision Making Embeddings

6.2. Assessing Behavior Complexity Captured by the Innate LLM

Table 2. Evaluation for Single and Multi Facets Targets. Target Type NLL Accuracy Single Facet Target 0.63 0.64 Multi Facets Target 1.18 0.42

Table 3. Comparison of VSM-ACTR With Baselines. Model NLL Accuracy Chance-level 0.6931 0.4826 LlaMa 1.1330 0.3564 LLM-ACTR (ours) 0.6534 0.6576

8.1. Main Insights/Takeaways

8.2. Limitations and Future Work

Credit Author Statement

Footnotes

Acknowledgments

Funding

Declaration of Conflicting Interests

Appendix

References

Table 2.
Evaluation for Single and Multi Facets Targets.

Target Type NLL Accuracy

Single Facet Target 0.63 0.64

Multi Facets Target 1.18 0.42

Table 3.
Comparison of VSM-ACTR With Baselines.

Model NLL Accuracy

Chance-level 0.6931 0.4826

LlaMa 1.1330 0.3564

LLM-ACTR (ours) 0.6534 0.6576