Sage Journals: Discover world-class research

Abstract

This preliminary study presents HPM-NL, a natural-language-based cognitive modeling tool developed to estimate task completion time (TCT) for upper-limb prosthesis tasks, specifically the clothespin relocation test (CRT). HPM-NL integrates logic from established human performance modeling frameworks (GOMS, CPM-GOMS, ACT-R, QN-MHP, and SOAR) and returns citation-anchored predictions based on user-input task descriptions. A Wilcoxon signed-rank test revealed no statistically significant difference between HPM-NL and Cogulator estimates for a single CRT cycle, suggesting comparable TCT outputs for this specific task. However, HPM-NL’s current scope is limited to single-task modeling in a structured experimental setting, with no assessment of its predictions across diverse tasks, real users, or broader cognitive measures such as workload. Further limitations include reliance on a proprietary large language model, potential citation errors, lack of empirical validation against human-subject performance data, and uncertainty about generalizability. Despite these constraints, HPM-NL provides an early-stage tool for exploring task modeling in prosthesis research.

Keywords

human performance large language model chatgpt artificial intelligence performance modeling human factors prosthesis upper limb

Introduction

The prevalence of limb loss and differences has risen significantly, with over 5.6 million individuals in North America living with amputations (Caruso & Harrington, 2024) and more than 185,000 new cases reported annually. Given the increasing reliance on advanced prosthetic technologies, accurately modeling user interactions is crucial for improving design and usability. Human performance modeling (HPM) plays a key role in predicting human performance, providing a computational alternative to costly human-subject studies (Wickens et al., 2021; Zahabi & Park, 2023). There are five major HPM frameworks have been used and validated from a significant number of studies, including Goals, Operators, Methods, and Selection Rules (GOMS) (Card, 2018); Cognitive, Perceptual, Motor-GOMS (CPM-GOMS) (John & Kieras, 1996); Adaptive Control of Thought-Rational (ACT-R) (Anderson & Lebiere, 2014); Queueing Network—Model Human Processor (QN-MHP) (Liu et al., 2006); and State, Operator, And Result (SOAR) (Laird, 2019; Park & Zahabi, 2024; Zahabi & Park, 2023).

Some of these models, while valuable and well-studied, are still difficult to learn and not easily applicable to real-world or clinical settings by end users. To address these limitations, we aim to introduce generative pre-trained transformer (GPT)-based HPM methodology and tool (HPM-Natural Language; HPM-NL) to predict task completion time (TCT) and workload of an upper-limb prostheses task. We pursued scientifically grounded HPM approach, while having accessibility to clinicians or other researchers, with minimized hallucinations. As it is well-known that chatbots generate hallucinations, instead of pursing generalization, we aim to provide clear modeling logic of HPM-NL and narrow down the scope of the validation process only for TCT and one task (clothespin relocation test; CRT), by comparing HPM-NL generated results and Cogulator generated results (Estes, 2021).

Method

To develop HPM-NL, we implemented a multi-step process designed to systematically gather empirical evidence from the prosthesis domain and integrate it with established HPM principles (e.g., fitts’ law).

Step 1) Numerical database: We used ChatGPT o1 pro (Deep Research) to extract numerical data from 30 years of peer-reviewed articles on HPMs and EMG-based upper-limb prosthetics from high quality venues, including the model parameters, nature of the human subject experiments, and demographic data. This data encompassed TCT and workload ratings (e.g., NASA-TLX, Borg scale). The prompt sets, inclusion criteria, and quality assurance (QA) procedure are released in Open Science Foundation (OSF) for reproducibility.

Step 2) Computational modeling logics: We again employed ChatGPT o1 pro (Deep Research) to systematically extract the assumptions, computational and mathematical modeling logic from five major HPMs. This extraction included details on how each framework processes inputs, handles cognitive and motor subtasks, and generates time estimates and workload predictions.

Step 3) Model integration: This involved integrating the information from Steps 1 and 2 into HPM-NL. The combined dataset and modeling logic served as a knowledge base, allowing HPM-NL to not only reference empirical results but also apply the rules and algorithms of each modeling framework during outcome generation.

Step 4) Ruleset: We developed a structured rule set of output generation, stemming from the logic under GPT, enabling HPM-NL to produce detailed HPM outcomes (e.g., task performance measures in each step of a task). To begin a simulation, HPM-NL requires four essential pieces of information. It first needs to know the user’s skill level, which can range from novice to expert, since different skill levels influence how cognitive, and motor resources are used. It also considers disability status, such as whether the person is able-bodied or has a specific limitation like an upper-limb amputation, which affects physical workload and certain motor assumptions. Additionally, HPM-NL needs a clear description of the task scenario, including the steps involved, the environment in which it takes place, and any relevant constraints. Finally, it requires a clear understanding of the task goal—the intended outcome or purpose of the scenario—which helps anchor the simulation across models that use goal-oriented reasoning.

Once these inputs are gathered, HPM-NL simulates the task using each of the five architectures according to their distinct computational rules. GOMS provides a baseline estimate of task time by applying Keystroke-Level Modeling, assuming expert behavior in routine, predictable tasks. CPM-GOMS extends this by identifying paths of overlapping cognitive and motor activity, allowing for parallel processing and more refined time predictions. ACT-R simulates behavior through its modular architecture, processing perceptual, manual, goal setting, and memory-related activities in cycles. It calculates task time by summing these cycles, and it evaluates cognitive workload by incorporating factors such as base mental effort, how frequently production rules fire, how interruptible the task is, and any skill-related penalties. Physical workload in ACT-R is modeled through the degree of motor involvement and biomechanical strain.

QN-MHP takes a different approach by modeling the user as a system of queuing processors. It measures time based on how long tasks wait in cognitive, perceptual, and motor queues, especially under multitasking conditions. This architecture captures dynamic aspects of human performance, and workload is inferred from the length of queues and competition for limited processing resources. The SOAR method models decision making as a series of cognitive cycles used to select and apply operators toward a goal. It tracks how often impasses arise, how complex operator choices are, and how demanding physical actions are to estimate both time and workload. Notably, only ACT-R, QN-MHP, and SOAR provide estimates for cognitive and physical workload. GOMS and CPM-GOMS focus strictly on the timing of expert performance and do not account for workload directly.

For every task, HPM-NL simulates performance across a virtual population of 30 to 50 participants, using either Monte Carlo methods (Speagle, 2019) or Bayesian sampling (Qian et al., 2003; Rasmussen & Ghahramani, 2003) to account for variability. The results include step-by-step breakdowns, comparisons across models, visual graphs, and tabular summaries. Every output is backed by citations in APA 7th edition style, and the system offers a clear explanation of how each estimate was derived, ensuring complete transparency and traceability. By aligning its outputs with peer-reviewed literature and rigorous modeling logic, HPM-NL provides a scientifically grounded framework for understanding how humans perform tasks under a variety of conditions.

Step 5) Hallucination minimization policy: HPM-NL minimizes hallucinations—defined as unverifiable or fabricated outputs—through a structured approach rooted in computational modeling, empirical grounding, and source transparency. This policy aligns with best practices in cognitive modeling and human performance simulation, ensuring that all predictions (TCT, cognitive workload (CWL), physical workload (PWL)) are reproducible and scientifically validated. First, HPM-NL enforces model-constrained reasoning by the logics in Table 1. Each model is implemented according to its documented theoretical and empirical parameters—for example, ACT-R’s modular cycle times and production rule activations, or QN-MHP’s queuing and processor contention logic. Second, HPM-NL applies a policy of reference traceability. Every computational formula, parameter, and workload estimate is linked directly to a peer-reviewed publication or technical report. No numerical output is presented unless it can be explicitly mapped to an original source. This ensures verifiability and helps prevent data drift or speculative extrapolation. Third, the system emphasizes methodological specificity. Rather than using vague heuristics (e.g., “high workload” or “slow task”), HPM-NL formulates its predictions through parameterized, empirically derived equations such as CWL = α × N_imp + β × D_sg + γ × C_op in SOAR (Laird, 2012). These structured approaches allow CWL and PWL to be inferred through measurable model states and processor activity, which supports reproducibility. To summarize, HPM-NL’s hallucination minimization policy relies on computational fidelity, strict sourcing, and scientific accountability, enabling it to function as a trustworthy cognitive modeling framework.

Table 1.

Modeling Logic in HPM-NL (Preliminary Version).

Model	Metric	Formula / Parameters	Sources
GOMS	Task completion time (TCT)	TCT = Σ(K, P, H, M) operator times• Keystroke (K) ≈ 0.2 s • Mental (M) ≈ 1.2 s • Pointing (P) and Homing (H) based on Fitts’/Hick’s laws	Card (2018); Gray and Altmann (2001); John and Kieras (1996); John et al. (2002)
ACT-R	TCT	TCT = Σ (ModuleTimeₖ) where: • Visual Module: ~85 ms • Manual Module: ~70 ms • Goal Module: ~50 ms • Declarative Memory Retrieval: ~200 ms • Procedural Module (Production Matching): ~50 ms	Anderson et al. (2004); Anderson (2009);
	Cognitive workload (CWL)	CWL = α × Σ (ModuleActivationTime) + β × TaskDemandLevel Where: • ModuleActivationTime: Time each cognitive module is active • TaskDemandLevel: Categorical variable representing task complexity • α, β: Weights determined through regression analysis aligning with subjective workload ratings	Park and Myung (2013)
	Physical Workload (PWL)	PWL = Σ (MotorActionFrequency × EffortPerAction) Where: • MotorActionFrequency: Number of times a motor action is executed • EffortPerAction: Estimated physical effort for each action, possibly derived from biomechanical models	Anderson et al. (1997); Anderson and Lebiere (2014); Salvucci (2006)
QN-MHP	TCT	TCT = Σ (Qᵢ + Sᵢ) where: • Qᵢ = Queuing time at processor i • Sᵢ = Service time at processor i Typical Sᵢ values: • Perceptual: ~100 ms • Cognitive: ~70 ms • Motor: ~70 ms	Liu et al. (2006); Wu and Liu (2007)
	CWL	CWL = U_c × L_c Where: • U_c = Utilization of the cognitive processor (0–1) • L_c = Average queue length at the cognitive processor Alternatively, CWL can be inferred from total cognitive waiting time: CWL = Σ Q_c / N, where N = number of cognitive events	Cao and Liu (2012); Wu and Liu (2007)
	PWL	PWL = U_m × A_m Where: • U_m = Motor processor utilization (0–1) • A_m = Count of motor operations (e.g., key presses, drags) High concurrency increases instantaneous physical load	Liu et al. (2006); Wu and Liu (2007)
SOAR	TCT	TCT = N_dc × t_dc where: • N_dc = Number of decision cycles • t_dc = Time per decision cycle (~50 ms)	Laird (2012); Laird (2022); Newell (1990)
	CWL	CWL = α × N_imp + β × D_sg + γ × C_op Where: • N_imp = Number of impasses • D_sg = Depth of subgoal stack • C_op = Complexity of operator selection • α, β, γ = Weights determined through regression analysis aligning with subjective workload ratings	Boggs (2025); Laird (2012); Rosenbloom (2015)
	PWL	PWL = Σ (MotorCommandFrequency × EffortPerCommand) Where: • MotorCommandFrequency: Number of times a motor command is issued • EffortPerCommand: Estimated physical effort for each command, possibly derived from biomechanical models	Laird (2012); Laird (2022)

A human subject experiment was conducted to validate the performance of HPM-NL (IRB number: REB25-0361) in early 2025. Validation of the HPM-NL was conducted through a between-subjects experimental design comparing two sets of performance outcomes: (1) HPM-NL’s generated TCT predictions (Link) and (2) Cogulator-based TCT predictions (Estes, 2021) (Link) (used only for CPM-GOMS). Twenty-five graduate students (M: 17, F: 8, Age. M = 25.71, Age. STDEV = 2.45) enrolled in human factors engineering and user centered design course (BMEN 631) participated the validation study: modeling clothespin relocation test (CRT) (one cycle) of electromyography (EMG)-based upper limb prosthetic device users (Park et al., 2023; Park & Zahabi, 2020). All students received training on HPM, Cogulator, and HPM-NL throughout the course to ensure familiarity and consistency in their model-building approaches. We hypothesized that TCT from HPM-NL (CPM-GOMS) would exhibit no significant differences from Cogulator-generated TCT. We did not include a plain GPT, as an alternative (4o, o1, or other extensions), as it generated too much fake outcomes.

Result

The modeling logic implemented in HPM-NL from five HPMs are summarized in Table 1. The five models and papers in Table 1 were chosen primarily based on a recent systematic literature review paper: Park and Zahabi (2024). HPM-NL is free, can be searched and used from Explore GPTs’ menu (Link).

A Wilcoxon signed-rank test (Wilcoxon, 1992) was conducted using R to evaluate differences between HPM-NL and Cogulator across four paired observations. Results indicated no statistically significant difference between the two conditions (W = 5.0, p = 1.0), suggesting that the TCT from HPM-NL closely approximates TCT from Cogulator under the one-cycle of CRT of EMG-based upper-limb prosthetic users. Descriptive statistics showed that the mean values for Cogulator (M = 8.38, SD M = 12.34) and HPM-NL (M = 8.43, SD = 9.61) were highly similar, with overlapping ranges and comparable distribution characteristics. These findings support preliminary validity of the human performance model under constrained sample conditions.

Discussion

First, authors emphasize once again that the current version of HPM-NL is not generalizable, but intended only for preliminary use on a particular task (CRT). The small validation study indicated that the HPM-NL workflow can replicate benchmark timing estimates while drastically reducing modelling effort; users gain Cogulator-grade accuracy in seconds and without specialised software training. This equivalence supports the use of HPM-NL as a rapid first-pass estimator during early design or iterative usability studies.

Beyond this initial empirical result, HPM-NL contributes a new “meta-architecture” paradigm: it can instantiate multiple cognitive models within a single simulation and report their outputs side-by-side. This capability encourages cross-model comparison and may stimulate community-driven refinements, enlarging the evidence base for model choice in future HPM research. The structured pipeline—literature-derived parameter database, rule extraction, and citation-anchored output—also provides a reproducible template for deploying large language model (LLM)-assisted model generators in other applications.

In a near future, HPM-NL could offer several tangible potential benefits for the HPM community. By embedding the formal logic of five well-established frameworks behind a natural-language interface, it reduces the steep learning curve that often limits the practical uptake of model-based evaluation. Clinicians and engineers can describe a task in everyday language and immediately receive transparent, fully referenced estimates of TCT and workload. Because every numeric output is tied to a peer-reviewed source and generated under a strict hallucination-minimisation policy, the tool retains scientific traceability while delivering the rapid turnaround that LLM pipelines make possible. Its free availability through the “Explore GPTs” catalogue further lowers barriers to entry and provides an easily deployable teaching and prototyping platform. Additionally, HPM-NL supports on-demand visualization of model components, giving users immediate feedback regarding potential bottlenecks or conflicts within the model. The broad reliance on established human factors literature undergirds these features with both theoretical consistency and empirical credibility. Lastly, the speed and accessibility are other merits of HPM-NL. Previously, modelers reported spending substantially more time aligning model assumptions and searching for relevant parameters, whereas HPM-NL required only brief textual prompts and provided ready-to-use output in under a minute per task. Coupled with its free availability and the absence of proprietary software requirements, this efficiency renders HPM-NL a practical and cost-effective solution for clinicians, engineers, and researchers.

While HPM-NL is delivered through a LLM interface, it does not rely on distributional text regularities alone. Every numeric prediction is produced after the LLM retrieves (Step 1) empirically reported constants from 30 years of prosthesis and HPM literature and (Step 2) instantiates the explicit causal algorithms of five classical frameworks. Consequently, the path from prompt to output can be traced from the user’s natural-language task description, through model-specific equations—e.g., ACT-R module cycle times (Anderson & Lebiere, 2014) or QN-MHP queue–service dynamics (Liu et al., 2006)—to the final TCT or workload estimate. Because the intermediate values and their sources are exposed in the output pane, clinicians and engineers can audit or override any assumption, providing a level of transparency comparable to hand-built CPM-GOMS spreadsheets yet orders-of-magnitude faster.

We nevertheless acknowledge that the generative layer can introduce bias when the input scenario strays beyond the CRT task. To guard against this, our roadmap adopts a hybrid validation strategy that alternates between analytic prediction and empirical calibration. Future work will (i) benchmark HPM-NL against observed TCT and NASA-TLX scores collected from able-bodied and prosthesis users across multi-step, multitasking protocols, (ii) quantify agreement using root-mean-square error and Bland-Altman analysis, and (iii) expose model provenance by listing the specific operators and parameters that most strongly influenced each prediction. By combining token-level traceability with established explanatory constructs—goal stacks in SOAR or processor-queue lengths in QN-MHP—we aim to deliver interpretations that are meaningful to both cognitive scientists and design practitioners while avoiding the “black-box” critiques often levelled at LLMs (Park et al., 2020; Park & Zahabi, 2024).

Practically, we advise treating HPM-NL as a rapid, first-pass hypothesis generator situated within a progressive-fidelity workflow (Wickens et al., 2021). Early in design, its citation-anchored outputs can highlight potential performance bottlenecks within seconds; subsequent high-fidelity simulations or instrumented user tests should then confirm or refine these estimates. In this layered role, the tool complements rather than replaces traditional HPM practice, balancing the accessibility benefits of LLM technology with the empirical rigour demanded in safety-critical healthcare applications.

In Addition to the Issues Above, Several Limitations Must be Acknowledged

First, we should further validate that each computational logic (other than GOMS) in Table 1 is well established, and functions as described in the original references. The original and recent modelers of each of the five HPMs should be involved and provide comments on them. Second, we did not test the variability of the HPM-NL according to the different styles of the queries. In the future study, we need to formulate the styles of queries to stabilize the results. Plus, the study’s validation was limited to a single, highly structured task—the clothespin relocation test—and focused solely on TCT. Other important performance dimensions such as CWL, multitasking, or fatigue effects were not assessed, and HPM-NL’s utility in modeling complex, real-world tasks remain to be demonstrated. The participant pool, without actual prosthesis users, constrains the generalizability of the findings, particularly for assistive technology design contexts where user variability is critical.

Another limitation arises from the technical reliance on a proprietary LLM—specifically, a fixed version of ChatGPT (o1 pro). This dependency introduces a risk of version drift, where future changes in the model’s architecture or behavior could affect HPM-NL’s outputs and compromise reproducibility. Additionally, although HPM-NL enforces strict citation traceability, its automated literature-mining and parameter extraction processes may still propagate errors or inherit biases from the source materials, potentially affecting the reliability.

The study also lacks a baseline comparison with general-purpose GPT prompting. Without this comparison, it is difficult to isolate the added value of HPM-NL’s curated modeling framework over ad-hoc use of LLMs for performance prediction. Furthermore, since the study participants had prior exposure to HPM-NL through classroom training, expectation effects may have influenced how they used the tool or interpreted its outputs, possibly biasing convergence with Cogulator.

Lastly, while hallucination mitigation strategies were applied, the risk of subtle inaccuracies remains inherent to any LLM-based system. Continued validation against empirical human-subject data, alongside community peer review, will be necessary to ensure the robustness and long-term credibility of HPM-NL in applied human performance modeling.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Funding for this research was provided by 2025 Schulich Momentum Fund (University of Calgary). The views and opinions expressed are those of the authors.

References

Anderson

J. R.

(2009). How can the human mind occur in the physical universe? Oxford University Press.

Anderson

J. R.

Bothell

Byrne

M. D.

Douglass

Lebiere

Qin

(2004). An integrated theory of the mind. Psychological Review, 111(4), 1036.

Anderson

J. R.

Lebiere

C. J.

(2014). The atomic components of thought. Psychology Press.

Anderson

J. R.

Matessa

Lebiere

(1997). ACT-R: A theory of higher level cognition and its relation to visual attention. Human–Computer Interaction, 12(4), 439–462.

Boggs

(2025). Towards visual-symbolic integration in the Soar cognitive architecture. Cognitive Systems Research, 91, 101353.

Cao

Liu

(2012). QN-ACTR modeling of multitask performance of dynamic and complex cognitive tasks. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 56(1), 409–413.

Card

S. K.

(2018). The psychology of human-computer interaction. CRC Press.

Caruso

Harrington

(2024). Prevalence of limb loss and limb difference in the United States: Implications for public policy [White paper]. Avalere & Amputee Coalition.

Estes

(2021). Cogulator—A cognitive modeling calculator (Version [if known]) [Computer software]. https://github.com/Cogulator/Cogulator

10.

Gray

W. D.

Altmann

E. M.

(2001). Cognitive modeling and human-computer interaction. Karwowski, 341, 387–391.

11.

John

B. E.

Kieras

D. E.

(1996). The GOMS family of user interface analysis techniques: Comparison and contrast. ACM Transactions on Computer-Human Interaction (TOCHI), 3(4), 320–351.

12.

John

B. E.

Vera

A. H.

Matessa

Freed

M. A.

Remington

(2002, April 20–25). Automating CPM-GOMS. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 147–154). Association for Computing Machinery.

13.

Laird

J. E.

(2012). The Soar cognitive architecture. MIT Press.

14.

Laird

J. E.

(2019). The Soar cognitive architecture (2nd ed.). MIT Press.

15.

Laird

J. E.

(2022). Introduction to Soar [Preprint]. arXiv. https://arxiv.org/abs/2205.03854

16.

Liu

Feyen

Tsimhoni

(2006). Queueing Network-Model Human Processor (QN-MHP) A computational architecture for multitask performance in human-machine systems. ACM Transactions on Computer-Human Interaction (TOCHI), 13(1), 37–70.

17.

Newell

(1990). Unified theories of cognition. Cambridge University Press.

18.

Park

Berman

Dodson

Liu

Armstrong

Huang

Kaber

Ruiz

Zahabi

(2023). Assessing workload in using electromyography (EMG)-based prostheses. Ergonomics, 67(2), 257–273.

19.

Park

McKenzie

Shahini

Zahabi

(2020). Application of cognitive performance modeling for usability evaluation of emergency medical services in-vehicle technology. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting (Vol. 64, No. 1, pp. 1048–1052). SAGE Publications.

20.

Park

Zahabi

(2020). Comparison of cognitive workload assessment techniques in EMG-based prosthetic device studies. In 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC) (pp. 3659–3664). IEEE.

21.

Park

Zahabi

(2024). A review of human performance models for prediction of driver behavior and interactions with in-vehicle technology. Human Factors, 66(4), 1249–1275.

22.

Park

Myung

(2013). Predicting task-related properties of mental workload with ACT-R cognitive architecture. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting (Vol. 57, No. 1, pp. 768–772). SAGE Publications.

23.

Qian

S. S.

Stow

C. A.

Borsuk

M. E.

(2003). On monte carlo methods for bayesian inference. Ecological Modelling, 159(2–3), 269–277.

24.

Rasmussen

C. E.

Ghahramani

(2003). Bayesian Monte Carlo. In Advances in Neural Information Processing Systems (Vol. 15, pp. 505–512). MIT Press.

25.

Rosenbloom

P. S.

(2015). Supraarchitectural capability integration: From Soar to Sigma. In Proceedings of the 13th International Conference on Cognitive Modeling (pp. 136–141). Penn State University.

26.

Salvucci

D. D.

(2006). Modeling driver behavior in a cognitive architecture. Human Factors, 48(2), 362–380. https://doi.org/10.1518/001872006777724417

27.

Speagle

J. S.

(2019). A conceptual introduction to Markov chain Monte Carlo methods. arXiv preprint arXiv:1909.12313. https://arxiv.org/abs/1909.12313

28.

Wickens

C. D.

Helton

W. S.

Hollands

J. G.

Banbury

(2021). Engineering psychology and human performance. Routledge.

29.

Wilcoxon

(1992). Individual comparisons by ranking methods. In Breakthroughs in statistics: Methodology and distribution (pp. 196–202). Springer.

30.

Liu

(2007). Queuing network modeling of driver workload and performance. IEEE Transactions on Intelligent Transportation Systems, 8(3), 528–537. https://doi.org/10.1109/TITS.2007.895233

31.

Zahabi

Park

(2023). Cognitive performance modeling. In Handbook of human-machine systems (pp. 281–289). Wiley.

Human Performance Modeling with Natural Language (HPM-NL) for Upper-limb Prostheses: Generative pre Trained Transformer (GPT)-Based Rapid HPM Under Low Hallucination

Abstract

Keywords

Introduction

Method

Result

Discussion

In Addition to the Issues Above, Several Limitations Must be Acknowledged

Footnotes

Declaration of Conflicting Interests

Funding

References