Sage Journals: Discover world-class research

Abstract

Intelligent tutoring systems (ITS) are designed to imitate human tutors by closing-the-loop between learners and tutoring agents. It is well-established that the cognitive factors of self-confidence and workload impact learners’ self-awareness of achievements and self-efficacy, which in turn enhances learning outcomes. However, little work has been done to operationalize these concepts in ITSs for psychomotor learning. In this work, the authors consider learners’ skill progression while repeatedly landing a quadrotor in a simulator. The landing simulator is enabled with automation assistance that can turn on or off; when on, the automation assistance augments the learner’s input to mimic an expert’s landing trajectory. The authors design an algorithm to calibrate learners’ self-confidence to their performance and compare it against learners’ who do not receive any assistance. Statistical analyses revealed that participants who received assistance according to the calibration algorithm demonstrated more self-efficacy and less fatigue than those who did not.

Keywords

automation cognition human factors learning training and education simulation and gaming

Introduction

Modern-day intelligent tutoring systems (ITSs) are designed to imitate human tutors and provide individualized feedback to learners (Woolf, 2008). This is usually accomplished by closing-the-loop between human models and tutoring agents. For example, in conventional learning contexts such as mathematics or computer programing, ITS agents are designed to respond to learners based on their self-confidence. The cognitive factors of self-confidence and workload impact the learner’s self-awareness of their achievements, or self-efficacy (Bandura et al., 1999). It is well-documented that increasing self-efficacy in students is associated with higher performance and motivation to continue learning (Bandura et al., 1999; Jungert & Rosander, 2010; Shuggi et al., 2019). However, developing ITSs may not be as straightforward for psychomotor learning because assessing an answer’s “correctness” in conventional learning contexts is not analogous to assessing how “correct” performance is in a psychomotor task (Neagu et al., 2021). Existing psychomotor learning ITSs face challenges including creating a knowledge space of the task, personalizing the agent to the learner’s characteristics, and maintaining learner motivation (Neagu et al., 2022).

In Yuh, Ortiz, Sommer-Kohrt, et al. (2024), we began tackling some of these challenges using a quadrotor landing simulator module (shown in Figure 1) that can be used to train a human to manually land a quadrotor. To improve the knowledge space of the task, the authors identified four learning stages associated with landing the quadrotor based on psychomotor learning theory (Dreyfus, 2004), the quadrotor state information, and eye gaze trajectories. From this, a rule-based classifier was developed using quadrotor and participant eye gaze trajectories. The learning stage classifier achieved an accuracy of 80.69%. Importantly, this classifier enables online assessment of users’ learning progression over multiple trials of quadrotor landings.

Figure 1.

Quadrotor simulator module and controller.

Intelligence was introduced to the simulator module by creating an assistive mode in which the human’s input is augmented by automation designed to follow a human expert’s landing trajectory (Yuh et al., 2022; Yuh, Rabb, Thorpe, et al., 2024). However, this then introduces the question of when the learner should practice landing the quadrotor manually versus with automation assistance. Motivated by the role of human self-confidence in learning (Akbari & Sahibzada, 2020) as well as the impacts of self-efficacy on the learner’s motivation to learn (Dreyfus, 2004; McQuiggan et al., 2008), we designed an algorithm that makes this mode selection by calibrating the learner’s self-confidence to their performance (Yuh, Rabb, Thorpe, et al., 2024). We benchmarked this algorithm against a strategy with mode selections based solely on learner’s performance and showed statistically significant improvements in learning outcomes.

Despite these promising findings, the proposed psychomotor ITS designed to calibrate learners’ self-confidence via automation assistance has not been compared to the true baseline of human learning without any assistance. Therefore, our objective is to evaluate whether using automation that assists learners based on an algorithm designed to calibrate self-confidence to performance leads to improved learning outcomes in comparison to learners receiving no assistance. In this paper, we first define four learning stages in the context of the quadrotor landing simulator. Next, we describe the apparatus, procedure, and data collection in our experiment design. We follow this with a presentation and discussion of the statistical findings. Finally, we conclude with a summary of our contributions and the implications our findings have for future ITS design.

Learning Stages

In Yuh, Ortiz, Sommer-Kohrt, et al. (2024), the five-stage psychomotor learning theory model by Dreyfus (2004) is applied to the quadrotor landing simulator module using quadrotor state information and eye gaze trajectories. For the scope of this paper, eye gaze is not used because eye-tracking data was not collected. Additionally, eye-tracking may not be easily accessible or interpretable for all psychomotor tasks. The authors assume that most participants are unable to reach the expert (fifth) learning stage within 25 trials. Features of the quadrotor dynamics and landing types characterizing each learning stage are summarized in Table 1.

Table 1.

Learning Stage Descriptions in Terms of Quadrotor Dynamics and Feasible Landing Types adapted from Yuh, Ortiz, Sommer-Kohrt, et al. (2024).

Learning stage	Description	Quadrotor dynamics	Feasible landing type
LS1: Novice	− Decomposes task − Understands goal	Learner struggles to stabilize quadrotor.	UnsuccessfulUnsafe
LS2: Advanced Beginner	− Develops strategies – Identifies sub-tasks − Recognizes potential points of failure	Learner can stabilize quadrotor and land, but quadrotor trajectory is not efficient or consistent.	UnsuccessfulUnsafeSafe
LS3: Competent	− Refines strategy – Prioritizes sub-tasks – Improves consistency – Exhibits high attention load	Learner can control quadrotor. Quadrotor trajectory is less variable.	UnsafeSafe
LS4: Proficient	− Fixed strategy – Improves with practice − Exhibits decreased attention load	Learner can maintain control and land quadrotor consistently.Quadrotor trajectory is efficient.	Safe

In learning stage 1 (LS1), the quadrotor trajectory is unstable and results in unsuccessful or unsafe landings. In learning stage 2 (LS2), the learner demonstrates more control over the quadrotor. Although the quadrotor position trajectory is not the most efficient, the learner can keep the quadrotor stable for most of the trial before crashing. In learning stage 3 (LS3), the learner maintains control over the quadrotor, resulting in a trajectory that is increasingly similar to an expert trajectory. Finally, in learning stage 4 (LS4), the learner can land the quadrotor consistently and efficiently. Example trajectories for each learning stage are provided in Figure 2.

Figure 2.

Examples of quadrotor trajectories categorized in learning stages 1 to 4. The landing pad is represented by a gray box.

Experimental Design

Apparatus

The quadrotor landing simulator was developed in Python 3.6.8 using Pygame 2.0.1. The simulator was originally developed by Byeon et al. (2021) and was adapted for this study. A Thrustmaster T. Hotas 4 joystick and throttle was used to control the quadrotor. The experimental platform is depicted in Figure 1.

Procedure

Prior to the 25 trials, participants are provided with instructions and a description of the experimental setup. Then, participants complete two 60-s tutorials to familiarize themselves with the simulator environment. The participants practice using the throttle by moving the quadrotor up and down and flying the quadrotor using both the throttle and joystick controls in the first and second tutorial, respectively.

Participants then complete 25 trials of landing the quadrotor, as depicted in Figure 3. After each trial, participants are shown their numerical performance score out of 1,000, the time taken to complete the trial, and the landing type (unsuccessful, unsafe, or safe). The performance score is designed specifically for the context of the quadrotor landing using gamification (Scheider et al., 2015). The participants must learn to land the quadrotor on the landing pad, while also satisfying landing speed and roll angle constraints (landing with a speed <5 m/s and a final roll angle between −10∘ and 10∘. Landing types are categorized as either unsuccessful, if they do not meet any of these criteria, unsafe if they meet just the landing pad criteria, and safe, if the landing constraints are also met. The reader is referred to Yuh, Ortiz, Sommer-Kohrt, et al. (2024) for additional details about the landing types and the design of the numerical scoring function.

Figure 3.

Flowchart of experiment. For the algorithm group, mode selection is based on the algorithm. For the manual group, all trials are completed in manual mode.

Participants self-report their self-confidence and mental demand on a scale of 0 to 100. Self-confidence is defined as “The confidence in oneself and one’s powers and abilities” to land the quadrotor. When prompting for mental demand self-reports, the following description of mental demand from the NASA Task Load Index (TLX) survey by (Hart & Staveland, 1988) is used: “How much mental and perceptual activity was required (e.g., thinking, deciding, calculating, remembering, looking, searching, etc.)? Was the task easy or demanding, simple or complex, exacting or forgiving?” To reduce the time participants spent on the survey portion of the experiment to avoid unqualified answers (Fowler & Cosenza, 2009), the full NASA TLX survey was not used. Mental demand is used to infer mental workload.

Participants are randomly assigned to two groups. The first group, known as the algorithm group, may receive automation assistance (which augments the participant’s input to the quadrotor controller) in trials 3 to 20 according to the mode selection algorithm designed to calibrate learners’ self-confidence to their performance shown in Figure 4. The participant is aware of when the automation assistance is on or off. Trials 21 to 25 are completed manually such that initial and final assessments of learning stage, performance, self-confidence, and mental demand in manual mode can be made for each participant. The second group, known as the manual group, completes all 25 trials manually so that the mode selection algorithm can be compared to the case in which learners are never provided with automation assistance. In addition to recording performance metrics and self-reported data, all trials are classified into learning stages 1 to 4 using the quadrotor position, velocity, and roll attitude states as well as the thrust force and roll attitude acceleration inputs.

Figure 4.

Mode selection for the algorithm group. Score and self-confidence (SC) thresholds are determined using 33% and 66% quantiles from prior launches of the experiment.

Data Collection

Quadrotor states and inputs are sampled at 30 Hz. After every trial, the performance score and landing type are assessed, and the participant’s self-confidence and mental workload are self-reported.

Participants

In total, 62 participants completed the study. Our analysis focuses on learners transitioning from novice (LS1) to proficient (LS4). Therefore, eight participants were removed because these participants already achieved LS3 and a safe landing manually within the first two trials in the quadrotor game or only achieved safe landings, meaning that these participants were too advanced in their skill. Among the remaining participants, 30 participants were removed due to never reaching LS3 and LS4. These participants never transitioned past the advanced beginner learning stage. This resulted in 24 participants split evenly between the algorithm (6 male, 6 female, mean age = 24.9 years) and manual (5 male, 7 female, mean age = 21.7 years) groups with participant ages ranging between 18 and 47 years. Each participant was compensated $20/hr. The Institutional Review Board at Purdue University approved the study.

Results and Discussion

To analyze how learning outcomes differ across the two groups of participants, we utilize independent t-tests to compare self-reported data, performance metrics, and achieved learning stages. This is followed by a regression analysis on self-reported self-confidence in both groups.

Independent t-Tests

In Table 2, differences in means of scores, self-confidence, number of LS1-LS4 classifications, and mean number of landing types in trials 21 to 25 are not significant. However, the mean mental workload in trials 21 to 25 is significantly lower for the algorithm group. In other words, participants receiving assistance based on the algorithm achieved similar performance and self-confidence to that of the manual group with less mental workload. The mean of total LS2 classifications and mean of total LS3 and LS4 classifications are significantly lower and higher, respectively, for the algorithm group, likely due to receiving assistance that augments the user’s input, resulting in quadrotor trajectories more like that of an expert.

Table 2.

Independent t-Test Results Between the Algorithm and Manual Groups Including t-Value, Degrees of Freedom (DOF), p-Value, Significance (Sign.), Means, and Standard Deviations.

					Algorithm group		Manual group
Dependent variable	t-value	DOF	p-value	Sign.	Mean	Standard deviation	Mean	Standard deviation
Self-confidence in trials 21 to 25	0.134	118	.894		80.8	20.7	80.2	26.6
Mental Workload in trials 21 to 25	−2.05	118	.0424	*	37.0	19.1	44.8	22.7
Performance scores in trials 21 to 25	0.0802	118	.936		867	183	864	174
Unsuccessful landings in trials 21 to 25	0.196	22	.847		0.833	1.11	0.750	0.965
Unsafe landings in trials 21 to 25	1.25	22	.223		0.667	0.779	0.333	0.492
Safe landings in trials 21 to 25	−0.758	22	.457		3.50	1.51	3.92	1.16
LS1 classifications in trials 21 to 25	−0.158	22	.876		0.917	1.38	1.00	1.21
LS2 classifications in trials 21 to 25	−1.23	22	.233		1.42	1.93	2.33	1.72
LS3 and LS4 classifications in trials 21 to 25	1.31	22	.205		2.67	1.78	1.67	1.97
Total LS1 classifications	−0.668	22	.511		6.75	5.46	8.08	4.23
Total LS2 classifications	−2.09	22	.0489	*	7.58	5.84	12.3	5.10
Total LS3 and LS4 classifications	2.87	22	.00852	**	10.7	6.02	4.67	3.94

Note: Significant variables are in bold.

p < .05, **p < .01, ***p < .001.

Multi-Variate Linear Regression Analysis

Multi-variate linear regression analyses were completed using self-confidence as the dependent variable and available feedback information (trial number, landing type, score, time taken per trial, and mental workload) as independent variables. A Bonferroni test was used to identify outliers in the data (Weisberg, 2005). Tables 3 and 4 show that performance feedback information has a significant impact on self-confidence for both groups. However, the ordinary R-squared values for the algorithm and manual groups were 0.624 and 0.278, respectively. This is important because the extent to which feedback information represented as independent variables explains the variation of self-confidence in the linear regression is better for the algorithm group than that of the manual group. In other words, the algorithm group is more likely to consider the given feedback when self-assessing performance. In fact, with the addition of automation assistance as a regressor, the ordinary R-squared value for the algorithm group increases to 0.726 in Table 5. Using an F-test to compare the model in Table 3 to the nested model in Table 5, we confirm that the addition of automation assistance as an independent variable improves the regression model significantly (F(1,293) = 103, p < 2.20 × 10₋₁₆). This means that the algorithm group self-confidence is impacted significantly by the intelligent automation assistance in addition to the given performance feedback. Self-confidence is known to represent the learner’s self-awareness of their achievements (Bandura et al., 1999), so participants in the algorithm group exhibit better self-awareness of performance, and in turn, better self-efficacy behavior in the quadrotor landing simulator module.

Table 3.

Coefficients, p-Values, and Significance (sign.) for Self-Confidence Regression Models for the Algorithm Group.

Independent variable	Coefficients	p-Value	Sign.
Intercept	34.1	1.29 × 10⁻³	**
Trial	1.25	3.25 × 10⁻¹⁵	***
Unsafe landing	−15.3	4.45 × 10⁻⁴	***
Unsuccessful landings	−8.53	.0949
Performance scores	0.0355	4.69 × 10⁻⁴	***
Time per trial	0.302	3.86 × 10⁻⁶	***
Mental workload	−0.383	3.87 × 10⁻⁹	***
Multiple R²	.624
Adjusted R²	.616

p < .05, **p < .01, ***p < .001.

Table 4.

Coefficients, p-Values, and Significance (Sign.) for Self-Confidence Regression Models for the Manual Group.

Independent variable	Coefficients	p-Value	Sign.
Intercept	48.1	2.26 × 10⁻³	**
Trial	1.31	5.22 × 10⁻⁷	***
Unsafe landing	−16.7	1.99 × 10⁻³	**
Unsuccessful landings	−3.38	.676
Performance scores	0.0202	.185
Time per trial	−0.110	.301
Mental workload	−0.205	8.51 × 10⁻³	**
Multiple R²	.265
Adjusted R²	.250

p < .05, **p < .01, ***p < .001.

Table 5.

Coefficients, p-Values, and Significance (Sign.) for Self-Confidence Regression Model with Assistance Independent Variable for the Algorithm Group.

Independent variable	Coefficients	p-Value	Sign.
Intercept	28.3	1.29 × 10⁻³	**
Trial	0.661	3.25 × 10⁻¹⁵	***
Unsafe landing	−12.2	4.45 × 10⁻⁴	***
Unsuccessful landings	−1.70	.0949
Performance scores	0.0560	4.69 × 10−4	***
Time per trial	0.168	3.86 × 10⁻⁶	**
Mental workload	−0.0270	3.87 × 10⁻⁹	***
Assistance on	−21.5	<2.00 × 10⁻¹⁶	***
Multiple R²	.722
Adjusted R²	.715

p < .05, **p < .01, ***p < .001.

Limitations

We recognize that the main limitation of this work is the sample sizes of the groups. To remedy this, the experiment should be extended beyond 25 trials in the future. The majority of the recruited participants who were removed from the dataset for not reaching LS3 could not achieve a safe landing within 25 trials. Given the difficulty of the quadrotor landing simulator, it is likely that these participants needed more trials to become proficient at the task. This would help increase the sample size within both the algorithm and manual group and improve the statistical power (Cohen, 1977).

Conclusion

In this work, our objective is to evaluate whether using automation that assists learners based on an algorithm designed to calibrate self-confidence to performance leads to improved learning outcomes in comparison to learners receiving no assistance. We evaluate the learning outcomes of the two groups using performance metrics, learning stage progression, and self-reports of self-confidence and mental workload.

Through statistical analyses, we found that participants who received assistance based on their self-reported self-confidence and performance in the quadrotor simulator demonstrated more self-efficacy and less fatigue than those who did not have access to assistance. This aligns with literature on human learning, in which it is recognized that cognitive factors such as self-confidence and workload play an integral role in a student’s self-efficacy and learning strategies, in turn, impacting their learning performance (Hayat et al., 2020). Furthermore, according to seminal self-efficacy literature, if human learners are more self-aware and realistic about their achievements, they are likely to experience future success (Bandura et al., 1999). Our algorithm for determining when to provide automation assistance in ITSs is a response to the challenge of personalizing assistance to learner characteristics and maintaining learner motivation (Neagu et al., 2020) and can be easily implemented in other psychomotor learning contexts where desirable performance and tutoring objectives are difficult to characterize quantitatively.

Footnotes

Acknowledgements

We thank Sooyung Byeon (Purdue University) for the initial development of the game platform and Jacob Hunter (Purdue University) for his help with data collection.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This material is based upon work supported by the National Science Foundation under Award No. 1836952. Any opinions, findings, and conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

ORCID iD

Madeleine S. Yuh

References

Akbari

Sahibzada

(2020). Students’ self-confidence and its impacts on their learning process. International Journal of Social Science Research, 5(1), 1–15.

Bandura

Freeman

W. H.

Lightsey

(1999). Self-efficacy: The exercise of Control. Journal of Cognitive Psychotherapy, 13, 158–166.

Byeon

Sun

Hwang

(2021). Skill-level-based hybrid shared control for human-automation systems. 2021 IEEE international conference on systems, man, and cybernetics (SMC), Melbourne, Australia (pp.1507–1512).

Cohen

(1977). Chapter 1—The concepts of power analysis. In Cohen

(Ed.), Statistical power analysis for the behavioral sciences (pp. 1–17). Academic Press.

Dreyfus

S. E.

(2004). The five-stage model of adult skill acquisition. Bulletin of Science Technology & Society, 24, 177–181.

Fowler

Cosenza

F. C

., (2009). The Sage handbook of applied social research methods. Sage Publications, Inc.

Hart

S. G.

Staveland

L. E.

(1988). Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In Hancock

P. A.

Meshkati

(Eds.), Advances in psychology (pp. 139–183). North-Holland.

Hayat

A. A.

Shateri

Amini

Shokrpour

(2020). Relationships between academic self-efficacy, learning-related emotions, and metacognitive learning strategies with academic performance in medical students: A structural equation model. BMC Medical Education, 20, 76.

Jungert

Rosander

(2010). Self-efficacy and strategies to influence the study environment. Teaching in Higher Education, 15, 647–659.

10.

McQuiggan

S. W.

Mott

B. W.

Lester

J. C.

(2008). Modeling self-efficacy in intelligent tutoring systems: An inductive approach. User Modeling and User-Adapted Interaction, 18, 81–123.

11.

Neagu

L.-M.

Rigaud

Guarnieri

Dascalu

Travadel

. (2022). Selfit v2 – challenges encountered in building a psychomotor intelligent tutoring system. In Crossley

Popescu

(Eds.), Intelligent tutoring systems. ITS 2022. Lecture notes in computer science (Vol. 13284). Springer.

12.

Neagu

L.-M.

Rigaud

Guarnieri

Travadel

Dascalu

(2021). Selfit – An intelligent tutoring system for psychomotor development. In Cristea

A. I.

Troussas

(Eds.), Intelligent tutoring systems (pp. 291–295). Springer International Publishing.

13.

Neagu

L.-M.

Rigaud

Travadel

Dascalu

Rughinis

R.-V.

(2020). Intelligent tutoring systems for psychomotor training – A systematic literature review. In Kumar

Troussas

(Eds.), Intelligent tutoring systems (pp. 335–341). Springer International Publishing.

14.

Scheider

Martin

Kiefer

Sailer

Weiser

(2015, April). Score design for meaningful gamification. Presented at the CHI’15 - Gamifying Research: Strategies, Opportunities, Challenges and Ethics, Seoul, Korea. Seoul, Korea.

15.

Shuggi

I. M.

Ayoub

M. J.

Moreno

Shaw

E. P.

Shewokis

P. A.

Gentili

R. J.

(2019). Motor Performance, mental workload and self-efficacy dynamics during learning of reaching movements throughout multiple practice sessions. Neuroscience, 423, 232–248.

16.

Weisberg

(2005). Applied linear regression. John Wiley & Sons.

17.

Woolf

B. P.

(2008). Building intelligent interactive tutors: Student-centered strategies for revolutionizing e-Learning. Morgan Kaufmann Publishers Inc.

18.

Yuh

M. S.

Byeon

Hwang

Jain

(2022). A heuristic strategy for cognitive state-based feedback control to accelerate human learning. IFAC-PapersOnLine, 55, 107–112.

19.

Yuh

M. S.

Rabb

Thorpe

Jain

(2024). Using reward shaping to train cognitive–based control policies for intelligent tutoring systems. 2024 American Control Conference (ACC). Presented at the 2024 American Control Conference (ACC).

20.

Yuh

M. S.

Ortiz

K. R.

Sommer-Kohrt

K. S.

Oishi

Jain

(2024). Classification of human learning stages via kernel distribution embeddings. IEEE Open Journal of Control Systems, 3, 102–117.

Online Self-Confidence Calibration for Improving Learning Outcomes Via Intelligent Tutoring Systems

Abstract

Keywords

Introduction

Learning Stages

Experimental Design

Apparatus

Procedure

Data Collection

Participants

Results and Discussion

Independent t-Tests

Multi-Variate Linear Regression Analysis

Limitations

Conclusion

Footnotes

Acknowledgements

Declaration of Conflicting Interests

Funding

ORCID iD

References