Abstract
Effective usage of AI-powered cybersecurity tools may be reduced by users’ tendencies to minimize effort, especially when they are fatigued. The current study identified multiple factors impacting user interaction with DeepPasswd, a tool utilizing deep learning to enhance password strength. Data were obtained from a demanding but monotonous simulated work task requiring regular password updating. Younger users were less likely than older ones to use the tool. Individual differences in anxiety, trust and fatigue were associated with frequency of password tweaks while using the tool, taken as index of engagement. Data on work task performance suggested users vary in their willingness to apply effort. Mitigation of tool neglect and security fatigue requires greater understanding of how users tradeoff effort against enhanced security.
Introduction
System users notoriously create weak passwords, often choosing convenience over password strength (Shen et al., 2016). Current countermeasures such as providing feedback on password strength or enforcing requirements are of limited efficacy (St Clair et al., 2006). Advancements in Artificial Intelligence (AI) can contribute to cybersecurity but utilization of AI tools may be impaired by human factors issues such as security fatigue and lack of trust. The current study investigated how task and individual-difference factors may impact usage of a novel AI-powered password-setting tool.
We investigated DeepPasswd (Pasquini et al., 2020), which is the first interpretable probabilistic password strength meter. It relies on deep learning to estimate the strength of every single character of the password. It assigns a security score to each atomic component of the password and allows the user to improve password strength by tweaking a few “weak” characters. Users can salvage most of their password selection (improving memorability) while the meter ensures the final password is sufficiently strong.
Usage of password and other security tools is limited by effort-minimization, a fundamental bias in human decision-making that primes users to find an easy “satisficing” solution that is “good enough” for the problem at hand (Kool et al., 2010). Effort-minimization is accentuated by “security fatigue” (Nobles, 2022). When users feel burdened by security messages and demands for compliance, they reach a cognitive saturation point at which they become desensitized against security recommendations. Security fatigue may be heightened by high workload and poor device usability. Thus, the challenge for use of AI-powered systems such as DeepPasswd is to motivate the user to accept the additional mental workload cost of using the web interface for password generation in return for the security benefits. The user’s management of the effort-security tradeoff is likely to depend on multiple factors, as follows.
Demographic Factors
The stereotype that older users are less confident in technology use and less competent in maintaining security is simplistic. Branley-Bell et al. (2022) found that older users were relatively neglectful of the physical security of devices. However, older age correlated with several behaviors promoting security, including secure password generation. Personal experience also plays a role. Cheng et al. (2020) found that cybercrime victimization was associated with mistrust of people online, which might motivate use of cybersecurity tools.
Trust in AI
Cyber defense cannot be fully automated so trust optimization is critical. Under-trust is the primary vulnerability in AI-supported password selection, given users’ preferences for simple, low-effort password generation, but over-trust could lead to neglect of other security threats. Trust depends on both design and user attributes, including anxiety about computers and AI, beliefs that automated systems should be perfect, and others (Matthews et al., 2024).
Workload, Stress, and Fatigue
Stress and fatigue impair the allocation of attention and regulation of effort in multitasking environments (Matthews et al., 2019). Support from AI and automation can be double-edged in adverse environments. Benefits such as workload mitigation and enhanced decision-making can be offset by concerns about ceding control to the machine and the cognitive load of evaluating machine performance. Security fatigue may impair password management especially when the user is overloaded as well as fatigued.
The current research aimed to investigate multiple factors that might influence use of the DeepPasswd tool (Pasquini et al., 2020) for generating a strong but memorable password, within a simulated work environment. We utilized an online, text-based task that required participants to make rule-based decisions on mortgage applications. Certain, “sensitive” applications required password updating. At each update, participants could choose to access DeepPasswd or set their own password.
We manipulated two factors that we expected to impact security fatigue and motivations to engage with DeepPasswd. First, we manipulated the cognitive demands of the primary, decision-making task by varying the rate at which emails arrived for processing by the participant. We hypothesized that, at a higher email rate, security fatigue would be more likely to develop, and participants would use DeepPasswd less frequently and interact with the tool less on each occasion of use. These effects were expected to be stronger in the second half of the 1-hr task interval.
Second, we attempted to manipulate user motivation to engage with the tool by comparing neutral and positive AI credibility inductions. Increasing the credibility of AI enhances its influence on the human (Allan et al., 2021). The neutral induction described the principles of DeepPasswd in abstract terms, whereas the positive induction also emphasized the unique capabilities of AI-driven tools for protecting users against threats from hackers. We anticipated that the positive induction would increase engagement with DeepPasswd, especially under conditions of higher cognitive demand.
We also assessed various personal qualities, including age and experience of being a victim of computer-based crime. We expected older participants and crime victims would be more likely to use the tool (Branley-Bell et al., 2022). We measured dispositional attitudes toward computers and AI associated with anxiety and trust. We expected that both high internet anxiety (Joiner, 2007) and high expectations of automated systems (Merritt et al., 2015) would prompt greater usage of DeepPasswd. Following task performance, we secured subjective measures of trust, stress, fatigue, and workload, to further assess the participant’s mental state during decision-making.
Method
Participants
Three hundred seventy-six adults located in the United States were recruited using Amazon Mechanical Turk (AMT). Participants met screening criteria based on their AMT Human Intelligence Task (HIT) performance history. Baseline criteria were: (a) an approval rating ≥95% for previously completed HITs; and (b) an approval of ≥1,000 HITs prior to participating in this study. One hundred twenty-four responses were removed from the final sample due to inactivity or indiscriminate response patterns during task completion. The final sample comprised 252 participants (174 males, 77 females, 1 unreported) aged 22 to 67 (
Materials and Measures
Mortgage Broker Decision-Making Task
The participant acted as a mortgage broker making binary approval/denial decisions on incoming mortgage loan applications. Applications were delivered to participants via a simulated email inbox. There were two types of email. Basic emails did not need a password to read. Sensitive emails were password protected, and required participants to update that password every time the email was accessed. Participants selected the top email in the inbox and read numeric information relevant to the application, such as income verification, credit score, and property value appraisal. They had to approve or deny applications based on this information according to prescribed decision rules. For example, the application should be approved if the applicant’s income from multiple sources exceeded a criterion limit, unless the applicant’s credit score was lower than a value provided in the email. Participants selected an “approve” or “deny” button at the bottom of the email. The software logged “hits” (correct acceptances) and “false alarms” (incorrect acceptances).
Dependent measures were calculated for each half hour interval of the task. These included the number of emails processed (rate of work), and two indices of decision-making performance derived from signal detection theory.
DeepPsswd AI Tool
To assist password-setting, participants could access DeepPsswd. Participants could “tweak” individual characters repeatedly until the user was satisfied and the tool indicated that the password was strong. Dependent measures were proportion of occasions in which the participant used DeepPsswd when available, and mean frequency of password tweaks (Figure 1).

Training slide to introduce the DeepPsswd tool; “ILuvdogs7!” can be changed to “!Luvd0g$7!.”
Procedure
After consenting, participants provided demographic information, pre-task subjective measures, and viewed a narrated training video based on a sequence of Powerpoint slides. The video highlighted the importance of strong password setting and provided a tutorial on the decision-making task and the DeepPsswd tool. Participants were instructed to (a) accurately accept and reject applicants based on the defined criteria, and (b) set strong and memorable passwords to protect sensitive applicant information.
Participants were assigned at random to one of four experimental conditions based on a 2 × 2 between-groups design. The task manipulations were email rate (high; low) and AI induction (positive; neutral). Email rate was manipulated to vary time pressure and task demands. Participants received basic emails at intervals of 60 s (high event rate) and 15 s (low), respectively. Both groups received sensitive emails at intervals of 300 s. Participants receiving the positive AI induction viewed additional materials in the training video (see Figure 2). These included a positive promotion for AI technology, as well as information on the advantages of DeepPsswd. The neutral condition provided task instructions but no additional promotion of the tool or AI technology. Task duration was 60 min, followed by completion of post-task questionnaires.

Example slide from the positive AI induction.
Results
Effects of Experimental Factors
Password Usage
Effects of the experimental factors on password tool use and password tweaks were analyzed using mixed-model 2 × 2 × 2 (Interval × Email rate × Induction) ANOVAs, with repeated-measures on the interval factor. The only significant effect on the tool use ratio was the main effect of interval,
The analysis of password tweaks was based on the 143 participants who utilized the password tool at least once in each interval. A natural log transform was applied to reduce negative skew. ANOVA showed a significant main effect of induction,
Decision-Making Performance
Three 2 × 2 × 2 (Interval × Email rate × Induction) ANOVAs were run to determine effects on number of emails completed, perceptual sensitivity (
The
The
Individual Differences
Four-step hierarchical regressions were run to identify individual difference predictors of each behavioral measure. Step 1 added the two experimental factors, email rate and induction, effect-coded with values of 1 and −1, and the centered product term for their interaction. At Step 2, the three dispositional scales were included, followed by the four state scales at Step 3.
For password tool use, none of the steps were significant. For password tweaks, each step added significantly to the variance explained. The effect for the induction reflects the tendency for tweaks to be less frequent with a positive induction. In addition, participants low in internet anxiety but high in state distress and engagement tweaked the password more frequently, as shown in Table 1.
Predictors of Password Tweak Frequency.
Exp. = Experimental.
Table 2 provides summary statistics for the decision-making performance regressions. The strongest predictor of emails completed was the email rate. Dispositional anxiety was associated with a higher completion rate, but state trust and distress were related to a lower rate. The regression analysis for
Predictors of Decision-Making Performance.
Exp. = Experimental; Disp. = Dispositional; Anx. = Anxiety; Engage. = Task Engagement; Expect. = Expectations.
Password Tool Use: Contrast Sets Analysis
We further investigated password tool use by analysis of Contrast Sets (Bay & Pazzani, 2001), that is, conjunctions of attributes and values that differ meaningfully in their distribution across groups. For instance, if we take the attribute Age and discretize it in bins, we can test the hypothesis that different age groups differ in usage of the password tool by applying a chi squared test. The analysis is also applied to conjunctions of multiple attributes. We tested multiple attributes and their conjunctions in this way. Here, we focus on two for which the analysis showed significant effects: age, and victimhood status. Further analyses are reported by Matthews et al. (2023).
For both age and victimhood, we allocated participants to one of three bins, as shown in Table 3. Victim status was assigned according to responses to questions on experiencing online fraud and phishing attacks. Similarly, participants were categorized according to whether they always used the password tool, on some occasions, or never. The table shows the percentage of participants in each category as a function of the age and victimhood bins. The two older groups were more likely to use the tool at least sometimes, compared with the youngest group. Half of younger adults entirely neglected the tool. Users also varied in whether they adopted consistent “all-or-none” strategy, that is, always or never using the tool, versus using it sometimes, implying a more flexible strategy. The youngest age group tended to be all-or-none, whereas the oldest age group showed the highest proportion of using the tool sometimes. Similarly, those reporting definite experience of being victims were over-represented in the “sometimes-used” category, suggesting being a victim of cybercrime may encourage flexibility in password tool use.
Tool Use Frequency, by Contrast Set Bins.
Discussion
In this study, we created an environment that simulated routine, monotonous office work. Data suggested that we were successful in producing a fatigue state. Use of DeepPsswd declined significantly over the two 30-min halves of the work period consistent with reports of diminishing resources for security activities when demands are high (Olt & Mesbah, 2019). In addition, decision-making performance was characterized by a rather lax, inaccurate response style characterized by low perceptual sensitivity and a low response criterion.
We hypothesized that the manipulations of email rate and type of induction would influence password tool use, but these hypotheses were not supported. The impact of manipulations may not have countered the monotony of the task and participants’ tendencies to minimize effort. The positive induction produced lower frequencies of password tweaks in three out of four experimental conditions. While the induction did not appear to produce greater engagement with the tool, it may have facilitated more efficient use of it.
Several individual difference factors were associated with password tool usage. These included both stable, dispositional factors and measures of the person’s state of mind while performing the task. Individual difference factors related to frequency of password tweaks (but not to tool use). Internet anxiety was negatively associated with tweak frequency consistent with its status as an avoidance emotion that is negatively related to computer use in various contexts (Maricutoiu, 2014). Apprehensive users were willing to use the password tool but they minimized their interactions with it.
Both task engagement and distress as measured by the DSSQ, were associated with a higher tweak frequency. Task engagement is linked to motivation and task-directed effort (Matthews, 2021). Conversely, low task engagement is a fatigue state which appeared to be associated with reduced motivation to interact with the tool by trying different passwords. The association between distress and more frequent password tweaks might also reflect motivation, if distress is a driver of coping efforts.
There were various significant relationships between the individual difference factors and performance on the decision-making task. Dispositional internet anxiety predicted a higher rate of email processing, poorer discrimination of mortgage applications, and a lower response criterion. Anxiety was related with a fast but careless style of response which may reflect a strategy to avoid deep engagement with the computer-based task. The association between anxiety and reduced password tweaks may also express this avoidant strategy. High expectations (PAS) were also negatively associated with both sensitivity and lower response criterion.
Higher state trust in DeepPsswd correlated with higher sensitivity and higher response criterion, implying greater involvement with the task. Thus, although both PAS high expectations and the Jian scale are trust measures, they appear to signal differing strategies toward performing computer-mediated tasks. There may be a conceptual distinction between “passive trust” associated with broad optimism about task outcomes and “active trust” related to beliefs that active, effortful engagement with the system is necessary to accomplish desired outcomes. Of the DSSQ variables, worry impaired accuracy of discrimination, and task engagement was associated with a higher response criterion (in this context, greater task involvement), broadly consistent with previous stress state research (Matthews, 2021).
Consistent with Branley-Bell et al.’s (2022) findings, older individuals were more likely to use the tool; Indeed, despite their presumed tech savviness, around 50% of the youngest age group never used it all. However, the oldest group in the present study were typically middle-aged rather than “old”; mean age was 47.4. The contrast sets analysis also suggested more subtle effects on the extent to which the participant adopted an all-or-nothing strategy for tool use, as opposed to using the tool only some of the time, a strategy characteristic of those previously victimized.
The present findings highlight the practical challenges of maintaining user engagement with tools that enhance security Despite the novelty of DeepPasswd to participants, almost 30% never used the tool at all, and the frequency of tool use declined during the 1-hr task. In a real work situation, further declines in tool use over weeks or months might be anticipated. The role of individual and group differences in tool use suggests possible intervention strategies. Personal factors including higher immediate task engagement and distress and trust in the Password tool during work activities were also linked to indicators of active strategy use. Conversely, high levels of internet anxiety appear to be related to an avoidant strategy of rapid but careless work with fewer password tweaks. Motivating primary work activities may have the beneficial side effect of also encouraging security tool use. Results also support the value of training efforts to overcome complacency and neglect of security tools, especially for younger users.
Study limitations include the use of a simulation method. Users are likely to have greater motivations toward maintaining security in real life, especially in skilled occupations. More systematic sampling in relation to demographic factors and levels of computer and cybersecurity expertise would also be desirable in future studies. The simulation may have been overly effective in producing security fatigue in the online context, in that we had to discard data from a substantial number of participants who showed minimal engagement in the decision-making task. In-person supervision of performance may be preferable for fatigue studies. Finally, the study focused on user rather than usability issues. An enhanced interface design might elevate user motivation and tool use.
In conclusion, AI-driven tools such as DeepPasswd have great potential for improving cybersecurity, but their widespread adoption may be limited by users’ tendencies to minimize effort, especially when security fatigue develops. The overarching factor driving tool usage may be the extent to which the user’s task strategy promotes proactive, effortful engagement as opposed to coasting through task activities with minimal effort. Multiple task and personal factors influence the tradeoff countering threats and conserving effort.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was sponsored by the Virginia Commonwealth Cyber Initiative program on Impact of Human Behavior on Resilient Cyber Security Systems.
