Like Shooting Phish in a Barrel: Cue Utilization and Cognitive Reflection Aid Performance in Controlled,but Not Naturalistic Phishing Tasks

Abstract

The study tested the role of cue utilization and cognitive reflection tendencies in email users’ phishing decision capabilities in both controlled and naturalistic settings. 94 university students completed measures of their phishing cue utilization and cognitive reflection, a phishing decision task, and a naturalistic simulated phishing campaign, in which they were sent simulated phishing emails to their personal inboxes. For the phishing decision task, results revealed that participants with lower cognitive reflection tendencies were more likely to misclassify genuine emails as phishing, compared to participants with higher cognitive reflection. Further, participants with higher cognitive reflection and lower cue utilization took the most time to diagnose emails, but participants low in both cue utilization and cognitive reflection demonstrated the shortest response latencies. These findings suggest that greater cognitive reflection can offset lower levels of cue utilization. For the naturalistic simulation, neither cue utilization nor cognitive reflection predicted an increased propensity to interact with a suspicious email. This result highlights a potential gap between phishing investigations conducted in controlled and naturalistic settings. The implications extend to future research, emphasizing the need for studies that employ naturalistic methodologies to better understand and address phishing threats in real-world environments.

Keywords

cue utilization cognitive reflection phishing cyber security EXPERTise 2.0

Introduction

Online criminal activity is a serious and escalating threat impacting individuals, organizations, and governments both in Australia and world-wide. Social engineering attacks, which involve the psychological manipulation of individuals to alter their behavior and prompt actions that compromise security, are particularly concerning. Notably, 31% of these attacks are conducted via phishing emails (Verizon, 2024), which are fraudulent messages designed to deceive recipients into revealing sensitive information, such as financial details or passwords. As the last line of defense, email users must determine whether an email is legitimate or fraudulent, making human error a significant vulnerability in cybersecurity systems (Herzberg, 2009).

A popular framework for understanding decision-making in naturalistic contexts is the heuristics and biases approach (Tversky & Kahneman, 1974). When equipped with limited information, or under conditions of uncertainty or urgency, people will frequently employ heuristics (i.e., mental “rules of thumb”) that leverage previous experience to problem-solve and conserve cognitive resources (Dumm et al., 2020; Williams et al., 2018). However, while typically useful, an overreliance on heuristics can result in judgment errors and faulty decision-making (i.e., biases) (Gigerenzer & Gaissmaier, 2011; Kahneman & Egan, 2011). Senders use social engineering techniques within the body of an email (e.g., creating a sense of urgency or offering a reward) to capitalize on these judgment errors, and as such, research has investigated if a more rational or reflective approach to reasoning would reduce the potential for human error (Moody et al., 2017; Workman, 2008).

Cognitive Reflection

The dual-processing system (DPS; Evans, 2008) outlines two distinct cognitive pathways for decision-making: System 1 (S1) is characterized by intuitive thinking, which describes an unconscious, instinctive, and rapid process; and System 2 (S2) is a more deliberate, slow, and intentional thinking pathway. The DPS proposes that heuristics (S1) generate automatic behavioral responses, often described as unskilled intuition, unless analytic reasoning (S2) intervenes and inhibits the behavior. Similarly, the Elaboration Likelihood Model (ELM) presents a dual-cognitive model (Petty & Cacioppo, 1986). The model suggests that individuals using the central information processing pathway activate two sub-processes: attention, which is described as mental focus, and elaboration, which involves forming connections between the present experience and past knowledge (Harrison et al., 2016).

Although the supposition of pathway independence has been questioned (Baron et al., 2015; Klein, 2011; Trémolière & Bonnefon, 2014), a central tenet of this model is the proposed benefit of the S2 pathway, whereby individuals engage deeply with stimuli or contextual features such as those found in phishing emails (Stanovich & West, 2000). The assumption is that upon receipt of an email, the S2 pathway may prompt a recipient to continue the information search and carefully examine the email for cues of authenticity, thus decreasing their chances of victimization in cybercrime (Harrison et al., 2016; Luo et al., 2013; Vishwanath et al., 2016; Vishwanath et al., 2011).

A body of research has implicated cognitive reflection as a significant activator of the systematic thinking pathway (Isler et al., 2020; Kahneman & Klein, 2009; Pennycook & Rand, 2019). Cognitive reflection is defined as an information processing technique and is measured by behavioral outcomes characterized along a spectrum of rapid, impulsive decision-making or slower, rational, and stepwise approaches (Frederick, 2005). Frederick’s (2005) Cognitive Reflection Test (CRT) operationalizes cognitive reflection as a measure of impulsivity requiring participants to complete a three-item mathematical questionnaire. Each item is designed to evoke an instinctively incorrect answer, which avoids cognitive strain or reflection (i.e., S2 processing) and relies on the heuristics-biases pathway (i.e., S1 processing) for decision-making.

The CRT has been previously used to examine decision-making with reference to phishing emails. In a study by Jones et al. (2019), 224 university students and staff completed an email judgment task requiring them to discriminate between legitimate and phishing emails, as well as complete a battery of cognitive tasks relating to individual differences, including participants’ cognitive reflection. The results revealed that individuals with a greater capacity for cognitive reflection were more successful in discriminating phishing emails from authentic communication (Jones et al., 2019). Thus, measuring cognitive reflection tendencies can be useful in predicting phishing victimization.

The CRT has faced criticism in the literature regarding its validity and reliability, with some studies citing that the wide-spread use of the test, particularly the original version (Frederick, 2005), has created an over-exposure and familiarity with the test items (Baron et al., 2015; Chandler et al., 2014). Other reports suggest that variables such as mathematical ability and general intelligence factors have the potential to confound participants’ typical cognitive impulsivity (Toplak et al., 2011). To address these issues, Thomson and Oppenheimer (2016) designed the CRT–2, which uses questions that limit a reliance on respondents’ mathematical ability to generate correct answers and thus absolving it from one of the key critiques of the original CRT.

Cheng and Janssen (2019) aimed to validate the CRT-2 by examining its relationship with the conceptually related intertemporal choice task, which measures preference for immediate smaller rewards (i.e., impulsivity) or later larger rewards (i.e., reflective, long-term orientation). Results from 139 college students showed a significant positive association between correct responses on the CRT-2 (i.e., higher cognitive reflection) and fewer impulsive choices across two hypothetical gain and payment conditions. The study also reported a positive correlation between impulsive choices and intuitive errors (compared to non-intuitive errors) on the CRT-2. This supports the measure’s construct validity in assessing information processing preferences and implications for reflective decision-making (Cheng & Janssen, 2019).

In line with these findings, Isler et al. (2020) compared methods of activating reflective thinking using the CRT-2 to measure participants’ performance on common behavioral manipulations used in online research. 1748 adult participants were assigned to five experimental conditions (e.g., time delay, memory recall, and decision justification). The results demonstrated that reflective thinking (i.e., higher scores on the CRT-2) was successfully activated when participants were asked to justify their decisions or were given training to increase reflection and awareness of cognitive biases. This suggests that the CRT-2 may be useful in distinguishing between reflective thinking and critical analysis, and other aspects of information processing such as attention and recall (Harrison et al., 2016; Petty & Cacioppo, 1986; Thomson & M. Oppenheimer, 2016).

Skilled Intuition

It is important to note that not all decision-making will neatly assemble into either pathway presented in the dual processing models (Stanovich & West, 2000; Trémolière & Bonnefon, 2014). While unskilled intuitive processes (e.g., heuristics and biases) are theorized to reside or operate via S1 channels, the development of skilled intuition is also likely to leverage S1 architecture. This describes the expert ability to react rapidly to an extenuating circumstance, often demonstrating better performance outcomes compared to their less experienced counterparts (French & Nevett, 1993; Klein et al., 1986).

Klein and Klinger (1991) pioneered an investigation into expert decision-making by interviewing fire commanders, and reported that under critical and complex conditions, these experts typically generate only one plausible course of action rather than the hypothesized systemic appraisal of several plausible actions. The Recognition-Primed Decision (RPD; Klein & Klinger, 1991) model describes a process for developing skilled intuition, a blended S1 and S2 information processing pathway where domain experts capitalize on pattern recognition skills to create mental simulations for how the events or process might unfold and then evaluate the merits of the first viable course of action. The repertoire of memories available to skilled decision-makers is presumed to reduce uncertainty and accelerate appropriate responses (Baron et al., 2015; Baylor, 2001; French & Nevett, 1993; Klein, 1993, 2011; Klein & Klinger, 1991; Weick et al., 2005).

Cue Utilization

The foundation of skilled intuition is thought to originate with cues, which are the associations between features and events that exist in the environment or in memory (Wiggins et al., 2018). With repeated exposure to cues, a connection to an object or event is reinforced for later recall (Wiggins, 2015). The grouping of multiple cues is presumed to create a mental model (i.e., pattern) of the expected cue sequence, which leads to a comprehensive understanding or sensemaking of the entire process, event, or object (Gacasan & Wiggins, 2017; Weick, Sutcliffe, & Obstfeld, 2005Weick et al., 2005). Cue utilization is therefore the capacity to recognize and respond to the demands of a task or process, by mentally organizing specific cues (Gacasan & Wiggins, 2017). Studies have linked cue utilization with performance across several domains including piloted aircrafts (Renshaw & Wiggins, 2017; Wiggins et al., 2018), disaster recovery project management (Gacasan & Wiggins, 2017), driving simulations (Yuris et al., 2019), collision avoidance at sea (Chauvin & Lardjane, 2008), and emergency triaging for nurses (Reay & Rankin, 2013).

Schriver et al. (2008) used a flight simulator to examine the differences between expert and less-skilled pilot decision-making. Results showed that expert pilots made more accurate and timely decisions, used less cues overall and applied more attention to relevant diagnostic cues when compared to the novice pilots. This outcome reflects the RPD model, which describes domain experts use of cue pattern recognition skills as an important facet or precursor to skilled intuition and cue utilization (Gacasan & Wiggins, 2017; Stanovich & West, 2000). However, contrary to expectations, in the failure simulation (i.e., flight problems), both expert and novice pilots demonstrated greater accuracy in problem diagnoses for conditions where cues were less correlated to each other (i.e., randomized), than for more correlated cues (i.e., typical or expected).

The authors suggested that in the absence of an existing mental model, both novice and expert pilots applied similar attentional and analytic resources (i.e., S2 thinking pathway) for effective decision making (Schriver et al., 2008). Activation of S2 decision making pathways (i.e., cognitive reflection) may therefore be the first step in facilitating skill development via the acquisition of new diagnostic cues (i.e., experience), which over time develops into pattern recognition and subsequently cue utilization occurring in the S1 thinking pathway (Kahneman & Klein, 2009; Klein, 2008; Klein & Klinger, 1991). As such there may be an important interaction between cognitive reflection and cue utilization in explaining how individuals make accurate decisions (Endsley, 1995; Jones et al., 2019; Pennycook & Rand, 2019; Schriver et al., 2008).

Cue utilization may also play an important role in a cyber security context by reducing cognitive load where the failure to recognize and take appropriate action can result in a significant safety breach (Wiggins, 2015; Wiggins et al., 2014). Wiggins (2021) suggests that participants are classified as having greater or lesser cue utilization skills based on domain-specific performance indicators, specifically, their ability for Recognition, Association, Prioritization, Identification, and Discrimination (RAPID) of cues.

Bayl-Smith et al. (2020) used the phishing edition of EXPERTise 2.0 (Brouwers et al., 2016; Wiggins, 2016; Wiggins et al., 2015), a web-based tool to assess participants’ cue utilization across the five different (RAPID) facets. Participants were instructed to complete five scenario-based tasks where they must recognize or classify email authenticity, assign strength of association between phishing concepts, prioritize or rank feature importance, identify diagnostic features, and discriminate between domain-related or unrelated features (Wiggins, 2016; Wiggins et al., 2018). Participants were also required to identify suspicious features in emails and classify them as legitimate or untrustworthy. Using a k-means cluster analysis, participants were successfully grouped into those demonstrating relatively higher or lower cue utilization skills. Participants with higher cue utilization were more accurate in identifying phishing features than those in lower cue utilization (Bayl-Smith et al., 2020). Additionally, it was reported that the average email deliberation time (i.e., cognitive reflection) had a positive impact upon participants’ ability to recognize key email features that indicated suspicion. This supports the notion that participants demonstrating higher cue utilization (i.e., S1 thinking) or higher cognitive reflection (i.e., S2 thinking) were better at identifying features that do not fit with the expected patterns of the situation.

In a similar study investigating the relationship between email users’ cue utilization and their phishing detection, Nasser et al. (2020) asked 50 participants to complete a dual-task exercise, which required them to complete a phishing detection task while also completing a rail control task with increasing complexity (i.e., cognitive load). Participants’ relative cue utilization was again distinguished using the cyber version of EXPERTise 2.0. Results revealed that users with relatively higher cue utilization had greater accuracy in discriminating email authenticity. However, in contrast to expectations, high cue utilizers did not demonstrate an advantage over low cue utilizers under conditions of increasing cognitive load. This may be attributed to practice effects (Duff et al., 2007) and design limitations when reporting cue sources (Nasser et al., 2020). Overall, the findings, taken together with those of Bayl-Smith et al. (2020) suggest a clear advantage for email users higher in cue utilization in the detection of malicious emails.

Ackerley et al. (2022) recently extended on the work of Nasser et al. (2020), for the first time examining the potential interplay between cue utilization and cognitive reflection in email users’ ability to efficiently differentiate between phishing and genuine emails. Participants completed the original Cognitive Reflection Test (CRT), a laboratory-based phishing diagnostic task, and the EXPERTise 2.0 battery. The results revealed an interaction between users’ cognitive utilization and cue reflection, whereby participants relatively low in both domains performed significantly worse in diagnosing phishing emails compared to others. They concluded that a high level of cognitive reflection was able to compensate for a lower level of cue utilization, and vice versa.

While a novel contribution, Ackerley et al. (2022) note the artificial nature of the phishing diagnostic performance measure they used, which may have yielded several experimental artefacts. For example, requiring participants to detect phishing emails from a sample of emails may have elicited expectation effects not present during real-world decision making. Further, such explicit directions may have primed participants to engage in greater cognitive reflection during their decision making, resulting in a distorted view of their natural tendencies beyond the study. The authors underline the need to test the generalizability of their results beyond controlled phishing diagnosis tasks via the use of naturalistic study techniques, for instance, employing simulated phishing emails sent to participants’ real inboxes sporadically.

Study Aims

The aim of this study is to test: 1) the role of cue utilization and cognitive reflection tendencies in email users’ phishing email diagnostic capabilities and 2) whether any differences exist when comparing capabilities in controlled versus naturalistic settings.

The notion of skilled intuition brought awareness to the fact that not all individuals need to access S2 thinking pathways for good decision-making (Kahneman & Klein, 2009; Klein, 1993, 2011; Klein & Klinger, 1991; Weick et al., 2005). Previous research suggests that individuals with relatively high cue utilization will perform better than those lower in cue utilization for complex tasks such as discriminating legitimate emails from phishing emails in a cyber-attack (Ackerley et al., 2022; Bayl-Smith et al., 2020; Nasser et al., 2020). Consistent with their findings, we hypothesize that (H₁) the cyber security edition of the EXPERTise 2.0 battery will enable identification of two “clusters” of participants, with one group demonstrating relatively higher cue utilization than the other group (i.e., lower cue utilization). Additionally, we hypothesize that (H₂) participants with higher levels of cue utilization will demonstrate, a) greater accuracy on the phishing decision task (i.e., higher rates of true positives, and lower rates of false positives) and b) shorter response latencies in making their judgments, compared to those with lower levels of cue utilization.

It is postulated that S2 thinking may come more easily for individuals with a greater tendency towards cognitive reflection, and this may be key for the careful analysis of email legitimacy (Evans, 2008; Isler et al., 2020). Indeed, evidence has revealed advantages in phishing detection capabilities among those with greater cognitive reflection (Ackerley et al., 2022). Therefore, we hypothesize that (H₃) participants who demonstrate a greater tendency to engage in cognitive reflection will demonstrate, a) greater accuracy on the phishing decision task (i.e., higher rates of true positives, and lower rates of false positives) and b) longer response latencies in making their judgments, compared to those participants who are less inclined to engage in cognitive reflection.

As noted, this study also aims to add to the limited phishing email research set in naturalistic conditions. This is actioned by sending participants simulated phishing emails to their university email addresses, differentiated by either greater or fewer phishing cues. It is hypothesized that (H₄) cue utilization groupings will be predictive of participants’ engagement with a naturalistic phishing simulation, whereby participants with lower cue utilization will demonstrate greater engagement with phishing emails (i.e., opening an email or clicking on an embedded link) compared to those with higher cue utilization. Likewise, we hypothesize that (H₅) cognitive reflection groupings will be predictive of participants’ engagement with a naturalistic phishing simulation, whereby participants with lower cognitive reflection will demonstrate greater engagement with phishing emails (i.e., opening an email or clicking on an embedded link) compared to those with higher cognitive reflection.

It was presumed that users with both higher cue utilization levels and cognitive reflection tendencies possess cognitive resources that increase their sensitivity to phishing cues, compared to the respective lower groupings. As such, we ask (RQ₁) do any differences in engagement with a naturalistic phishing simulation (H₄ and H₅) relate to the number of phishing cues embedded within the phishing emails (i.e., are emails with fewer phishing cues less likely to be engaged than those with a greater number of phishing cues)? Additionally, (RQ₂) are any differences based on phishing cue numbers contingent on cue utilization or cognitive reflection groupings?

Method

Participants

The convenience sample of 94 (50 female, 42 male, and two non-binary) participants were recruited from first- and second-year undergraduate psychology students enrolled at Macquarie University, Australia. Female ages ranged from 18 to 42 years (M_age = 21.03, SD_age = 6.14), male ages ranged from 17 to 43 years (M_age = 20.38, SD_age = 5.55), and non-binary ages ranged from 19 to 31 years (M_age = 25, SD_age = 8.49). In addition to completing the online research activities, participants consented to being sent three simulated phishing emails within a six-week period after completion. These emails presented no risk to participants’ computer or device.

Materials

CRT-2

The Cognitive Reflection Test-2 (CRT-2; Thomson & Oppenheimer, 2016) measures participants’ cognitive reflection tendencies. The CRT-2 is a revised version of the original three-item Cognitive Reflection Test (CRT; Frederick, 2005). The CRT-2 is a four-item short answer questionnaire, and measures participants’ tendency for impulsivity. Its theoretical framework is underpinned by the “System 1 and 2” dual process reasoning models of cognition (Stanovich & West, 2000). The measure required participants to respond to four “trick” items, with the expectation that they will invariably arrive at an intuitive but incorrect answer or engage in systematic and reflective thinking to arrive at the correct answer.

Participants could attain a maximum score of four, with each correct answer equaling one point, as per the original method of scoring (Frederick, 2005). No points were given to intuitive errors (i.e., incorrect) or non-intuitive errors (e.g., “I don’t know”) responses. The CRT-2 has previously shown average internal reliability across items (Cronbach’s α > .50; Thomson & M. Oppenheimer, 2016), and although this is a less-than-ideal statistic, Cheng and Janssen (2019) suggest that the very few items may influence the power available for calculating reliability. A strong correlation has been found between CRT-2 and the original CRT (r > .50, p < .001; Thomson & M. Oppenheimer, 2016).

Phishing Decision Task

The phishing decision task was accessed via Qualtrics (Qualtrics, 2021). The task has been used to measure participants’ ability to correctly differentiate between trustworthy or suspicious emails (Ackerley et al., 2022; Bayl-Smith et al., 2020; Nasser et al., 2020). A total of 40 images were presented to each participant, consisting of half genuine and half phishing emails. The emails were sampled from a compilation of real phishing attempts that the research team had received over a 6-month period. Phishing cues (e.g., unknown email address) were included, but the content was deidentified. Each email was displayed in a randomized order for a maximum of 20 seconds, after which the email disappeared, and participants were asked to select if the email was trustworthy or suspicious. The total time each email was viewed was also collected for each participant.

EXPERTise 2.0

The cyber security edition of Expert Intensive Skills Evaluation Program Version 2.0 (EXPERTise 2.0; Wiggins et al., 2015) is a situational judgment test software platform that comprises of five tasks, each testing different components of behavior indicative of cue utilization. These are the Feature Recognition Task (FRT), Feature Association Task (FAT), Feature Prioritization Task (FPT), Feature Identification Task (FIT), and the Feature Discrimination Task (FDT). The statistical properties of EXPERTise 2.0 have been assessed using domain-specific stimuli, and predictive validity and test-retest reliability has been established for clinician audiology training (Watkinson et al., 2018), power system controllers (Loveday et al., 2013), and pilot weather decision-making (Wiggins et al., 2014).

The stimuli for this study were designed to reflect the most ecologically valid or realistic experience for participants receiving phishing emails (Bayl-Smith et al., 2020; Nasser et al., 2020). These diagnostic elements were selected via collaboration with a subject-matter expert to ensure content validity. The five scenario-based EXPERTise 2.0 tasks incorporate cybersecurity cues represented through text, auditory, and visual elements (Sturman et al., 2024).

Feature Identification Task (FIT)

The FIT assesses participants’ ability to quickly discern if an email contains suspicious visual features (phishing email) or is a legitimate communication (non-phishing). 15 email images are presented on a screen individually for 20 seconds and participants use their cursor to click on any element in the email that causes suspicion or click on a green box on the screen labeled “trustworthy.” Participants with relatively higher cue utilization are expected to demonstrate shorter response latency in identifying features and formulating a diagnosis (Loveday et al., 2013).

Feature Recognition Task (FRT)

The FRT measures the accuracy in which participants classify key diagnostic features of emails as either phishing or genuine (Wiggins et al., 2018). 15 email images are presented on a screen individually for a short duration (1 second) and then participants are asked to classify each email as trustworthy, untrustworthy, or impossible to tell. Despite the limited exposure, skilled decision makers are expected to successfully access their repertoire of cues in memory, which enables rapid recognition of the diagnostic features, and respond accurately (Loveday et al., 2013; Wiggins et al., 2018). The FRT generates a count of correct judgments, with greater levels of accuracy being indicative of higher levels of skilled cue utilization (Bayl-Smith et al., 2020; Morrison et al., 2018).

Feature Association Task (FAT)

The FAT measures participants’ ability to consider the strength of association between specific diagnostic features of phishing emails. Two conceptual phrases are presented on the screen for a limited time, and participants indicate their relatedness using a 7-point Likert-type scale (1 = Extremely Unrelated to 7 = Extremely Related). Participants complete two parts of the FAT, at first phrases are presented adjacently in pairs and the participants rate their association. Subsequently, the phrases are presented sequentially one after the other, to investigate for improvements in decision-making. Research suggests that associated concepts are more rapidly distinguished due to pre-existing neural connections within memory (Morrison et al., 2013). Therefore, individuals with relatively higher levels of cue utilization are expected to obtain a greater variance between concepts.

Feature Discrimination Task (FDT)

The FDT tests a participant’s capacity to discriminate the relative importance of features of a suspected phishing email. Participants are given one detailed scenario regarding a potential phishing email along with a picture of the email. Participants are then given several choices (e.g., pay as requested and ignore the email) and are asked to select their decision. Following this, features of the scenario (e.g., time sent and hyperlink) are presented on a 10-point Likert-type scale (1 = Not Important at All to 10 = Extremely Important) for participants to rate each feature on its perceived importance for decision-making. Research suggests that participants’ who can discriminate relevant from less relevant email features via the rating scales (i.e., greater levels of variance between responses) tend to demonstrate relatively higher levels of cue utilization (Loveday et al., 2013).

Feature Prioritization Task (FPT)

The FPT measures participants’ ability to prioritize email features in an information-search task. Participants are given an introduction sentence to a scenario and are required to click on individual drop-down menus one at a time describing different email features (e.g., company logo and knowledge of sender). For the first scenario, participants are given 60 seconds to decide their course of action, and for the second scenario participants are given 120 seconds. Relatively higher levels of cue utilization are associated with accessing drop-down menu items in order of importance rather than sequentially down the webpage (Crane et al., 2018).

Naturalistic Phishing Simulation

Following their participation in the first part of the research, participants were sent three simulated phishing emails to their student email address, (one per week). One email was blocked by spam filters and was excluded from the analysis. The decision to send a small number of emails in the phishing simulation was made to reflect the scarcity of phishing emails that successfully bypass spam filters and reach student inboxes. This approach aimed to create a more authentic scenario, aligning with the actual frequency with which students encounter phishing attempts. The emails differed in sophistication (i.e., the number of phishing cues) and included persuasive elements (e.g., urgent message and university logos) to mislead the recipient. The three possible behavioral outcomes for participants included, disregarding the email, opening but not clicking the embedded URL, or clicking on the embedded URL in the phishing email. By clicking on the URL in a phishing email, participants were directed to a webpage with educational content regarding phishing email identification, as well as a description of the study.

Procedure

Participants accessed the online study via the advertised link from SONA, the online research participation system associated with Macquarie University. Participants landed on the Qualtrics page, an online research survey platform (Qualtrics, 2021), where they read a Participant Information and Consent Form (PICF) that described the study as approved by the ethics board at Macquarie University. Participants were advised that the study was about phishing email detection associated with cue identification. They were informed that they would receive course credit upon completion of several tasks, which included an online survey, a series of email image evaluations, and a cue utilization task.

Participants answered demographic questions, the CRT-2 items (Thomson & M. Oppenheimer, 2016), followed by the phishing decision task on the Qualtrics platform (Qualtrics, 2021). Participants were instructed to continue to the next task by clicking on the arrow at the bottom of the page, upon which they were re-directed to the EXPERTise 2.0 platform. Upon completion, participants were redirected to the debriefing statement. Over the following weeks, three simulated phishing emails were sent to participants’ from Macquarie University’s IT department to their university email address.

Design and Statistical Analysis

The study employed a quasi-experimental, 2 × 2 between-subjects design to examine the effects of cognitive reflection (IV; high vs. low) and cue utilization (IV; high vs. low) on phishing email decision accuracy and response latency (DVs) in the phishing decision task, as well as email engagement (DV) with the simulated phishing emails. A k-means cluster was used to identify the cue utilization groups, and cognitive reflection scores were classified as either higher or lower (calculation method described in “Data Reduction and Preliminary Analysis”). Analysis of Variance (ANOVA) and multinomial regression were used to test the hypotheses.

Results

Data Reduction and Preliminary Analysis

Cases were visually assessed for missing data and those with substantially incomplete records were removed. A total of 94 participants were retained for the analyses, and the error rate was set at α = 0.05.

Participants’ responses on the CRT-2 were scored as either the correct (1) or incorrect (0) answer and tallied to yield a total score out of 4. Participants were divided into two distinct groups, with those who scored 0 through 2 being categorized as “lower” in cognitive reflection (n = 38), and those who scored 3 or 4 categorized as “higher” (n = 56).

Consistent with previous methods, cue utilization was established based on participants’ performance across the five EXPERTise 2.0 tasks. A k-means cluster was performed on standardized scores from each task, forcing a two-cluster model (i.e., higher and lower cue utilization), consistent with previous approaches (Bayl-Smith et al., 2020; Brouwers et al., 2016; Sturman et al., 2021). The FIT, FRT, FAT, and FDT yielded statistically significant mean differences between the two groups (Note: Only one FAT – sequential, was included in the cluster analysis as both FAT Tasks were strongly correlated r = .943, p < .001). Scores on the FPT failed to reveal the expected direction of performance across the two groups and was excluded from further analysis.

Participants with higher cue utilization displayed a negative standardized mean for the FIT, and a positive standardized mean for the FRT, FAT, and FDT. This pattern is consistent with past research and supported the prediction (H₁) that the phishing edition of EXPERTise 2.0 battery would enable the identification of two participant clusters that discriminate between higher and lower instances of cue utilization in the phishing context (Brouwers et al., 2016; Loveday et al., 2013; Loveday et al., 2014). Overall, 57 participants comprised a group whose behavior was consistent with a relatively “high” degree of cue utilization, while 37 participants comprised a “low” cue utilization group. Table 1 presents the standardized means from the EXPERTise 2.0 tasks by derived cluster centroids for the four retained tasks.

Table 1.

Standardized Means From EXPERTise Tasks: Centroid Values for the Four Retained EXPERTise 2.0 Task Clusters.

EXPERTise 2.0 task	Low cluster (n = 37)	High cluster (n = 57)
Feature identification Task (response latency)	.094	−.233
Feature association task (variance/time)	−.437	.072
Feature detection task (variance)	−.688	.371
Feature recognition Task (accuracy)	−.720	.486

Note. The F test differences between clusters were statistically significant (p < .05).

Participants’ responses on the phishing decision task were separated into “true positives” relating to the correct detection of 20 phishing email (1 = correct, 0 = incorrect), and “false positives” relating to the false detection of phishing when presented with 20 genuine emails (1 = false detection, 0 = no detection). Participants’ response latency (i.e., time taken to diagnose an email as trustworthy or suspicious) was recorded in seconds (s) and calculated to yield an average speed score. The participants’ raw scores for cue utilization, total CRT-2, and the phishing task (i.e., true positives, false positives, and response latency) were transformed into standardized z-scores, no participants were identified as having extreme scores, with all z-scores <3.29 (Osborne & Overbay, 2004; Tabachnick & Fidell, 2007). Participants’ engagement with the two simulated phishing emails were classified into three behavioral outcomes and each assigned a numerical value, disregarding the email (0), opening the phishing email without clicking the embedded URL (1), and clicking the embedded URL (2).

Main Analysis

The phishing decision task scores (i.e., true or false positive detection) and response latency scores for the participants (N = 94) were examined using a series of 2 × 2 factorial between-groups analyses of variance (ANOVA). All scores were examined for violations of normality and homogeneity of variance for all groups. Interpretation of effect size relating to partial eta squared (η²) were advised from Cohen (1988).

Cue Utilization, Cognitive Reflection, and Phishing Decision Task Accuracy

For true positive scores, results revealed a non-significant main effect for both cue utilization, F (1,90) = 3.55, p = .063, partial η² = .038, obs. Power = 0.461, and cognitive reflection, F (1,90) = .180, p = .673, partial η² = .002, obs. Power = 0.07. No main effect was reported for cue utilization for the false positive scores, F (1,90) = 1.08, p = .301, partial η² = .012, obs. Power = .178. Contrary to expectations, this finding suggested that those with higher levels of cue utilization did not accurately detect phishing (i.e., true positives or false positives) more than those with lower levels of cue utilization on the phishing decision task (H₂).

Results revealed a statistically significant effect for false positive scores in relation to participant cognitive reflection groupings, F (1,90) = 4.88, p = .030, partial η² = .051 (small effect), obs. power = .589. Participants with a lower tendency for cognitive reflection (n = 38) incorrectly diagnosed genuine emails as phishing emails more often (M = 6.50, SE = .43) than participants with higher cognitive reflection (n = 56) (M = 5.32, SE = .32). This supported the expectation that participants who demonstrate a greater tendency for cognitive reflection will also demonstrate greater accuracy on the phishing decision task (H_3a), see Figure 1.

Figure 1.

Mean true positive and false positive scores across cognitive reflection conditions. Note. Error bars represent standard errors (±1 SE).

The findings failed to reveal an interaction between cue utilization and cognitive reflection for either true positives, F (1, 90) = .225, p = .637, partial η² = .002, obs. power = 0.08, or false positives scores, F (1,90) = .827, p = .365, partial η² = .009, obs. power = 0.15. Therefore, the mean phishing decision accuracy for participants in either of the two cue utilization clusters (i.e., higher or lower), were not contingent on participants’ tendency to engage in cognitive reflection, and vice versa (see Figures 2 and 3).

Figure 2.

Interaction of mean true positive scores across conditions. Note. Error bars represent standard errors (±1 SE).

Figure 3.

Interaction mean false positive scores across conditions. Note. Error bars represent standard errors (±1 SE).

Cue Utilization, Cognitive Reflection, and Phishing Decision Task Response Latency

A between-subjects ANOVA was conducted to test the effect of cue utilization and cognitive reflection groupings on participants’ response latencies in the phishing decision task. No main effect was reported for cue utilization, F (1,90) = .725, p = .397, partial η² = .008, obs. power = 0.13. However, results revealed a significant main effect for cognitive reflection, F (1,90) = 9.68, p = .002, partial η² = .097 (moderate effect), obs. power = 0.87 (H_3b), with participants with greater cognitive reflection taking more time to provide a response (H_3b). Additionally, there was a significant interaction between cognitive reflection and cue utilization, F (1,90) = 5.00, p = .028, partial η² = .053 (small effect), obs. power = .60. The interaction is shown in Figure 4.

Figure 4.

Interaction of mean average response latency across conditions. Note. Error bars represent standard errors (±1 SE).

Four simple effects tests were conducted to further analyze the interaction, using a Bonferroni adjusted alpha of .0125 to maintain the familywise error rate at .05 (Field, 2013). The simple effect was not statistically significant for participants in either the high cognitive reflection group, F (1,905) = 1.11, p = .294, partial η² = .012, or the high cue utilization group, F (1,90) = .461, p = .499, partial η² = .00. For participants in the low cognitive reflection group, the simple effect of cue utilization was statistically significant, F (1,90) = 4.18, p = .044, partial η² = .044 (small effect), with greater average response latency for those with higher cue utilization levels (M = 10.17 SE = .77), than those with low cue utilization levels (M = 7.88 SE = .81). For participants in the low cue utilization group, the simple effect for cognitive reflection was statistically significant, F (1,90) = 12.24, p = .001, partial η² = .120 (moderate effect), with greater average response latency for those with for those with higher cognitive reflection levels (M = 11.84 SE = .79) than those with low cognitive reflection levels (M = 7.88 SE = .81).

Overall, participants with either higher levels of cognitive reflection or cue utilization (or both) responded at similar speeds. However, those with higher cue utilization levels appeared to unexpectedly slow participants’ response in the lower cognitive reflection group, and predictably, higher cognitive reflection tendencies appeared to increase latency for participants with lower levels of cue utilization during the phishing decision task. Participants lower in both cognitive reflection and cue utilization demonstrated the shortest response latencies.

Cue Utilization, Cognitive Reflection, and Phishing Simulation Engagement

A multinomial regression was performed to test whether cue utilization and/or cognitive reflection could predict participants’ email engagement when presented with a simulated phishing email in naturalistic settings as compared to the controlled experimental phishing decision task. Email engagement was measured across three actions (i.e., not open an email, open an email, or click on an embedded link). Descriptive statistics are provided in Table 2.

Table 2.

Descriptive Statistics for Multinomial Regression Across Naturalistic Phishing Simulations.

email	Engagement	n	Percentage
Fewer numbers of phishing cues	Not opened	15	17.2
	Opened email	56	64.4
	Link clicked	16	18.4
Greater number of phishing cues	Not opened	28	32.2
	Opened email	47	54.0
	Link clicked	12	13.8

Results revealed that for both phishing email attempts (i.e., emails with relatively greater or fewer phishing cues), the relationship between email engagement and cue utilization was not statistically significant, χ² (2) = 2.45, p = .293 and χ² (2) = 1.26, p = .533. This means that cue utilization did not predict participant engagement with the simulated phishing emails (H₄). Likewise, there was no statistically significant relationship between email engagement and cognitive reflection for either phishing email attempts, χ² (2) = .76, p = .685 and χ² (2) = 3.38, p = .185 (H₅).

A Chi-square test of independence was used to investigate if there were differences between participant email engagement as a by-product of the number of phishing cues embedded within the emails. There was no significant relationship between the form of email interaction and the type of email viewed (i.e., greater or fewer cues). Thus, the proportions of the various types of email interaction did not differ depending on the number of cues present in the email, χ² (2, N = 174) = 5.29, p = .071, Cramer’s V = .17. As such, results demonstrated that engagement with a naturalistic phishing simulation were not contingent on the number of phishing cues nor the interaction of cue utilization and cognitive reflection groupings (RQ₁ and RQ₂).

Correlations Between Phishing Decision Task and Simulated Phishing Emails

To further investigate the non-significant results, a Pearson’s correlation was conducted to examine the relationships between the participants’ performance on the phishing tasks (i.e., true and false positive scores and response latency scores) and the participants engagement with the simulated phishing emails (i.e., opening an email or clicking on an embedded link). There were no statistically significant correlations reported between the phishing decision task components and the participants engagement in the simulated phishing campaign.

Discussion

The study aimed to test: 1) the role of cue utilization and cognitive reflection tendencies in email users’ phishing email diagnostic capabilities and 2) whether any differences exist when comparing capabilities in controlled versus naturalistic settings. This was examined with the use of a phishing decision task and a naturalistic phishing simulation. In the former, an online phishing decision task presented 40 emails to participants who were required to decide if each email was either phishing or genuine. Participants’ accuracy and response latency were recorded. Email users’ diagnostic capabilities relating to phishing attacks was also measured in naturalistic settings using two naturalistic simulated phishing emails, differentiated by either greater or fewer phishing cues. These emails were sent to participants’ university email addresses to investigate their level of engagement with suspicious emails. Table 3 lists the study’s hypotheses and research questions and is followed by a discussion of the key findings.

Table 3.

The Study’s Hypotheses and Research Questions.

Hypothesis/Question number	Statement/Question
H₁	The cyber security edition of the EXPERTise 2.0 battery will enable identification of two “clusters” of participants, with one group demonstrating relatively higher cue utilization than the other group (i.e., lower cue utilization)
H₂	Participants with higher levels of cue utilization will demonstrate, a) greater accuracy on the phishing decision task (i.e., higher rates of true positives, and lower rates of false positives) and b) shorter response latencies in making their judgments, compared to those with lower levels of cue utilization
H₃	Participants who demonstrate a greater tendency to engage in cognitive reflection will demonstrate, a) greater accuracy on the phishing decision task (i.e., higher rates of true positives and lower rates of false positives) and b) longer response latencies in making their judgments, compared to those participants who are less inclined to engage in cognitive reflection
H₄	Cue utilization groupings will be predictive of participants’ engagement with a naturalistic phishing simulation, whereby participants with lower cue utilization will demonstrate greater engagement with phishing emails (i.e., opening an email or clicking on an embedded link) compared to those with higher cue utilization
H₅	Cognitive reflection groupings will be predictive of participants’ engagement with a naturalistic phishing simulation, whereby participants with lower cognitive reflection would demonstrate greater engagement with phishing emails (i.e., opening an email or clicking on an embedded link) compared to those with higher cognitive reflection
RQ₁	Do any differences in engagement with a naturalistic phishing simulation (H₄ and H₅) relate to the number of phishing cues embedded within the phishing emails (i.e., are emails with fewer phishing cues less likely to be engaged than those with a greater number of phishing cues)?
RQ₂	Are any differences based on phishing cue numbers contingent on cue utilization or cognitive reflection groupings?

Key findings

Cue Utilization, Cognitive Reflection, and Phishing Decision Task

Hypothesis 1. The cyber security edition of the EXPERTise 2.0 battery enabled identification of two participant clusters, with one group demonstrating relatively higher cue utilization than the other group (i.e., lower cue utilization). This result supports H₁ and suggests that the test battery can detect genuine differences in recognizing, identifying, associating, and discriminating cues in a phishing context.

Practical implications for the EXPERTise 2.0 battery may be gleaned from its ability to distinguish differences in the broader population and therefore its potential use as a diagnostic tool for employee training needs, as well as tool measuring skill acquisition from relevant interventions (Morrison et al., 2018). This result is consistent with the recent literature that demonstrated that the software was able to differentiate cue utilization abilities in several operational contexts including phishing detection (Ackerley et al., 2022), radiology (Carrigan et al., 2021), aviation (Renshaw & Wiggins, 2017; Wiggins et al., 2018), and electricity distribution (Wiggins et al., 2020).

Hypothesis 2. Contrary to expectations, participants with higher levels of cue utilization did not demonstrate greater accuracy on the phishing decision task (H_2a), nor shorter response latencies in making their judgments (H_2b) when compared to those with lower levels of cue utilization. While these results may indicate that the relationship between cue utilization and task performance proposed here and in other domains may not translate in a cyber security context, there are several factors to consider. A post-hoc statistical power analysis revealed that the final sample size, although sufficient to detect a large effect (d = .80) in the current design, was insufficiently powered to detect a small (d = .20) or even medium (d = .50) effect. A larger sample may have revealed modest effects, for instance, the accuracy differences between cue utilization groups (p = .063).

The results may have been limited by experimental design choices. Cue utilization was clustered into either higher or lower groupings and this method of division may have been at too gross a level to detect significant differences. Future studies may decide to examine more granular differences in cue utilization expertise (e.g., novice, beginner, and intermediate). Potential ceiling effects are also noted given the high average accuracy in the phishing decision task. Participants were explicitly instructed to identify emails as either genuine or phishing, which could have alerted the participants’ response to phishing cues, introducing experimental artefacts such as expectation effects.

Lastly, the non-significant result may also be attributed to the poverty of cues within a phishing email when compared to other operational domains (e.g., firefighting, driving, and flight simulation). The limited visual markers may disadvantage participants in their efforts for skill acquisition and cue expertise, which requires sufficient detail to create a mental model representative of the complex task (French & Nevett, 1993; Harré et al., 2012; Klein & Klinger, 1991). This is especially evident in spear phishing emails, which are designed to appear authentic and personalized, and as such phishing cues may not be as apparent. Therefore, cue utilization may not be as valuable within the cyber security domain compared to other domains (Benenson et al., 2017; Lin et al., 2019).

Hypothesis 3. Mixed results were reported for participants with higher cognitive reflection, who demonstrated lower rates of false positives, but not higher rates of true positives on the phishing decision task (H3a) and took more time providing a response (H_3b), when compared to those who were less inclined to engage in cognitive reflection. This replicates previous findings, which suggest that a vital factor in reducing phishing susceptibility (i.e., lower rates of false positives) may be an individual’s inclination to deeply examine email features when considering its authenticity (Harrison et al., 2016; Luo et al., 2013; Vishwanath et al., 2011, 2016). This result is further supported by the finding that participants with higher levels of cognitive reflection and lower levels of cue utilization took the most time to respond, but participants low in both cue utilization and cognitive reflection demonstrated the shortest response latency, implying a low consideration for the email stimuli during the phishing decision task.

Speculatively, participants with less of an inclination towards reflective processes may have assumed that phishing emails were presented at a higher rate than actuality. The Truth-Default Theory (TDT) states that people on average tend to trust others, and because the phishing decision task alerted participants to the presence of phishing emails, this may have created demand characteristics that disproportionately calibrated the default decision-making for participants who generally engage less in a systematic interrogation (Levine, 2014).

Organizations may benefit from creating procedures that support staff in carving dedicated time for focused email processing, as well as investment in educational and training programs to mitigate cyber security risks especially for individuals who are less inclined to engage in cognitive reflection processes. Importantly, training and education should be conducted with frequency as some studies note behavioral changes are short-lived because employees continue to rely on heuristics and reinforce co-processing habits in demanding environments (Canova et al., 2014; Vishwanath, 2015).

Cue Utilization, Cognitive Reflection and Naturalistic Phishing Simulation

Hypotheses 4 and 5. Participants with either lower cue utilization or cognitive reflection tendencies did not demonstrate greater engagement with a naturalistic phishing simulation (i.e., opening an email or clicking on an embedded link) when compared to those in the respective higher groupings, failing to support both H₄ and H₅. While these results may imply that the theoretical relationship between cue utilization, cognitive reflection tendencies, and phishing decision do not extend to naturalistic settings, other factors may have contributed to these findings. The phishing simulation measured false negative response rates only (i.e., a failure to detect the phishing email), which was a non-significant finding across all experimental conditions. However, one novel aspect of this study was the division of true and false positive response rates, for which the latter did provide significant findings (i.e., lower cognitive reflection was associated with higher rates of false positives).

The authors also acknowledge the idiosyncratic nature of how people manage their personal inboxes, and the difficulty in measuring confounds or complexities associated with the real-world. Previous studies examining similar relationships note that there are many confounds that influence how people respond to unsolicited communications, including contextual expectations (Harré et al., 2012; Vishwanath, 2015), authority and urgency cues (Williams et al., 2018), age and sex (Lin et al., 2019; Sheng et al., 2010), and users’ propensity for curiosity, risk, and general Internet usage (Moody et al., 2017). Further, motivational variables may play a role (e.g., organizational commitment and job satisfaction), which would vary across organizations (Cooke et al., 2004).

Research Questions 1 and 2. This may also provide insight into the findings that there were no differences in engagement between the two simulated phishing emails despite the different number of phishing cues embedded (RQ₁ and RQ₂). It is important to note that experimental artifacts cannot typically be replicated in naturalistic settings, and future studies should place more emphasis on naturalistic experimentations. This is underlined by the lack of correlation between the phishing decision task components (i.e., true and false positive scores, and response latency scores) and participants’ engagement in the phishing simulation. It is recommended that organizations invest in technological infostructure and security software (e.g., email warning banners and filtering systems) to mitigate the cyber security attack risk especially for time-constrained or high demand environments (Herzberg, 2009; Moody et al., 2017; Moore, 2021).

Limitations

A limitation of the research paper was the size of the final sample and the homogenous demographics (i.e., first year university students). It is noted that a larger sample may have yielded significant relationships between cue utilization, cognitive reflection, and phishing decision task performance, which were not detected in the current study. Alternatively, future studies may wish cluster cue utilization into more specified levels of expertise (e.g., novice, beginner, and intermediate) to identify greater detail in the existing relationships.

Another limitation to be considered is that most participants performed well in the phishing decision task and the subsequent ceiling effects likely contributed to the non-significant results. This may have been a result of the 50:50 ratio of legitimate to phishing emails, which disproportionately increased the likelihood of detection. Previous studies suggest that sufficient task exposure may progressively improve participant performance (Nasser et al., 2020), but studies have also shown that participants anticipate a 50:50 ratio in experimental tasks and thus appropriate their responses, when real-life phishing emails occur far less frequently (ACSC, 2021; Canfield et al., 2016).

A further potential limitation is that the phishing emails were sent within six weeks of completing the phishing cue and performance assessments. It’s possible that these measures may have sensitized participants to phishing cues, thereby enhancing their detection capabilities during the naturalistic simulation. Additionally, participants were informed that they would be sent phishing emails, which may have made them more vigilant and conscious of the impending emails. These factors could have influenced the results, potentially leading to an overestimation of participants’ ability to detect phishing attempts. Future research should consider extending the time interval between measures and utilizing limited disclosure to reduce the likelihood of priming effects and to better simulate real-world conditions.

A final consideration is the approach to measuring email engagement in the naturalistic condition. Participants who did not open the email generally had fewer phishing cues to assess, relying only on “pre-opening” cues such as sender information or subject line. In contrast, participants who opened the email but did not click on the malicious link had access to additional cues within the email body. Although these individuals were better performers than those who clicked on the link, they were still vulnerable to the risk of malware installation from simply opening the email. This approach provides a staged view of user performance. However, future research could benefit from further investigating the differentiation between cues visible before and after opening the email. Such an approach would offer a more comprehensive evaluation of participants’ phishing detection abilities across various stages of email interaction.

Conclusion and future directions

To the authors’ current knowledge, this study is the first to investigate cue utilization (using the EXPERTise 2.0 phishing decision battery) and decision-making using a naturalistic phishing simulation. The study’s novelty is further extended by the division of phishing decision accuracy into true positives and false positive response rates and examining the potential for interaction between cue utilization and cognitive reflection. Results revealed that participants with relatively lower cognitive reflection were more likely to falsely diagnose genuine emails as phishing emails, implicating cognitive reflection tendencies as an important information processing mechanism for phishing-related decision-making. Furthermore, task response was slowest for participants with both greater cognitive reflection tendencies and lower levels of cue utilization, compared to other groups. This implicates the potential for cognitive reflection tendencies to influence deliberation time and provide a greater chance for deeper email engagement for users with relatively lower cue utilization skills.

Although cognitive reflection is considered a stable personality trait, future research should investigate interventions that might encourage greater focus and attention, such as mindfulness which has shown positive outcomes in the workplace (Althammer et al., 2021; Creswell, 2017; Dobie et al., 2016). Aspects of system design and software could also be leveraged to discourage concurrent task or information processing and a systematic consideration of emails (e.g., warning banners) to support workers in highly complex and demanding environments.

Further, studies should investigate cyber security risks within a naturalistic context by utilizing simulated phishing trials across various demographics and contexts. The limited significant results from the current study’s phishing simulation highlights the knowledge gap between experimental settings and naturalistic confounds, and subsequently reflect the scarcity of scientific understanding and increasing vulnerability for the email user. Additionally, usability testing using the naturalistic decision paradigm across a wide demographic could map commonalities around email usage, and therefore advance the knowledge of the confounds and complexities in phishing email detection. Overall, the findings of this study have implications for training and educational approaches that encourage email users to engage in systematic and effortful decision-making, as well as utilize software design interventions as both a technological barrier and a warning system for vulnerable workers.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Ben W. Morrison

Mark W. Wiggins

Dr. Ben Morrison is a Senior Lecturer and Organisational Psychologist at Macquarie University, Australia. Ben’s research investigates how new and emerging technologies can aid human-centered processes like decision making, and how human capabilities, limitations, and preferences can inform the design of effective systems. Ben is particularly interested in the nexus between human expertise and advanced technologies such as artificial intelligence.

Ms. Emmilly Graf is a psychologist and organizational research analyst. Emmilly, has over 10 years of experience in the Tertiary Higher Education sector, and conducting Human-Centred research focusing on Psychosocial Risks & Hazards, Work Design, Leadership Practices, and Systems Alignment. She draws upon evidence-based frameworks and data insights to support teams, leaders and organizations to improve employee wellbeing and performance.

Dr. Piers Bayl-Smith has investigated the role of social engineering and psychological profiling in the context of phishing and cybersecurity within organizations. He is also interested in research that involves work, aging and discrimination.

Mark Wiggins is Professor of Organisational Psychology at Macquarie University. Mark’s research and teaching interests lie in the assessment and development of expert performance, particularly in the context of cognitive skills such as diagnosis, sensemaking, and situation assessment. Together with his students, he developed the EXPERT Intensive Skills Evaluation (EXPERTise 2.0) software which has been used to assess the diagnostic skills of practitioners across a range of contexts, including electricity power transmission and distribution, medical practice, allied health, motor vehicle driving, and aviation piloting.

References

Ackerley

Morrison

Ingrey

Wiggins

Bayl-Smith

Morrison

(2022). Errors, irregularities, and misdirection: Cue utilisation and cognitive reflection in the diagnosis of phishing emails. Australasian Journal of Information Systems, 26. https://doi.org/10.3127/ajis.v26i0.3615

Althammer

S. E.

Reis

van der Beek

Beck

Michel

(2021). A mindfulness intervention promoting work–life balance: How segmentation preference affects changes in detachment, well‐being, and work–life balance. Journal of Occupational and Organizational Psychology, 94(2), 282–308. https://doi.org/10.1111/joop.12346

Baron

Scott

Fincher

K. S.

Emlen Metz

(2015). Why does the Cognitive Reflection Test (sometimes) predict utilitarian moral judgment (and other things)? Journal of Applied Research in Memory and Cognition, 4(3), 265–284. https://doi.org/10.1016/j.jarmac.2014.09.003

Baylor

A. L.

(2001). A U-shaped model for the development of intuition by level of expertise. New Ideas in Psychology, 19(3), 237–244. https://doi.org/10.1016/S0732-118X(01)00005-8

Bayl-Smith

Sturman

Wiggins

(2020). Cue utilization, phishing feature and phishing email detection. In Bernhard

(Ed.) Financial cryptography and data security. FC 2020. Lecture notes in computer science. Springer. https://doi.org/10.1007/978-3-030-54455-3_5

Benenson

Gassmann

Landwirth

(2017). Unpacking spear phishing susceptibility. In Brenner

(Ed.) Financial cryptography and data security. FC 2017. Lecture notes in computer science. Springer. https://doi.org/10.1007/978-3-319-70278-0_39

Brouwers

Wiggins

Helton

O'Hare

Griffin

(2016). Cue utilization and cognitive load in novel task performance. Frontiers in Psychology, 7(435), 435–512. https://doi.org/10.3389/fpsyg.2016.00435

Canfield

C. I.

Fischhoff

Davis

(2016). Quantifying phishing susceptibility for detection and behavior decisions. Human Factors: The Journal of the Human Factors and Ergonomics Society, 58(8), 1158–1172. https://doi.org/10.1177/0018720816665025

Canova

Volkamer

Bergmann

Borza

(2014). NoPhish: An anti-phishing education app. In Mauw

Jensen

C. D.

(Eds.), Security and trust management. Lecture notes in computer science. Springer. https://doi.org/10.1007/978-3-319-11851-2_14

10.

Carrigan

A. J.

Charlton

Foucar

Wiggins

M. W.

Georgiou

Palmeri

T. J.

Curby

K. M.

(2021). The role of cue-based strategies in skilled diagnosis among pathologists. Human Factors, 64(7), 1154–1167. https://doi.org/10.1177/0018720821990160

11.

Chandler

Mueller

Paolacci

(2014). Nonnaivete among Amazon Mechanical Turk workers: Consequences and solutions for behavioral researchers. Behavior Research Methods, 46(1), 112–130. https://doi.org/10.3758/s13428-013-0365-7

12.

Chauvin

Lardjane

(2008). Decision making and strategies in an interaction situation: Collision avoidance at sea. Traffic Psychology and Behaviour, 11(4), 259–269. https://doi.org/10.1016/j.trf.2008.01.001

13.

Cheng

Janssen

(2019). The relationship between an alternative form of cognitive reflection test and intertemporal choice. Studia Psychologica, 61(2), 86–98. https://doi.org/10.21909/sp.2019.02.774

14.

Cohen

M. A.

(1988). Some new evidence on the seriousness of crime. Criminology, 26(2), 343–353. https://doi.org/10.1111/j.1745-9125.1988.tb00845.x

15.

Cooke

N. J.

Salas

Kiekel

P. A.

Bell

(2004). Advances in measuring team cognition. In Salas

Fiore

S. M.

(Eds.), Team cognition: Understanding the factors that drive process and performance (pp. 83–106). American Psychological Association.

16.

Crane

M. F.

Brouwers

Wiggins

Loveday

Forrest

Tan

S. G. M.

Cyna

A. M.

(2018). Experience isn’t everything: How emotion affects the relationship between experience and cue utilization. Human Factors, 60(5), 685–698. https://doi.org/10.1177/0018720818765800

17.

Creswell

J. D.

(2017). Mindfulness interventions. Annual Review of Psychology, 68(1), 491–516. https://doi.org/10.1146/annurev-psych-042716-051139

18.

Dobie

Tucker

Ferrari

Rogers

J. M.

(2016). Preliminary evaluation of a brief mindfulness-based stress reduction intervention for mental health professionals. Sage Publications.

19.

Duff

Beglinger

L. J.

Schultz

S. K.

Moser

D. J.

McCaffrey

R. J.

Haase

R. F.

Westervelt

H. J. K.

Langbehn

D. R.

Paulsen

J. S.

Huntington's Study Group . (2007). Practice effects in the prediction of long-term cognitive outcome in three patient samples: A novel prognostic index. Archives of Clinical Neuropsychology, 22(1), 15–24. https://doi.org/10.1016/j.acn.2006.08.013

20.

Dumm

Eckles

D. L.

Nyce

Volkman-Wise

(2020). The representative heuristic and catastrophe-related risk behaviors. Journal of Risk and Uncertainty, 60(2), 157–185. https://doi.org/10.1007/s11166-020-09324-7

21.

Endsley

M. R.

(1995). Toward a theory of situation awareness in dynamic systems. Human Factors, 37(1), 32–64. https://doi.org/10.1518/001872095779049543

22.

Evans

J. S. B.

(2008). Dual-processing accounts of reasoning, judgment, and social cognition. Annual Review of Psychology, 59(1), 255–278. https://doi.org/10.1146/annurev.psych.59.103006.093629

23.

Field

(2013). Discovering statistics using IBM SPSS statistics. Sage Publications Ltd.

24.

Frederick

(2005). Cognitive reflection and decision making. The Journal of Economic Perspectives, 19(4), 25–42. https://doi.org/10.1257/089533005775196732

25.

French

K. E.

Nevett

M. E.

(1993). The development of expertise in youth sport. In Starkes

J. L.

Allard

(Eds.), Advances in psychology (pp. 255–270). North-Holland. https://doi.org/10.1016/S0166-4115(08)61475-2

26.

Gacasan

E. M. P.

Wiggins

M. W.

(2017). Sensemaking through cue utilisation in disaster recovery project management. International Journal of Project Management, 35(5), 818–826. https://doi.org/10.1016/j.ijproman.2016.09.009

27.

Gigerenzer

Gaissmaier

(2011). Heuristic decision making. Annual Review of Psychology, 62(1), 451–482. https://doi.org/10.1146/annurev-psych-120709-145346

28.

Harré

Bossomaier

Snyder

(2012). The perceptual cues that reshape expert reasoning. Scientific Reports, 2(1), 502–506. https://doi.org/10.1038/srep00502

29.

Harrison

Svetieva

Vishwanath

(2016). Individual processing of phishing emails: How attention and elaboration protect against phishing. Online Information Review, 40(2), 265–281. https://doi.org/10.1108/oir-04-2015-0106

30.

Herzberg

(2009). Why johnny can't surf (safely)? Attacks and defenses for web users. Computers & Security, 28(1-2), 63–71. https://doi.org/10.1016/j.cose.2008.09.007

31.

Isler

Yilmaz

Dogruyol

(2020). Activating reflective thinking with decision justification and debiasing training. Judgment and Decision Making, 15(6), 926–938. https://doi.org/10.1017/s1930297500008147

32.

Jones

H. S.

Towse

J. N.

Race

Harrison

(2019). Email fraud: The search for psychological predictors of susceptibility. PLoS One, 14(1), Article 02096844. https://doi.org/10.1371/journal.pone.0209684

33.

Kahneman

Egan

(2011). Thinking, fast and slow. Farrar, Straus and Giroux.

34.

Kahneman

Klein

(2009). Conditions for intuitive expertise: A failure to disagree. American Psychologist, 64(6), 515–526. https://doi.org/10.1037/a0016755

35.

Klein

(2011). The dual track theory of moral decision-making: A critique of the neuroimaging evidence. Neuroethics, 4(2), 143–162. https://doi.org/10.1007/s12152-010-9077-1

36.

Klein

Klinger

(1991). Naturalistic decision making. Human Factors, 50(3), 456–460. https://doi.org/10.1007/s12152-010-9077-110.1518/001872008X288385

37.

Klein

G. A.

(1993). A recognition-primed decision (RPD) model of rapid decision making. In Klein

G. A.

Orasanu

Calderwood

Zsambok

C. E.

(Eds.), Decision making in action: Models and methods (pp. 138–147). Ablex.

38.

Klein

G. A.

(2008). Naturalistic decision making. Human Factors, 50(3), 456–460. https://doi.org/10.1518/001872008X288385

39.

Klein

G. A.

Calderwood

Clinton-Cirocco

(1986). Rapid decision making on the fire ground. Proceedings of the Human Factors Society 30th Annual Meeting, 30(6), 576–580. https://doi.org/10.1177/154193128603000616

40.

Levine

T. R.

(2014). Truth-default theory (TDT). Journal of Language and Social Psychology, 33(4), 378–392. https://doi.org/10.1177/0261927x14535916

41.

Lin

Capecci

Ellis

Rocha

Dommaraju

Oliveira

Ebner

(2019). Susceptibility to spear-phishing emails: Effects of internet user demographics and email content. ACM Transactions on Computer-Human Interaction, 26(5), 22–28. https://doi.org/10.1145/3336141

42.

Loveday

Wiggins

Festa

Schell

Twigg

(2013). Pattern recognition as an indicator of diagnostic expertise. In Latorre

C. P.

Sanchez

F. A.

(Eds.), Pattern recognition – applications and methods (pp. 1–11). Springer.

43.

Loveday

Wiggins

M. W.

Searle

B. J.

(2014). Cue utilization and broad indicators of workplace expertise. Journal of Cognitive Engineering and Decision Making, 8(1), 98–113. https://doi.org/10.1177/1555343413497019

44.

Luo

Zhang

Burd

Seazzu

(2013). Investigating phishing victimization with the heuristic-systematic model: A theoretical framework and an exploration. Computers & Security, 38, 28–38. https://doi.org/10.1016/j.cose.2012.12.003

45.

Moody

G. D.

Galletta

D. F.

Dunn

B. K.

(2017). Which phish get caught? An exploratory study of individuals′ susceptibility to phishing. European Journal of Information Systems, 26(6), 564–584. https://doi.org/10.1057/s41303-017-0058-x

46.

Moore

(2021). Gartner forecasts worldwide security and risk management spending to exceed $150 billion in 2021. Gartner. https://www.gartner.com/en/newsroom/press-releases/2021-07-20-gartner-announces-gartner-security-and-risk-management-summit-2021

47.

Morrison

B. W.

Wiggins

M. W.

Bond

N. W.

Tyler

M. D.

(2013). Measuring relative cue strength as a means of validating an inventory of expert offender profiling cues. Journal of Cognitive Engineering and Decision Making, 7(2), 211–226. https://doi.org/10.1177/1555343412459192

48.

Morrison

B. W.

Wiggins

M. W.

Morrison

N. M. V.

(2018). Utility of expert cue exposure as a mechanism to improve decision-making performance among novice criminal investigators. Journal of Cognitive Engineering and Decision Making, 12(2), 99–111. https://doi.org/10.1177/1555343417746570

49.

Nasser

Morrison

B. W.

Bayl-Smith

Taib

Gayed

Wiggins

M. W.

(2020). The role of cue utilization and cognitive load in the recognition of phishing emails. Frontiers in Big Data, 3(33), 546860–546910. https://doi.org/10.3389/fdata.2020.546860

50.

Osborne

Overbay

(2004). The power of outliers (and why researchers should always check for them). Practical Assessment, Research and Evaluation, 9(6), 1–8. https://pareonline.net/getvn.asp?v=9&n=6

51.

Pennycook

Rand

D. G.

(2019). Lazy, not biased: Susceptibility to partisan fake news is better explained by lack of reasoning than by motivated reasoning. Cognition, 188, 39–50. https://doi.org/10.1016/j.cognition.2018.06.011

52.

Petty

R. E.

Cacioppo

J. T.

(1986). The elaboration likelihood model of persuasion. In Berkowitz

(Ed.), Advances in experimental social psychology (pp. 123–205). Academic Press.

53.

Qualtrics (2021). [Web-based software]. Available from https://www.qualtrics.com/au/

54.

Reay

Rankin

(2013). The application of theory to triage decision-making. International Emergency Nursing, 21(2), 97–102. https://doi.org/10.1016/j.ienj.2012.03.010

55.

Renshaw

P. F.

Wiggins

M. W.

(2017). The predictive utility of cue utilization and spatial aptitude in small Visual Line-Of-Sight rotary-wing Remotely Piloted Aircraft operations. International Journal of Industrial Ergonomics, 61, 47–61. https://doi.org/10.1016/j.ergon.2017.05.014

56.

Schriver

A. T.

Morrow

D. G.

Wickens

C. D.

Talleur

D. A.

(2008). Expertise differences in attentional strategies related to pilot decision making. Human Factors: The Journal of the Human Factors and Ergonomics Society, 50(6), 864–878. https://doi.org/10.1518/001872008x374974

57.

Sheng

Holbrook

Kumaraguru

Cranor

L. F.

Downs

(2010). Who falls for phish? A demographic analysis of phishing susceptibility and effectiveness of interventions. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Atlanta, GA, USA, 10 April 2010. https://doi.org/10.1145/1753326.1753383.

58.

Stanovich

K. E.

West

R. F.

(2000). Individual differences in reasoning: Implications for the rationality debate? Behavioral and Brain Sciences, 23(5), 645–726. https://doi.org/10.1017/S0140525X00003435

59.

Sturman

Bell

E. A.

Auton

J. C.

Breakey

G. R.

Wiggins

M. W.

(2024). The roles of phishing knowledge, cue utilization, and decision styles in phishing email detection. Applied Ergonomics, 119, 104309–104311. https://doi.org/10.1016/j.apergo.2024.104309

60.

Sturman

Wiggins

(2021). Drivers’ cue utilization predicts cognitive resource consumption during a simulated driving scenario. Human Factors, 63(3), 402–414. https://doi.org/10.1177/0018720819886765

61.

Tabachnick

B. G.

Fidell

L. S.

(2007). Using multivariate statistics. Pearson.

62.

Thomson

Oppenheimer

(2016). Investigating an alternate form of the cognitive reflection test. Judgment and Decision Making, 11(1), 99–113. https://doi.org/10.1037/t49856-000

63.

Toplak

M. E.

West

R. F.

Stanovich

K. E.

(2011). The cognitive reflection test as a predictor of performance on heuristics-and-biases tasks. Memory & Cognition, 39(7), 1275–1289. https://doi.org/10.3758/s13421-011-0104-1

64.

Trémolière

Bonnefon

J. F.

(2014). Efficient kill–save ratios ease up the cognitive demands on counterintuitive moral utilitarianism. Personality and Social Psychology Bulletin, 40(7), 923–930. https://doi.org/10.1177/0146167214530436

65.

Tversky

Kahneman

(1974). Heuristics and biases: Judgement under uncertainty. Science, 185(4157), 1124–1130. https://www.science.org/doi/10.1126/science.185.4157.1124

66.

Verizon . (2023). Data breach investigations report 17th ed. Verizon. https://www.verizon.com/business/en-au/resources/reports/dbir/

67.

Vishwanath

(2015). Examining the distinct antecedents of e-mail habits and its influence on the outcomes of a phishing attack. Journal of Computer-Mediated Communication, 20(5), 570–584. https://doi.org/10.1111/jcc4.12126

68.

Vishwanath

Harrison

Y. J.

(2016). Suspicion, cognition, and automaticity model of phishing susceptibility. Communication Research, 45(8), 1146–1166. https://doi.org/10.1177/0093650215627483

69.

Vishwanath

Herath

Chen

Wang

Rao

H. R.

(2011). Why do people get phished? Testing individual differences in phishing vulnerability within an integrated information processing model. Decision Support Systems, 51(3), 576–586. https://doi.org/10.1016/j.dss.2011.03.002

70.

Watkinson

Bristow

Auton

McMahon

C. M.

Wiggins

M. W.

(2018). Postgraduate training in audiology improves clinicians' audiology-related cue utilisation. International Journal of Audiology, 57(9), 681–687. https://doi.org/10.1080/14992027.2018.1476782

71.

Weick

K. E.

Sutcliffe

K. M.

Obstfeld

(2005). Organizing and the process of sensemaking. Organization Science, 16(4), 409–421. https://doi.org/10.1287/orsc.1050.0133

72.

Wiggins

M. W.

(2015). Cues in diagnostic reasoning. In Wiggins

M. W.

Loveday

(Eds.), Diagnostic expertise in organizational environments (pp. 1–10). Ashgate Publishing. https://hdl.handle.net/1959.14/362110

73.

Wiggins

M. W.

(2016). Expertise and cognitive skills development for ab-initio pilots. In Telfer

R. A.

Moore

P. J.

(Eds.), Aviation training: Learners, instruction and organization (pp. 54–66). Routledge.

74.

Wiggins

M. W.

(2021). A behaviour-based approach to the assessment of cue utilisation: Implications for situation assessment and performance. Theoretical Issues in Ergonomics Science, 22(1), 46–62. https://doi.org/10.1080/1463922X.2020.1758828

75.

Wiggins

M. W.

Auton

Sturman

(2020). Evaluating situation assessment in distributed network electricity control. Proceedings of the Human Factors and Ergonomics Society, 64(1), 263–267. https://doi.org/10.1177/1071181320641062

76.

Wiggins

M. W.

Azar

Hawken

Loveday

Newman

(2014). Cue-utilisation typologies and pilots’ pre-flight and in-flight weather decision-making. Safety Science, 65, 118–124. https://doi.org/10.1016/j.ssci.2014.01.006

77.

Wiggins

M. W.

Crane

Loveday

(2018). Cue utilization, perceptions, and experience in the interpretation of weather radar returns. Proceedings of the Human Factors and Ergonomics Society - Annual Meeting, 62(1), 721–725. https://doi.org/10.1177/1541931218621164

78.

Wiggins

M. W.

Loveday

Auton

(2015). EXPERT intensive skills evaluation (EXPERTise 2.0) test. Macquarie University.

79.

Williams

E. J.

Hinds

Joinson

A. N.

(2018). Exploring susceptibility to phishing in the workplace. International Journal of Human-Computer Studies, 120, 1–13. https://doi.org/10.1016/j.ijhcs.2018.06.004

80.

Workman

(2008). Wisecrackers: A theory-grounded investigation of phishing and pretext social engineering threats to information security. Journal of the American Society for Information Science and Technology, 59(4), 662–674. https://doi.org/10.1002/asi.20779

81.

Yuris

Wiggins

Auton

Gaicon

Sturman

(2019). Higher cue utilization in driving supports improved driving performance and more effective visual search behaviors. Journal of Safety Research, 71, 59–66. https://doi.org/10.1016/j.jsr.2019.09.008