Abstract
This study aimed to explore the psychometric properties of the Fear of Failure in Learning Scale among postgraduate students using the Rasch model analysis. Fear of failure significantly impacts postgraduates’ academic performance, yet few techniques are applied for testing the reliability and validity of the Fear of Failure in Learning Scale (Choi, 2021). A total of 414 postgraduate students from the University of Malaya participated in the study. The results showed that both the items and persons had high reliability, and the separation degree was satisfactory. All items were within the acceptable range in the item fit test. Although most dimensions met the unidimensionality criteria, the self-handicapping dimension might be multi-dimensional due to the small number of items. Some items in the scale showed local dependency. The rating scale functioned well, but category “5” was selected less frequently. The item hierarchy indicated a moderate match between item and person abilities, but there were gaps in item location and overlapping items. The differential item functioning (DIF) analysis revealed no significant gender differences. Overall, the Fear of Failure in Learning Scale has excellent reliability and validity, but some items may need modification or removal. This study fills the research gap of comprehensively examining this scale using the Rasch model and provides in-depth analysis for future research, though it has limitations in sample selection and item modification.
Plain Language Summary
What had we studied? Fear of failure in university can harm postgraduate students’ grades and confidence, but there is no easy way to measure this fear accurately. We tested a latest questionnaire called the Fear of Failure in Learning Scale, which is designed for the higher education group, to see if it works well for postgraduate students. How we did it? We asked 414 postgraduate students from University of Malaya to fill out the questionnaire. Using a statistical method (Rasch model), we checked if the questions were clear, fair, and measured what they are supposed to. What we found? The questionnaire is reliable. Students answered consistently, and questions were well-designed overall. Most questions worked as intended, but a few had issues: Some questions about self-handicapping (like avoiding effort) might measure more than one thing. A few questions were too similar or overlapping in difficulty. The highest rating (e.g. “strongly agree”) was rarely used. The questionnaire did not show gender bias. It worked equally well for male and female students. Why these matters? This questionnaire can help supervisors and advisors identify students struggling with fear of failure. However, fixing the problematic questions (like removing duplicates or rewording self-handicapping items) could make it even better. Our study provides a roadmap for improving the tool, though testing it with more diverse student groups is still needed.
Introduction
Fear of failure was originally defined as an individual’s effort to evade feelings of shame or humiliation that arise from failing to meet a goal (Hunter et al., 2021). It is one of the dimensions of achievement motivation as well as an emotional trait, which avoiding unfavorable experience in accomplishment (Elliot & Thrash, 2004; McClelland et al., 1953). Numerous studies have been carried out to investigate how fear of failure relates to students’ engagement in academic activities (Fentaw et al., 2022; Nakhla & Allan, 2025; Picton et al., 2022). The findings showed that fear of failure among students was tied to academic procrastination, which was a kind of delaying behavior on tasks (Cho & Lee, 2022). Meanwhile, some research has also indicated that fear of failure had a connection with students’ completion of their learning journey (Choi, 2021).
A large amount of research had made the conclusion that fear of failure was prevalent among postgraduate students (Byrom et al., 2022; Kumari & Malik, 2021; Nair & Sutar, 2023; Wang et al., 2024). The underlying reason is that postgraduate students require greater self-innovation, time management, and self-discipline skills in their studies (Ferreira, 2021; Razavi & Soltani, 2023; Zhang & Yi, 2023). These demands present significant challenges and pressures for them. When faced with numerous difficulties and challenges, postgraduate students could encounter fear of failure if they find it hard to adjust and manage themselves independently. Meanwhile, fear of failure can adversely shape their academic performance and may even lead to stagnation in their research, resulting in delayed graduation (Mokbul, 2023). According to data released by the Ministry of Education Malaysia, approximately 100,000 postgraduates enroll each year, while the number of graduates remains around 20,000 every year. From the data, the number of postgraduate students who graduate each year is only one-fifth of the enrollment figures. Therefore, it is essential to probe into the psychological factors contributing to the fear of failure among postgraduate students. This understanding can help them recognize their fear of failure and adjust their mindset, allowing them to better adapt to and enjoy the academic research process.
However, there are still limited instruments specifically designed to measure the fear of failure in the master’s and doctoral student population. The most commonly used questionnaire for assessing fear of failure is the Performance Failure Appraisal Inventory, which has been utilized by many researchers to test both undergraduate and postgraduate student groups (Henry et al., 2021; Henschel & Iffland, 2021). Nevertheless, this questionnaire targets the general population instead of the accurate population. Addressing this gap, researcher Choi developed a questionnaire in 2021 specifically aimed at measuring fear of failure experienced by higher education learners during the learning process. The fear of failure in learning scale is a multi-dimensional questionnaire, including feeling of shame, performance avoidance, learned helplessness, and self-handicapping (Choi, 2021). The Achievement Goal Theory served as the fundamental theory underpinning the exploration of the fear of failure relevant to this instrument’s context. Concurrently, the scholar carried out an extensive review of the existing literature regarding the learning experiences of higher education students and integrated the Achievement Goal Theory as the foundational framework for this instrument (Choi, 2021). Scholars have utilized CFA and EFA technique to validate the fear of failure in learning scale applied to Chinese students and translated it into Chinese version. The result reported that it is a reliable and well validated tool to measure the higher education students’ group (Y. Chen et al., 2024).
Except for the CFA and EFA technique, there is another excellent validation technique named Rasch model analysis. Rasch model can improved precision, enhanced objectivity, and increased measurement independence (Astuti et al., 2024). Nevertheless, so far, few researchers used Rasch model analysis technique to conduct a comprehensive psychometric properties assessment of the Fear of Failure in Learning Scale (Choi, 2021). Therefore, the purpose of this study is to fill this research gap by employing Rasch model analysis to access the reliability and validity of this instrument within the postgraduate students’ population.
Literature Review
Fear of Failure in Learning
Fear of failure (FF) involves an intricate interplay among emotion, personality, and cognition. Firstly, fear of failure refers to the emotions such as unease, nervousness, or anxiety that arise when taking into account the possibility of potential failures in the future (Martin & Marsh, 2003). Secondly, it constitutes an element of personality. For instance, elevated levels of neuroticism is a trait that persistently adds to fear of failure in all types of contexts (Noguera et al., 2013). Additionally, fear of failure is a cognitive assessment in specific situations, which means regarding a particular situation perceived as a risk to success (Duru et al., 2024). These aspects are interrelated and jointly explain the definition of fear of failure (Henry et al., 2019).
Research has revealed that the fear of failure exerts an influence on students’ behaviors, their level of participation, as well as their academic achievements throughout the learning process (Tao et al., 2022). This holds particularly true for students in higher education settings. In contrast to K-12 education, higher education functions and operates in accordance with students’ personal choices, requirements, and aspirations regarding learning.
The distinctions in the background, structure, and functions related to learning pose certain challenges, pressures, and strains on higher education students. Higher education demands that students be more proactive and exercise greater self-discipline (Krskova et al., 2021). Moreover, they are expected to be involved in the academic assignments requiring individuals to create, initiate, and evaluate on their own (Marginson, 2024). Put another way, students pursuing higher education are required to have enhanced capabilities in terms of self-regulation, autonomous learning, and assuming responsibility for their own learning.
The elevated demands for these abilities, combined with the higher level of difficulty in learning tasks, result in higher education students shouldering more stress (Javaid et al., 2024). Therefore, the level of their fear of failure becomes quite pronounced. Hence, it is crucial to conduct investigations into the fear of failure among the student population in higher education.
Fear of Failure in Learning Scale
In the existing research, most of them aim to measure individuals’ feelings of shame or anxiety in daily life, and the targeted population is rather broad. There are hardly any questionnaires specifically designed for students, especially for measuring the fear of failure among higher education students. The development of a new questionnaire by Choi’s (2021) research has filled this gap. The name of the questionnaire is “Fear of Failure in Learning Scale,” which mainly focuses on measuring the factors related to the fear of failure in learning scenarios. The scale contains four dimensions: feeling of shame (nine items), performance avoidance (seven items), learned helplessness (six items), self-handicapping (four items). The specific model of the Fear of Failure in Learning Scale (Choi, 2021) is shown as the following Figure 1.

Dimensional construct of the instrument.
Existing Studies on Fear of Failure in Learning Scale
Since Choi developed the Fear of Failure in Learning Scale for the field of higher education in 2021, this measurement tool has been adopted by some researchers in the domains of psychological and biological measurement. For instance, researchers from Manchester Metropolitan University used this scale to help identify a series of psychological characteristics related to fear of failure among undergraduate students (Hopper, 2019). Additionally, other psychological researchers employed the scale to explore the relationships and influencing mechanisms between fear of failure and social-emotional competence (Choi, 2025), learning ability (Yune et al., 2024), as well as perfectionism in the undergraduate population (Surahman & Adhim, 2022). Furthermore, the scale has also been extended to the field of biology to investigate the biological benefits associated with fear of failure (Margulieux et al., 2025).
Nevertheless, in the current body of research on this scale, evidence related to the testing of its reliability and validity remains relatively limited. In previous studies, the scale has been translated into a Chinese version and applied to a sample of 366 undergraduate participants. Exploratory factor analysis (EFA) as well as confirmatory factor analysis (CFA) were utilized to test the scale’s performance, and the results indicated that its convergent validity and reliability reached satisfactory levels (Y. Chen et al., 2024). It should be noted, however, that relying solely on EFA and CFA to validate the psychometric properties of a measuring tool is insufficient in terms of scientific rigor.
Although exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) are widely used to ascertain the dimensionality and construct validity of measurement instruments (Tavakol & Wetzel, 2020), both analytical methods have notable limitations. Firstly, they rely on sample characteristics and the test itself. Specifically, the ability levels of participants are closely associated with items, and the difficulty levels of items are also highly correlated with the participant sample (Bond & Fox, 2015). This limitation restricts the generalizability of results when testing psychometric properties within specific populations. In addition, factor analysis methods only focus on the overall model fit and fail to provide item-level reliability and validity results, making it difficult to meet the needs of refined testing (Christensen et al., 2024). In contrast, the Rasch Model has distinct advantages. Firstly, its item calibration is independent of the sample, and the estimation of participants’ ability levels is not dependent on specific items (Bond & Fox, 2015). This feature enables the results of psychometric property testing to transcend the constraints of specific populations and items, thereby achieving greater generalizability.
Rasch Model Analysis
The Rasch model has been extensively utilized in research within the domains of education and psychology (Khine, 2020). It stems from item response theory and assesses the congruence with scale items by estimating two aspects, specifically the difficulty degree of test questions and the proficiency of test takers. Whether it is during the creation of academic test questionnaires or when accessing the psychometric properties of existing ones, employing the Rasch model offers multiple advantages. These include greater precision, heightened objectivity, and enhanced independence in measurement. Put differently, the Rasch model enables more accurate and unbiased measurement, facilitating a more precise connection between the measuring tool and the latent characteristics of individuals (Astuti et al., 2024).
Utilizing Rasch model analysis constitutes a dependable means of evaluating the rationality and reliability of a measuring instrument in relation to relevant constructs (Aryadoust et al., 2021). This model analysis is a comprehensive and meticulous procedure as it undertakes a thorough assessment of survey participants (Uzunova-Dimitrova & Zhelezov, 2024). Initially, stringent requirements are set for participants. Subsequently, individuals demonstrating a suitable fit are categorized into several groups based on the proficiency level determined via Rasch probability to align with the model. After that, the rationality and consistency of the measuring instrument are verified. This helps to minimize the influence of outliers and misfitting individuals on the reliability and validity tests of the scale and boosts the validity of the scale employed in research endeavors (Ridwan et al., 2023). Additionally, through Rasch model analysis, each item within the scale can be sorted into separate discrete categories according to the item’s level of difficulty (Liu, 2020). It can also reveal redundant items and determine whether there is continuity in the difficulty levels among different items. Furthermore, the Rasch model can conduct a detailed evaluation of the rating scale (Liu, 2020).
Previous research has indicated that the Rasch model has been widely utilized in educational psychology (Golino et al., 2021; Hayat, 2022; Nielsen et al., 2021). For example, Nielsen et al. (2021) adopted the Rasch model to reduce the items of the tool designed for measuring critical thinking and developed a concise questionnaire suitable for the group of Danish students. Another research conducted among undergraduate psychology majors in Indonesia adopted Rasch model analysis to test the item location of the tool, the proficiency level of participants and so on. Detailed conclusions were drawn, and targeted adjustments were made to two items (Hayat, 2022). In short, prior studies have shown that applying Rasch model analysis is a reliable and fruitful approach that facilitates the generation of items possessing both validity and reliability for specific measuring tools, without having to be overly concerned about the impact of responses on the reliability and validity tests of questionnaires (Aryadoust et al., 2021). Moreover, the Rasch model also furnishes empirical support for the link between item scales and respondents.
Given the substantial amount of previous empirical evidence regarding the measurement of the psychometric properties of measuring instruments through Rasch model analysis, this study opts for the Rasch model with the intention of conducting an in-depth analysis of the psychometric properties of the Fear of Failure in Learning Scale (Choi, 2021). This encompasses aspects such as participants, items, the degree of match between item location and individual ability, rating scale, and differential item functioning (DIF) based on gender. This analysis assists researchers in tracking participants’ responses to specific questions and ascertaining the suitability or acceptability of each item within the scale for specific populations. It can also fill the gap where no comprehensive analysis using the Rasch model has been carried out in this area hitherto and offer an alternative specific method for constructing a comprehensive approach to evaluate the rationality and consistency of a measuring scale (Aryadoust et al., 2021).
Methods
Instrument
The fear of failure in learning scale was utilized in this research, which was developed by Choi (2021). It is a multi-dimensional scale, containing four sub-scales: feeling of shame, performance avoidance, learned helplessness, and self-handicapping. Feeling of shame includes nine items; performance avoidance consists of seven items; learned helplessness holds six items, and the final dimension which is self-handicapping comprise four items. In total, this scale involves 26 items. There is no reversed scoring item in this scale.
Meanwhile, this scale employs a five-point Likert scale, ranging from strongly disagree, disagree, neither agree/disagree, agree, and strongly agree, with corresponding ratings from 1 to 5 (Choi, 2021). The higher the score, the greater likelihood of students experiencing a fear of failure (Choi, 2021). The detail information of fear of failure in learning scale is showed in Table 1.
Detail Information of Fear of Failure in Learning Scale.
Ethical Considerations
This research has gotten the ethical approval letter from the University of Malaya’s Research Ethics Committee (reference number: UM.TNC2/UMREC_3218). Meanwhile, the permission letter from the author of fear of failure in learning scale has been obtained by the researcher. The data collection methods adhered to the relevant requirement and policy. At the beginning of each online questionnaire, every participant was required to sign an online participant consent form first. In this consent form, the research objectives, their voluntary participation with the right to withdraw at any time, the inclusion and exclusion criteria, the complete confidentiality of their responses, and other provisions related to participant rights were explicitly stated. Only after choosing to agree can they start answering the follow-up questions. The participants’ consent forms are properly stored in a protected database.
Participants
This research is designed to be a quantitative study. The sample obtained are the postgraduate students at the University of Malaya. The total amount of enrolled postgraduate students at this university is 17,310, according to the data released by the official UM facts in 2024. The recruited postgraduates are the active students. The master’s and doctoral students who are withdraw from the semester or have graduated are out of the criteria.
The sample size was determined utilizing Krejcie and Morgan’s table (Krejcie & Morgan, 1970), which indicated that a minimum of 375 postgraduate participants should be recruited to accurately represent the entire population. This research employed a random sampling method by distributing the online questionnaire via university email. The online questionnaire using Wenjuanxing, which is a wild-adopted survey collection software by researchers, and the collection period lasted 2 months. For the real data collected, a total of 493 participants completed the questionnaire. Among this, 79 respondents who were outliers were removed from the data. Therefore, the final data for Rasch model analysis was 414 participants. Table 2 showed the demographic aspects of the respondents.
The Demographic Profiles of Participants.
Procedure of Data Analysis
Winsteps 3.66.0 was utilized to analyze the data in this research. To fulfill the unidimensionality of the Rasch model, the multi-dimensions were separated into four dimensions, and the four sub-scales were launched into the Winsteps software one by one in the excel form.
The Rasch analysis was conducted following an iterative process to ensure the validity of the measurement scale. An initial analysis of the raw data revealed significant model misfit, primarily driven by a subset of respondents whose response patterns were erratic and incompatible with the model’s expectations. To uphold the fundamental principle of constructing a measure, that is, sufficiently unidimensional and reliable for substantive interpretation, data purification was undertaken.
Person measure was analyzed firstly to remove the outliers and misfitting respondents. After that, the final data compatible with the Rasch model will be incorporated into the analysis system to examine whether there were misfitting items and unexpected item polarity. Later, the reliability and separation of person and items were analyzed. Subsequently, the unidimensionality of each sub-scale was assessed, following with the item targeting, item bias on gender, and rating scale functionality.
Results
Unidimensionality checks on the initial data revealed that the model for each subdimension was only marginally acceptable. To address this, we adopted a conservative fit criterion (Linacre, 2012) and removed 79 participants with Outfit MNSQ values >2.0, who were deemed to introduce substantial measurement noise. Although there were some scholars recommended that 5% of persons whose Outfit MNSQ >2 were tolerable (Bond & Fox, 2015), the retention of these respondents was found to distort the measurement distribution and compromise the reliability of several items. The final analysis on the purified sample of 414 respondents demonstrated a robust measurement model with significantly enhanced person and item reliability.
The resulting model demonstrated excellent psychometric properties. The person reliability was 0.95, and the item reliability was 0.99. A principal component analysis of the residuals supported the essential unidimensionality of the scale, with the eigenvalues of the first contrast well below the recommended threshold of 2.0 (Aryadoust et al., 2021). The infit and outfit MNSQ statistics for all items and the vast majority of persons fell within the acceptable range of 0.5 to 1.5, indicating good model-data fit (Aryadoust et al., 2021).
Person and Item Reliability
As shown in Table 3, the reliability value for persons (N = 414) was 0.95, indicating that the persons were highly reliable. The separation value for persons was 4.18, which allows for the division into four different ability groups. These results meet the requirements for Rasch model analysis in terms of person ability diversity.
Person and Item Reliability.
The reliability value for the items (N = 26) was 0.99, indicating that the items were set up excellently. According to the rules of Rasch model analysis, the item separation value should be at least 2, with values above four considered ideal, indicating that the measurement tool has strong discriminative ability (Malec et al., 2007). Based on the results in Table 3, the item separation value was 8.77, which means that the item demonstrates good discrimination and is capable of effectively measuring across multiple ability levels.
Item Fit
As for the item measure, previous scholars suggested to check the criteria of infit and outfit-mean square value (validated criteria: 0.5–1.5), outfit Z-standard value (validated criteria: −2.0 to 2.0) and the point measure correlation value (validated criteria: 0.4–0.85; Bond & Fox, 2015; Boone et al., 2014). However, the Z-standard value is sensitive with large sample size, typically >3.0 or below −2.0. For this reason, researchers recommended to refer to the infit and outfit-mean square instead of the Z-standard value when the quantity of participants is ample (Soeharto & Csapo, 2021). Table 4 displayed the results of item measure on fear of failure in learning scale (Choi, 2021).
The Results of Fear of Failure in Learning Scale Item Fit.
The data presented in Table 4 indicate that the “pt. mean corr.” values for all items are close to the “pt. mean exp.” values. Specifically, Item 2FS (pt. mean corr. = 0.80; pt. mean exp. = 0.84), Item 3PA (pt. mean corr. = 0.73; pt. mean exp. = 0.78), Item 16PA (pt. mean corr. = 0.73; pt. mean exp. = 0.78), and Item 22LH (pt. mean corr. = 0.71; pt. mean exp. = 0.77) have actual values slightly lower than the expected values, but they still fall within an acceptable range. Besides, every item demonstrated a positive point mean correlation, which means they all support the measurement of the hypothesized construct. As a result, all the items were retained to contribute to the measurement construct.
Unidimensionality
In Rasch model analysis, it is required that the measurement tool satisfies the unidimensionality test. The following three conditions must be met: First, the first component’s explained variance should be >50% (Astuti et al., 2024). Second, the residual eigenvalues of unexplained variance in 1st contrast should remain within the range of 1.4 to 2.1. Third, the acceptable percentage of unexplained variance in first contrast should below 10% (Yao et al., 2025). Table 5 displayed detailed information on the unidimensionality of each dimension of the Fear of Failure in Learning Scale (Choi, 2021).
Unidimensionality Data of Each Dimension of the Fear of Failure in Learning Scale.
For the overall scale, which is a four-dimensional instrument, the principal component analysis of the residuals revealed that the Rasch dimension explained 54.1% of the variance in the data, with an eigenvalue of 5.3. Eigenvalue was higher than 2, indicating the tool was multi-dimensional. When the four dimensions were analyzed separately using the Rasch model, the “feeling of shame” dimension explained 70.6% of the variance, the “performance avoidance” dimension explained 60.7% of the variance, the “learned helplessness” dimension explained 59.5% of the variance, and the “self-handicapping” dimension explained 66.9% of the variance. These data reflected that the percentage of variance explained for all dimensions is higher than the 50% threshold recommended by scholars.
Furthermore, the unexplained variance eigenvalues for the first contrast of the four dimensions were 1.5, 1.5, 1.6, and 1.6, respectively, all of which are below the recommended value of 2. Therefore, each individual dimension of the instrument met the unidimensionality criteria of the Rasch model (Aryadoust et al., 2021). However, it is important to note that the unexplained variance percentages for the “learned helplessness” and “self-handicapping” dimensions exceeded 10%, with values of 10.7% and 13%, respectively. Although the unexplained variance percentage of learned helplessness was slightly higher than the recommendation rate, it was still acceptable. As for the dimension of self-handicapping, one reason for this phenomenon is the small number of items on the externalizing subscale.
Local Dependency
Local independence is another key assumption of the Rasch model, which asserts that items in a test should not exhibit inter-item correlations. According to Yen’s Q3 theory (Yen, 1984), residuals between any two items should be uncorrelated. To detect local dependence using Q3, a standard approach is to employ a consistent threshold where the absolute value is 0.2 (W. H. Chen & Thissen, 1999). However, some researchers have proposed a critical value of 0.5 as an alternative (Davidson et al., 2004; Ten Klooster et al., 2008). Meanwhile One consensus that has been reached is that the residual correlation value should not exceed 0.7. Table 6 presented the large standardized residual correlation.
Large Standardized Residual Correlation.
All the residual correlation values of items in the Fear of Failure in Learning Scale (Choi, 2021) did not reach 0.5. However, as shown in Table 6, eight pairs of items exhibited potential standardized residual correlations due to the high similarity in the content of the items.
Category Scale Functioning
The evaluation of category scale functioning is carried out with the aim of ascertaining the degree to which respondents made use of all the available rating options (Chong et al., 2022). Firstly, Smith et al. (2003) proposed that it is advisable for making sure each response category has a respondent frequency of no <10. The underlying reason is that response categories with a low number of responses fail to supply adequate information, thereby precluding the possibility of stable estimation. Additionally, categories with only a few respondents tend to signify that they are either unnecessary or redundant. In such cases, these categories may be merged into adjacent ones to optimize the categorization system.
Secondly, it is required that the observed average value remains monotonically increasing. Thirdly, the outfit measures value should <2.0 to escape from the misinformation and noise in the category analysis (Linacre, 2004). Finally, for a five-point scale, Linacre recommends that the distance between adjacent thresholds, which defines a distinguishable range on the variable continuum, should be within the range of at least 1.0 logit and not surpass 5.0 logits. This is to preclude the occurrence of significant gaps in the variable. Table 7 displayed that summary statistics of category functioning.
Category Functioning Statistics.
Table 7 showed that the frequency of each category being selected is >10. Specifically, for Category 5, the number of respondents choosing it is merely 17, which is slightly higher than the minimum requirement. Although it was employed by respondents with a relatively low frequency.
Moreover, the values of the average measure are in a monotonically increasing state. The maximum outfit value is 1.69, which is less than the recommended range of 2, indicating its acceptability.
Furthermore, by observing the values of the structure calibration, it is found that the interval differences between each adjacent category fall within the range of 1 to 5, all of which are within a reasonable distance.
As depicted in Figure 2, the category probability curves manifested that the category thresholds progressed in accordance with the category level. Moreover, each category exhibited a distinctive peak, thereby implying that the participants could reliably discriminate among the response categories. As a result, all categories demonstrate good adaptability.

Category probabilities: modes–structure measures at intersections.
Item Hierarchy
To conduct a more in-depth evaluation of the strengths and weaknesses of the “Fear of Failure in Learning Scale,” the Wright Map provides a means to assess how effectively the scale items are distributed in relation to the ability levels of participants regarding the latent variable, specifically the fear of failure. As depicted in Figure 2, the Wright Map presents both the representations of participants’ abilities in the left column and the representations of items in the right column.
In the Wright Map from Figure 3, the item mean has been set to 0.05 logit. This setting guarantees that each group of respondents approximately has an equal chance, close to a 50:50 ratio, of responding to the item in a way that aligns with their respective abilities.

Wright map.
Simultaneously, the person mean is configured at 0.00 logit. Given that the values of both the item and the person are nearly the same, it can be concluded that the items within this sample are well-targeted, meaning they are appropriately designed to match the abilities of the respondents.
Nevertheless, it was observed that the distribution of person ability measures was somewhat broader compared to the distribution of item location measures. This implies that the items did not cover the ability of respondents, and additional items are required to more effectively distinguish students who possess different levels of fear of failure within a wider range.
As for each item, item 7 is situated at the highest location among all items on the Wright Map, signifying that it is the most difficult to endorse. Consequently, only respondents with the highest level of fear of failure are statistically more likely to agree with this statement. While item 2 occupies the lowest position among the item measures. This indicates that only the lowest level of Fear of Failure is required for a respondent to have a high probability of endorsing this statement. It should be noted that there was a relatively wide spacing between item 7 and the second highest location item, which was item 26. This indicates the necessity of filling the gap between these two items by adding items with corresponding item location parameters that align with the latent trait continuum between them.
Moreover, there were five groups of items that showed overlapping (i.e. items 11, 12, 14, 18, 22, and 23). This suggests that these five groups of items, which were respectively arranged on the same parallels, were testing the same ability, implying that these items might need to be reduced. However, in this 26-item version of the questionnaire, when it came to the item fit measure, each item demonstrated a good fit. Therefore, all the items were retained in this study for further analysis.
DIF Analysis
Differential Item Functioning (DIF) can be used to examine any item with a bias in two independent groups (Penfield & Camilli, 2006). To identify the item bias based on gender, three criteria should be checked. Firstly, the absolute value of the DIF contrast needs to be >0.43. Secondly, the absolute value of t should be >2. Finally, the Welch probability value should be <.05 (Penfield & Camilli, 2006).
Upon inspection, it was found that the absolute values of all the DIF contrast values were below 0.43. For most of the items, the t-value results also fell within the range from −2 to 2, and the p > .05 for the comparison between males and females. Even though researcher pointed out that the value of the difficulty contrast is the most crucial number in a differential item functioning (DIF) analysis (Metu, 2020), for some items, the absolute value of the t-value was over 2, with the p < .05. Table 8 presented the statistics of the items that might have biases based on gender.
Bias Items on Gender.
Discussion
This study aimed to examine the psychometric properties of Fear of Failure in Learning Scale (Choi, 2021) in higher education, utilizing Rasch model analysis. In the preliminary stage before the formal analysis, due to the large sample size (493), there were some misfitting or outliers in the raw data. Therefore, participants were first screened using the Rasch model with the aim of making them more suitable for the Rasch model. Eventually, 79 people were excluded, and 414 valid participants were confirmed for further exploration.
Based on the application of the Rasch model in this study for measuring the reliability and validity of the Fear of Failure in Learning Scale, this study utilized the Winsteps software to conduct data analysis in the following seven aspects: item and person reliability, item fit, unidimensionality, local dependency, category scale functioning, item hierarchy, and DIF analysis. The results demonstrated that both items and persons were highly reliable, and the degree of separation was also satisfactory. When examining the fit of items, all items were within the acceptable range, and no item was removed, this result was aligned with the previous research (Y. Chen et al., 2024; Surahman & Adhim, 2022).
As for unidimensionality, the results of the dimension of self-handicapping indicated the potential presence of multidimensionality, which might be related to the insufficient number of items (item = 4). Subsequently, in the local dependency test, the results showed that most of the questions in the self-handicapping dimension had an overlapping relationship. The results of the former and the latter mutually corroborated. This dimension of self-handicapping still requires further exploration by other researchers using different analytical methods in different countries or populations.
In addition to the examination of the items, this study also analyzed the function of the rating scale using the Rasch model. The results showed that although more than 10 people selected category “5,” which should be regarded as acceptable, compared with other categories, category “5” was selected very few times. The reason may be that the sample was highly educated individuals (masters and doctoral students), they may have a deeper and more meticulous understanding of the scale and thus be more cautious when making selections. Highly educated groups may regard “strongly agree” as an extremely high evaluation criterion and are reluctant to give the highest rating easily. As a result, further qualitative research needs to be conducted, such as interview or open-ended questionnaires, to validate their thoughts based on this “strong agree” category.
Regarding item hierarchy, the results demonstrated a moderate match between item and person abilities. However, a small number of participants were not covered by the items. Meanwhile, five groups of items showed overlapping, implying that some questions redundantly tested the same probability of endorsing statements., requiring item modification or item reduction. Moreover, there was a significant spacing between the highest location item and the second highest location one, suggesting that items with corresponding item location parameters need to be inserted to fill the gap. Therefore, future investigation can be conducted to revise items 11, 12, 14, 18, 22, and 23 in various item location levels.
As for the DIF analysis, the questionnaire did not reveal significant gender differences. Even though the t value and Welch prob. value of items 17 and 26 indicated a slight gender difference, the DIF contrast value was small, which means that this difference is insignificant in practical applications.
Conclusion
This study makes a practical contribution through the utilization of Rasch model analysis to determine the psychometric characteristics of the scale for Fear of Failure in Learning within the higher education population. Overall, the Fear of Failure in Learning Scale (Choi, 2021) had excellent reliability and validity and showed no gender difference. Meanwhile, the rating scale functions well. There was a gap in the difficulty levels between the two most difficult items. Moreover, in one of the dimensions of this questionnaire, namely self-handicapping, due to the small number of items, some items exhibit overlapping. Therefore, some items may require modification or removal.
The questionnaire selected in this study is a relatively new one, which is specifically developed for the higher education population. In previous studies, some researchers have explored the reliability and validity of the questionnaire by using CFA and EFA methods. However, there is still a research gap in conducting a comprehensive examination of the questionnaire using the Rasch model. This study fills this gap and conducts an in-depth analysis of the questionnaire in as many aspects as possible.
This study also has some limitations. The master and doctoral students in the sample all come from the same research-oriented university. As a result, findings of this research are only applicable to the institutions which is research based and the population of postgraduates who are required dissertation or thesis tasks. Focusing on the samples, further studies is needed to be multi-institutional and cross-cultural, such as private and other cultural background universities. Moreover, a limitation of the present study is that the Rasch model, which assumes unidimensionality, was the sole analytical approach used, whereas the questionnaire was multidimensional. This highlights the need for further exploratory or confirmatory factor analysis among the population of postgraduate students. In addition, regarding the modification or removal of items, it requires other studies to explore in detail, and even a short form may be needed.
In conclusion, this study provides an in-depth analysis of the psychometric properties of the Fear of Failure in Learning Scale (Choi, 2021) among higher education population for researchers and offers support for future research.
Footnotes
Acknowledgements
I would like to thank Prof. Dr. Harris Shah Abd Hamid for initially guiding me in learning the Rasch model during my master’s program and Dr. Bambang Sumintono for providing in-depth insights on the model at that time. Their earlier support laid a foundation for the analytical work in this study.
Ethical Considerations
This paper has obtained the approval from the Ethical Clearance Committee of the University of Malaya, with the permit number (reference number: UM.TNC2/UMREC_3218). In addition, this research has also obtained the permission certificate for the use of the questionnaire.
Consent to Participate
The following content was clearly stated in advance on the title page of the online questionnaire: Research title, research purpose, research procedure, included and excluded participants in this research, benefit to participants, risk to participants, confidentiality, and contact number for complaints.
Consent for Publication
Written informed consent for publication of student academic data and anonymized information was obtained from all participants. Identifying details has been keep by the researcher confidentiality.
Author Contributions
Wang Li (first author): conceptualization, methodology, data collection, formal analysis, investigation, writing—original draft, visualization. Azmawaty Mohamad Nor (corresponding author): supervision, writing—review and editing, project administration. Amira Najiha Yahya (second author): supervision, validation, writing—review and editing. All authors have read and approved the final manuscript.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The research data supporting this study is available.
