Sage Journals: Discover world-class research

Abstract

In this study, we explored psychometric network analysis (PNA) as an alternative method for identifying item wording effects in self-report instruments. We examined the functioning of negatively worded items in the network structures of two math-related scales from the 2019 Trends in International Mathematics and Science Study (TIMSS); Students Like Learning in Mathematics (SLLM); and Students Confident in Mathematics (SCM). We also explored how the negatively worded items functioned in network structures across demographic subgroups. Data were drawn from eight countries that represented diverse levels of math performance and cultural attitudes toward school (n = 75,972). We found that negatively worded items were distinct from the positively worded items in the SLLM and SCM item networks, and that this effect was consistent across all age- and country-level subgroups. Based on these findings, we recommend PNA as a data-driven approach for detecting wording effects effectively.

Many self-report measures, such as psychological scales and surveys, contain negatively worded (NW) items through negations (e.g., no, not, none, and never) and negative connotations (e.g., bad, boring, unpleasant, and fearful). These items are phrased to reverse the polarity of a response scale, such that high endorsement represents low levels of the psychological trait. NW items are typically intended as “speed bumps” to encourage respondents to answer more carefully (e.g., Podsakoff et al. 2003). They may also help eliminate positive response bias (e.g., Hinz et al. 2007) and provide better domain coverage of the target trait (Weijters and Baumgartner 2012).

Despite the benefits of including NW items in self-report measures, such items may also lead to unintended measurement issues. Respondents tend to endorse less extreme response categories in NW items (e.g., Weems et al. 2006; Yang et al. 2012), suggesting that they interpret negative statements differently (Weems et al. 2006). Additionally, individuals at the midlevel of the target latent trait may tend to disagree with positively worded and NW items (Kam and Meyer 2023; Kam et al. 2021). As such, NW items can result in a misalignment of the response categories and decrease internal consistency (e.g., Roszkowski and Soven 2010; Zeng et al. 2020). NW items can also impact the factor structure of a scale, either by forming a different construct than was theorized or otherwise distorting it (DiStefano and Motl 2006; Kam and Meyer 2015; Weijters and Baumgartner 2012). These impacts of NW items on the psychometric quality of self-report measures are referred to as item wording effects. Notably, the severity of item wording effects depends on respondent characteristics such as age (Bulut and Bulut 2022; Michaelides 2019; Yang et al. 2012), race (Lindwall et al. 2012; Michaelides 2019; Yang et al. 2012), reading abilities (Bolt et al. 2020; Bulut 2021; Gnambs and Schroeders 2020; Michaelides 2019; Weems et al. 2006), and education level (Bolt et al. 2020).

For self-report instruments with an empirically validated factor structure, item wording effects can be detected using factor analytic methods (e.g., confirmatory factor analysis [CFA] and exploratory item factor analysis [IFA; Bock et al. 1988]) and other methods based on item response theory (IRT). In CFA models, NW items are modeled as a separate factor distinct from the theoretical factor structure. The rationale is that if item wording effects are present, models with a separate factor for NW items will outperform the theoretical model assuming no wording effects (e.g., Gu et al. 2015; Lindwall et al. 2012; Michaelides 2019; Roszkowski and Soven 2010). Similarly, exploratory IFA can be used for identifying the optimal number of latent traits and the item–trait relationship in relation to item wording; however, exploratory IFA, as a data-driven approach, does not require specifying an item–trait structure beforehand. In the IRT-based methods, respondents' latent trait levels and item statistics are examined to evaluate item wording effects (e.g., Sliter and Zickar 2014; Wang et al. 2015). For example, Wang et al. (2015) used the bifactor IRT model to examine item wording effects and found that they produced an overestimation of test reliability and biased estimates of the latent trait. Bolt et al. (2020) adapted IRT mixture modeling for item wording effects and found similar results.

Despite their ease of use, the factor analytic and IRT-based approaches have several limitations in detecting item wording effects. First, the factor analytic and IRT-based approaches focus exclusively on the relationship between the latent trait and the items rather than examining relationships among the items. Therefore, these approaches fail to offer insights into how the items interact and mutually influence each other depending on their wording. Second, both approaches make strong statistical assumptions. For example, the factor analytic approaches assume a linear relationship between the latent trait and the item response options. The IRT-based approach also has specific assumptions regarding the model used (e.g., all items must have equal discrimination in the Rasch model) (Sliter and Zickar 2014). Third, both CFA and IRT require a causal structure (i.e., item–trait relationship) to be specified based on prior knowledge (Marsman et al. 2018). However, if the item–trait relationship is not entirely known or misspecified, then model parameters and additional parameters associated with item wording effects may be erroneously estimated (DiStefano and Motl 2006; Jin and Wang 2014; Kam and Meyer 2015; Wang et al. 2015). Although exploratory IFA can avoid the problems caused by erroneous item–trait specification, it also has some drawbacks affecting its utility in detecting item wording effects, such as high redundancy due to unnecessary model parameters and lower generalizability (Huang et al. 2017).

Psychometric Network Analysis

Psychometric network analysis (PNA; Borsboom and Cramer 2013) is an alternative method that can be applied to model latent traits (e.g., psychological traits). PNA has gained significant use in psychology, including the subfields of clinical psychology (e.g., Borsboom and Cramer 2013; Christensen et al. 2019; Fried and Nesse 2015), personality research (e.g., Briganti and Linkowski 2020), and intelligence research (e.g., Bulut et al. 2021; Golino and Demetriou 2017; Kan et al. 2020). Researchers have also used PNA for scale development, scale validation, dimensionality analysis, and dimension reduction (e.g., Bansal et al. 2020; Briganti and Linkowski 2020; Christensen et al. 2020; Golino and Demetriou 2017; Golino and Epskamp 2017; McGrew et al. 2023).

The PNA approach involves three steps (Epskamp et al. 2018:196): (1) estimating a statistical model based on data, resulting in a weighted network between observed variables; (2) analyzing the weighted network structure using indices (e.g., node strength); and (3) assessing the accuracy of the network parameters and measures. The outcomes of PNA are presented visually in a network graph where nodes (i.e., circles) represent observed variables, whereas edges (i.e., lines connecting the nodes) represent the statistical relationships (e.g., partial correlations) between nodes. In PNA, there are several indices for nodes and edges indicating the importance of nodes or edges to the given network. Edge weights are the partial correlation between two nodes. Thicker edges show stronger correlations, and the length of edges shows how quickly a node affects another node.

In contrast to the CFA, IFA, and IRT methods, PNA offers several advantages in identifying item wording effects. First, PNA focuses exclusively on the correlations among the items and, thus, it may yield more accurate descriptions of expected and unexpected relationships among the items (Christensen et al. 2020). Second, PNA captures complex relationships, such as indirect connections among the items, thereby offering further insights into inconsistent interactions (e.g., item wording effects) that are not easily attainable through the factor analytic or IRT approaches (Borsboom et al. 2021). Third, PNA does not require any specification of item-trait relationships (i.e., factorial structure). Thus, researchers would not have to make assumptions about whether positively worded and NW items measure the same latent trait. Fourth, as a model-free approach, PNA enables the discovery of unexpected relationships and patterns in the data, making it particularly suitable for detecting anomalies, such as item wording effects. Lastly, PNA offers powerful visualizations that show the connections between items, making it easier to interpret findings effectively (Borsboom et al. 2021) compared to the factor analytic and IRT-based methods.

A noteworthy assumption of PNA is that networks are homogeneous across subgroups of a target population, such as demographic groups based on gender and race (Jones et al. 2020). As mentioned previously, the impact of item wording effects may vary depending on respondents' demographic characteristics (e.g., race, age, and gender). An extension of PNA, network model trees (Jones et al. 2020), provides a potential solution to examine the viability of this homogeneity assumption. Demographic variables can be treated as covariates in network trees and added to PNA to investigate the presence and intensity of item wording effects by subgroups. In summary, PNA offers greater flexibility in evaluating how the items are interconnected in the presence of item wording effects and whether the inter-item relationships depend on subgroup membership.

Current Study

In this study, we used PNA to examine item wording effects in two self-report measures related to students' psychological beliefs towards mathematics (i.e., Students Like Learning in Mathematics [SLLM] and Students Confident in Mathematics [SCM]). To our knowledge, this study represents one of the initial endeavors to leverage PNA for detecting item wording effects in self-reported measures. Additionally, it explores these effects across diverse groups utilizing network tree models. We investigated whether NW items in these scales constituted a different community (i.e., dimension) relative to the positively worded items. The study aimed to answer the following research questions:

1. Do the SLLM and SCM scales exhibit item wording effects due to NW items?

2. Can PNA detect item wording effects in SLLM and SCM?

3. Does the functioning of NW items in the network structures of SLLM and SCM vary based on country and grade level?

Method

Sample

Data were drawn from fourth and eighth graders (n = 75,972) who participated in the 2019 Trends in International Mathematics and Science Study (TIMSS). TIMSS is an international, large-scale assessment that evaluates students’ learning outcomes in mathematics and science every four years in the fourth and eighth grades. In addition to assessing student performance, TIMSS collects data on different aspects of students’ home and school lives, such as demographic information and contextual information (e.g., home learning environment, school climate, and attitudes toward learning mathematics and science).

The sample of this study included eight countries: Japan, Korea, Hong Kong, Chinese Taipei, Kazakhstan, Turkey, Morocco, and Saudi Arabia. These countries were selected to obtain a heterogeneous sample in terms of cultures, attitudes, and math achievement (Mullis et al. 2020). These countries were either at the bottom or top regarding their attitude scale scores in both grades, as reported in the TIMSS 2019 report (Mullis et al. 2020). For achievement, Japan, Korea, Hong Kong, and Chinese Taipei were the top performers; Turkey and Kazakhstan were in the middle range; and Morocco and Saudi Arabia were in the lower range of average mathematics scores. Sample sizes by country were as follows: 8,642 (50% female, 49% fourth grade) for Japan, 7,754 (49% female, 50% fourth grade) for Korea, 6,233 (46% female, 48% fourth grade) for Hong Kong, 8,680 (49% female, 43% fourth grade) for Chinese Taipei, 9,244 (49% female, 52% fourth grade) for Kazakhstan, 8,105 (51% female, 50% fifth grade) for Turkey, 16,181 (49% female, 48% fourth grade) for Morocco, and 11,133 (51% female, 49% fourth grade) for Saudi Arabia.

Instruments

The SLLM and SCM scales consist of nine items rated on a four-point Likert scale (1 = Agree a lot, 2 = Agree, 3 = Disagree, 4 = Disagree a lot). These scales were selected for this study because they were applied in both the fourth- and eighth-grade levels, with only minor modifications in two items. Also, both scales contained NW items; two on the SLLM and five on the SCM are phrased negatively (see Appendix A). Both scales demonstrated sufficient internal consistency in fourth- and eighth-grade samples (Yin and Fishbein 2020).

Data Analysis

Student responses to positively worded items were reverse coded so that higher scores would indicate greater confidence or liking for learning mathematics (i.e., 1 = Disagree a lot, 2 = Disagree, 3 = Agree, 4 = Agree a lot). To examine item wording effects, we applied Exploratory Graph Analyses (EGA; Golino and Epskamp 2017) to each scale separately to identify the number of dimensions in the networks of SLLM and SCM. EGA is a data-driven network analysis approach that can reveal communities (i.e., dimensions) in networks without pre-specifying a factorial structure (Christensen and Golino 2021). In this study, we anticipated that if substantial item-wording effects were present in SLLM and SCM, then positively worded and NW items in each scale would display distinctive communities in their networks.

In EGA, we used the polychoric correlations among the items to estimate a Gaussian Graphical Model (GGM) where edges represented partial correlations between the items in each scale, after controlling for all other relations among the items. To estimate the GGM, we used the Graphical Least Absolute Shrinkage and Selection Operator (GLASSO) algorithm in the EGAnet package (Christensen et al. 2020; Golino and Christensen 2022) in R (R Core Team 2022). To identify distinctive communities in the networks, we used the walktrap community detection algorithm (Pons and Latapy 2006), which performs iterative and random walks over the network, provides a similarity measure based on these random walks, and then identifies distinctive communities.

One of the network measures (node strength) in EGA is roughly equivalent to factor loading in CFA models (Christensen and Golino 2021), the good-fit indices produced by EGA analysis are identical, allowing for direct comparisons with CFA models (Golino and Demetriou 2017; Golino and Epskamp 2017). Previous studies indicated that EGA is a more effective approach for identifying the number of dimensions than factor analytic techniques such as parallel analysis (Golino and Demetriou 2017; Golino and Epskamp 2017). To confirm the findings of EGA, the factor structures of SLLM and SCM were also examined using a one-factor CFA model (i.e., a single factor based on positively worded items and NW items), a two-factor CFA model (i.e., two separate factors for positively worded items and NW items), and a bi-factor model (i.e., a general factor for all items and two additional factors for positively worded and NW items) using the Lavaan package (Rosseel 2012) in R (R Core Team 2022). The factor models were estimated using the weighted least square mean and variance adjusted estimator. Goodness-of-fit criteria, including root mean square error of approximation (RMSEA), Tucker-Lewis index (TLI), and comparative fit index (CFI), were used to confirm sufficient model-data fit for each model (RMSEA ≤ 0.06, TLI ≥ 0.95, and CFI ≥ 0.95; Hu and Bentler 1999).

To identify potential subgroups in the network of SLLM and SCM, we used network model trees based on the model-based recursive partitioning approach (MOB; Zeileis et al. 2008) available in the network tree package (Jones et al. 2020). The MOB algorithm uses covariates to split the network into subnetworks where the network model parameters are maximally heterogeneous (Jones et al. 2020). The package, specifically the comparetree function, offers an adjacency matrix illustrating the variances for each edge, facilitating the assessment of large discrepancies between the networks. With the network model trees, we examined whether the networks of SLLM and SCM would differ based on age groups and country for the students participating in TIMSS 2019. Age was selected as a covariate because previous research showed that age and reading comprehension (partly a function of age) could be associated with item wording effects (Bolt et al. 2020; Bulut 2021; Bulut and Bulut 2022; Michaelides 2019). Country was selected to explore the impact of different languages on the resulting networks, as each country included in the sample of this study speaks a different language.¹

Results

The Dimensionality of SLLM and SCM

The EGA analyses revealed two distinctive communities (one for positively worded items and another for NW items) for both the SLLM and SCM scales. As shown in Figure 1, the items with the same type of wording share stronger connections (i.e., thicker edges) with each other. It should be noted that responses to positively worded items were reverse coded before performing psychometric network analysis. Thus, all the edges in the networks shared the same color. In the network plot of SLLM, the strongest connection appears to be between the NW items across all connections. In the network plot of SCM, the distinction between the communities for the positively worded items and NW items seems more evident based on how the items are clustered together.

Figure 1.

Network plots of (a) SLLM and (b) SCM.

Table 1 presents the results of the CFA and EGA models for the SLMM and SCM scales. The CFA results indicated that the network modes derived from EGA fit the data better than the one-factor model in both SLMM and SCM. The model-fit difference between the one-factor model, the two-factor model, and the network model seemed to be negligible in SLLM. This finding was not necessarily surprising because the number of NW items in SLLM (two of the nine items) was very small relative to the number of positively worded items. The bi-factor model did not converge properly for SLLM as it produced negative factor loadings and variance estimates. Unlike SLMM, the SCM scale showed that the two-factor CFA model, the bi-factor model, and the network model outperformed the one-factor CFA model based on all fit indices, supporting the presence of item wording effects in SCM. Table 1 also shows that the two-factor CFA and the network models obtained from EGA fit the data equally well, providing nearly identical fit statistics. In contrast, the bi-factor model indicated a relatively worse fit. This finding suggests that the network model derived from EGA can be used to reveal psychometric issues leading to the distortion of factor structures (e.g., item wording effects) in self-report measures without requiring any prior hypothesis on the factorial structure (e.g., separate factors based on item wording or a general factor with additional factor based on item wording).

Table 1.

Model Fit Indices for the CFA and EGA Models for the SLMM and SCM Scales.

Scales	Models	x ²	df	RMSEA	CFI	TLI
SLLM	One-factor	4867.33	27	0.05	0.99	0.99
	Two-factor	468.08	26	0.02	1.00	0.99
	Bi-factor	—	—	—	—	—
	EGA	1890.19	26	0.03	1.00	1.00
SCM	One-factor	30136.12	27	0.13	0.92	0.91
	Two-factor	3226.37	26	0.04	0.99	0.99
	Bi-factor	12061.18	21	0.09	0.97	0.95
	EGA	8354.72	26	0.05	1.00	0.99

Note: RMSEA: Root mean squared error of approximation; CFI: Comparative Fit Index; TLI: Tucker-Lewis Index. The bi-factor model results for the SLLM scale are not available since the model failed to converge and produced negative variance estimates.

Network Model Trees of SLMM and SCM

The network tree models of SLMM and SCM are extensive networks characterized by numerous splits. The variances determine these splits observed for each edge, effectively depicting significant discrepancies between the subnetworks. Figure 2 shows the network tree of the SCM scale based on country and grade level. The primary split occurred between high-performing countries (except for Kazakhstan) and lower-performing countries, followed by additional splits based on the grade level. These findings indicate potential heterogeneity in the network of SCM due to differences at the country and grade levels. Despite these significant splits based on country and grade level, the NW items of SCM (red nodes) remained disconnected from the positively worded items (green nodes) in the network plots across all terminal nodes. However, there are some minor differences among the terminal nodes. For example, the item communities based on positively worded items and NW items were connected with negative edges (see the red lines) for some countries and grade levels, such as both fourth and eighth graders in Hong Kong and Saudi Arabia. Similar trends were also observed between the items of the SLLM scale (see Appendix B). Overall, the network tree analyses suggest that the NW items continue to produce a distinctive community in the network structures of SLLM and SCM, regardless of the two covariates, country and grade level.

Figure 2.

The network model tree of SCM.

Discussion

This study used PNA to investigate the effects of item wording on the SLLM and SCM scales. By dividing the network model trees by country and age, we explored item wording effects among different student groups. The PNA approach allowed us to examine the relationships between positively worded and NW items and visualize item clusters based on their wording. Our findings provided evidence of item wording effects in both scales, with NW items forming a distinct community within the network structures. These items were also found to have stronger relationships with each other than positively worded items, suggesting that students may have different tendencies when endorsing them. These results were consistent with previous studies that employed different methods for detecting item wording effects (e.g., Bolt et al. 2020; DiStefano and Motl 2006; Gnambs and Schroeders 2020; Kam and Meyer 2023; Lindwall et al. 2012; Michaelides 2019; Wang et al. 2015) and demonstrate the usefulness of PNA in revealing such relationships. Network model trees helped us further examine the connectivity and importance of NW items by partitioning the network structures based on country and grade level. The partitioned network structures showed the unique interactions between negatively and positively worded items across different countries and grade levels. Of these two covariates, the country is the most significant one affecting initial splits. The findings of our study reinforce previous research regarding the impact of age and culture on item wording effects (e.g., Michaelides 2019; Weems et al. 2006).

Furthermore, our findings highlight the significant impact of cultural factors on the diversity observed within the network models, with cultural influences often overshadowing those of grade level or age. Considering the profound influence of culture on language, numerous studies also indicated that respondents’ cultural backgrounds could significantly shape their perception of NW items in different languages (Michaelides 2019; Schmitt and Allik 2005). Thus, cultural influences on the perception of NW effects may outweigh the influence of age on such perceptions. In another study conducted across five European countries, Lindwall et al. (2012) reported similar findings, with item wording effects models remaining consistent across different age groups but varying significantly across countries. This body of evidence underscores the necessity of considering cultural factors when interpreting perceptions and responses within network models.

Overall, compared to the methods utilized in previous studies (e.g., multi-group factor analysis), network model trees seem to detect significant group differences more easily, signaling the presence of wording effects for a particular demographic group (i.e., identifying meaningful subgroups in the data) (Jones et al. 2020; Zeileis et al. 2008).

Implications for Practice

Our findings suggest that it is possible to harness PNA to detect item wording effects. PNA offers unique, item-level insights into the functioning of NW items that cannot be examined through conventional models without specifying a causal relationship between the items and the target constructs being studied. Examining the network structure of a scale can help researchers and practitioners understand how NW items relate to other items in the scale. We also want to note that item wording effects may emerge due to careless and insufficient effort (C/IE) responses (Arias et al. 2020).² Therefore, researchers and practitioners should consider applying network methods and other methods to detect C/IE responses to better understand the functioning of NW items in their work.

Our findings also highlight the utility of network model trees in identifying wording effects across demographic subgroups. This presents an opportunity for researchers and practitioners to use network model trees to gain further insight into which covariates can reduce or exacerbate item wording effects in their own settings. A notable advantage of this method is that the impact of multiple demographic variables can be analyzed together within the same network model. That is, network model trees consider multiple covariates in the analysis and retain only the significant ones, instead of running the analysis separately for each covariate. This approach can also consider complex interactions among the covariates (e.g., age x gender) while examining item wording effects. Additionally, unlike multi-group CFA models, network model trees provide a tree-like structure that identifies which variables exhibit better importance in explaining the heterogeneity of network models. Thus, the network tree-based results offer a comprehensive view encompassing all covariates and facilitate a more nuanced understanding of item-level dynamics.

Limitations and Future Research

There are several limitations to our study. First, PNA was applied to two relatively short scales using a robust sample size. Consequently, the generalizability of the network approach to longer scales or smaller sample sizes remains uncertain. Additionally, both scales employed a four-point Likert scale in our study. However, as the number of response options increases, there is a likelihood of shifts in respondents’ interactions with positively worded and NW items. Thus, subsequent research is needed to elucidate the impact of varying response options on the detection of item wording effects in self-report measures.

Furthermore, our study presents initial findings on the comparison between PNA and factor analytic approaches in detecting item wording effects. To deepen our understanding, future studies should undertake a comprehensive examination encompassing multiple methodologies such as PNA, factor analysis, and IRT-based methods. A simulation study incorporating diverse conditions, such as the number of NW items, the proportion of these items to positively worded ones, sample size, and the total number of items in a self-report measure, would contribute substantially to refining our insights into the intricacies of detecting and interpreting item wording effects.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Okan Bulut

Notes

Appendix A

Table A1.

Items of SLLM and Their Codes.

BSBM16A	I Enjoy Learning Mathematics
BSBM16B	I Wish I did not have to study mathematics
BSBM16C	Mathematics is boring
BSBM16D	I Learn many interesting things in mathematics
BSBM16E	I Like mathematics
BSBM16F	I Like any schoolwork that involves numbers
BSBM16G	I Like to solve mathematics problems
BSBM16H	I Look forward to mathematics lessons
BSBM16I	Mathematics is one of my favorite subjects

Note: For the fourth graders, item codes are ASBM02A to ASBM02I in the same order.

Table A2.

Items of SCM and Their Codes.

BSBM19A	I Usually Do Well in Mathematics
BSBM19B^a	Mathematics is more difficult for me than for many of my classmates
BSBM19C^b	Mathematics is not one of my strengths
BSBM19D	I Learn things quickly in mathematics
BSBM19E	Mathematics makes me nervous
BSBM19F	I Am good at working out difficult mathematics problems
BSBM19G	My teacher tells me I am good at mathematics
BSBM19H	Mathematics is harder for me than any other subject
BSBM19I	Mathematics makes me confused

Note: For fourth graders, item codes are ASBM02A to ASBM02I in the same order.

^aThe adjective is “harder”.

^bThe item is “I am just not good at mathematics” in the SCM scale for the fourth graders.

SOURCE: IEA’s Trends in International Mathematics and Science Study - TIMSS 2019 Downloaded from https://timss2019.org/download Mullis, I. V. S., Martin, M. O., Foy, P., Kelly, D. L., & Fishbein, B. (2020). TIMSS 2019 International Results in Mathematics and Science. Retrieved from Boston College, TIMSS & PIRLS International Study Center website: https://timssandpirls.bc.edu/timss2019/international-results/

Appendix B

Figure B1.

The network model tree of SLLM.

References

Arias

V. B.

Garrido

L. E.

Jenaro

Martinez-Molina

Arias

. 2020. A little garbage in, lots of garbage out: Assessing the impact of careless responding in personality survey data. Behavior Research Methods 52:2489–505.

Bansal

P. S.

Goh

P. K.

Lee

C. A.

Martel

M. M.

. 2020. Conceptualizing callous-unemotional traits in preschool through confirmatory factor and network analysis. Journal of Abnormal Child Psychology 48:539–50.

Bock

R. D.

Gibbons

Muraki

. 1988. Full-information item factor analysis. Applied Psychological Measurement 12:261–80.

Bolt

Wang

Y. C.

Meyer

R. H.

Pier

. 2020. An IRT mixture model for rating scale confusion associated with negatively worded items in measures of social–emotional learning. Applied Measurement in Education 33:331–48.

Borsboom

Cramer

A. O. J.

. 2013. Network analysis: An integrative approach to the structure of psychopathology. Annual Review of Clinical Psychology 9:91–121.

Borsboom

Deserno

M. K.

Rhemtulla

Epskamp

Fried

E. I.

McNally

R. J.

Robinaugh

D. J.

Perugini

Dalege

Costantini

Isvoranu

A.-M.

Wysocki

A. C.

van Borkulo

C. D.

van Bork

Waldorp

L. J.

. 2021. Network analysis of multivariate data in psychological science. Nature Reviews Methods Primers 1:1–18.

Bulut

H. C.

2021. Item wording effects in psychological measures: Do early literacy skills matter? Journal of Measurement and Evaluation in Education and Psychology 12:239–53.

Bulut

H. C.

Bulut

. 2022. Item wording effects in self-report measures and reading achievement: Does removing careless respondents help? Studies in Educational Evaluation 72:101–26.

Bulut

Cormier

D. C.

Aquilina

Bulut

H. C.

. 2021. Age and sex invariance of the Woodcock-Johnson IV tests of cognitive abilities: Evidence from psychometric network modeling. Journal of Intelligence 9:1–16.

10.

Briganti

Linkowski

. 2020. Exploring network structure and central items of the Narcissistic Personality Inventory. International Journal of Methods in Psychiatric Research 29:1–7.

11.

Christensen

A. P.

Golino

. 2021. Estimating the stability of psychological dimensions via bootstrap exploratory graph analysis: A Monte Carlo simulation and tutorial. Psych 3:479–500.

12.

Christensen

A. P.

Golino

Silvia

P. J.

. 2020. A psychometric network perspective on the validity and validation of personality trait questionnaires. European Journal of Personality 34:1095–108.

13.

Christensen

A. P.

Gross

G. M.

Golino

H. F.

Silvia

P. J.

Kwapil

T. R.

. 2019. Exploratory graph analysis of the multidimensional Schizotypy Scale. Schizophrenia Research 206:43–51.

14.

DiStefano

Motl

R. W.

. 2006. Further investigating method effects associated with negatively worded items on self-report surveys. Structural Equation Modeling 13:440–64.

15.

Epskamp

Borsboom

Fried

E. I.

. 2018. Estimating psychological networks and their accuracy: A tutorial paper. Behavior Research Methods 50:195–212.

16.

Fried

I. E.

Nesse

R. M.

. 2015. Depression is not a consistent syndrome: An investigation of unique symptom patterns in the STAR*D study. Journal of Affective Disorders 172:96–102.

17.

Gnambs

Schroeders

. 2020. Cognitive abilities explain wording effects in the Rosenberg Self-esteem Scale. Assessment 27:404–18.

18.

Golino

Christensen

A. P.

. 2022. EGAnet: Exploratory graph analysis—a framework for estimating the number of dimensions in multivariate data using network psychometrics. https://CRAN.R-project.org/package=EGAnet (accessed October 12, 2022).

19.

Golino

F. H.

Demetriou

. 2017. Estimating the dimensionality of intelligence like data using exploratory graph analysis. Intelligence 62:54–70.

20.

Golino

F. H.

Epskamp

. 2017. Exploratory graph analysis: A new approach for estimating the number of dimensions in psychological research. PloS One 12:1–26.

21.

Wen

Fan

. 2015. The impact of wording effect on reliability and validity of the core self-evaluation scale (CSES): A bi-factor perspective. Personality and Individual Differences 83:142–47.

22.

Hinz

A., D. M.

Schwarz

Herzberg

P. Y.

. 2007. The acquiescence effect in responding to a questionnaire. GMS Psycho-Social Medicine 4:1–9.

23.

L.-t.

Bentler

P. M.

. 1999. Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal 6:1–55.

24.

Huang

P. H.

Chen

Weng

L. J.

. 2017. A penalized likelihood method for structural equation modeling. Psychometrika 82:329–54.

25.

Jin

K. Y.

Wang

W. C.

. 2014. Item response theory models for performance decline during testing. Journal of Educational Measurement 51:178–200.

26.

Jones

P. J.

Mair

Simon

Zeileis

. 2020. Network trees: A method for recursively partitioning covariance structures. Psychometrika 85:926–45.

27.

Kam

C. C. S.

Meyer

J. P.

. 2015. How careless responding and acquiescence response bias can influence construct dimensionality: The case of job satisfaction. Organizational Research Methods 18:512–41.

28.

Kam

C. C. S.

Meyer

J. P.

. 2023. Testing the nonlinearity assumption underlying the use of reverse-keyed items: A logical response perspective. Assessment 30:1569–89.

29.

Kam

C. C. S.

Meyer

J. P.

Sun

. 2021. Why do people agree with both regular and reversed items? A logical response perspective. Assessment 28:1110–24.

30.

Kan

J. K.

de Jonge

van der Maas

H. L. J.

Levine

S. Z.

Epskamp

. 2020. How to compare psychometric factor and network models. Journal of Intelligence 8:35.

31.

Lindwall

Barkoukis

Grano

Lucidi

Raudsepp

Liukkonen

Thøgersen-Ntoumani

. 2012. Method effects: The problem with negatively versus positively keyed items. Journal of Personality Assessment 94:196–204.

32.

Marsman

Barkoukis

Grano

Lucidi

Raudsepp

Liukkonen

Thøgersen-Ntoumani

. 2018. An introduction to network psychometrics: Relating Ising network models to item response theory models. Multivariate Behavioral Research 53:15–35.

33.

McGrew

S. K.

Schneider

J. W.

Decker

S. L.

Bulut

. 2023. A psychometric network analysis of CHC intelligence measures: Implications for research, theory and interpretation of broad CHC scores “beyond g.” Journal of Intelligence 11:19.

34.

Michaelides

P. M.

2019. Negative keying effects in the factor structure of TIMSS 2011 motivation scales and associations with reading achievement. Applied Measurement in Education 32:365–78.

35.

Mullis

V. S. I.

Martin

M. O.

Foy

Kelly

D. L.

Fishbein

. 2020. TIMSS 2019 international results in mathematics and science. Boston College, TIMSS & PIRLS International Study Center. https://timssandpirls.bc.edu/timss2019/ (accessed December 29, 2021).

36.

Podsakoff

M. P.

MacKenzie

S. B.

Lee

J.-Y.

Podsakoff

N. P.

. 2003. Common method biases in behavioral research: A critical review of the literature and recommended remedies. Journal of Applied Psychology 88:879–903.

37.

Pons

Latapy

. 2006. Computing communities in large networks using random walks. Journal of Graph Algorithms and Applications 10:191–218.

38.

R Core Team . 2022. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.

39.

Rosseel

2012. Lavaan: An R package for structural equation modeling. Journal of Statistical Software 48:1–36.

40.

Roszkowski

J. M.

Soven

. 2010. Shifting gears: Consequences of including two negatively worded items in the middle of a positively worded questionnaire. Assessment & Evaluation in Higher Education 35:113–30.

41.

Schmitt

P. D.

Allik

. 2005. Simultaneous administration of the Rosenberg Self-Esteem Scale in 53 nations: Exploring the universal and culture-specific features of global self-esteem. Journal of Personality and Social Psychology 89:623–42.

42.

Sliter

A. K.

Zickar

M. J.

. 2014. An IRT examination of the psychometric functioning of negatively worded personality items. Educational and Psychological Measurement 74:214–26.

43.

Wang

W.-C.

Chen

H.-F.

Jin

K.-Y.

. 2015. Item response theory models for wording effects in mixed-format scales. Educational and Psychological Measurement 75:157–78.

44.

Weems

H. G.

Onwuegbuzie

A. J.

Collins

K. M. T.

. 2006. The role of reading comprehension in responses to positively and negatively worded items on rating scales. Evaluation & Research in Education 19:3–20.

45.

Weijters

Baumgartner

. 2012. Misresponse to reversed and negated items in surveys: A review. Journal of Marketing Research 49:737–47.

46.

Yang

Chen

Y.-H.

W.-J.

Turner

J. E.

. 2012. Cross-cultural evaluation of item wording effects on an attitudinal scale. Journal of Psychoeducational Assessment 30:509–19.

47.

Yin

Fishbein

. 2020. Creating and interpreting the TIMSS 2019 context questionnaire scales. In Methods and procedures: TIMSS 2019 technical report, eds. Martin

M. O.

von Davier

Ina Mullis

V. S.

, 16.1–16.331. Boston College: TIMSS & PIRLS International Study Center. https://timssandpirls.bc.edu/timss2019/ (accessed February 10, 2024).

48.

Zeileis

Hothorn

Hornik

. 2008. Model-based recursive partitioning. Journal of Computational and Graphical Statistics 17:492–514.

49.

Zeng

Wen

Zhang

. 2020. How does the valence of wording affect features of a scale? The method effects in the undergraduate learning burnout scale. Frontiers in Psychology 11:585179.

A Psychometric Network Analysis Approach for Detecting Item Wording Effects in Self-report Measures across Subgroups

Abstract

Psychometric Network Analysis

Current Study

Method

Sample

Instruments

Data Analysis

Results

The Dimensionality of SLLM and SCM

Network Model Trees of SLMM and SCM

Discussion

Implications for Practice

Limitations and Future Research

Footnotes

Declaration of Conflicting Interests

Funding

ORCID iD

Notes

Appendix A

Appendix B

References