Sage Journals: Discover world-class research

Abstract

Dominance and unfolding response processes describe two ways in which individuals may respond to rating scale items. The dominance process assumes a monotonic relationship between a latent trait and the probability of endorsement and is typically modeled using a linear factor model within structural equation modeling (SEM). In contrast, the unfolding process assumes single-peaked response functions, with endorsement most likely when item and person locations are close on the latent continuum. Fitting unfolding models usually requires specialized software, which limits their integration with SEM. In this article, we proposed the ordered categorical response unfolding model (OCRUM), which can be estimated in Mplus. We illustrated its use with two empirical datasets and found that item and person locations were comparable to those obtained from the generalized graded unfolding model (GGUM). We also conducted Monte Carlo simulations to examine parameter recovery under varying sample sizes, test lengths, and response formats. Finally, we demonstrated that OCRUM can serve as the measurement component of a general structural equation model, enabling dominance and unfolding response processes to be represented within a single SEM framework.

Keywords

unfolding model ideal point structural equation modeling nonlinear models

Organizational research commonly uses rating scales. Respondents are typically presented with statements describing behaviors, feelings, or attitudes and are instructed to rate the extent to which the items accurately describe a target (e.g., self, supervisors, subordinates, team, or organization). A notable rating scale is the NEO Personality Inventory-Revised (NEO-PI-R) (Costa & McCrae, 1992), a well-known measure of the five domains of personality. It requires individuals to evaluate each statement on a rating scale comprising five response options: “strongly disagree,” “disagree,” “neutral,” “agree,” and “strongly agree.”

The development of rating scaling methods can be traced back to Thurstone's (1928) and Likert's (1932) pioneering work on attitude measurements (Krosnick et al., 2005). Coombs (1964) noted that these pioneers made two divergent assumptions on how individuals would respond to the rating scale items. Thurstone (1928) assumed that a person would endorse items that accurately described their attitude level and would not endorse those too dissimilar. This response process, called the unfolding or ideal-point response process by Coombs (1964), implies a single-peaked response function (Andrich, 1996; Roberts et al., 1999; Stark et al., 2006; Thurstone & Chave, 1929). To illustrate, if individuals are presented with an item describing moderate conscientiousness (e.g., I write notes to myself only if I have too many things to do at once; an example taken from Chernyshenko et al., 2007), those with extremely high and low degrees of conscientiousness will not endorse it; a person with high conscientiousness will almost always write notes, regardless of the number of tasks at hand, while a person with low conscientiousness will almost never write notes. In contrast, an individual with moderate conscientiousness is more likely to endorse the item as it most accurately describes their behavior.

Likert (1932) proposed the use of total scores as a scale score estimate and item–total correlations as an indicator of item quality. This implied a different underlying response process, which Coombs (1964) referred to as the dominance response process. This process is most easily understood in the context of cognitive ability items, where individuals with higher ability levels are expected to endorse (i.e., answer correctly) more items on the scale compared with those with lower ability levels. Therefore, the dominance process assumes that the probability of endorsement increases as a person's trait level increases, leading to a monotonically increasing function rather than single-peaked item response function. Most psychometric models used in current organizational research, such as the classical test theory, linear factor analysis, and logistic item response theory (IRT), are consistent with the dominance response process. Notably, almost all published structural equation modeling (SEM) analyses use the linear factor model as the primary measurement model for relating latent factors to observed responses.

However, in the past two decades, numerous researchers have argued that models consistent with the ideal-point response process should be used when rating scales are employed to measure constructs in noncognitive domains, such as attitudes (Andrich, 1996; Roberts et al., 1999), personality (Chernyshenko et al., 2007; Stark et al., 2006), and vocational interests (Tay et al., 2009). Tay et al. (2009; Tay & Ng, 2018) contended that it is important to consider whether a scale is designed to measure maximal or typical behavior when selecting an appropriate measurement model. Assessments of maximal behavior, such as mental or physical ability, aim to measure a person's maximum capacity. Respondents in such assessments are typically motivated to answer items correctly, and items below a person's location (i.e., relatively easy items) on the latent trait continuum have a higher probability of being answered correctly; conversely, items above the person's location (i.e., relatively difficult items) have a lower probability of being answered correctly. Hence, a dominance response process is most likely to be observed in assessments where an objective or socially correct answer is present.

Conversely, in assessments of typical behavior, such as personality or attitudes, individuals are typically asked to rate the extent to which a statement describes themselves. This is likely to elicit introspection, in which people compare the item with themselves to determine whether its description matches their own perception. Therefore, people are more likely to endorse items that are closer to their location on the latent trait continuum (Drasgow et al., 2010; Stark et al., 2006; Tay et al., 2009). This response process is consistent with the tenets of the unfolding response process.

Misspecification of the underlying response process is undesirable for several reasons. First, the use of incorrect measurement models can lead to model–data misfit and introduce bias into parameter estimates (Reise, 2010). Second, as demonstrated by Chernyshenko et al. (2007), many moderately worded items are unduly removed from item pools because they are deemed to be of low quality by many dominance model statistics (such as weak corrected item–total correlations, small factor loadings, and low IRT discrimination parameters). However, under unfolding (or ideal point) models, these items can be highly discriminating and usefully retained for scale construction to increase reliability (Chernyshenko et al., 2007; Stark et al., 2006). Finally, using an incorrect model for scoring may change examinees’ rank order, especially at extreme trait levels, which can affect the quality of selection decisions, as shown by Carter et al. (2014).¹

Unfolding (Ideal-Point) Models

Over the past few decades, several unfolding models have been developed for rating scale response data. The two most commonly used unfolding models are the graded unfolding model (GUM; Roberts & Laughlin, 1996) and the generalized GUM (GGUM; Roberts et al., 2000). Recent studies have proposed multidimensional extensions of the GGUM (e.g., confirmatory multidimensional GGUM [CMGGUM]; Wang & Wu, 2015). However, only limited applications have appeared in organizational and psychological practice. One reason is that most of the current unfolding models require specialized or standalone software, such as GGUM2004 for the GGUM (Roberts et al., 2000, 2006) and WinBUGS for CMGGUM (Wang & Wu, 2015), which many users may not find easy to use (e.g., Foster et al., 2017).

Importantly, psychologists and organizational scientists are often interested in investigating relationships between latent constructs after the latent variables have been measured. This is commonly achieved via SEM. However, the measurement part of most structural equation models is typically specified as a linear factor analysis model, which is more appropriate for representing dominance response processes. Although many unfolding models have been proposed for measurement, empirical applications of SEM that assess relationships between latent traits and unfolding/nonmonotonic response mechanisms are rare. Assessing such relationships within a unified SEM framework enables organizational researchers to test theories linking latent constructs when responses follow dominance and unfolding processes (e.g.,Cortina et al., 2020; Williams, 2003; Zyphur et al., 2023). Adding our model for unfolding data to the currently available linear models for dominance data now makes it possible to represent both response processes within a single SEM framework that can be readily implemented in Mplus. While trait estimates from specialized software for fitting unfolding models can be saved and later used in regression, as either predictors or outcomes, this two-step approach has been found to produce biased path coefficient estimates (Lu et al., 2005). One notable attempt to address this limitation was made by Usami (2011), who incorporated the GGUM into the SEM framework so that both GGUM parameters and structural coefficients could be estimated simultaneously to avoid parameter bias. His approach, however, was computationally intensive, relied on Markov chain Monte Carlo (MCMC) methods, and could accommodate the latent trait only as an outcome (endogenous) variable, not as an antecedent (exogenous) variable.

To address these challenges, this study proposes modeling the single-peaked (nonmonotonic) relationship between endorsement (observed responses) and the latent trait using a metric unfolding model (Brady, 1989; Davison, 1977; van Schuur & Kiers, 1994). Such a model can be shown to be mathematically equivalent to a quadratic factor analysis model (Maraun & Rossi, 2001), which can be fitted via the latent moderated structural (LMS) approach implemented in Mplus (Muthén & Muthén, 1998–2017). Further details are provided in the next section, where we discuss how to handle dichotomous and polytomous data using the proposed approach. For ease of reference, we refer to our proposed model as the ordered categorical response unfolding model (OCRUM). We report three studies to illustrate the formulation, empirical application, and implementation of the OCRUM via the SEM software Mplus. In Study 1, we compare item and person parameter estimates obtained from the OCRUM with those obtained from the GGUM using two published empirical datasets. In Study 2, we conduct a detailed investigation of OCRUM parameter recovery and model–data fit under various design factors via Monte Carlo simulations. In Study 3, we consider a full structural equation model with both structural and measurement components that involve a mix of unfolding responses (modeled by the OCRUM) and dominance responses (modeled by a linear factor model). In particular, we evaluate the accuracy of estimating structural coefficients in such scenarios via Monte Carlo simulations.

Ordered Categorical Response Data

In social science research, it is common to use rating scales (e.g., Likert-type formats with 4 to 6 points) as questionnaire response formats and to treat the resulting responses as ordinal. Therefore, it is often appropriate to model such data with an ordered categorical measurement model, especially when the multivariate normality assumption for observed responses is not justifiable. The approach for modeling ordered categorical data in SEM is similar to that for continuous data, except that item thresholds (τ) are estimated instead of item intercepts (intercepts are fixed to 0) and error variances are not estimated for identification reasons (Bovaird & Koziol, 2012; Muthén & Asparouhov, 2002).

Following Muthén's (1983, 1984) proposal of the latent response variable formulation, the observed categorical responses $Y_{j}$ were hypothesized to be related to the latent (unobserved) continuous distribution $Y_{j}^{}$ as
$Y_{j} = {\begin{matrix} 0, & i f Y_{j} \leq τ_{j \cdot 1} \\ c, & i f τ_{j \cdot c} < Y_{j} * \leq τ_{j \cdot c + 1} \\ m, & i f Y_{j} * > τ_{j \cdot m} \end{matrix}$
(1)
where $Y_{j}^{}$ is the underlying latent response for item j; $Y_{j}$ is the observed categorical response for item j; c is the observed value for the category, where c = 0, 1, …, m, with 0 representing the strongest level of disagreement and m representing the strongest level of agreement; and $τ_{j \cdot c}$ is the category c threshold for item j, with m thresholds estimated per item.

Figure 1 illustrates the relationship between $Y_{j}^{}$ , $Y_{j}$ , and $τ_{j \cdot c}$ for an item with six response options and five thresholds. In essence, an observed categorical response $Y_{j}$ is determined by both the latent response $Y_{j}^{}$ and threshold $τ_{j \cdot c}$ . When a latent response $Y_{j}^{}$ is between $τ_{2}$ and $τ_{3}$ , the observed response $Y_{j}$ is “D” (disagree). When $Y_{j}^{}$ is larger than (i.e. to the right of) $τ_{5}$ , the observed response $Y_{j}$ is “SA” (strongly agree).

Figure 1.
Relationship between latent response, observed response, and item thresholds.

In Muthén's (1983, 1984) original latent response variable formulation, the latent response was hypothesized to follow a linear factor analysis model, for example, a one-factor model, $Y_{i j} = a_{j} + t_{j} θ_{i} + ε_{i j}$ , where $Y_{i j} $ is the latent response of subject i on item j and the latent response is related to the observed categorical responses $Y_{j}$ as stated in equation (1). The parameters $a_{j}$ and $t_{j}$ are commonly known as the intercept and factor loading of the j-th item, respectively. Depending on whether the error term in equation (1) is assumed to follow a normal or logistic distribution, a probit or logit link can be used to map the conditional probability of the observed response Y to the latent trait $θ$ . This specification corresponds to a dominance response process: as $θ$ increases, the probability of endorsement increases monotonically (see Panel A in Figure 2).

Figure 2.
Illustration of the theoretical item response function of dominance and unfolding responses.

Formulation of the OCRUM

This study aims to represent the unfolding response process through a quadratic factor analysis model. The OCRUM is formulated based on two tenets of the unfolding response process on a unidimensional latent psychological continuum. First, items and individuals (latent traits) can be positioned on the same latent continuum that reflects their sentiment or attitude. For example, a neutral item or an individual with a neutral attitude regarding the psychological construct would be in the middle, negative items on the left, and positive items on the right side of the latent continuum.

Second, when individuals are asked to express their agreement with an item that reflects their attitude or another psychological construct, they are more likely to agree with an item located close to them on the continuum. Suppose $θ_{i}$ and $δ_{j}$ reflect the locations of the individual and the item on the same continuum, respectively. The probability of the individual agreeing with the item increases as the distance between $θ_{i}$ and $δ_{j}$ approaches 0. Conversely, when the item is located far below or above the individual, the likelihood of endorsement is lower (Drasgow et al., 2010; Stark et al., 2006; Tay et al., 2009).

Inspired by the metric unidimensional unfolding model (Maraun & Rossi, 2001; van Schuur & Kiers, 1994), the OCRUM defines the latent response as
$Y_{i j} = a_{j} + t_{j} (θ_{i} - δ_{j})^{2} + ε_{i j}$
(2)
where $Y_{i j} $ is the latent response of subject i on item j; $a_{j}$ is the conditional mean of $Y_{i j} $ when $θ_{i}$ = $δ_{j}$ ; $t_{j}$ is the curvature parameter; $θ_{i}$ is the person location (latent trait) for subject i; $δ_{j}$ is the item location for item j; and $ε_{i j}$ is the measurement error of subject i on item j.

In a linear factor model with positive factor loadings, larger values of $θ_{i}$ are associated with larger values of $Y_{i j} $ , which correspond to a higher probability of agreement (see Panel A in Figure 2). In contrast, for the OCRUM, the smaller the distance between $θ_{i}$ and $δ_{j}$ (as defined in equation 2), the larger the value of $Y_{i j} $ (with $t_{j} < 0$ ), which reflects a higher probability of agreement (see Panel B in Figure 2). A link function (e.g., probit) can be used to map the conditional probability of the observed response Y to the latent response Y, as stated in equation (1).

By expanding equation (2), the OCRUM can be re-expressed as a quadratic factor analysis model (Maraun & Rossi, 2001):
$Y_{i j} = μ_{j} + β_{j} θ_{i} + t_{j} θ_{i}^{2} + ε_{i j}$
(3)
where $μ_{j} = a_{j} + t_{j} δ_{j}^{2}$ ; $β_{j} = - 2 δ_{j} t_{j}; t_{j}$ is the curvature parameter; $Y_{i j} $ is the latent response of subject i on item j; $a_{j}$ is the conditional mean of $Y_{i j} $ when $θ_{i}$ = $δ_{j}$ ; $θ_{i}$ is the person location (latent trait) for subject i; $δ_{j}$ is the item location for item j; and $ε_{i j}$ is the measurement error of subject i on item j.

Essentially, the unfolding response mechanism described in equation (2) is characterized by the Euclidean distance between item and person locations, and the single-peakedness property in the responses $Y_{i j} $ is represented by this quadratic function. We further illustrate how the item response function (the probability of endorsing a scored response option) changes as the values of the two key model parameters—item location and curvature—vary. The top panel of Figure 3 illustrates the item response function for dichotomous (e.g., yes–no) items with different item location values ( $δ_{1} = - 1.5, δ_{2} = 0.0, δ_{3} = 1.5$ ) but the same curvature parameter. The location of the peak of the function shifts to the negative or positive direction of the latent continuum according to the $δ_{j}$ values. Thus, the probability of endorsing each item is highest when the person location equals the item location. As the distance between person and item locations increases in either direction, the probability of a positive endorsement decreases. Conversely, the items in the bottom panel differ in terms of the curvature parameter values $(t_{1} = - 0.5, t_{2} = - 1, t_{3} = - 2)$ while sharing the same item location. The larger the absolute magnitude of the curvature, the “sharper” the peak of the unimodal item response function. Hence, the curvature parameter is analogous to the discrimination parameters found in many dominance and ideal-point IRT models. Our proposed model can be fitted using existing SEM software, such as Mplus, as elaborated in the next section.

Figure 3.
Illustration of the item response functions of the ordered categorical response unfolding model for dichotomous items. Top panel: Changing the item location parameters ( $δ_{j}$ ) while holding the curvature parameter constant shifts the item response curve along the latent $θ$ continuum, such that the peak of the curve coincides with the item location value. Bottom panel: Changing the curvature parameters ( $t_{j}$ ) while holding the item location parameter constant alters the sharpness of the curve. Larger absolute values of the curvature parameter produce a sharper peak in the probability-of-endorsement function.

LMS Method

The OCRUM (see equation 3) involves a quadratic term $θ_{i}^{2}$ , which can be regarded as a latent interaction of the person location $θ_{i}$ with itself. The product indicator approach (e.g., Kenny & Judd, 1984; Marsh et al., 2004) is commonly used to model interactions between latent variables. However, it requires the computation of cross products between item responses, which is not tenable for the OCRUM because ordered categorical response data are not metric and therefore cannot be subjected to arithmetic operations. Instead, we propose utilizing the LMS approach (Klein & Moosbrugger, 2000), which can model latent interactions without explicitly creating product indicators.

The LMS is a distribution analytic method that analyzes raw data using “an iterative ML estimation procedure tailored for the type of non-normality induced by interaction effects” (Klein & Moosbrugger, 2000, p. 473). This procedure produces maximum likelihood estimates and provides statistical inferences on the parameter estimates. Currently, the LMS method is available in Mplus and the nlsem R package (Umbach et al., 2017).

Study 1

Study 1 aimed to illustrate the application of the proposed OCRUM using two empirical datasets and to compare the performance of the OCRUM with that of the GGUM, specifically with respect to the estimated (relative) locations of items $(δ_{j})$ and persons ( $θ_{i}$ ).

GGUM

The GGUM was chosen as a comparison model because it is commonly used for modeling unfolding response data in the current literature (e.g., Foster et al., 2017; Polak et al., 2009; Tay, 2011; Tay et al., 2009; Weekers & Meijer, 2008). The GGUM (Roberts et al., 2000) was developed within the IRT probabilistic modeling framework based on Muraki's (1992) generalized partial credit model. The GGUM is defined as
$P (Z_{i} = z | θ_{j}) = \frac{\exp {α_{i} [z (θ_{j} - δ_{i}) - \sum_{k = 0}^{z} τ_{i k}]} + \exp {α_{i} [(M - z) (θ_{j} - δ_{i}) - \sum_{k = 0}^{z} τ_{i k}]}}{\sum_{w = 0}^{C} {\exp {α_{i} [w (θ_{j} - δ_{i}) - \sum_{k = 0}^{w} τ_{i k}]} + \exp {α_{i} [(M - w) (θ_{j} - δ_{i}) - \sum_{k = 0}^{w} τ_{i k}]}}}$
(4)
where $θ_{j}$ is the location of individual j on the latent continuum underlying the responses; $δ_{i}$ is the location of item $i$ on the same latent continuum (comparable to the item location parameter in the OCRUM in equations 2 and 3); $α_{i}$ is the discrimination parameter for item i (comparable to the “curvature” parameter in the OCRUM in equations 2 and 3); and $τ_{i k}$ is the location of the subjective response category thresholds for item i on the latent continuum (comparable to the threshold parameters in equation 1). In addition, z = 0, …, C is the category index (number of response categories minus 1), and M = 2C + 1. The GGUM expresses the unfolding response process described earlier: as the distance between person and item locations decreases, the probability of a positive endorsement of the item increases.

We emphasize that the OCRUM is proposed as an alternative model for unfolding responses (where item responses are guided by the distance between item and person locations on the latent continuum), rather than a replacement for the GGUM. Both the OCRUM and the GGUM share similar model parameters for representing the item response curve—a peak at different locations through the item location parameter in both models, and the “sharpness” of the peak through the discrimination parameter in the GGUM versus the curvature parameter in the OCRUM. Although the mathematical expressions of the quadratic function (OCRUM) and the hyperbolic cosine function (GGUM) differ, they both produce a similar unimodal response curve (and the hyperbolic cosine function can be approximated by a polynomial/quadratic function using the Taylor series expansion). This is analogous to choosing between logistic and probit link functions in IRT models for dominance responses (other sigmoid functions, such as arc or hyperbolic tangent, can also be considered). For a given dataset, researchers can fit both the OCRUM and the GGUM and compare which model provides a better empirical approximation to the data. Roberts & Laughlin (1996) first introduced the GGUM and acknowledged that alternative variations of unfolding models could be adapted or developed.

The OCRUM can be linked to the SEM framework as a measurement model and can be readily fitted in Mplus. This creates new opportunities for researchers to analyze data that include a mix of unfolding and dominance responses, as shown in Study 3. Such flexibility is not currently available when using the GGUM.

Method

Empirical Datasets

We reanalyzed two datasets using both the OCRUM and the GGUM: (1) Thurstone's (1932) 24-item Attitude Toward Capital Punishment Scale, as collected by Roberts (1995) and (2) the Order facet of the Conscientiousness Personality Scale reported by Chernyshenko et al. (2007). These datasets were selected because they measure noncognitive constructs and have previously been shown to follow an unfolding response mechanism.

Dataset 1

Roberts (1995) collected polytomous response data on Thurstone's (1932) 24-item Attitude Toward Capital Punishment Scale from 245 undergraduates at the University of South Carolina. Items were presented in random order, and participants responded using a 6-point Likert type scale (1 = strongly disagree, 2 = disagree, 3 = slightly disagree, 4 = slightly agree, 5 = agree, 6 = strongly agree). This dataset was used in Roberts and Laughlin's (1996) study to introduce the GGUM and demonstrate its empirical application. Accordingly, we selected it as a benchmark for comparison with the OCRUM in Study 1. The dataset was obtained from the GGUM software website. Consistent with Roberts and Laughlin (1996), we used 12 of the 24 items in our analysis.²

Dataset 2

The Order Scale, developed by Chernyshenko et al. (2007), measures the order facet of the conscientiousness personality domain. Items were generated under the assumption of an unfolding response process. Response data for the 20-item scale were collected from 539 undergraduate students at an American university using a 4-point Likert type scale (1 = strongly disagree, 2 = disagree, 3 = agree, 4 = strongly agree). Because we analyzed only complete cases, approximately 6% of the sample was excluded, resulting in a final sample size of 505. To be consistent with Chernyshenko et al.'s (2007) analytic approach, we dichotomized the polytomous responses prior to analysis due to infrequent endorsement of the two extreme options: strongly disagree and disagree were collapsed and coded as 0, and agree and strongly agree were collapsed and coded as 1.

Procedure

The datasets were analyzed using the proposed OCRUM in Mplus. An example of the Mplus input file for fitting the OCRUM is presented in Appendix A. Full-information maximum likelihood (FIML) with robust standard errors (MLR) was used to estimate model parameters. A probit link was chosen to maintain consistency across all studies reported in this article.³ User-specified starting values of −1 were provided for the $t_{j}$ parameters to facilitate estimation. User-specified starting values were also provided for the $δ_{j}$ parameters to scale items on the latent continuum (i.e., positive item location values represent the positive polarity of the latent construct).⁴ Person locations (factor scores) were estimated via the expected a posteriori (EAP) method.

The GGUM parameters were estimated using the GGUM2004 software (Roberts et al., 2000, 2006). Item parameters were estimated with marginal maximum likelihood (MML; also referred to as FIML in the SEM literature; Forero & Maydeu Olivares, 2009), and person locations were estimated via the EAP procedure (Roberts et al., 2006). We used the GGUM2004 default estimation settings (Roberts, 2004).

Analysis

Because the GGUM and the OCRUM rely on different mathematical formulations to represent the unimodality/single-peakedness of the unfolding process, we did not expect the magnitudes of the item and person location estimates to be identical across models. However, we expected the pattern and ordering of item and person locations to be comparable, such that both models would reach similar conclusions about items located at the extremes versus the middle of the latent continuum. Accordingly, we computed Spearman's rank correlation ( $ρ$ ) and Pearson correlation (r) between item (and person) locations estimated by the OCRUM (Mplus) and the GGUM (GGUM2004). Spearman's rank correlation was used as an indicator of similarity in the ordering of estimated locations (Polak, 2011). We also plotted person locations estimated from the OCRUM and GGUM2004 to visually examine their relationship.

Results

Item Location $(δ_{j})$

Spearman's rank correlations between item locations estimated by the OCRUM and the GGUM were .97 for the 12-item Attitude Toward Capital Punishment Scale and .95 for the 20-item Order Scale. The corresponding Pearson correlations were also high (.91 and .89 for the Attitude Toward Capital Punishment Scale and the Order Scale, respectively). Table 1 presents the estimated item locations ${\hat{δ}}_{j}$ and their standard errors.

Table 1.
Item Location Estimates of Attitude Toward CP and ORS Items.

Attitude Toward CP (Thurstone, 1932) OCRUM GGUM

CP10 We must have capital punishment for some crimes. −8.554 (1.850) −1.445 (0.149)

CP24 Capital punishment should be used more often than it is. −4.892 (3.488) −1.316 (0.115)

CP17 Capital punishment is just and necessary. −2.976 (1.258) −2.981 (3.790)

CP20 Capital punishment gives the criminal what he deserves. −2.052 (1.844) −1.001 (0.103)

CP18 I do not believe in capital punishment but it is not practically advisable to abolish it. 1.180 (0.332) 1.021 (0.212)

CP15 Life imprisonment is more effective than capital punishment. 4.914 (3.940) 2.130 (0.317)

CP09 I don’t believe in capital punishment but I’m not sure it isn’t necessary. 2.018 (0.868) 1.278 (0.156)

CP12 I do not believe in capital punishment under any circumstances. 2.209 (0.678) 1.933 (0.079)

CP16 Execution of criminals is a disgrace to civilized society. 3.563 (1.036) 2.896 (0.721)

CP14 We can’t call ourselves civilized as long as we have capital punishment. 9.516 (1.821) 3.390 (5.700)

CP13 Capital punishment is not necessary in modern civilization. 9.628 (1.428) 3.739 (8.475)

CP02 Capital punishment is absolutely never justified. 10.860 (1.565) 3.823 (9.654)

ORS (Chernyshenko et al., 2007)

ORD38 Organization is a key component of most things I do. −6.757 (0.629) −2.203 (12.442)

ORD50 Every item in my room and on my desk has its own designated place. −6.541 (0.839) −2.571 (8.799)

ORD34 I have a daily routine and stick to it. −6.211 (1.098) −2.502 (1.601)

ORD46 I keep detailed notes of important meetings and lectures. −4.138 (2.369) −3.481 (0.734)

ORD37 I prefer to do things in a logical order. −3.342 (1.808) −3.460 (23.910)

ORD44 I become annoyed when things around me are disorganized. −2.984 (1.822) −2.693 (19.608)

ORD35 I need a neat environment in order to work well. −1.700 (0.415) −1.535 (0.335)

ORD20 I do pretty standard maintenance for my property and possessions. −0.207 (0.251) −0.232 (0.212)

ORD23 My room neatness is about average. −0.192 (0.097) −0.192 (0.079)

ORD42 I write notes to myself only if I have too many things to do at once. 0.239 (0.213) 0.247 (0.242)

ORD29 Although I try to keep everything in its place, it does not always work for me. 0.632 (0.109) 0.670 (0.089)

ORD26 My ability to plan is at about average. 1.188 (0.387) 1.190 (0.289)

ORD30 Although I have a daily organizer, I have hard time keeping it up to date. 2.239 (1.425) 1.493 (0.299)

ORD14 I do not like work spaces that are too clean and tidy. 3.386 (2.544) 5.171 (36.637)

ORD10 I frequently forget to put things back in their proper place. 1.452 (0.193) 1.293 (0.084)

ORD06 Most of the time my room is in complete disarray. 2.628 (1.175) 1.895 (0.184)

ORD17 Being neat is not exactly my strength. 1.887 (0.423) 1.623 (0.176)

ORD24 Half of the time I do not put things in their proper place. 2.278 (0.641) 1.531 (0.126)

ORD15 For me, being organized is unimportant. 5.752 (2.323) 2.305 (0.399)

ORD02 Usually, my notes are so jumbled, even I have a hard time reading them. 7.566 (0.946) 3.430 (5.342)

Note.* CP = capital punishment; GGUM = generalized graded unfolding model; OCRUM = ordered categorical response unfolding model; ORS = Order Scale. Columns labeled OCRUM and GGUM refer to the item location estimates $\hat{δ_{j}}$ based on the respective model. Values in parentheses are the standard errors associated with the parameter estimates. Parameters for the two scales were estimated separately.

Person Location $(θ_{i})$

Similar to the item location analysis, we computed Spearman's rank correlation ( $ρ$ ) and Pearson correlation (r) between person locations ( ${\hat{θ}}_{i}$ ) estimated by the two models. For the 12-item Attitude Toward Capital Punishment Scale, correlations between person locations estimated by the GGUM and the OCRUM were high ( $ρ$ = .99, r = .95). For the 20-item Order Scale, the correlations were also very high ( $ρ$ = 1.00, r = .99). Figure 4 illustrates the distributions of the person locations and their scatterplots for both scales.

Figure 4.
Correlation and distribution of estimated person locations (Study 1). (a) Thurstone's (1932) Attitude Toward Capital Punishment Scale. (b) Chernyshenko et al.'s (2007) Order Scale.

There were some cases with noticeable differences in person locations between the GGUM and the OCRUM. Upon data inspection, we observed aberrant response patterns (e.g., endorsing both items on the positive and negative ends of the latent continuum) that did not conform to expectations under the unfolding response mechanism. Although the GGUM and the OCRUM yielded different ${\hat{θ}}_{i}$ values for these cases, the sign of the person location remained consistent across models.

Summary of Study 1

Spearman's rank and Pearson correlations were used to quantify the similarity between item location estimates ( ${\hat{δ}}_{j}$ ), with higher coefficients indicating greater similarity. The high correlations observed for both scales suggest that the OCRUM performs comparably to the GGUM in estimating relative item locations. Furthermore, correlations between person locations ( ${\hat{θ}}_{i}$ ) obtained from the OCRUM and the GGUM were also high, indicating that both models produce similar patterns in person and item location estimates. Overall, Study 1 shows that the OCRUM recovers the ordering of item and person locations in a manner comparable to the widely used GGUM for modeling unfolding responses. An advantage of the OCRUM over the GGUM is that it can be readily implemented within an existing SEM framework and software, enabling new opportunities to model relationships among latent variables that follow unfolding or mixed dominance–unfolding response mechanisms. We elaborate on this point in Study 3.

Study 2

The second study aimed to investigate the effects of sample size (N), test length (J), item location range (D), and response options (R) on the recovery of key model parameters—namely, the estimated item locations, estimated person locations, and model–data fit of the OCRUM—via Monte Carlo simulation. The simulation results provide insights into the potential pitfalls of the proposed model and help generate practical guidelines for applied researchers (see the Discussion section).

Method

Simulation Design

This study followed a 3 (sample size: N = 150, 300, 600) × 3 (test length: J = 3, 5, 10) × 2 (item location range: D = ±1.5, ±2.5) × 2 (response option: R = dichotomous; polytomous) design. Replications were performed for all 36 conditions until 200 successful replications were obtained for each condition. The dependent variables of interest were (a) bias in the $δ_{j}$ parameter estimates, (b) the correlation between the simulated (true) $θ$ and estimated $\hat{θ}$ , and (c) the accuracy of various fit statistics in identifying the correct model.

Item location ( $δ_{j}$ ) was the main parameter of interest as it was the only OCRUM parameter (see equations 2 and 3) with a straightforward interpretation and implication for measurement (i.e., the location of items on the latent continuum). The parameter, $β_{j}$ , which is commonly known as the item factor loading, was not investigated because its value is determined by the $δ_{j}$ and $t_{j}$ parameters (see equations 2 and 3). Therefore, we focused our investigation on the recovery of $δ_{j}$ .

We chose three sample sizes to reflect a small (N = 150), moderate (N = 300), and larger (N = 600) sample, as often observed in psychological studies. Larger sample sizes generally lead to more precise parameter estimates (i.e., smaller standard errors). We sought to identify the sample size requirements needed to obtain reliable parameter estimates and to offer recommendations for researchers who plan to use the OCRUM in their studies.

We selected three levels of test length, namely, three, five, and 10 items, to reflect a typical short, medium, and relatively long scale, respectively, for measuring a single factor or dimension commonly encountered in psychology; for example, the Revised UCLA Loneliness Scale includes three items (Hughes et al., 2004), the five-item Satisfaction With Life Scale (Diener et al., 1985) and the 10-item Generic Job Satisfaction Scale (Macdonald & MacIntyre, 1997). Conceptually, a longer test should provide greater precision in estimating a person's location on the latent continuum, as each item contributes an additional observation (data point). We therefore examined the effect of test length on the accuracy of person location estimation to provide practical recommendations on minimal test length for OCRUM users.

Following previously reported research designs (e.g., Andrich, 1988; Roberts, 1995), we incorporated two item location ranges: ±1.5 and ±2.5. These ranges reflect differences in the coverage of items that measure a particular attitude or valence on the latent continuum, assuming a normally distributed person location distribution on the same continuum. We categorized items as intermediate or extreme based on their $δ_{j}$ values relative to the distribution of $θ_{i}$ . Specifically, items with $| δ_{j} |$ ≤ 1.5 were defined as intermediate, and items with $| δ_{j} |$ > 1.5 were defined as extreme, assuming $θ$ follows a standard normal distribution. Items with negative location values represent an opposite attitude or valence compared with items with positive location values. The distance between adjacent items in the simulation was set to be equal, regardless of the number of items or item location range. We expected extreme items to yield less precise parameter estimates than intermediate items because, under the assumption of normally distributed person locations, fewer individuals fall in the regions where extreme items are located.

We examined both dichotomous and polytomous responses, as both formats are common in psychological measures (e.g., Minnesota Multiphasic Personality Inventory [MMPI]-2 with a true–false format; Butcher et al., 1989, 2001; NEO-PI-R with a 5-point rating format; Costa & McCrae, 1992). Polytomous items tend to provide more psychometric information about both items (item parameters) and persons (person locations) than dichotomous items. However, in the IRT literature, polytomous item parameter estimates often require larger sample sizes for accurate estimation (Chernyshenko et al., 2007). We therefore sought to examine sample size requirements for different response formats to obtain reliable parameter estimates and to offer recommendations for OCRUM users.

Data Generation

The procedure used to generate the observed responses for the simulation was adapted from Muthén and Kaplan (1985). The observed response data were simulated in three steps:
Continuous responses $Y_{j} $ were generated from the proposed OCRUM model in equation (2), where $t_{j}$ was fixed at −1, $a_{j}$ was fixed at 0, and $δ_{j}$ corresponded to the values in Table 2 for each condition.

Person locations $θ_{i}$ and errors $ε_{i j}$ were simulated from a standard normal distribution (µ = 0; σ = 1).

The continuous responses $Y_{j} $ were transformed into observed dichotomous or polytomous ordered categorical responses $Y_{j}$ via threshold values ( $τ_{c}$ ), as described below.

Table 2.
Item Locations ( $δ_{j}$ ) Used in Simulation.

Item Location Range (D) Test Length (J) Extreme Intermediate Extreme

±1.5 3 nil −1.50; 0.00; 1.50 nil

5 nil −1.50; −0.75; 0.00; 0.75; 1.50 nil

10 nil −1.50; −1.17; −0.83; −0.50; −0.17; 0.17; 0.50; 0.83; 1.17; 1.50 nil

±2.5 3 −2.50 0.00 2.50

5 −2.50 −1.25; 0.00; 1.25 2.50

10 −2.50; −1.94 −1.39; −0.83; −0.28; 0.28; 0.83; 1.39 1.94; 2.50

Note. “nil” indicates that no item is located at the extreme end of the continuum. For example, comparing the first and fourth rows, there is no extreme item in the first row, whereas there are two extreme items (one on each side of the continuum) in the fourth row.

The $τ_{c}$ values were obtained as follows: a single $Y^{}$ variable was simulated from equation (2) with $a = 0$ , $t = - 1$ , and $δ = 0$ , while $θ$ and $ε$ were simulated from a standard normal distribution ( $μ = 0$ , $σ = 1$ ) with a large sample size ( $N = 1, 000, 000$ ). The values at the fifth, 23rd, 50th, 77th, and 95th percentiles of this $Y $ distribution were used as thresholds to generate polytomous data, representing data from a 6-point rating scale. The 50th-percentile value was used as the threshold to generate dichotomous data.

Analysis

Bias in Parameter Estimation

To evaluate the feasibility of fitting the proposed model using the LMS approach, we computed parameter bias to assess the accuracy of the parameter estimates. Higher bias values indicate greater deviation from the true (population) parameter values.

Mean bias in the parameter estimates was calculated as
$B i a s ({\hat{γ}}_{p}) = \frac{\sum_{q = 1}^{r} ({\hat{γ}}_{p q} - γ_{p})}{r}$
where ${\hat{γ}}_{p q}$ is the q-th sample (i.e., replication) estimate of the p-th true (population) parameter value $γ_{p}$ and r is the number of replications in a simulation (Bandalos & Gagné, 2012; Bandalos & Leite, 2013).

Recovery of Person Location ( $θ$ )

Similar to Study 1, Spearman's rank correlation and Pearson correlation were computed to reflect the similarity in the ranking and pattern of the true and estimated $θ$ .

Model–Data Fit

The FIML method for categorical data was chosen for estimation, which involved numerical integration. Chi-square tests and other conventional global goodness-of-fit statistics (e.g., root mean square error of approximation and standardized root mean square residual) were not available under this estimator (Muthén, 2010). Nevertheless, Mplus provides model comparison indices, including the Akaike information criterion (AIC), Bayesian information criterion (BIC), and sample-size-adjusted BIC (sBIC). Given the same set of observed variables, these indices can be used to select a better-fitting model among a set of competing models (regardless of whether they are nested), after accounting for model complexity; models with smaller values are preferred (West et al., 2012).

A two-factor exploratory factor analysis (EFA) model⁵ was chosen as the competing model to assess the relative fit of the OCRUM to the simulated data (recall that the data-generating model was the OCRUM). Previous studies have reported that linear factor analysis of unfoldable unidimensional items (e.g., bipolar constructs such as happy vs. sad and liberalism vs. conservatism) can spuriously produce a two-dimensional solution—often referred to as the “extra factor” phenomenon (Maraun & Rossi, 2001; van Schuur & Kiers, 1994). Therefore, we used a two-factor model as the competing model to evaluate the sensitivity and specificity of the model fit statistics of interest.

For each condition, we computed the proportion of simulated datasets in which each information criterion (AIC, BIC, sBIC) identified the OCRUM as the better-fitting model compared with the two-factor EFA. This proportion is referred to as the hit rate. The three-item test length condition was not included in this comparison because it is not feasible to fit a two-factor EFA model to such data (the model is underidentified).

Procedure

This simulation study investigated the effects of sample size, test length, item location range, and response options on parameter estimates and model–data fit. Data were generated using the procedure described above. The OCRUM (equation 3) was fitted to the simulated data until 200 successful replications were obtained for each condition. Subsequently, a two-factor EFA model (with geomin rotation) was fitted to the same datasets. Both the OCRUM and two-factor EFA models were estimated using the MLR estimator with a probit link in Mplus.

Results

Successful Replications

Table 3 presents, for each condition, the total number of replications attempted (first value in each cell), the number of datasets that converged with estimation errors (second value; at least one parameter fixed and not estimated), and the number of datasets that failed to converge (third value; the OCRUM did not fit). As shown in Table 3, for dichotomous response options (R = 2), Mplus had more difficulty fitting the OCRUM when the sample size was small (N = 150) and the test length was short (J = 3), especially when the item location range was large (±2.5). This difficulty is reflected in the larger number of replications required to obtain 200 successful datasets in these conditions. In contrast, most datasets were fitted without error in the polytomous response option conditions (R = 6) when the sample size was 300 or more, regardless of test length. Although item location range still affected convergence patterns, its impact diminished as sample size increased.

Table 3.
Simulation Replications (True Model = OCRUM, Fitted Model = OCRUM).

J: 3 J: 5 J: 10

D: ±1.5 D: ±2.5 D: ±1.5 D: ±2.5 D: ±1.5 D: ±2.5

R: Dichotomous

N 150 377/177/0 420/217/3 232/31/1 321/121/0 231/31/0 390/190/0

300 349/149/0 366/166/0 210/10/0 266/66/0 205/5/0 275/75/0

600 373/173/0 312/112/0 202/2/0 218/18/0 202/2/0 276/76/0

R: Polytomous

N 150 201/1/0 271/71/0 200/0/0 246/46/0 201/0/1 288/88/0

300 200/0/0 252/52/0 200/0/0 222/22/0 200/0/0 220/20/0

600 200/0/0 228/28/0 200/0/0 208/8/0 200/0/0 204/4/0

Note. D = item location range; J = test length; N = sample size; R = response options. The number of successful replications was 200 for all conditions. The first value in each cell represents the total number of replications attempted for that condition. The second value represents the number of datasets that converged but with estimation errors (e.g., inadmissible parameter estimates such as negative error variances). The third value represents the number of nonconverged datasets (i.e., cases in which a unique set of parameter estimates was not found).

Bias in Parameter Estimation

Figure 5 illustrates several consistent patterns. As expected, within each response option condition, the more extreme an item was (i.e., items with locations outside the range of ±1.5), the larger the bias in its item location estimate. This parameter bias for extreme items was more pronounced in the dichotomous (R = 2) conditions than in the polytomous (R = 6) conditions. However, these biases decreased as sample size increased, and the bias in the item location estimates tended toward 0 when the sample size increased from N = 150 to N = 600.⁶

Figure 5.
Item location parameter bias (Study 2). (a) Test length (J) = 3, (b) test length (J) = 5, and (c) test length (J) = 10.

Recovery of Person Location ( $θ$ )

Table 4 presents the mean and standard deviation of Spearman's rank correlations between estimated person locations $\hat{θ}$ and true (simulated) person locations θ. Across Monte Carlo conditions, mean Spearman's rank correlations exceeded .90 (Table 4), and mean Pearson correlations exceeded .85 (Table 5). The standard deviations of both correlation coefficients within each condition were below .05.

Table 4.
Mean and Standard Deviation of Spearman Rank Correlation Between $\hat{θ}$ and Simulated $θ$ .

J: 3 J: 5 J: 10

D: ±1.5 D: ±2.5 D: ±1.5 D: ±2.5 D: ±1.5 D: ±2.5

R: Dichotomous

N 150 .902 (0.016) .912 (0.014) .926 (0.022) .939 (0.009) .959 (0.016) .968 (0.005)

300 .903 (0.011) .913 (0.008) .932 (0.012) .941 (0.007) .960 (0.012) .969 (0.004)

600 .904 (0.007) .915 (0.007) .933 (0.008) .941 (0.004) .964 (0.006) .970 (0.003)

R: Polytomous

N 150 .962 (0.009) .977 (0.003) .972 (0.006) .985 (0.003) .983 (0.006) .991 (0.002)

300 .965 (0.006) .978 (0.002) .974 (0.004) .986 (0.001) .984 (0.003) .992 (0.001)

600 .967 (0.003) .978 (0.002) .975 (0.003) .986 (0.001) .986 (0.002) .992 (0.001)

Note. D = item location range; J = test length; N = sample size; R = response options. Values represent means; standard deviations are reported in parentheses.

Table 5.
Mean and Standard Deviation of Pearson Correlation Between $\hat{θ}$ and Simulated $θ$ .

J: 3 J: 5 J: 10

D: ±1.5 D: ±2.5 D: ±1.5 D: ±2.5 D: ±1.5 D: ±2.5

R: Dichotomous

N 150 .865 (0.024) .875 (0.016) .895 (0.046) .902 (0.016) .941 (0.043) .951 (0.008)

300 .867 (0.017) .876 (0.012) .906 (0.026) .905 (0.010) .939 (0.034) .952 (0.006)

600 .867 (0.012) .878 (0.007) .906 (0.017) .906 (0.006) .949 (0.017) .952 (0.004)

R: Polytomous

N 150 .961 (0.021) .967 (0.005) .975 (0.016) .980 (0.003) .984 (0.020) .990 (0.002)

300 .965 (0.007) .969 (0.003) .977 (0.003) .981 (0.002) .987 (0.008) .991 (0.001)

600 .966 (0.004) .969 (0.002) .978 (0.003) .981 (0.001) .987 (0.003) .991 (0.001)

Note. D = item location range; J = number of item per factor; N = sample size; R = response options. Values represent means; standard deviations are reported in parentheses.

Between-subjects analyses of variance (ANOVAs) were conducted to examine the effects of sample size (N), test length (J), item location range (D), response options (R), and their interactions on Spearman's rank correlations (Table 6) and Pearson correlations (Table 7) between estimated and simulated person locations. Each of the 36 conditions had 200 replications, and the number of replications served as the effective “sample size” for the ANOVAs. This large effective sample size implies that even trivial differences between conditions could attain statistical significance (p < .05) due to reduced standard errors. To avoid overinterpreting trivial effects, we report effect sizes for the ANOVAs rather than focusing on p values.

Table 6.
Effects of Various Simulation Design Factors on Spearman Rank Correlation Between $\hat{θ}$ and Simulated $θ$ .

Degree of Freedom Type III Sum of Squares $η_{p}^{2}$

N 2 0.011 0.022

J 2 1.619 0.777

D 1 0.183 0.283

R 1 3.427 0.881

NJ 4 0.001 0.001

ND 2 0.001 0.003

JD 2 0.005 0.011

NR 2 0.000 0.000

JR 2 0.472 0.504

DR 1 0.001 0.002

NJD 4 0.001 0.001

NJR 4 0.001 0.001

NDR 2 0.000 0.000

JDR 2 0.001 0.001

NJDR 4 0.001 0.002

Residual 7,164 0.465

Note. D = item location range; J = test length; N = sample size; R = response options. Interaction effects are denoted by an asterisk (); for example, the interaction between N and J is denoted as NJ.

Table 7.
Effects of Various Simulation Design Factors on Pearson Correlation Between $\hat{θ}$ and Simulated $θ$ .

Degree of Freedom Type III Sum of Squares $η_{p}^{2}$

N 2 0.014 0.007

J 2 2.885 0.598

D 1 0.054 0.027

R 1 8.957 0.822

NJ 4 0.003 0.002

ND 2 0.003 0.002

JD 2 0.005 0.002

NR 2 0.001 0.000

JR 2 0.891 0.315

DR 1 0.003 0.001

NJD 4 0.003 0.001

NJR 4 0.004 0.002

NDR 2 0.000 0.000

JDR 2 0.005 0.003

NJDR 4 0.003 0.002

Residual 7,164

Note. D = item location ranges; J = number of item per factor; N = sample size; R = response options. Interaction effects are denoted by an asterisk (); for example, the interaction between N and J is denoted as NJ.

Effect sizes were defined as the magnitude of an effect (Kelley & Preacher, 2012). Partial eta squared ( $η_{p}^{2}$ ) was used as the effect size index and represents the proportion of variance associated with an effect (e.g., a main effect) relative to that effect plus its associated error variance. Following Cohen (1988), $η_{p}^{2}$ = .02 (2% of variance explained) was interpreted as a small effect, with $η_{p}^{2}$ = .13 (13%) interpreted as a medium effect and $η_{p}^{2}$ = .26 (26%) interpreted as a large effect.

A large interaction effect was observed between test length and response options (JR). The $η_{p}^{2}$ values were .504 and .315 for the Spearman's rank and Pearson correlation analyses, respectively, indicating that the effect of test length on the correlations differed across response formats. Tests of the simple effects of test length (J) at each level of response options (R) were subsequently conducted using the relevant subsets of data. ANOVAs revealed that the effect of test length in the polytomous response condition was large for both Spearman's rank correlation, F(2, 3,597) = 1,834.09, $η_{p}^{2}$ = .505, and Pearson correlation, F(2, 3,597) = 1,818.77, $η_{p}^{2}$ = .503, consistent with the descriptive statistics. The effect of test length was even larger in the dichotomous response condition for Spearman's rank correlation, F(2, 3,597) = 6,896.30, $η_{p}^{2}$ = .793, and Pearson correlation, F(2, 3,597) = 3,582.60, $η_{p}^{2}$ = .666. Inspection of the descriptive statistics in Tables 6 and 7 shows that longer tests generally resulted in higher correlations between true and estimated person locations.

Model Comparison

Because the simulated data were generated according to equation (2) (the OCRUM), fitting these data with the OCRUM should yield lower AIC, BIC, and sBIC values than fitting a two-factor EFA model (which is misspecified, given the data-generating process). The hit rate for each statistic was defined as the proportion of simulated datasets in a condition for which that statistic (AIC, BIC, or sBIC) identified the OCRUM as the better-fitting model. Table 8 summarizes these results. A value of 1 indicates that the statistic identified the OCRUM as the better-fitting model in all simulated datasets under that condition, whereas a value of 0 indicates that it never did so.

Table 8.
Model Comparison Hit Rate Using AIC, BIC, and sBIC (True Model = OCRUM).

J: 5 J: 10

D: ±1.5 D: ±2.5 D: ±1.5 D: ±2.5

R: Dichotomous*

N 150 0.92/0.81/0.92 0.91/0.86/0.93 1.00/1.00/1.00 1.00/1.00/1.00

300 0.99/0.97/0.99 1.00/0.98/1.00 1.00/1.00/1.00 1.00/1.00/1.00

600 1.00/1.00/1.00 1.00/1.00/1.00 1.00/1.00/1.00 1.00/1.00/1.00

R: Polytomous

N 150 1.00/1.00/1.00 1.00/1.00/1.00 1.00/1.00/1.00 1.00/1.00/1.00

300 1.00/1.00/1.00 1.00/1.00/1.00 1.00/1.00/1.00 1.00/1.00/1.00

600 1.00/1.00/1.00 1.00/1.00/1.00 1.00/1.00/1.00 1.00/1.00/1.00

Note. AIC = Akaike information criterion; BIC = Bayesian information criterion; D = item location range; J = test length; N = sample size; R = response options; OCRUM = ordered categorical response unfolding model; sBIC = sample-size-adjusted Bayesian information criterion. Values represent the proportion of successful identifications of the true model (OCRUM) as the better-fitting model compared with a two-factor exploratory factor analysis model (hit rate). The three-item test length condition was not included because it is not possible to extract a two-factor solution from three indicators. Values in each cell are presented in the order of information criteria (AIC, BIC, sBIC) from which they are derived.

The AIC and sBIC generally indicated that the OCRUM was the correct model across most combinations of sample size (N), test length (J), item location range (D), and response options (R). In contrast, the BIC appeared less reliable under some conditions—for example, when responses were dichotomous, sample size was small (N = 150), test length was short (J = 5), and item location range was narrow (D = ±1.5); in this case, BIC identified the OCRUM as the better-fitting model in only 81% of the datasets.

Summary of Study 2

Study 2 examined the ability of the OCRUM approach to recover model parameters across a range of conditions that varied sample size (N), test length (J), item location range (D), and response options (R). Taken together, the findings suggest that the OCRUM may not perform well when analyzing dichotomous response data with a small sample size (N = 150) paired with a short test (J = 3) that includes extreme items. In general, the OCRUM performed better with polytomous response data, and item location bias decreased as a function of sample size and test length; that is, as sample size and test length increased, bias decreased. For dichotomous response data, the GGUM has similarly been found to produce relatively large parameter and standard error bias for extreme items (Roberts, 2004).

Spearman's and Pearson correlations between estimated and simulated θ also indicated better performance of the OCRUM with polytomous data. Both correlations were consistently above .96 in all conditions when responses were polytomous, whereas correlations were slightly lower for dichotomous data when the test was shorter (J = 3).

A second goal of Study 2 was to investigate the use of AIC, BIC, and sBIC in model comparison procedures. Based on our analyses, AIC and sBIC performed well across different levels of sample size, test length, response format, and item location range. In contrast, BIC showed reduced accuracy only under the smallest-sample, shorter-test condition for dichotomous data (N = 150, J = 5; hit rates = 0.81–0.86) but identified the OCRUM in essentially all other conditions.

Study 3

Past analyses of unfolding response data have typically focused on scoring participants on the latent construct of interest. However, many psychological and organizational studies also aim to test associations or directional (“causal”) relationships among latent constructs. Usami (2011), for example, attempted to estimate structural parameters and unfolding measurement model parameters simultaneously using a Bayesian approach. Although effective, this approach was computationally intensive and difficult to generalize to more complex models. In contrast, our proposed OCRUM approach can be readily implemented in existing SEM software (e.g., Mplus). This enables new modeling opportunities involving unfolding responses that have remained largely unexplored in previous work. In Study 3, we examined the feasibility of incorporating the OCRUM as part of the measurement model in a structural equation model and evaluated the accuracy of estimating structural parameters when a mix of unfolding and dominance responses is present, using Monte Carlo simulations.

Method

Simulation Design

We considered three levels of sample size (N = 150, 300, 600), following the rationale described in Study 2. In addition to sample size, we manipulated two effect sizes for the standardized structural coefficients, namely, $β$ = 0.10 (weak effect) and $β$ = 0.30 (moderate effect; Cohen, 1988). Furthermore, we examined whether associating the unfolding response mechanism with an exogenous latent variable (antecedent), an endogenous mediator, or an endogenous outcome affected the estimation of structural coefficients. For convenience, we referred to this third design factor as model type and assessed three different models. A total of 18 conditions were included in this simulation: 3 (sample size) × 2 (effect size) × 3 (model type). We performed 200 replications for each condition.

The dependent variables of interest were (a) parameter bias in the structural path coefficients ( $β$ ; see Study 2 for computation details), as described in the Data Generation section below, and (b) the statistical power of the z tests for these structural path coefficients. Statistical power was computed as the percentage of replications in which the null hypothesis for a given structural path ( $H_{0} : β = 0$ ) was correctly rejected at $α = 0.05.$

Data Generation

A full structural equation model comprises both measurement and structural parts. In this study, the structural part included one latent trait ( $θ_{X}$ ) serving as an antecedent, one latent trait ( $θ_{M}$ ) serving as a mediator, one latent trait ( $θ_{Y}$ ) serving as an outcome variable, and one continuous observed variable representing a covariate/control variable ( $X_{o b s}$ ). The structural relationships between these variables were specified as follows (see Figure 6):
$\begin{aligned} θ_{M} = & β_{1} θ_{X} + d_{M} \\ θ_{Y} = & β_{2} θ_{M} + β_{3} θ_{X} + β_{4} X_{o b s} + d_{Y}, \end{aligned}$
where d_M and d_Y are structural errors. This model configuration, or minor variations of it, is common in psychological and organizational research.

Figure 6.
A graphical representation of the structural equation model used in Study 3. Note. The measurement models are omitted from this figure for brevity; only the structural part of the model is shown. Ovals represent latent variables, and rectangles represent observed variables. x_obs represents an observed covariate (e.g., years of work experience) that is controlled in the model. Three different models (Models 3a–3c) with varying specifications of the latent variables, as listed in Table 9, were simulated. In Model 3a, $θ_{X}$ was specified to follow an unfolding model (ordered categorical response unfolding model [OCRUM]), and both $θ_{M}$ and $θ_{Y}$ were specified to follow a linear factor model. In Model 3b, $θ_{M}$ was specified to follow OCRUM, and both $θ_{X}$ and $θ_{Y}$ were specified to follow a linear factor model. In Model 3c, $θ_{Y}$ was specified to follow OCRUM, and both $θ_{M}$ and $θ_{Y}$ were specified to follow a linear factor model.

Table 9.
Response Mechanism Measurement Model of the Structural Equation Model in Figure 3 (Study 3).

Model $θ_{X}$ $θ_{M}$ $θ_{Y}$

Model 3a Unfolding Dominance Dominance

Model 3b Dominance Unfolding Dominance

Model 3c Dominance Dominance Unfolding

Note. The ordered categorical response unfolding model was specified to model the unfolding response mechanism. A linear factor analysis model was specified to model the dominance response mechanism.

Table 10.
Mean Absolute Bias for Structural Coefficients of the Models in Study 3.

N Effect Size $β_{1}$ $β_{2}$ $β_{3}$ $β_{4}$

Model 3a 150 Small 0.020 0.017 0.022 0.011

Moderate 0.016 0.018 0.019 0.012

300 Small 0.014 0.012 0.014 0.008

Moderate 0.013 0.012 0.012 0.008

600 Small 0.009 0.009 0.010 0.005

Moderate 0.009 0.009 0.010 0.007

Model 3b 150 Small 0.022 0.020 0.018 0.012

Moderate 0.019 0.019 0.017 0.012

300 Small 0.014 0.013 0.012 0.008

Moderate 0.013 0.012 0.011 0.008

600 Small 0.010 0.010 0.007 0.005

Moderate 0.009 0.009 0.009 0.005

Model 3c 150 Small 0.017 0.018 0.021 0.014

Moderate 0.017 0.019 0.021 0.015

300 Small 0.013 0.015 0.016 0.011

Moderate 0.012 0.013 0.014 0.009

600 Small 0.009 0.010 0.009 0.007

Moderate 0.008 0.009 0.009 0.007

Note. N = sample size; $β_{1}$ to $β_{4}$ correspond to the structural coefficient in Figure 6.

We considered three variations of the measurement model (see Table 9). In Model 3a, $θ_{X}$ was modeled using the unfolding response mechanism, whereas $θ_{M}$ and $θ_{Y}$ were modeled using dominance response mechanisms. In Model 3b, $θ_{M}$ followed the unfolding response mechanism, whereas $θ_{X}$ and $θ_{Y}$ followed dominance mechanisms. In Model 3c, $θ_{Y}$ was modeled using the unfolding response mechanism, whereas $θ_{X}$ and $θ_{M}$ were modeled using dominance response mechanisms. For the unfolding response mechanism, data were generated based on the OCRUM; for the dominance response mechanism, data were generated based on a linear single factor model (i.e., a conventional confirmatory factor analysis model).

Similar to Study 2, the observed unfolding response data were simulated in two steps:
Latent continuous responses ( $Y_{i j}^{}$ ) were generated from the OCRUM (equation (2)), where $t_{j}$ was fixed at −1, $a_{j}$ was fixed at 0, and $δ_{j}$ took values [−1.50, −0.75, 0.00, 0.75, 1.50]. Person locations $θ_{i}$ and $ε_{i j}$ were simulated from a standard normal distribution (µ = 0; σ = 1).

The latent continuous responses were then transformed into 6-category ordered responses using the same threshold values as in Study 2. Five indicators (items) were generated following these steps.

Conversely, the observed dominance response data were simulated in two analogous steps:
Latent continuous responses $Y_{k}^{}$ were generated from a linear single factor model: $Y_{i k} * = λ_{k} θ_{i} + e_{i k}$ . Five indicators were generated for each latent trait. Factor loadings were fixed at $λ_{k} = 0.70$ for all 5 items. Person locations $θ_{i}$ were simulated from a standard normal distribution (µ = 0; σ = 1) and error terms $e_{i k}$ were simulated from a normal distribution (µ = 0; σ = 0.54).

The latent continuous responses were then transformed into 6-category ordered responses using the following threshold values: [−1.60, −0.83, 0.00, 0.83, and 1.60].

Results

All 200 replications were successful in every condition, and no estimation or convergence errors were encountered.

Bias in Parameter Estimation

Parameter bias in the standardized structural path coefficients (β) was minimal. The maximum mean absolute bias (MAB) across the 18 conditions was 0.022, with bias decreasing as sample size increased (Table 10). The mean parameter bias of the factor loadings (λ) for the linear factor models was also negligible, ranging from −0.004 to 0.005 across model types, sample size levels, and structural path effect size conditions.

The pattern of mean parameter bias for item locations ( $δ$ ) was consistent with findings for the polytomous response option conditions in Study 2: larger sample sizes and less extreme item locations were associated with smaller estimation bias. For items with a true location of 0 (i.e., items near the middle of the latent continuum), mean parameter bias ranged from −0.009 to 0.004 in the small sample size condition (N = 150), compared with −0.002 to 0.001 in the large sample size condition (N = 600). For items with a true location of +1.5 (items near the positive pole of the latent continuum), mean parameter bias ranged from 0.042 to 0.358 when N = 150, compared with 0.006 to 0.335 when N = 600. For items with a true location of −1.5 (items near the negative pole), mean parameter bias ranged from −0.358 to −0.040 when N = 150 and from −0.330 to −0.021 when N = 600.

Statistical Power

As expected, structural paths with smaller effect sizes had lower statistical power than those with larger effect sizes at a fixed sample size, and power increased as sample size increased (Table 11). In the smaller effect size conditions, the statistical power for structural coefficients involving latent variables measured with an unfolding response mechanism (i.e., $β_{1}$ and $β_{3}$ in Model 3a; $β_{1}$ and $β_{2}$ in Model 3b; $β_{2}$ and $β_{3}$ in Model 3c) was slightly higher than for other structural parameters.

Table 11.
Statistical Power of Structural Parameters in the Mediation Model (Study 3).

N Effect Size $β_{1}$ $β_{2}$ $β_{3}$ $β_{4}$

Model 3a 150 Small 0% 0% 0.5% 0%

Moderate 100% 100% 100% 100%

300 Small 9% 1% 5.5% 1.5%

Moderate 100% 100% 100% 100%

600 Small 71.5% 61% 74.5% 98%

Moderate 100% 100% 100% 100%

Model 3b 150 Small 0% 1% 0% 0%

Moderate 100% 100% 100% 100%

300 Small 8.5% 5% 3% 1.5%

Moderate 100% 100% 100% 100%

600 Small 74% 76% 58.5% 95%

Moderate 100% 100% 100% 100%

Model 3c 150 Small 0% 0% 0% 0%

Moderate 100% 100% 100% 100%

300 Small 1% 8% 8% 16.5%

Moderate 100% 100% 100% 100%

600 Small 56.5% 79% 75.5% 98.5%

Moderate 100% 100% 100% 100%

Note. N = sample size; $β_{1}$ to $β_{4}$ correspond to the structural coefficient in Figure 6. Statistical power was evaluated at $α = 0.05$ , and the values represent the percentage of replications in which the null hypothesis for the parameter was correctly rejected.

Summary of Study 3

Study 3 examined the incorporation of the OCRUM into a structural equation model as a measurement model. All Monte Carlo replications were successfully fitted without convergence problems. Structural path parameters were recovered with minimal bias. Sample size and effect size had a joint effect on statistical power, such that smaller sample sizes paired with smaller effect sizes yielded lower power.

Furthermore, in the smaller effect size conditions, structural path coefficients associated with OCRUM based latent variables showed higher statistical power than other structural paths, regardless of whether the OCRUM based latent variable was exogenous or endogenous. Inspection of several randomly selected replications of Model 3a suggested that this may be due to the choice of population parameters in the simulation design, which resulted in slightly higher empirical reliability for the OCRUM latent factor ( $r_{x x^{'}} \approx .94$ ) compared with the CFA latent factors ( $r_{y y^{'}} \approx .81$ ). Future simulation studies should control for the reliability of latent factors to more directly evaluate this possibility.

We also investigated the effect of measurement model misspecification on the estimation of structural path parameters. Specifically, we deliberately misspecified the OCRUM measurement model (in Models 3a, 3b, and 3c) as a conventional linear factor model and fitted this misspecified model to the same datasets. Results suggested that measurement model misspecification led to higher parameter bias compared with conditions in which the measurement models were correctly specified (see Appendix B for details).

Overall, Study 3 demonstrates the potential and practicality of integrating the OCRUM into the SEM framework and expands modeling possibilities when latent constructs follow dominance, unfolding, or mixed response mechanisms. The negative impact of measurement model misspecification on structural path estimates underscores the importance of considering the underlying response process and choosing an appropriate measurement model in statistical analyses.

Discussion

Dominance and unfolding response processes describe two ways in which individuals may respond to rating-scale items. The dominance response process is commonly represented by a linear common factor model and is widely used as a measurement model in SEM. Measurement models proposed for the unfolding response process, however, often require specialized software that is not compatible with SEM, which has limited their use in organizational research (Foster et al., 2017).

This study proposed the OCRUM as an alternative approach for modeling unfolding response mechanisms. The OCRUM can be incorporated into the SEM framework in a relatively straightforward manner and readily implemented in existing SEM software. Our approach is conceptually simple and straightforward to apply and extend. We evaluated the performance of the OCRUM using two empirical datasets and two simulation studies. Analyses of the empirical datasets (Study 1) illustrated the application of the OCRUM and demonstrated that it performed comparably to the GGUM in determining the ordering of item locations (δ) and person locations (θ) on the latent continuum. The simulation in Study 2 examined the effects of sample size, test length, item-location range, and response options on parameter recovery and model–data fit. We found that item and person locations in the unidimensional OCRUM could be estimated from polytomous response data on tests with 5–10 items with minimal bias at a sample size of 300. Further increasing the sample size to 600 yielded only modest additional reductions in bias. As in prior work, items located at the extreme ends of the continuum tended to have larger bias, and parameters estimated from dichotomous data tended to exhibit more bias than those estimated from polytomous data. Study 3 demonstrated that relationships among constructs that generate unfolding and dominance responses can be modeled simultaneously within a single SEM framework using the OCRUM.

What is the Nature of the Construct That Makes a Dominance vs. Unfolding Model Appropriate?

When considering whether to apply dominance models or unfolding (ideal-point) models, it is important to attend to the nature of the assessment context. Measures designed to capture maximum performance (e.g., many cognitive ability tests) typically present items that vary in difficulty, and individuals are motivated to answer as many items correctly as they can. In such contexts, higher levels of the latent trait are associated with a monotonically increasing probability of a correct or endorsed response, which aligns well with dominance response models (Coombs, 1964; Drasgow et al., 2010). In contrast, measures of typical behavior (e.g., many personality or vocational interest inventories) ask respondents to indicate how well statements describe them. Here, responses are often best conceptualized as reflecting positions along a continuum, with individuals most likely to endorse items located near their own standing on the trait, which is consistent with unfolding response models (Coombs, 1964; Roberts et al., 2000; Tay & Ng, 2018; Thurstone,1928).

The choice between dominance and unfolding models may also depend on the polarity of the construct. Unfolding models may be particularly suitable for bipolar constructs with two opposing ends (e.g., satisfaction vs. dissatisfaction). Previous studies have shown that applying conventional dominance-based linear factor models to such constructs can yield spurious extra factors rather than a single bipolar continuum (Davison, 1977; Tay & Drasgow, 2012; van Schuur & Kiers, 1994). Moreover, as demonstrated in additional analyses (Appendix B), misspecifying the measurement model can introduce parameter bias.

The testing context may also influence the response-generation process. For example, the pressure of high-stakes testing can shift respondents toward a performance-oriented mindset, potentially attenuating ideal-point responding and producing more dominance-like response patterns (O’Brien & LaHuis, 2011). Likewise, participants who carefully follow assessment directions may respond differently from those who engage in impression management. In such cases, it is useful to evaluate the data empirically using a model-comparison approach—for example, fitting both a dominance model (e.g., a linear factor model) and an unfolding model (e.g., a quadratic factor model) and comparing their fit, or even considering models that allow mixtures of response mechanisms. Our simulations indicate that information criteria such as AIC and sample-size adjusted BIC (sBIC) can correctly identify the data-generating model under a wide range of conditions (i.e., sample size, test length, response options, and item-location range).

Discussing item and scale construction under unfolding (ideal-point) assumptions is beyond the scope of this article. Interested readers are referred to Chernyshenko et al. (2007), Cao et al. (2015), and DeNunzio et al. (2024), which provide guidance on ideal-point item writing and scale development for constructs such as personality and engagement.⁷

To date, research on modeling unfolding responses has largely focused on model development and scoring. Much less attention has been given to examining relationships among constructs that follow a single-peaked response process. Yet, assessing relationships among latent constructs (e.g., mediation, moderation) plays a key role in many organizational and psychological investigations, particularly when structural models are estimated. SEM has become one of the most widely used analytical techniques in organizational research because it allows researchers to test complex theoretical models involving multiple latent constructs and their interrelationships (Williams, 2003; Zyphur et al., 2023). Mediation and moderation analyses with latent variables are central tools for understanding the mechanisms and boundary conditions through which organizational phenomena operate, and they are essential for theory testing and development in the organizational sciences (Cortina et al., 2020; Edwards & Lambert, 2007). Our study addresses this gap by introducing the OCRUM approach, which can be used to model unfolding responses and to test, estimate, and evaluate constructs that follow both unfolding and dominance response mechanisms within the same model. Because the method is implemented within an existing SEM framework—a mainstream multivariate method familiar to many researchers in the behavioral and social sciences—it opens new modeling opportunities for unfolding data (e.g., longitudinal analysis, multilevel analysis, tests of moderation/mediation, detection of unobserved heterogeneity).

Limitations and Future Directions

In addition to evidence that the OCRUM may not perform well with small samples and very short, dichotomously scored tests, another limitation concerns the estimation method used for person locations (θ), which assumes a normal distribution. This assumption may not always hold at the population level. Future studies should examine the impact of violations of the normality assumption on the estimation of θ.

Similar to commonly used confirmatory factor analysis, the OCRUM requires users to decide the number of factors a priori. Researchers can, however, adopt a model comparison approach by specifying alternative numbers of underlying factors and selecting the best fitting model using information criteria (e.g., AIC and sBIC). Extending the OCRUM to incorporate multiple latent factors underlying unfolding responses is conceptually straightforward within the SEM framework, but it raises computational challenges. Estimating the OCRUM in Mplus involves numerical integration, and the computational burden increases roughly linearly with the number of observations and exponentially with the number of dimensions of integration (i.e., the number of latent variables or latent-variable interactions that require integration; Muthén & Muthén, 1998–2017). In our applications, models with a single latent factor (Studies 1 and 2) were estimated quickly (e.g., datasets with N = 600, J = 10, and polytomous responses took less than 10 s), whereas the three-factor model in Study 3 required noticeably longer run times (e.g., N = 600 and J = 10 required several minutes to estimate on a standard desktop computer). Future research should further evaluate and optimize computational approaches for fitting OCRUM-based models, particularly in higher-dimensional applications.

A design feature of our simulation study was the use of evenly spaced item locations. This represents an idealized situation in which a scale comprises items that are evenly spaced along the latent continuum. We acknowledge that, in practice, items are unlikely to be perfectly evenly spaced. We did not examine conditions with unevenly spaced items in this simulation. Future investigations should explore how uneven spacing affects the recovery of item locations and person locations. Finally, future work should investigate cases in which scales and items are multidimensional, rather than unidimensional or characterized by independent clusters of item parameters. We hope that this study will motivate further analyses and applications involving unfolding response mechanisms within SEM.

	Attitude Toward CP (Thurstone, 1932)	OCRUM	GGUM
CP10	We must have capital punishment for some crimes.	−8.554 (1.850)	−1.445 (0.149)
CP24	Capital punishment should be used more often than it is.	−4.892 (3.488)	−1.316 (0.115)
CP17	Capital punishment is just and necessary.	−2.976 (1.258)	−2.981 (3.790)
CP20	Capital punishment gives the criminal what he deserves.	−2.052 (1.844)	−1.001 (0.103)
CP18	I do not believe in capital punishment but it is not practically advisable to abolish it.	1.180 (0.332)	1.021 (0.212)
CP15	Life imprisonment is more effective than capital punishment.	4.914 (3.940)	2.130 (0.317)
CP09	I don’t believe in capital punishment but I’m not sure it isn’t necessary.	2.018 (0.868)	1.278 (0.156)
CP12	I do not believe in capital punishment under any circumstances.	2.209 (0.678)	1.933 (0.079)
CP16	Execution of criminals is a disgrace to civilized society.	3.563 (1.036)	2.896 (0.721)
CP14	We can’t call ourselves civilized as long as we have capital punishment.	9.516 (1.821)	3.390 (5.700)
CP13	Capital punishment is not necessary in modern civilization.	9.628 (1.428)	3.739 (8.475)
CP02	Capital punishment is absolutely never justified.	10.860 (1.565)	3.823 (9.654)

	ORS (Chernyshenko et al., 2007)
ORD38	Organization is a key component of most things I do.	−6.757 (0.629)	−2.203 (12.442)
ORD50	Every item in my room and on my desk has its own designated place.	−6.541 (0.839)	−2.571 (8.799)
ORD34	I have a daily routine and stick to it.	−6.211 (1.098)	−2.502 (1.601)
ORD46	I keep detailed notes of important meetings and lectures.	−4.138 (2.369)	−3.481 (0.734)
ORD37	I prefer to do things in a logical order.	−3.342 (1.808)	−3.460 (23.910)
ORD44	I become annoyed when things around me are disorganized.	−2.984 (1.822)	−2.693 (19.608)
ORD35	I need a neat environment in order to work well.	−1.700 (0.415)	−1.535 (0.335)
ORD20	I do pretty standard maintenance for my property and possessions.	−0.207 (0.251)	−0.232 (0.212)
ORD23	My room neatness is about average.	−0.192 (0.097)	−0.192 (0.079)
ORD42	I write notes to myself only if I have too many things to do at once.	0.239 (0.213)	0.247 (0.242)
ORD29	Although I try to keep everything in its place, it does not always work for me.	0.632 (0.109)	0.670 (0.089)
ORD26	My ability to plan is at about average.	1.188 (0.387)	1.190 (0.289)
ORD30	Although I have a daily organizer, I have hard time keeping it up to date.	2.239 (1.425)	1.493 (0.299)
ORD14	I do not like work spaces that are too clean and tidy.	3.386 (2.544)	5.171 (36.637)
ORD10	I frequently forget to put things back in their proper place.	1.452 (0.193)	1.293 (0.084)
ORD06	Most of the time my room is in complete disarray.	2.628 (1.175)	1.895 (0.184)
ORD17	Being neat is not exactly my strength.	1.887 (0.423)	1.623 (0.176)
ORD24	Half of the time I do not put things in their proper place.	2.278 (0.641)	1.531 (0.126)
ORD15	For me, being organized is unimportant.	5.752 (2.323)	2.305 (0.399)
ORD02	Usually, my notes are so jumbled, even I have a hard time reading them.	7.566 (0.946)	3.430 (5.342)

Item Location Range (D)	Test Length (J)	Extreme	Intermediate	Extreme
±1.5	3	nil	−1.50; 0.00; 1.50	nil
5	nil	−1.50; −0.75; 0.00; 0.75; 1.50	nil
10	nil	−1.50; −1.17; −0.83; −0.50; −0.17; 0.17; 0.50; 0.83; 1.17; 1.50	nil
±2.5	3	−2.50	0.00	2.50
5	−2.50	−1.25; 0.00; 1.25	2.50
10	−2.50; −1.94	−1.39; −0.83; −0.28; 0.28; 0.83; 1.39	1.94; 2.50

		J: 3	J: 5	J: 10
R: Dichotomous
N	150	377/177/0	420/217/3	232/31/1	321/121/0	231/31/0	390/190/0
300	349/149/0	366/166/0	210/10/0	266/66/0	205/5/0	275/75/0
600	373/173/0	312/112/0	202/2/0	218/18/0	202/2/0	276/76/0
R: Polytomous
N	150	201/1/0	271/71/0	200/0/0	246/46/0	201/0/1	288/88/0
300	200/0/0	252/52/0	200/0/0	222/22/0	200/0/0	220/20/0
600	200/0/0	228/28/0	200/0/0	208/8/0	200/0/0	204/4/0

		J: 3	J: 5	J: 10
R: Dichotomous
N	150	.902 (0.016)	.912 (0.014)	.926 (0.022)	.939 (0.009)	.959 (0.016)	.968 (0.005)
300	.903 (0.011)	.913 (0.008)	.932 (0.012)	.941 (0.007)	.960 (0.012)	.969 (0.004)
600	.904 (0.007)	.915 (0.007)	.933 (0.008)	.941 (0.004)	.964 (0.006)	.970 (0.003)
R: Polytomous
N	150	.962 (0.009)	.977 (0.003)	.972 (0.006)	.985 (0.003)	.983 (0.006)	.991 (0.002)
300	.965 (0.006)	.978 (0.002)	.974 (0.004)	.986 (0.001)	.984 (0.003)	.992 (0.001)
600	.967 (0.003)	.978 (0.002)	.975 (0.003)	.986 (0.001)	.986 (0.002)	.992 (0.001)

		J: 3	J: 5	J: 10
R: Dichotomous
N	150	.865 (0.024)	.875 (0.016)	.895 (0.046)	.902 (0.016)	.941 (0.043)	.951 (0.008)
300	.867 (0.017)	.876 (0.012)	.906 (0.026)	.905 (0.010)	.939 (0.034)	.952 (0.006)
600	.867 (0.012)	.878 (0.007)	.906 (0.017)	.906 (0.006)	.949 (0.017)	.952 (0.004)
R: Polytomous
N	150	.961 (0.021)	.967 (0.005)	.975 (0.016)	.980 (0.003)	.984 (0.020)	.990 (0.002)
300	.965 (0.007)	.969 (0.003)	.977 (0.003)	.981 (0.002)	.987 (0.008)	.991 (0.001)
600	.966 (0.004)	.969 (0.002)	.978 (0.003)	.981 (0.001)	.987 (0.003)	.991 (0.001)

	Degree of Freedom	Type III Sum of Squares	$η_{p}^{2}$
N	2	0.011	0.022
J	2	1.619	0.777
D	1	0.183	0.283
R	1	3.427	0.881
N*J	4	0.001	0.001
N*D	2	0.001	0.003
J*D	2	0.005	0.011
N*R	2	0.000	0.000
J*R	2	0.472	0.504
D*R	1	0.001	0.002
NJD	4	0.001	0.001
NJR	4	0.001	0.001
NDR	2	0.000	0.000
JDR	2	0.001	0.001
NJD*R	4	0.001	0.002
Residual	7,164	0.465

	Degree of Freedom	Type III Sum of Squares	$η_{p}^{2}$
N	2	0.014	0.007
J	2	2.885	0.598
D	1	0.054	0.027
R	1	8.957	0.822
N*J	4	0.003	0.002
N*D	2	0.003	0.002
J*D	2	0.005	0.002
N*R	2	0.001	0.000
J*R	2	0.891	0.315
D*R	1	0.003	0.001
NJD	4	0.003	0.001
NJR	4	0.004	0.002
NDR	2	0.000	0.000
JDR	2	0.005	0.003
NJD*R	4	0.003	0.002
Residual	7,164

		J: 5	J: 10
R: Dichotomous
N	150	0.92/0.81/0.92	0.91/0.86/0.93	1.00/1.00/1.00	1.00/1.00/1.00
300	0.99/0.97/0.99	1.00/0.98/1.00	1.00/1.00/1.00	1.00/1.00/1.00
600	1.00/1.00/1.00	1.00/1.00/1.00	1.00/1.00/1.00	1.00/1.00/1.00
R: Polytomous
N	150	1.00/1.00/1.00	1.00/1.00/1.00	1.00/1.00/1.00	1.00/1.00/1.00
300	1.00/1.00/1.00	1.00/1.00/1.00	1.00/1.00/1.00	1.00/1.00/1.00
600	1.00/1.00/1.00	1.00/1.00/1.00	1.00/1.00/1.00	1.00/1.00/1.00

Model	$θ_{X}$	$θ_{M}$	$θ_{Y}$
Model 3a	Unfolding	Dominance	Dominance
Model 3b	Dominance	Unfolding	Dominance
Model 3c	Dominance	Dominance	Unfolding

	N	Effect Size	$β_{1}$	$β_{2}$	$β_{3}$	$β_{4}$
Model 3a	150	Small	0.020	0.017	0.022	0.011
Moderate	0.016	0.018	0.019	0.012
300	Small	0.014	0.012	0.014	0.008
Moderate	0.013	0.012	0.012	0.008
600	Small	0.009	0.009	0.010	0.005
Moderate	0.009	0.009	0.010	0.007
Model 3b	150	Small	0.022	0.020	0.018	0.012
Moderate	0.019	0.019	0.017	0.012
300	Small	0.014	0.013	0.012	0.008
Moderate	0.013	0.012	0.011	0.008
600	Small	0.010	0.010	0.007	0.005
Moderate	0.009	0.009	0.009	0.005
Model 3c	150	Small	0.017	0.018	0.021	0.014
Moderate	0.017	0.019	0.021	0.015
300	Small	0.013	0.015	0.016	0.011
Moderate	0.012	0.013	0.014	0.009
600	Small	0.009	0.010	0.009	0.007
Moderate	0.008	0.009	0.009	0.007

	N	Effect Size	$β_{1}$	$β_{2}$	$β_{3}$	$β_{4}$
Model 3a	150	Small	0%	0%	0.5%	0%
Moderate	100%	100%	100%	100%
300	Small	9%	1%	5.5%	1.5%
Moderate	100%	100%	100%	100%
600	Small	71.5%	61%	74.5%	98%
Moderate	100%	100%	100%	100%
Model 3b	150	Small	0%	1%	0%	0%
Moderate	100%	100%	100%	100%
300	Small	8.5%	5%	3%	1.5%
Moderate	100%	100%	100%	100%
600	Small	74%	76%	58.5%	95%
Moderate	100%	100%	100%	100%
Model 3c	150	Small	0%	0%	0%	0%
Moderate	100%	100%	100%	100%
300	Small	1%	8%	8%	16.5%
Moderate	100%	100%	100%	100%
600	Small	56.5%	79%	75.5%	98.5%
Moderate	100%	100%	100%	100%

Footnotes

ORCID iDs

Ringo Moon-ho Ho

Jie Xin Lim

Author Contributions

Ringo Moon-ho Ho: conceptualization and writing—original draft and editing. Jie Xin Lim: software and writing—review and editing. Olexander Chernyshenko: writing—review and editing.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Notes

Author Biographies

Ringo Moon-ho Ho is an Associate Professor in Psychology at the School of Social Sciences, Nanyang Technological University, Singapore. His research focuses on the development and application of quantitative methods, in particular, multilevel modeling, resampling methods, structural equation modeling, and time series analysis in the behavioral and social sciences.

Jie Xin Lim is a Research Fellow in the School of Social Sciences, Nanyang Technological University, Singapore. His research interests center on statistical computing and aging studies.

Olexander Chernyshenko is an Associate Professor in Organizational Behavior and Human Resource Management in the Division of Strategy, Management & Organization at the Nanyang Business School. His research focuses on personality and work attitudes assessment, P–O fit, measurement invariance, and group decision-making.

Appendix A

Appendix B

References

Andrich

(1988). The application of an unfolding model of the PIRT type to the measurement of attitude. Applied Psychological Measurement, 12(1), 33–51. https://doi.org/10.1177/014662168801200105

Andrich

(1996). A hyperbolic cosine latent trait model for unfolding polytomous responses: Reconciling Thurstone and Likert methodologies. British Journal of Mathematical and Statistical Psychology, 49(2), 347–365. https://doi.org/10.1111/j.2044-8317.1996.tb01093.x

Bandalos

D. L.

Gagné

(2012). Simulation methods in structural equation modeling. In Hoyle

R. H.

(Ed.), Handbook of structural equation modeling (pp. 92–108). The Guilford Press.

Bandalos

D. L.

Leite

(2013). Use of Monte Carlo studies in structural equation modeling research. In Hancock

G. R.

Mueller

R. O.

(Eds.), Structural equation modeling: A second course (2nd ed., pp. 625–666). IAP Information Age Publishing.

Bovaird

J. A.

Koziol

N. A.

(2012). Measurement models for ordered-categorical indicators. In Hoyle

R. H.

(Ed.), Handbook of structural equation modeling (pp. 495–511). The Guilford Press.

Brady

H. E.

(1989). Factor and ideal point analysis for interpersonally incomparable data. Psychometrika, 54(2), 181–202. https://doi.org/10.1007/BF02294514

Butcher

J. N.

Dahlstrom

W. G.

Graham

J. R.

Tellegen

A. M.

Kaemmer

(1989). MMPI-2: Manual for administration and scoring. University of Minnesota Press.

Butcher

J. N.

Graham

J. R.

Ben-Porath

Y. S.

Tellegen

A. M.

Dahlstrom

W. G.

Kaemmer

(2001). MMPI-2: Manual for administration and scoring (Rev. ed.). University of Minnesota Press.

Cao

Drasgow

Cho

(2015). Developing ideal intermediate personality items for the ideal point model. Organizational Research Methods, 18(2), 252–275. https://doi.org/10.1177/1094428114555993

10.

Carter

N. T.

Dalal

D. K.

Boyce

A. S.

O'Connell

M. S.

Kung

M.-C.

Delgado

K. M.

(2014). Uncovering curvilinear relationships between conscientiousness and job performance: How theoretically appropriate measurement makes an empirical difference. Journal of Applied Psychology, 99(4), 564–586. https://doi.org/10.1037/a0034688

11.

Chernyshenko

O. S.

Stark

Drasgow

Roberts

B. W.

(2007). Constructing personality scales under the assumptions of an ideal point response process: Toward increasing the flexibility of personality measures. Psychological Assessment, 19(1), 88–106. https://doi.org/10.1037/1040-3590.19.1.88

12.

Cohen

(1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum.

13.

Coombs

C. H.

(1964). A theory of data. John Wiley & Sons, Inc.

14.

Cortina

J. M.

Sheng

Keener

S. K.

Keeler

K. R.

Grubb

L. K.

Schmitt

Tonidandel

Summerville

K. M.

Heggestad

E. D.

Banks

G. C.

(2020). From alpha to omega and beyond! A look at the past, present, and (possible) future of psychometric soundness in the Journal of Applied Psychology. Journal of Applied Psychology, 105(12), 1351–1381. https://doi.org/10.1037/apl0000815

15.

Costa

P. T.,

Jr. McCrae

R. R.

(1992). Revised NEO personality inventory (NEO PI-R) and NEO five-factory inventory (NEO-FFI) professional inventory. Psychological Assessment Resources.

16.

Davison

M. L.

(1977). On a metric, unidimensional unfolding model for attitudinal and developmental data. Psychometrika, 42(4), 523–548. https://doi.org/10.1007/BF02295977

17.

DeNunzio

M. M.

Smith

R. W.

Naidoo

L. J.

(2024). The development and validation of an ideal point measure of work engagement. Journal of Business and Psychology, 39(2), 345–368. https://doi.org/10.1007/s10869-023-09901-y

18.

Diener

Emmons

R. A.

Larsen

R. J.

Griffin

(1985). The Satisfaction With Life Scale. Journal of Personality Assessment, 49(1), 71–75. https://doi.org/10.1207/s15327752jpa4901_13

19.

Drasgow

Chernyshenko Oleksandr

Stark

(2010). 75 years after Likert: Thurstone was right!. Industrial and Organizational Psychology, 3(4), 465–476. https://doi.org/10.1111/j.1754-9434.2010.01273.x

20.

Edwards

J. R.

Lambert

L. S.

(2007). Methods for integrating moderation and mediation: A general analytical framework using moderated path analysis. Psychological Methods, 12(1), 1–22. https://doi.org/10.1037/1082-989X.12.1.1

21.

Forero

C. G.

Maydeu-Olivares

(2009). Estimation of IRT graded response models: Limited versus full information methods. Psychological Methods, 14(3), 275–299. https://doi.org/10.1037/a0015825

22.

Foster

G. C.

Min

Zickar

M. J.

(2017). Review of item response theory practices in organizational research: Lessons learned and paths forward. Organizational Research Methods, 20(3), 465–486. https://doi.org/10.1177/1094428116689708

23.

Hughes

M. E.

Waite

L. J.

Hawkley

L. C.

Cacioppo

J. T.

(2004). A short scale for measuring loneliness in large surveys: Results from two population-based studies. Research on Aging, 26(6), 655–672. https://doi.org/10.1177/0164027504268574

24.

Kelley

Preacher

K. J.

(2012). On effect size. Psychological Methods, 17(2), 137–152. https://doi.org/10.1037/a0028086

25.

Kenny

D. A.

Judd

C. M.

(1984). Estimating the nonlinear and interactive effects of latent variables. Psychological Bulletin, 96(1), 201–210. https://doi.org/10.1037/0033-2909.96.1.201

26.

Klein

Moosbrugger

(2000). Maximum likelihood estimation of latent interaction effects with the LMS method. Psychometrika, 65(4), 457–474. https://doi.org/10.1007/BF02296338

27.

Krosnick

J. A.

Judd

C. M.

Wittenbrink

(2005). The measurement of attitudes. In Alberracin

Johnson

B. T.

Zanna

M. P.

(Eds.), The handbook of attitudes (pp. 21–76). Lawrence Erlbaum Associates.

28.

Likert

(1932). A technique for the measurement of attitudes. Archives of Psychology, 22(140), 1–55.

29.

I. R. R.

Thomas

D. R.

Zumbo

B. D.

(2005). Embedding IRT in structural equation models: A comparison with regression based on IRT scores. Structural Equation Modeling: A Multidisciplinary Journal, 12(2), 263–277. https://doi.org/10.1207/s15328007sem1202_5

30.

Macdonald

MacIntyre

(1997). The generic job satisfaction scale: Scale development and its correlates. Employee Assistance Quarterly, 13(2), 1–16. https://doi.org/10.1300/J022v13n02_01

31.

Maraun

M. D.

Rossi

N. T.

(2001). The extra-factor phenomenon revisited: Unidimensional unfolding as quadratic factor analysis. Applied Psychological Measurement, 25(1), 77–87. https://doi.org/10.1177/01466216010251006

32.

Marsh

H. W.

Wen

Hau

K. T.

(2004). Structural equation models of latent interactions: Evaluation of alternative estimation strategies and indicator construction. Psychological Methods, 9(3), 275–300. https://doi.org/10.1037/1082-989x.9.3.275

33.

Muraki

(1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16(2), 159–176. https://doi.org/10.1177/014662169201600206

34.

Muthén

(1983). Latent variable structural equation modeling with categorical data. Journal of Econometrics, 22(1), 43–65. https://doi.org/10.1016/0304-4076(83)90093-3

35.

Muthén

(1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika, 49(1), 115–132. https://doi.org/10.1007/BF02294210

36.

Muthén

Asparouhov

(2002). Latent variable analysis with categorical outcomes: Multiple-group and growth modeling in Mplus. http://www.statmodel.com/mplus/examples/webnote.html

37.

Muthén

Kaplan

(1985). A comparison of some methodologies for the factor analysis of non-normal Likert variables. British Journal of Mathematical and Statistical Psychology, 38(2), 171–189. https://doi.org/10.1111/j.2044-8317.1985.tb00832.x

38.

Muthén

L. K.

(2010). Model fit diagnostics and Mplus parameter arrays . Retrieved October 14 from http://www.statmodel.com/discussion/messages/11/233.html?1424353153

39.

Muthén

L. K.

Muthén

(1998-2017). Mplus user’s guide (8th ed.). Muthén & Muthén.

40.

O’Brien

LaHuis

D. M.

(2011). Do applicants and incumbents respond to personality items similarly? A comparison of dominance and ideal point response models. International Journal of Selection and Assessment, 19(2), 109–118. https://doi.org/10.1111/j.1468-2389.2011.00539.x

41.

Polak

(2011). Item analysis of single-peaked response data: The psychometric evaluation of bipolar measurement scales. Optima.

42.

Polak

Heiser

W. J.

de Rooij

(2009). Two types of single-peaked data: Correspondence analysis as an alternative to principal component analysis. Computational Statistics & Data Analysis, 53(8), 3117–3128. https://doi.org/10.1016/j.csda.2008.09.010

43.

Reise

S. P.

(2010). The role of measurement models in psychometric assessment. Journal of Personality Assessment, 92(6), 481–485.

44.

Roberts

J. S.

(1995). Item response theory approaches to attitude measurement. [Unpublished doctoral dissertation]. University of South Carolina.

45.

Roberts

J. S.

(2004). GGUM2004 technical reference manual. http://www.camri.uqam.ca/

46.

Roberts

J. S.

Donoghue

J. R.

Laughlin

J. E.

(2000). A general item response theory model for unfolding unidimensional polytomous responses. Applied Psychological Measurement, 24(1), 3–32. https://doi.org/10.1177/01466216000241001

47.

Roberts

J. S.

Fang

H.-r.

Cui

Wang

(2006). GGUM2004: A Windows-based program to estimate parameters in the generalized graded unfolding model. Applied Psychological Measurement, 30(1), 64–65. https://doi.org/10.1177/0146621605280141

48.

Roberts

J. S.

Laughlin

J. E.

(1996). A unidimensional item response model for unfolding responses from a graded disagree–agree response scale. Applied Psychological Measurement, 20(3), 231–255. https://doi.org/10.1177/014662169602000305

49.

Roberts

J. S.

Laughlin

J. E.

Wedell

D. H.

(1999). Validity issues in the Likert and Thurstone approaches to attitude measurement. Educational and Psychological Measurement, 59(2), 211–233. https://doi.org/10.1177/00131649921969811

50.

Stark

Chernyshenko

O. S.

Drasgow

Williams

B. A.

(2006). Examining assumptions about item responding in personality assessment: Should ideal point methods be considered for scale development and scoring? Journal of Applied Psychology, 91(1), 25–39. https://doi.org/10.1037/0021-9010.91.1.25

51.

Tay

(2011). The psychometric principles of affect: Are they ideal? [Unpublished doctoral dissertation].

52.

Tay

Drasgow

(2012). Theoretical, statistical, and substantive issues in the assessment of construct dimensionality. Organizational Research Methods, 15(3), 363–384. https://doi.org/10.1177/1094428112439709

53.

Tay

Drasgow

Rounds

Williams

B. A.

(2009). Fitting measurement models to vocational interest data: Are dominance models ideal? Journal of Applied Psychology, 94(5), 1287–1304. https://doi.org/10.1037/a0015899

54.

Tay

(2018) Ideal point modeling of non-cognitive constructs: Review and recommendations for research. Frontiers in Psychology, 9, 2423. https://doi.org/10.3389/fpsyg.2018.02423

55.

Thurstone

L. L.

(1928). Attitudes can be measured. American Journal of Sociology, 33(4), 529–554. https://doi.org/10.1086/214483

56.

Thurstone

L. L.

(1932). Motion pictures and attitudes of children. University of Chicago Press.

57.

Thurstone

L. L.

Chave

E. J.

(1929). The measurement of attitude: A psychophysical method and some experiments with a scale for measuring attitude toward the church. University of Chicago Press.

58.

Umbach

Naumann

Brandt

Kelava

(2017). Fitting nonlinear structural equation models in R with package nlsem. Journal of Statistical Software, 77(7), 1–20. https://doi.org/10.18637/jss.v077.i07

59.

Usami

(2011). Generalized graded unfolding model with structural equation for subject parameters. Japanese Psychological Research, 53(3), 221–232. https://doi.org/10.1111/j.1468-5884.2011.00476.x

60.

van Schuur

W. H.

Kiers

H. A. L.

(1994). Why factor analysis often is the incorrect model for analyzing bipolar concepts, and what model to use instead. Applied Psychological Measurement, 18(2), 97–110. https://doi.org/10.1177/014662169401800201

61.

Wang

W.-C.

S.-L.

(2015). Confirmatory multidimensional IRT unfolding models for graded-response items. Applied Psychological Measurement, 40(1), 56–72. https://doi.org/10.1177/0146621615602855

62.

Weekers

A. M.

Meijer

R. R.

(2008). Scaling response processes on personality items using unfolding and dominance models: An illustration with a Dutch dominance and unfolding personality inventory. European Journal of Psychological Assessment, 24(1), 65–77. https://doi.org/10.1027/1015-5759.24.1.65

63.

West

S. G.

Taylor

A. B.

(2012). Model fit and model selection in structural equation modeling. In Hoyle

R. H.

(Ed.), Handbook of structural equation modeling (pp. 209–231). The Guilford Press.

64.

Williams

(2003). Recent advances in causal modeling methods for organizational and management research. Journal of Management, 29(6), 903–936. https://doi.org/10.1016/S0149-2063(03)00084-9

65.

Zyphur

M. J.

Bonner

C. V.

Tay

(2023). Structural equation modeling in organizational research: The state of our science and some proposals for its future. Annual Review of Organizational Psychology and Organizational Behavior, 10(1), 495–517. https://doi.org/10.1146/orgpsych.2023.10.issue-1

		J: 3		J: 5		J: 10
		D: ±1.5	D: ±2.5	D: ±1.5	D: ±2.5	D: ±1.5	D: ±2.5
R: Dichotomous
N	150	377/177/0	420/217/3	232/31/1	321/121/0	231/31/0	390/190/0
	300	349/149/0	366/166/0	210/10/0	266/66/0	205/5/0	275/75/0
	600	373/173/0	312/112/0	202/2/0	218/18/0	202/2/0	276/76/0
R: Polytomous
N	150	201/1/0	271/71/0	200/0/0	246/46/0	201/0/1	288/88/0
	300	200/0/0	252/52/0	200/0/0	222/22/0	200/0/0	220/20/0
	600	200/0/0	228/28/0	200/0/0	208/8/0	200/0/0	204/4/0

Modeling Unfolding Response Data Within the Structural Equation Modeling Framework

Abstract

Keywords

Unfolding (Ideal-Point) Models

Ordered Categorical Response Data

Formulation of the OCRUM

LMS Method

Study 1

GGUM

Method

Empirical Datasets

Dataset 1

Dataset 2

Procedure

Analysis

Results

Item Location ( δ j )

Person Location ( θ i )

Summary of Study 1

Study 2

Method

Simulation Design

Data Generation

Analysis

Bias in Parameter Estimation

Recovery of Person Location ( θ )

Model–Data Fit

Procedure

Results

Successful Replications

Bias in Parameter Estimation

Recovery of Person Location ( θ )

Model Comparison

Summary of Study 2

Study 3

Method

Simulation Design

Data Generation

Results

Bias in Parameter Estimation

Statistical Power

Summary of Study 3

Discussion

What is the Nature of the Construct That Makes a Dominance vs. Unfolding Model Appropriate?

Limitations and Future Directions

Footnotes

ORCID iDs

Author Contributions

Funding

Declaration of Conflicting Interests

Notes

Author Biographies

Appendix A

Appendix B

References

Item Location $(δ_{j})$

Person Location $(θ_{i})$

Recovery of Person Location ( $θ$ )

Recovery of Person Location ( $θ$ )