Abstract
Previous research has shown that ignoring individual differences of factor loadings in conventional factor models may reduce the determinacy of factor score predictors. Therefore, the aim of the present study is to propose a heterogeneous regression factor score predictor (HRFS) with larger determinacy than the conventional regression factor score predictor (RFS) when individuals have different factor loadings. First, a method for the estimation of individual loadings is proposed. The individual loading estimates are used to compute the HRFS. Then, a binomial test for loading heterogeneity of a factor is proposed to compute the HRFS only when the test is significant. Otherwise, the conventional RFS should be used. A simulation study reveals that the HRFS has larger determinacy than the conventional RFS in populations with substantial loading heterogeneity. An empirical example based on subsamples drawn randomly from a large sample of Big Five Markers indicates that the determinacy can be improved for the factor emotional stability when the HRFS is computed.
Introduction
The conventional factor model consists of factor scores and factor loadings. It has been proposed to use the conventional factor model on different types of data (Cattell, 1952). The most prominent type of factor analysis is R-factor analysis, where factor analysis is performed for the covariances of variables measured for many individuals at one measurement occasion. In the following, we refer to R-factor analysis, although the methods presented here can also be adapted for other types of factor analysis. The R-factor model assumes that factor scores can differ between individuals, whereas it is implied that the loadings of the measured variables on the common factors are constant for all individuals. However, there are good reasons to expect that, at least under some circumstances, individuals may have different factor loadings. Arguments for considering inter-individual loading heterogeneity have been presented from developmental psychology, behavioral genetics, and P-factor analysis. P-factor analysis of the covariance of variables measured on a large number of measurement occasions for one individual has been considered in order to represent the idiographic aspects of data (Molenaar, 2004; Molenaar et al., 2003). However, in the present study, we consider loading heterogeneity in the context of R-factor analysis. The meaning of individual factor loadings can be illustrated by an example from the concept of intelligence. In this context, factor scores represent one’s intellectual capacity, whereas an individual’s factor loading on a variable describes to what extent the individual makes use of intelligence for a given task. Individuals with higher loadings of the task of an intelligence factor may rely more heavily on their intelligence, whereas individuals with smaller loadings may recruit other traits to solve the task and utilize their intelligence to a smaller extent.
The assumption that an observed variable has the same factor loading for every individual of a sample is probably an over-simplification in several areas of research. It has been shown that models describing individual differences that are based on parameters of the total sample can, in general, not directly be applied to a single individual (Molenaar & Campbell, 2009). This over-simplification also concerns the factor model because it is typically based on the analysis of a covariance matrix of observed variables. If there are different covariances of observed variables for different individuals or for different subsamples, this cannot be detected by factor analysis of a single covariance matrix of the total sample. In line with this, Kelderman and Molenaar (2007) showed that under the assumption that normally distributed heterogenous loadings are independent of each other, the population covariance matrix of a model based on heterogenous loadings is the same as the population covariance matrix of a model based on the same loadings for all individuals. When the resulting covariance matrices of observed variables are the same for models with and without heterogeneous loadings between individuals, the covariance matrices do not provide a basis for estimating heterogeneous factor loadings. Moreover, the factor scores representing the latent or “true” individual differences of the construct are indeterminate (Guttman, 1955; Nicewander, 2020) so that multiplying individual loadings with the indeterminate individual factor scores will necessarily give indeterminate results. In other words, the number of parameters of the factor model is already larger than the number of data points. Therefore, researchers rely on factor score predictors as a proxy for individual factor scores. The validity of such factor score predictors is given by their determinacy, that is, their correlation with the corresponding factor (Grice, 2001). However, given indeterminate factor scores, increasing the number of parameters by introducing individual differences on factor loadings will further increase indeterminacy. For example, one could conceive that for each individuum, a different rotation of factor loadings is possible. Moreover, Kelderman and Molenaar (2007) found that the standard likelihood-ratio goodness-of-fit statistic has little power in detecting loading heterogeneity. Therefore, considering loading heterogeneity is a challenge in the context of the exploratory factor model.
Ansari et al. (2002) point out that ignoring unobserved heterogeneity can lead to biased parameter estimates. They developed Markov Chain Monte Carlo procedures to perform Bayesian inference for confirmatory factor models with mean and covariance heterogeneity. Although this promising approach works in the context of confirmatory factor analysis, it is still relevant to consider loading heterogeneity in the context of exploratory factor analysis. As mentioned before, indeterminacy of the factor scores implies that any product of individual loadings with indeterminate factor scores will also be indeterminate. This also implies that, when a factor score predictor is specified, the product of individual loadings with the individual factor score predictor will be identified. Accordingly, Molenaar et al. (2003) investigated the effect of loading heterogeneity on the validity or determinacy of factor score predictors. They found that the determinacy coefficient reduces considerably when heterogenous loadings occur in the population model but are not specified in the factor model (Kelderman & Molenaar, 2007; Molenaar et al., 2003). Kelderman and Molenaar (2007) found that loading heterogeneity affects the distribution of the observed variables. They recommend that non-normality of the distributions of observed variables should be tested by means of the Shapiro–Wilk Test to use factor score predictors with more confidence. However, non-normality of data can have several reasons, so that a significant Shapiro–Wilk Test is not a specific indicator of loading heterogeneity.
Accordingly, a subsequent research question is whether a more specific test of loading heterogeneity is possible and whether the factor score predictors can be adapted to cases where heterogeneous loadings can be expected. The reduction of the factor score determinacy due to loading heterogeneity was more substantial for Bartlett’s (1937) factor score predictor than for Thurstone’s (1935) regression factor score predictor (RFS; Molenaar et al., 2003). Moreover, for homogeneous factor loadings, the determinacy of the RFS is larger than the determinacy of the Bartlett factor score predictor (Krijnen et al., 1996), so that the RFS is considered in the following. Is it possible to minimize the loss of factor score determinacy of the RFS that cooccurs with heterogenous loadings? The present study provides a tentative solution for this problem. The paper comprises the following sections: (a) some definitions are given, (b) a method for estimating heterogeneous loadings of orthogonal factors is proposed, (c) on this basis, a heterogenous regression factor score predictor (HRFS) is proposed that may allow to minimize the loss of factor score determinacy of the RFS, (d) a test for loading heterogeneity is proposed, (e) some ideas how to consider loading heterogeneity in oblique factor models are proposed, (f) a simulation study is performed to compare the loss of determinacy of RFS and HRFS in models with heterogeneous factor loadings, and (g) the determinacies of RFS and HRFS are compared in an empirical dataset based on the Five Factor model. Finally, some limitations of the present study and some prospects for the use of HRFS are discussed.
Definitions
In a population of individuals, the common factor model (Mulaik, 2010) can be defined as:
where
In the following, the correlation matrix of observed variables is considered, so that
The correlation of the RFS with the original factor, that is, the determinacy coefficient
For each observed variable i on factor j,
For
For
Estimation of Individual Factor Loadings
The proposed estimation procedure is described for a sample of n individuals and—in the first step—for the orthogonal factor model. Orthogonal factor analysis of q factors and p variables for the total sample yields:
Model misfit occurs in the sample so that the sample covariance matrix differs from the population covariances,
Similar to Cook’s (1977) ideas on the influence of a single individual on linear regression results, the determinacy of the RFS
The effect of the kth individuum on the determinacies of the factor score predictors is:
Positive values of
Although
where “sgn” is the sign function. The multiplication with the sign of the loadings maintains their sign after being squared. As
In a stepwise procedure,
where “SSQ” denotes the sum of squares. Heywood cases may occur more often for
The Heterogeneity-Based RFS
Making use of the estimated individual loadings, the heterogeneity-based regression factor score predictor (HRFS) is then computed for each individual as
The estimated correlation of
where “
Estimation of Loading Heterogeneity
Loading heterogeneity can be assessed when, in a sample of n individuals, n factor analyses with the data minus one individuum k are performed. For each loading on each factor, the inter-individual standard deviation
In the simulated population,
To obtain an indicator for the loading heterogeneity of a factor, the number of variables with loading heterogeneity greater than sampling error is counted for the respective factor. The following index is one if loading heterogeneity of variable i occurs, that is, if
The number
Cut-Off Values
For p ≤ 3, it is necessary to use a cut-off value of
On Loading Heterogeneity in Oblique Factor Models
The present approach follows Cook’s (1977) idea to investigate the effect of data elimination for the kth individual. In the correlated factor model, this may not only affect the oblique factor loadings
It is, nevertheless, possible to estimate the heterogeneity of factor inter-correlations without applying rotation methods directly to individual loading matrices. The first step is to perform first-order factor analysis with oblique rotation, that is,
As Schmid–Leiman factors are orthogonalized oblique factors, the second step is to estimate the individual factor loadings
so that
According to the condition given in Equations 24 and 25, the HRFS for oblique factors are then computed by:
It is, however, possible that
Therefore, a method for the estimation of oblique individual factor loadings
Although the focus of the present study is on the HRFS for orthogonal factors, a few examples will be provided for the HRFS as computed from Equation 23 for oblique rotated factors. It is, however, acknowledged that the issue of heterogeneous loadings for oblique factor models deserves a more complete investigation in future research.
Simulation Study
Conditions and Specification
A simulation study was performed to investigate whether the HRFS based on orthogonal factors has a larger validity, that is, determinacy coefficient than the conventional RFS when loading heterogeneity occurs. In an empirical study, the determinacy coefficient
and
A continuous proportion of
Independent variables of the simulation study were the loading heterogeneity with
For each sample of each condition, n × nd factor analyses for the computation of
As mentioned above, the effect of factor rotation on results for q = 3 was minimized by means of orthogonal target rotation according to Schoenemann (1966) towards the target matrix of salient loadings in the total samples. Moreover,
Results
The effect of loading heterogeneity on determinacy estimates was investigated by means of a repeated-measures ANOVA comprising parameter-based versus factor score-based determinacy coefficients (PAR-SCO) and the RFS-based versus HRFS-based determinacy coefficients (RFS-HRFS) as within-subject factors and the conditions, q, p, µ(

Mean determinacy coefficients based on model parameters and RFS (

Mean determinacy coefficients based on model parameters and RFS (
For σ(
To investigate whether these results could be replicated when the initial total-sample solution was based on Varimax-rotated factors, we performed the analysis for q = 3 and p/q = 6, for n = 150 and n = 600. The results for Varimax-rotated solutions (see Figure 3) were similar to the results for the corresponding target-rotated solutions (see Figure 2).

Mean determinacy coefficients based on model parameters and RFS (
As the parameter-based determinacies may be used as estimates for the score-based determinacies in empirical settings (when score-based determinacies cannot be computed), the effect of loading heterogeneity on parameter-based and score-based determinacies was compared. To compare the effect of loading heterogeneity on parameter-based and score-based determinacies, the effect of σ(

The means of
Overall, the means of
A short simulation to compare the determinacy of HRFS/RFS and RFS based on oblique population factor models was performed for q = 3 factors with a factor inter-correlation of ϕ = .30,

Mean determinacy coefficients based on model parameters, RFS (
Empirical Example
The empirical example dataset was based on answers to the 50 IPIP Big Five Factor Markers (Goldberg, 1992), updated at 11/08/2018 and retrieved at 13/11/2024 from https://openpsychometrics.org/_rawdata/. The five factors are Extraversion (E), Agreeableness (A), Conscientiousness (C), Emotional Stability (ES), and Intellect/Imagination (I). Each factor was measured by means of 10 items with five response categories, the direction of item scoring was altered. The dataset was collected from 2016 to 2018 through an interactive online personality test and contained 1,015,342 cases. However, it is recommended in the codebook to use only cases with a single user IP. Accordingly, only 696,854 cases with a single user IP were used. No demographic information was available. One hundred random subsamples with n = 150, n = 600, and n = 1,000 cases were drawn with replacement from the total sample. In the first step, principal axis factor analysis with q = 5 and subsequent orthogonal target-rotation towards the intended five-factor loading pattern was performed as a basis for the computation of
The results for the factors based on orthogonal target-rotation in the total sample indicated that

Mean parameter-based determinacy of RFS

Mean parameter-based determinacy of RFS
Hence, the effect of loading heterogeneity on determinacy and the possible advantage of HRFS/RFS over RFS did not only occur for target-rotated factors and was most pronounced for Oblimin-rotated factors. Note that the effect of oblique rotation on the difference of
Discussion
Factor score predictors are supposed to estimate the latent value of individuals on the factor of interest. In both research and practical applications, it is important to use scores with maximum validity. The validity of factor score predictors is represented by their determinacy coefficients, that is, their correlation with the underlying factor. Molenaar et al. (2003) found that the determinacy coefficient of factor score predictors was reduced when heterogenous loadings which occur in the population model were not specified in the factor model. Therefore, the present study investigated whether it is possible to improve the determinacy, and thus, the validity of the RFS by specifying heterogeneous loadings between individuals. We propose a method for the estimation of individual factor loadings and for the computation of HRFS. Moreover, we propose a binomial test to ascertain whether heterogeneous loadings are present. We suggest a two-step procedure: First, the binomial test should be conducted. If the binomial test for loading heterogeneity is significant, HRFS should be computed in the second step. Otherwise, RFS should be preferred.
To avoid the effects of rotational indeterminacy on heterogeneous loading estimates, the estimation of heterogeneous loadings was based on orthogonal factors. However, an extension of the estimation of heterogeneous loadings to correlated factor models is also possible. Two extensions are presented: One extension includes heterogeneous factor inter-correlations into the model, and the other extension starts from factor inter-correlations that are fixed across all individuals for the given sample. The latter extension was shortly investigated in the present study because it avoids the complexities of an interpretation of heterogeneous factor inter-correlations.
The conditional computation of HRFS/RFS was compared to RFS by means of a simulation study based on population models with one and three factors and different degrees of loading heterogeneity. Population loading heterogeneity was quantified by the inter-individual standard deviation of loadings (σ(
Score-based determinacies and parameter-based determinacies were similar when there was no population loading heterogeneity (σ(
The empirical example based on random subsamples drawn from a large online sample of Big Five Factor Markers (Goldberg, 1992) revealed that for moderate sample sizes (n = 600) and for large sample sizes (n = 1,000), the parameter-based determinacy of the factor ES was larger for HRFS/RFS than for RFS. This result was found for target-rotated factors, Varimax-rotated, and for the oblique extension of HRFS based on Oblimin-rotated factors. As the result was only found for moderate to large samples, it is unlikely that it is due to sampling error. For Oblimin-rotated factors and moderate to large samples, the parameter-based determinacy was larger for HRFS/RFS than for RFS also for the factors C and I. This indicates that considering the obliqueness of the factors may enhance the positive effect of heterogeneous loadings on determinacy.
The results of the empirical example indicate that the computation of HRFS/RFS is also feasible for five-factor models. An R-script for the computation of HRFS/RFS and RFS, the result of the binomial test of loading heterogeneity, and the determinacies for a single empirical dataset with one to five Varimax-rotated, orthogonal target-rotated, or Oblimin-rotated factors are given in Supplemental Material C.
As the determinacy of HRFS/RFS was larger than the determinacy of RFS when loading heterogeneity occurred in the population, the HRFS may be included into structural equation models (SEMs) when loading heterogeneity can be assumed according to the binomial test for the measurement models. However, maximal determinacy does not necessarily result in a minimal bias of structural parameters. For example, the RFS has maximal determinacy and considerable bias (Skrondal & Laake, 2001). Of course, minimal bias is also a desired property, especially when factor scores are used in SEM (Croon, 2002). If, however, maximal determinacy of the factor scores is intended, and if the binomial test of loading heterogeneity is significant the HRFS may be included into SEM, although the issue of minimal bias for HRFS should be investigated in further studies. If the HRFS is used in SEM, it is recommended to compute HRFS from loading heterogeneity in measurement models estimated before the structural part of the model, as in the structural after measurement approach (Rosseel & Loh, 2024). Beyond the reasons for separate estimation of measurement models given in Rosseel and Loh (2024), it should be noted that the estimation of loading heterogeneity in separate measurement models minimizes the effects of capitalization of chance. As individual factor loadings directly affect individual factor scores, a least squares prediction between two factors may result in biased individual loadings allowing for a larger prediction.
The individual factor loadings estimated as a basis for HRFS may also be interesting from a theoretical perspective as they may indicate the individual relevance of a factor for the response to an item. In trait activation theory, situational cues have been emphasized as a determinant of trait activation (Tett et al., 2021). From this perspective, the individual loading of an item could be an indicator for the individual trait-relevance of the situational content of a questionnaire item. Interestingly, ES appeared especially sensitive to situational moderation in Tett et al. (2021) and it had the largest increase of determinacy when computing HRFS/RFS instead of RFS. This perspective might be explored in future studies, where the individual loadings may be predicted by external variables or may be used as predictors for behavioral outcomes. Of course, if individual loadings are to be used in this way, it would be necessary to estimate their reliability (e.g., by means of test-retest correlations). As a limitation, it should be noted that the focus of the present study was on orthogonal factors. The reason for this focus was that, first, a procedure is applicable to orthogonal factors was to be found before an extension to oblique factors. Two ways to extend the procedure to oblique factors were outlined, and an extension based on constant factor inter-correlations was tentatively evaluated. The results—especially for the empirical example—are promising but they are preliminary as there was no room for a larger simulation study on correlated factors. Moreover, it might be of interest to use the individual loading estimates based on the binomial test for loading heterogeneity for the computation of HRFS combined with Croon’s (2002) bias correction, Bartlett factor scores, for correlation-preserving factor scores (Beauducel et al., 2024). A further limitation of the present study is that only Varimax-, Oblimin-, and orthogonal target rotation were investigated, which calls for an extensive investigation of loading heterogeneity for different methods of factor rotation. Finally, measurement invariance of factor scores is also an issue (Lai & Tse, 2024). Obviously, the HRFS is devoted to conditions where measurement invariance does not occur across individuals. However, loading heterogeneity at the level of the individual does not preclude that the mean loadings are identical for different groups of individuals. Accordingly, a comparison of measurement invariance of RFS and HRFS across groups could be an interesting aim for future research.
Conclusion
The present research revealed that the negative effect of loading heterogeneity on the determinacy (validity) of RFS can be reduced when individual loadings are accounted for by computing HRFS. We propose a two-step procedure: First, a binomial test for loading heterogeneity is proposed. If the binomial test for loading heterogeneity is significant, we recommend the computation of HRFS. In our simulation study, the resulting conditional HRFS/RFS computation yielded an improvement of the factor score determinacy over the determinacy of the RFS in population models based on loading heterogeneity. An empirical example based on subsamples drawn randomly from a large Big Five Marker data set revealed that the ES factor may have substantial loading heterogeneity and that the determinacy may be improved by the conditional computation of HRFS/RFS.
Supplemental Material
sj-docx-1-epm-10.1177_00131644251347530 – Supplemental material for How to Improve the Regression Factor Score Predictor When Individuals Have Different Factor Loadings
Supplemental material, sj-docx-1-epm-10.1177_00131644251347530 for How to Improve the Regression Factor Score Predictor When Individuals Have Different Factor Loadings by André Beauducel, Norbert Hilger and Anneke C. Weide in Educational and Psychological Measurement
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was funded by the German Research Foundation (DFG), BE 2443/18-1.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
