Sage Journals: Discover world-class research

Abstract

Large-scale assessments are generally designed for summative purposes to compare achievement among participating countries. However, these nondiagnostic assessments have also been adapted in the context of cognitive diagnostic assessment for diagnostic purposes. Following the large amount of investments in these assessments, it would be cost-effective to draw finer-grained inferences about the attribute mastery. Nonetheless, the correctness of attribute specifications in the Q-matrix has not been verified, despite being designed by domain experts. Furthermore, the underlying process of TIMSS (Trends in International Mathematics and Science Study) assessment is unknown as it was not developed for diagnostic purposes. Thus, this study suggests an initial validating attribute specifications in the Q-matrix and thereafter defining specific reduced or saturated models for each item. In doing so, the two analyses were validated across 20 countries that were selected randomly for TIMSS 2011 data. Results show that attribute specifications can differ from expert opinions and the underlying model for each item can vary.

Keywords

TIMSS cognitive diagnosis Q-matrix validation fit measures large scale assessment

Introduction

A recent popular psychometric model, called cognitive diagnosis model (CDM), in contrast to classical test theory (CTT) and item response theory (IRT), aims to mainly investigate a specific finer-grained set of multiple skills within a domain of interest. These predefined skills are used to classify examinees based on whether they have mastered or not. This is a critical point where CDMs differ from the other two commonly used unidimensional test theories. The CTT and IRT usually locate and assess examinees on an ability continuum by a single overall test score. Instead of reporting a single overall score, CDMs provide pedagogical information by which students’ strengths and weaknesses regarding the acquisitions of specific skills in a domain can be identified. Therefore, CDMs serve important purposes, such as offering a more precise tool to diagnose academic needs and creating a different perspective to design a better learning environment.

Till date, several models and methodological developments have been introduced in the context of cognitively diagnostic assessments (CDAs). In terms of model developments, two types of CDMs have been proposed, classified as reduced and general models. Specifically, the deterministic inputs, noisy “and” gate (DINA; Haertel, 1989; Junker & Sijtsma, 2001) model, the deterministic input, noisy “or” gate (DINO; Templin & Henson, 2006) model, compensatory and reduced reparameterized unified model (C-RUM and R-RUM; Hartz & Roussos, 2008), the additive CDM (A-CDM; de la Torre, 2011), and the linear logistic model (LLM; de la Torre & Douglas, 2004) are examples of reduced models. The general diagnosis model (GDM; von Davier, 2005), the log-linear diagnostic classification model (Henson, Templin, & Willse, 2009), and the generalized DINA model (G-DINA; de la Torre, 2011) are examples of generalized models. Although developing new models with peculiar assumptions, researchers have encountered new theoretical and practical issues. These concerns have led to considerable advances in methodological developments in various studies, such as the model-data fit (Chen & de la Torre, 2014; Chen, de la Torre, & Zhang, 2013; Cui & Leighton, 2009; Sen & Bradshaw, 2017), inferential item-fit evaluation (Sorrel, de la Torre, Abad, & Olea, 2017), person fit evaluation (Liu, Douglas, & Henson, 2009), Q-matrix validation (Chiu, 2013; DeCarlo, 2011; de la Torre, 2008; de la Torre & Chiu, 2016; Liu, Xu, & Ying, 2012; Terzi, 2017; Terzi & de la Torre, 2018), empirical applications of CDMs (Akbay, Terzi, Kaplan, & Karaaslan, 2018; Dogan & Tatsuoka, 2008; Lee, Park, & Taylan, 2011; Tjoe & de la Torre, 2014), and computerized adaptive testing in CDM (Cheng, 2009; Hsu, Wand, & Chen, 2013; Kaplan, de la Torre, & Barrada, 2014).

Generally, international large-scale exams (e.g., TIMSS; Trends in International Mathematics and Science Study) have been analyzed with IRT models, which provide a single total score for each examinee. With recent advancements in CDAs, however, there has been a trend toward providing more elaborate results on testing practices. A number of CDMs have been developed to obtain more detailed test results (Rupp, Templin, & Henson, 2010). The shift from single score reporting practices to CDM approaches has also been applied to TIMSS data in several studies (Birenbaum, Tatsuoka, & Yamada, 2004; Choi, Lee, & Park, 2015; Dogan & Tatsuoka, 2008; Im & Park, 2010; Lee et al., 2013; Lee et al., 2011; Liu, Huggins-Manley, & Bulut, 2018; Sen & Arıcan, 2015; Toker & Green, 2012). The applied diagnostic models have varied on these studies. These applications of CDMs to TIMSS data can be considered examples of retrofitting of CDMs to large-scale assessments, which have been generally developed and analyzed with IRT or CTT. For example, the rule space method has been used in several studies, including Im and Park (2010), Toker and Green (2012), Dogan and Tatsuoka (2008), and Birenbaum et al. (2004). Another example is that the TIMSS data have been analyzed using one of the commonly used reduced models, the DINA model, as highlighted by Lee et al. (2011), Lee et al. (2013), Choi et al. (2015), and Sen and Arıcan (2015). While carrying out these types of relevant analyses, CDMs typically assume that the test was developed based on specific attributes and a Q-matrix (Tatsuoka, 1983), which relates test items to particular attributes. For instance, Lee et al. (2011) focused on the DINA model with two purposes in view—to identify item characteristics and to investigate the mastery of attributes. Two main limitations of Lee et al. (2011)’s study were solely relying on the domain experts for attribute specifications without validating the correctness of attribute specifications and assuming the DINA model as the underlying correct model without evaluating model-data fit.

Moreover, the underlying process of TIMSS assessment for each item is unknown because it was not developed for diagnostic purposes. Nonetheless, nondiagnostic assessments can still be adapted for diagnostic purposes (Chen & de la Torre, 2014). As such large-scale assessments are not designed to obtain diagnostic information given the intensive efforts required, retrofitting multidimensional CDMs to these assessments can provide a way of obtaining the benefits of CDMs based on the current promises (Liu et al., 2018). Given the opportunity, large-scale assessments (TIMSS and PISA) have been adapted in the context of CDA. Considering the large amount of investments in these assessments, it would be cost-effective to draw finer-grained inferences about what attributes students have or have not mastered (Chen & de la Torre, 2014). Thus, there is need to emphasize the importance of doing CDA analyses using such large-scale data sets.

In particular, Chen et al. (2013) proposed a systematic procedure to adapt large-scale assessments in the context of CDM using the following steps: constructing initial and final attributes, and Q-matrix; evaluating reduced CDMs; and cross-validating the selected models. Chen et al. (2013) demonstrated using 26 released items in reading-domain of the Program for International Student Assessment (PISA), administered in 2000; initial attributes were defined by domain experts, followed by statistical analyses based on absolute and relative fit indices. After redefining those initial attributes and Q-matrix specifications, the selected Q-matrix was evaluated across reduced CDMs. Finally, the results were investigated using data from different countries. However, in Chen et al. (2013)’s study, attribute specifications were not validated using statistical procedures. Therefore, validating the correctness of attribute specifications in the Q-matrix and then defining a specific reduced or general model for each item, if possible, should be one of the earlier steps to be taken. Otherwise, attribute misspecifications in the Q-matrix and model-data misfit can classify examinees into inaccurate latent classes.

Purpose of the Study

Using the eighth-grade mathematics section of the TIMSS 2011 assessment (Mullis, Martin, Foy, & Arora, 2012), this study has three purposes. The first purpose is to validate attribute specifications in the Q-matrix under the G-DINA model in that any reduced model needs not to be known. Rather than constructing the Q-matrix of the test, it was adapted from Şen and Arıcan (2015). The validation of attribute specifications was implemented by the G-DINA model discrimination index (GDI; de la Torre & Chiu, 2016). After verifying the correctness of attribute specifications, the second purpose is to define the most appropriate model. This step is important because the fit of the model to the data should be evaluated (Chen et al., 2013). The Wald test used to investigate the item-level fit of a saturated CDM relative to the fitting of three reduced models (DINA, DINO, and A-CDM) was carried out (de la Torre & Lee, 2013). The third purpose is to validate results across 20 countries that were selected randomly.

Background

Q-Matrix

Regardless of assuming a reduced or general model, the Q-matrix is a crucial component of CDMs, in that each item is associated with the required attributes to be mastered by examinees for correctly answering the item. Let $q_{j k}$ represent the element in row j and column k of a $J \times K$ Q-matrix, where J and K are the number of items and attributes, respectively. If the kth attribute is required to answer item j correctly, $q_{j k} = 1$ , if it is not required, $q_{j k} = 0$ .

The process of constructing the Q-matrix typically involves experts’ judgments that could be considered subjective in nature. This can cause serious validation problems as a result of inaccurate parameter estimation and attribute classifications. Moreover, there have been some studies implemented for Q-matrix validation (Chiu, 2013; DeCarlo, 2011; de la Torre, 2008; de la Torre & Chiu, 2016; Liu et al., 2012; Terzi & de la Torre, 2018).

Saturated and Reduced CDMs

The primary purpose of CDMs is to classify examinees into latent classes based on which among K attributes have been mastered. This classification procedure differs because of the underlying model. At this point, saturated and reduced CDMs play various roles. For instance, among the reduced CDMs, the DINA is a commonly used model that classifies examinees into two groups, those who have mastered all the required attributes and those who have not mastered at least one of the required attributes. In other words, in the DINA model, examinees have to master all the required attributes; otherwise, nonmastery of even one of the required attributes can result in answering the item incorrectly. However, in the DINO model, mastery of even one of the required attributes would be enough to answer the item correctly. There is also another type of reduced models, which are additive in nature for instance, the A-CDM, the LLM, and the R-RUM, with different link functions (de la Torre, 2011) have the cumulative probability of success associated with one attribute that has an independent impact from other attributes. Because of the same nature rather than being based on different link functions, among these additive models, the A-CDM (i.e., additive) is used in this study in addition to the DINA model (i.e., conjunctive) and DINO model (i.e., disjunctive) associated with different condensation rules.

Nonetheless, each of these three reduced models has its own limitations. The G-DINA model is a generalization of the DINA model that partitions examinees into $2^{K_{j}}$ latent groups, where $K_{j}$ is the total number of required attributes for item j. The G-DINA model involves the effects of each mastered attribute and its corresponding interaction. Thus, each examinee with the mastery of different attributes has various probability of success for each item. Moreover, the G-DINA model is a saturated model that subsumes the aforementioned reduced models. Given an attribute vector $α_{l j}$ , where $l = 1, 2, \dots, 2^{K}$ , the item response function of the G-DINA model under the identity link function for item j is given as

\begin{matrix} P (α_{l j}) = δ_{j 0} + \sum_{k = 1}^{K_{j}} δ_{j k} α_{l k} + \sum_{k^{'} = k + 1}^{K_{j}} \sum_{k = 1}^{K_{j} - 1} δ_{j k k^{'}} α_{l k} α_{l k^{'}} \\ + \dots + δ_{j 12 \dots K_{j}} \prod_{k = 1}^{K_{j}} α_{l k}, \end{matrix}

(1)

where $δ_{j 0}$ is the intercept; $δ_{j k}$ is the main effect of $α_{k}$ ; $δ_{j k k^{'}}$ is the interaction effect of $α_{k}$ and $α_{k^{'}}$ ; and $δ_{j 12 \dots K_{j}}$ is the interaction effect of $α_{1}, \dots, α_{K_{j}}$ .

As earlier mentioned, if all the parameters in Equation 1 are set to zero except for $δ_{j 0}$ and $δ_{j 12 \dots K_{j}}$ , the DINA model with two parameters (i.e., slip and guessing) can be obtained from the G-DINA model. That is,

P (α_{l j}) = δ_{j 0} + δ_{j 12 \dots K_{j}} \prod_{k = 1}^{K_{j}} α_{l k},

where $g_{j} = δ_{j 0}$ and $1 - s_{j} = δ_{j 0} + δ_{j 12 \dots K_{j}}$ . When $δ_{j k} = δ_{j k k^{'}} = \dots = {(- 1)}^{K_{j} + 1} δ_{j 12 \dots K_{j}}$ , the DINO model with two parameters can be obtained. Moreover, if all the interaction terms in the G-DINA model are dropped, the A-CDM model with $K_{j} + 1$ number of parameters per item can be obtained.

These models were compared and contrasted in a number of studies for various purposes (Chen & de la Torre, 2014; de la Torre & Lee, 2013; Liu et al., 2018; Ma, Iaconangelo, & de la Torre, 2016; Sorrel et al., 2017). Such studies demonstrated the importance of implementing model-data fit analyses. Focusing on model-data fit at the item level is crucial because using a single model for all the test items does not reflect the reality according to current empirical applications (Sorrel et al., 2017). Thus, this present study aims to carry out model-data fit analyses after verifying the correctness of attribute specifications in the Q-matrix.

Given the purpose of this study, the next section of this article first presents information about the data source and statistical procedures implemented. Second, results regarding the Q-matrix validation and model-data fit evaluation are provided in the following section, followed by summary and discussions.

Method

Data Source

TIMSS 2011 eighth-grade mathematics responses from the students of 20 countries (e.g., Australia, Bahrain, Italy, Korea, Malaysia, Romania, Turkey, and the United States) were randomly selected for this study (Table 1). The administration of Booklet 2 to students from these countries was selected for CDM analyses in this study. Booklet 2 was composed of 32 items, including 15 multiple choice and 17 constructed response items. The sample sizes ranged from 272 (England) to 743 (the United States) students who took Booklet 2.

Table 1.

Average Scale Score for the TIMSS 2011 Eighth-Grade Participants.

Country	Sample size	Overall mathematics average scale score	SE
Korea, Republic of	368	613	2.9
Singapore	422	611	3.8
Hong Kong, SAR	279	586	3.8
Israel	326	516	4.1
The United States	743	509	2.6
England	272	507	5.5
Australia	539	505	5.1
Italy	283	498	2.4
New Zealand	386	488	5.5
Ukraine	239	479	3.9
Armenia	423	467	2.7
Romania	398	458	4.0
Turkey	488	452	3.9
Lebanon	288	449	3.7
Malaysia	408	440	5.4
Bahrain	329	409	2.0
Jordan	555	406	3.7
Palestinian National Authority	562	404	3.5
Saudi Arabia	321	394	4.6
Ghana	528	331	4.3

Note. Sample sizes are reported for students who took Booklet 2.

SAR = special administrative region.

In addition to test items, CDM analyses require constructing a Q-matrix that shows relationships between items and attributes, which are required to correctly answer the items. The attributes of Q-matrix in this study were adapted from the Common Core State Standards for Mathematics (Common Core State Standards Initiative, 2010). Table 2 presents attribute description for each content domain reported by TIMSS researchers (Mullis et al., 2012), and subattributes were determined by four experts in mathematics education (Şen & Arıcan, 2015). Note that as Items M052503A and M052503B were the same in the original 32-item list, one of them (Item M052503A) was dropped from the Q-matrix (Sen & Arıcan, 2015), thus, a total of 31 items was used in this study.

Table 2.

Attributes Adopted From the Common Core State Standards Initiative (2010).

Content domain	Attribute description	Frequency
Numbers	α_N1—Possesses understanding of fraction equivalence and ordering; uses equivalent fractions as a strategy to add and subtract fractions.	5
	α_N2—Understands decimal notation for fractions, and compares decimal fractions; performs operations with decimals.	5
	α_N3—Understands ratio concepts, and uses ratio reasoning to solve problems; finds a percent of a quantity as a rate per 100.	4
Algebra, Geometry	α_A1—Applies and extends previous understandings of arithmetic to algebraic expressions; solves real-life and mathematical problems using numerical and algebraic expressions and equations.	8
	α_A2—Reasons about and solves one-variable equations and inequalities; uses properties of operations to generate equivalent expressions.	4
	α_A3—Analyzes and solves linear equations and pairs of simultaneous linear equations.	1
	α_A4—Uses the four operations with whole numbers to solve problems; identifies and explains patterns in arithmetic.	3
	α_G1—Draws, constructs, and describes geometrical figures, and describes the relationships between them.	6
	α_G2—Solves real-life and mathematical problems involving angle measure, area, surface area, and volume.	5
	α_G3—Understands congruence and similarity using physical models, transparencies, or geometry software.	3
	α_G4—Recognizes perimeter, understands concepts of area, and relates area to multiplication and addition.	2
Data and chance	α_D1—Represents and interprets data; draws informal comparative inferences about two populations.α_D2—Investigates chance processes and develops, uses, and evaluates probability models.	34

Similar to Hou’s (2013) study, because sample sizes of the randomly selected countries were limited for the G-DINA model estimation, Q-matrix validation and model-data fit evaluation were separately carried out at the attribute level and content domain level. For example, the Q-matrix for each content domain displayed in Table 3 was used in this study. That is, items in each content domain were defined if there is need for a particular attribute to answer the item correctly in the corresponding content domain. At the content domain level shown in Table 4, each domain was specified as an attribute if a particular domain is required to answer the item correctly. That is, four content domains were adapted as the attributes without disaggregating them into the 13 finer-grained attributes. The analyses were implemented in the Ox language (Doornik, 2009).

Table 3.

Q-Matrix for Each Content Domain.

Numbers				Algebra					Geometry					Data and Chance
J	α_N1	α_N2	α_N3	J	α_A1	α_A2	α_A3	α_A4	J	α_G1	α_G2	α_G3	α_G4	J	α_D1	α_D2
1	1	1	0	3	1	0	0	1	6	0	0	0	1	13	0	1
2	0	1	0	6	1	1	0	0	9	1	1	1	0	14	1	0
4	1	0	0	7	1	0	0	0	10	1	1	1	0	18	0	1
5	1	1	1	8	1	1	0	0	11	0	1	0	1	28	1	0
15	1	1	0	17	0	0	0	1	12	1	1	0	0	29	0	1
16	1	1	0	19	1	1	0	0	25	1	0	0	0	30	0	1
18	0	0	1	20	1	0	0	0	26	1	1	0	0	31	1	0
30	0	0	1	21	1	1	0	0	27	1	0	1	0
31	0	0	1	22	1	0	0	0
				23	0	0	0	1
				24	0	0	1	0

Table 4.

Aggregated Q-Matrix of the Content Domains.

J	α₁	α₂	α₃	α₄	J	α₁	α₂	α₃	α₄	J	α₁	α₂	α₃	α₄
1	1	0	0	0	11	0	0	1	0	21	0	1	0	0
2	1	0	0	0	12	0	0	1	0	22	0	1	0	0
3	0	1	0	0	13	0	0	0	1	23	0	1	0	0
4	1	0	0	0	14	0	0	0	1	24	0	1	0	0
5	1	0	0	0	15	1	0	0	0	25	0	0	1	0
6	0	1	1	0	16	1	0	0	0	26	0	0	1	0
7	0	1	0	0	17	0	1	0	0	27	0	0	1	0
8	0	1	0	0	18	1	0	0	1	28	0	0	0	1
9	0	0	1	0	19	0	1	0	0	29	0	0	0	1
10	0	0	1	0	20	0	1	0	0	30	1	0	0	1
										31	1	0	0	1

Note. Attributes α₁, α₂, α₃, and α₄ correspond to the Number, Algebra, Geometry, and Data and Chance domains, respectively; J = item.

Statistical Procedures

Q-Matrix validation

This study applied the GDI (de la Torre & Chiu, 2016), denoted by $ζ_{j}^{2}$ , to empirically validate attribute specifications at the item level. This index suggests attribute specifications based on the proportion of variance accounted for (PVAF) by a q-vector relative to the maximum $ζ_{j}^{2}$ . The maximum $ζ_{j}^{2}$ is obtained when all attribute specifications are 1 (de la Torre & Chiu, 2016). At this point, an approximation is required to prevent attributes from overspecifications, which was provided by a predetermined cut-off value for PVAF. In this study, the cut-off value was set at 0.90 because it provides better results (Terzi, 2017) than the 0.95 value recommended by de la Torre and Chiu (2016).

Given an attribute distribution, the $ζ_{j}^{2}$ represents the weighted variance of the probabilities of correctly answering item j. The GDI of an item with a specification $q_{K^{'} : K^{″}}$ assuming the first $K_{j}$ attributes required is computed as the following:

\begin{array}{l} ζ_{j}^{2} = ζ_{K^{'} : K^{″}}^{2} = \sum_{α_{K^{'}} = 0}^{1} \dots \sum_{α_{K^{″}} = 0}^{1} w (α_{K^{'} : K^{″}}) {[p (α_{K^{'} : K^{″}}) - \bar{p} (α_{K^{'} : K^{″}})]}^{2} \\ = \sum_{α_{K^{'}} = 0}^{1} \dots \sum_{α_{K^{″}} = 0}^{1} w (α_{K^{'} : K^{″}}) p^{2} (α_{K^{'} : K^{″}}) - {\bar{p}}^{2} (α_{K^{'} : K^{″}}), \end{array}

where $\bar{p} (α_{K^{'} : K^{″}})$ is the weighted probability of success across all the possible patterns of $p (α_{K^{'} : K^{″}})$ , and $w (α_{K^{'} : K^{″}})$ is the posterior probability of examinees in class $α_{1} \dots α_{K^{″}}$ (de la Torre & Chiu, 2016).

Model fit evaluation

The Wald test was first introduced by de la Torre (2011) to examine whether the G-DINA model can be replaced by one of the reduced models. The Wald test was further applied by de la Torre and Lee (2013) where the most appropriate CDM at the item level was investigated, which was applied in this study using the TIMSS 2011 large-scale data set. As stated earlier, each reduced CDM can be obtained from the saturated model using different restriction matrices based on the model specifications. Note that items requiring multiple attributes were analyzed because there is no need to distinguish between the reduced and saturated CDMs for one-attribute items (de la Torre & Lee, 2013).

Given $K_{j} > 1$ requirement, this section describes how the Wald test can be implemented for model fit evaluation. The Wald test requires a restriction matrix R , $(2^{K_{j}} - p) \times 2^{K_{j}}$ , where p represents the number of parameters of a reduced model. As shown by de la Torre and Lee (2013), the restriction matrices for the DINA model ( $p = 2$ ), the DINO model ( $p = 2$ ), and the A-CDM ( $p = 4$ ) are demonstrated below, respectively, when $K_{j} = 3$ .

\begin{array}{l} R_{6 \times 8}^{(1)} = (\begin{matrix} 1 & - 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & - 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & - 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & - 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & - 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & - 1 & 0 \end{matrix}), \\ R_{6 \times 8}^{(2)} = (\begin{matrix} 0 & 1 & - 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & - 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & - 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & - 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & - 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & - 1 \end{matrix}), \\ R_{4 \times 8}^{(3)} = (\begin{matrix} 1 & - 1 & - 1 & 0 & 1 & 0 & 0 & 0 \\ 1 & - 1 & 0 & - 1 & 0 & 1 & 0 & 0 \\ 1 & 0 & - 1 & - 1 & 0 & 0 & 1 & 0 \\ - 1 & 1 & 1 & 1 & - 1 & - 1 & - 1 & 1 \end{matrix}) . \end{array}

$R_{6 \times 8}^{(1)}$ , $R_{6 \times 8}^{(2)}$ , and $R_{4 \times 8}^{(3)}$ restriction matrices provide different constraints to the G-DINA model, such that each model can be tested simultaneously within a single test. These sets of constraints based on the restriction matrices are as follows:

\begin{array}{l} δ_{1} = δ_{2} = δ_{3} = δ_{12} = δ_{13} = δ_{23} = 0, \\ δ_{1} = δ_{2} = δ_{3} = - δ_{12} = - δ_{13} = - δ_{23} = δ_{123}, and \\ δ_{12} = δ_{13} = δ_{23} = δ_{123} = 0, \end{array}

for the DINA model, the DINO model, and the A-CDM, respectively (de la Torre & Lee, 2013). The explanations on how these three reduced models were obtained according to these restriction matrices were mentioned in the “Saturated and Reduced CDMs” section. Thus, the Wald statistic can be computed as follows:

W = {[R \times P_{j}]}^{'} {R \times V a r (P_{j}) \times R^{'}}^{- 1} [R \times P_{j}],

where $P_{j}$ is obtained by the marginal maximum likelihood estimate of the parameter $\hat{P} (α_{l j}^{*}) = \frac{R_{α_{l j}^{*}}}{I_{α_{l j}^{*}}}$ using expectation maximization (EM; Bock & Aitkin, 1981) algorithm, $V a r (P_{j})$ is the inverse of the information matrix. In $\hat{P} (α_{l j}^{*})$ , $R_{α_{l j}^{*}}$ is the number of expected examinees in group $α_{l j}^{*}$ who gave a correct response to item j, and $I_{α_{l j}^{*}}$ is the number of expected examinees who are in the latent group $α_{l j}^{*}$ .

Results

First study was carried out for the Q-matrix validation under the G-DINA model. After validating the current attribute specifications given in the Q-matrix, the second study evaluated model-data fit at the item level. Results were further reported separately based on each content domain and aggregated content domains. The important contribution of this study is to propose two steps of model-data fit evaluation for each content domain and aggregated content domains. The reason for following such sequence is to implement model-data fit evaluation based on statistically validated attribute specifications. Therefore, unintended consequences of any misspecified attribute specification, if available, can be eliminated for the model-data fit evaluation. Moreover, results for both purposes were validated across the 20 countries.

Q-Matrix Validation

The validation of attribute specifications is displayed in Tables 5 and 6 for the attributes at the attribute level and content domain level, respectively. Those results were obtained using the GDI based on the G-DINA model. According to the validation of results across the 20 countries that were randomly selected, each attribute was specified as 1 if attribute specification was suggested by more than 50% of the countries on average; otherwise, it was specified as 0.

Table 5.

Suggested Q-Matrix for Each Content Domain.

Numbers				Algebra					Geometry					Data and Chance
J	α_N1	α_N2	α_N3	J	α_A1	α_A2	α_A3	α_A4	J	α_G1	α_G2	α_G3	α_G4	J	α_D1	α_D2
1	1	1	0	3	1	0	0	1	6	0	0	0	1	13	0	1
2	0	1	0	6	1	1	0	0	9	1	1	1	0	14	1	0
4	1	0	0	7	1	0	0	0	10	1	1	1	0	18	0	1
5	1	1	1	8	1	1	0	0	11	0	1	0	1	28	1	0
15	1	1	0	17	0	0	0	1	12	1	1	0	0	29	0	1
16	1	1	0	19	1	0*	0	0	25	1	0	0	0	30	0	1
18	0	0	1	20	1	0	0	0	26	1	1	0	0	31	1	0
30	0	0	1	21	1	1	0	0	27	1	0	1	0
31	0	0	1	22	1	0	0	0
				23	0	0	0	1
				24	0	0	1	0

Note. α_N1, α_N2, and α_N3 are the attributes of the Number domain; α_A1, α_A2, α_A3, and α_A4 are the attributes of the Algebra domain; α_G1, α_G2, α_G3, and α_G4 are the attributes of the Geometry domain; α_D1 and α_D2 are the attributes of the Data and Chance domain; attribute with * indicates a suggested attribute specification different from the original attribute; J = item.

Table 6.

Suggested Q-Matrix for the Aggregated Q-Matrix of the Content Domains.

J	α₁	α₂	α₃	α₄	J	α₁	α₂	α₃	α₄	J	α₁	α₂	α₃	α₄
1	1	0	0	0	11	0	0	1	0	21	0	1	0	0
2	1	0	0	0	12	0	0	1	0	22	0	1	0	0
3	0	1	0	0	13	0	0	0	1	23	0	1	0	0
4	1	0	0	0	14	0	0	0	1	24	0	1	0	0
5	1	0	0	0	15	1	0	0	0	25	0	0	1	0
6	0	1	1	0	16	1	0	0	0	26	0	0	1	0
7	0	1	0	0	17	0	1	0	0	27	0	0	1	1*
8	0	1	0	0	18	1	0	0	1	28	0	0	0	1
9	0	0	1	0	19	0	1	0	0	29	0	0	0	1
10	0	0	1	0	20	0	1	0	0	30	0*	0	0	1
										31	1	0	0	1

Note. Attributes α₁, α₂, α₃, and α₄ correspond to the Number, Algebra, Geometry, and Data and Chance domains; attribute with * indicates a suggested attribute specification different from the original attribute; J = item.

Given separate results for each content domain, all attribute specifications were deemed correct. There was only one exception where one attribute specification (α_A2) for Item 19 in Algebra domain was suggested to be 0. (Item 19 asks which of the four options is equal to $3 p^{2} + 2 p + 2 p^{2}$ ). Based on communication with the domain expert who designed the Q-matrix, as this item is a one-variable equation, α_A2 (i.e., reasons about and solves one-variable equations and inequalities) was originally included. However, α_A1 (i.e., solves real-life and mathematical problems using numerical and algebraic expressions and equations) defines a more general attribute that encompasses α_A2. Thus, eliminating α_A2 for Item 19 based on the GDI is in line with the expert’s opinion.

For the content domains, as the attributes were investigated for the Q-matrix validation as shown in Table 6, two attribute specifications were changed. In Item 27 (i.e., which of the options show the result of a half-turn clockwise around point 0?), α₄ was changed to 1, meaning that examinees also have to master the Data and Chance domain (α₄) to answer the item correctly. In this item, a new shape formed by a geometric rotation was asked. The domain expert suggests that, this question is still relevant to a prediction about an outcome; however, this geometry-related question cannot be solved with the knowledge of Data and Chance domain (α₄). Another suggestion was in Item 30, which asks “Over recent weeks, a shop’s average sales of bottles of soda have been 50% in the regular size, 40% in the small size, and 10% in the large size. Next week, the shopkeeper will order 1,200 bottles of soda. How many of these bottles should he order in the regular size?” In mastering the Number domain (i.e., α₁), answering the item correctly is not compulsory. Therefore, the attribute specification of α₁ in Item 30 was changed to 0. Item 30 only requires Data and Chance domain (α₄). According to the domain experts, it was considered important to know the ratios of the bottles so that the attributes in Numbers domain were retrieved. However, without having to set the odds for 50% in the question, it was easy to find the half of 1,200 bottles, (i.e., 600). Students can generally deal with half of something such as twofold without relating to the proportion. If a different rate had been asked, for example, 40% would have to be written as 40/100. This type of 50% chance questions using Data and Chance domain (α₄) is sufficient to correctly answer the item.

Model Fit Evaluation

The Wald test, W, is assumed to be asymptotically $χ^{2}$ -distributed with $2^{K_{j}} - p$ degrees of freedom under the null hypothesis that $R \times P_{j} = 0$ . If the null hypothesis is retained, a reduced model fits data; otherwise, the G-DINA model fits data. Results were obtained by setting the significance level at 0.05. As a result, findings based on the suggested reduced and saturated CDMs can be interpreted properly.

Given the suggested attribute specifications, the next focus was on the model-data fit evaluation. The Wald test was applied at item level where any reduced model fits the data if the null hypothesis is retained. Tables 7 and 8 show model-data fit evaluation for each content domain. If the averaged proportions of the retained fitting models are more than 50% for the 20 countries, a reduced model was selected; otherwise, the G-DINA model was selected. As observed, items requiring multiple attributes were analyzed using model-data fit evaluation.

Table 7.

Suggested Models for Each Content Domain.

Numbers				Algebra				Geometry
J	DINA	DINO	A-CDM	J	DINA	DINO	A-CDM	J	DINA	DINO	A-CDM
1	1	0	0	3	1	1	0	9	0	0	0
5	1	0	0	6	0	1	0	10	0	0	0
15	0	0	1	8	0	1	0	11	0	0	0
16	1	0	0	21	0	0	0	12	0	0	0
								26	0	0	0
								27	1	0	0

Note. α_N1, α_N2, and α_N3 are the attributes of the Number domain; α_A1, α_A2, α_A3, and α_A4 are the attributes of the Algebra domain; α_G1, α_G2, α_G3, and α_G4 are the attributes of the Geometry domain; α_D1 and α_D2 are the attributes of the Data and Chance domain; J = item; DINA = deterministic inputs, noisy “and” gate; DINO = deterministic input, noisy “or” gate; A-CDM = additive-cognitive diagnosis model.

Table 8.

Suggested Models for the Aggregated Content Domains.

J	DINA	DINO	A-CDM
6	1	1	1
18	0	0	1
27	1	1	1
31	0	0	1

Note. J = item; DINA = deterministic inputs, noisy “and” gate; DINO = deterministic input, noisy “or” gate; A-CDM = additive-cognitive diagnosis model.

For the content domain of Numbers, among the 20 countries, more than 10 countries (i.e., at least 50%) showed that the underlying model for Items 1, 5, and 16 is the DINA model. The interpretation of Items 1 and 16 is that examinees can give a correct answer if they have mastered α_N1 and α_N2. Similarly, the interpretation for Item 5 is that examinees can give a correct answer if they have mastered all the three attributes (i.e., α_N1, α_N2, and α_N3). Nonmastery of either one or more of these required attributes ends up with failure of correctly answering the item. Another implication of fitting the DINA model is that these attributes are also independent of each other, meaning that they have no interaction. Moreover, the A-CDM fits into Item 15. The interpretation is that the probability of correctly answering the item increases with the mastery of each α_N1 and α_N2. That is, mastery of any of these required attributes is additive without any interaction effects, meaning that mastery of a required attribute on top of the other increases the chance of answering the item correctly.

For the content domain of Algebra, the DINO model fits to Items 6 and 8. The interpretation of Items 6 and 8 is that examinees can give a correct answer if they have mastered either one of α_A1 and α_A2. However, it is interesting to know that the DINA and DINO models fit into Item 3. It is difficult to interpret this item. Nonetheless, if we set the cut-off at 85% of the 20 countries, then the underlying model becomes the DINA model. Furthermore, as none of the reduced models fit into Item 21, the G-DINA model was selected. The implication of fitting the G-DINA model is that α_A1 and α_A2 are dependent on each other, meaning that the probability of correctly answering the item increases with the mastery of each attribute with the interaction effect of α_A1 and α_A2.

When the content domain of Geometry was investigated, none of the reduced models fits into Items 9, 10, 11, 12, and 26. The implication of fitting the G-DINA model is that the probability of correctly answering the Items 9 and 10 increases with the mastery of each α_G1, α_G2, and α_G3 in addition to the interaction effects of α_G1 and α_G2, α_G1 and α_G3, α_G2 and α_G3, and α_G1, α_G2, and α_G3. Given the interpretations for Items 9 and 10, similar inferences of fitting the G-DINA model to Items 11, 12, and 26 can be made. For Item 27, the DINA model is retained as the underlying model, meaning that examinees can give a correct answer if they have mastered α_G1 and α_G3. In other words, nonmastery of either one or two of these required attributes leads to failure of correctly answering the item.

For the content domains, as the attributes were investigated for the model-data fit evaluation as shown in Table 8, the A-CDM fits to Items 18 and 31. That is, mastery of any of the required attributes (α₁ and α₄) is additive without any interaction effects, meaning that mastery of a required attribute on top of the other increases the chance of answering the item correctly. For Items 6 and 27, it is difficult to make such interpretation because the three reduced models fit into these items. In contrast, as discussed when each content domain was analyzed separately for the model-data fit evaluation, the DINO and DINA models fit to Items 6 for Algebra and 27 for Geometry, respectively. Hence, these models should be interpreted with caution for Items 6 and 27.

Summary and Discussion

One aspect for the evidence of validity is that an assessment tool should be useful (Kane, 2013). However, the utility of some applications, such as the model-data fit evaluation or retrofitting, can be improved given the assumption that reliable and accurate results are obtained (Liu et al., 2018). Even though model-data fit and retrofitting are not the ultimate solution, CDMs can still be applied to large-scale assessments in conjunction with an appropriate Q-matrix to obtain diagnostic information (Chen & de la Torre, 2014). Nonetheless, it is worthwhile using these assessments to draw finer-grained inferences about the mastery and nonmastery of specific attributes at the country levels because of the large amount of investments involved in developing the assessments (Chen & de la Torre, 2014). Moreover, attribute specifications in the Q-matrix constructed by domain experts are usually considered subjective in nature, which can lead to misclassification of examinees as a result of inaccurate parameter estimates (de la Torre & Chiu, 2016). Thus, employing such large-scale assessments require additional precaution.

This study used the eighth-grade mathematics section of the TIMSS 2011 assessment to analyze nondiagnostic assessments for diagnostic purposes. Two separate analyses were carried out at the attribute level and content domain level. After this separation and due to a large number of attributes that could cause problems for G-DINA model parameter estimates (Hou, 2013), first, attribute specifications in the Q-matrix were validated under the G-DINA model using the GDI index, in that any reduced model is not assumed. After verifying the correctness of attribute specifications for both cases, the model-data fit evaluation of the data was implemented. The Wald test used to evaluate the fit of a saturated CDM in contrast to the three reduced models (DINA, DINO, and A-CDM) was carried out at the item level (de la Torre & Lee, 2013). Comparing the fit of these distinct models is necessary because each item requires different underlying latent process. Furthermore, results from these two analyses were further validated across the 20 countries that were selected randomly.

Findings in this study suggest that, each item should be analyzed separately in terms of Q-matrix validation and model-data fit evaluation purposes based on validating results across countries. Instead of assuming the correctness of attribute specifications in the Q-matrix as well as having a single model for all the test items, researchers should start off the analyses with a notion that the correctness of attribute specifications should be verified, while the underlying latent process is unknown. After carrying out the Q-matrix validation, fitting models for each item should be identified. Otherwise, as the results show, assuming a single model for all items without validating the attribute specifications in the Q-matrix can cause serious validation problems as a result of inaccurate parameter estimates and attribute classifications. In general, according to results, some attribute specifications were changed and some items showed different reduced and general models. Due to the interactions among the attributes explained by the G-DINA model for specific items, caution should be taken while interpreting the items. Finally, for those items, which suggested multiple fitting models, it would be safer to follow the interpretation of a more general model—G-DINA.

This study has some limitations. First, the TIMSS 2011 mathematics questions were not designed for diagnostic purposes. However, because of the large amount of investments in such a large data set, this article was intended to obtain diagnostic information from this nondiagnostic assessment. Moreover, due to the fixed sample sizes, test lengths, and the number of attributes for the TIMSS 2011 data, we were unable to investigate the results under various conditions, in particular, for the G-DINA model parameter estimates. Therefore, inferences should be made carefully when retrofitting CDMs to responses from large-scale assessments is considered.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Ragip Terzi

Author Biographies

Ragip Terzi is a faculty member of Department of Educational Measurement and Evaluation in School of Education, Harran University, Turkey. His primary research interests include item response theory and cognitive diagnosis models. His recent publications appeared in Journal of Educational Measurement, International Journal of Assessment Tools in Education, Journal on Mathematics Education, and Journal of Measurement and Evaluation in Education and Psychology journals.

Sedat Sen is an assistant professor at Harran University, Sanliurfa, Turkey. His research interests focus on quantitative methods, applied statistics, and psychometrics. His recent publications appeared in Applied Psychological Measurement, International Journal of Testing, and Journal of Measurement and Evaluation in Education and Psychology journals.

References

Akbay

Terzi

Kaplan

Karaaslan

K. G.

(2018). Expert-based attribute identification and validation: An application of cognitively diagnostic assessment. Journal on Mathematics Education, 9, 103-120.

Birenbaum

Tatsuoka

Yamada

(2004). Diagnostic assessment in TIMSS-R: Between-countries and within-country comparisons of eighth graders’ mathematics performance. Studies in Educational Evaluation, 30, 151-173.

Bock

R. D.

Aitkin

(1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443-459.

Chen

J. S.

de la Torre

(2014). A procedure for diagnostically modeling extant large-scale assessment data: The case of the Programme for International Student Assessment in Reading. Psychology, 5, 1967-1978.

Chen

J. S.

Torre

Zhang

(2013). Relative and absolute fit evaluation in cognitive diagnosis modeling. Journal of Educational Measurement, 50, 123-140.

Cheng

(2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74, 619-632.

Chiu

C.-Y.

(2013). Statistical refinement of the Q-matrix in cognitive diagnosis. Applied Psychological Measurement, 37, 598-618.

Choi

M. K.

Lee

Y.-S.

Park

Y. S.

(2015). What CDM can tell about what students have learned: An analysis of TIMSS eighth grade mathematics. Eurasia Journal of Mathematics, Science & Technology Education, 11, 1563-1577.

Common Core State Standards Initiative. (2010). Common core state standards for mathematics. Washington, DC: National Governors Association Center for Best Practices and the Council of Chief State School Officers.

10.

Cui

Leighton

J. P.

(2009). The hierarchy consistency index: Evaluating person fit for cognitive diagnostic assessment. Journal of Educational Measurement, 46, 429-449.

11.

DeCarlo

L. T.

(2011). On the analysis of fraction subtraction data: The DINA model, classification, latent class sizes, and the Q-matrix. Applied Psychological Measurement, 35, 8-26.

12.

de la Torre

(2008). An empirically based method of Q-matrix validation for the DINA model: Development and applications. Journal of Educational Measurement, 45, 343-362.

13.

de la Torre

(2011). The generalized DINA model framework. Psychometrika, 76, 179-199.

14.

de la Torre

Chiu

C.-Y.

(2016). A general method of empirical Q-matrix validation. Psychometrika, 81, 253-273.

15.

de la Torre

Douglas

J. A.

(2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69, 333-353.

16.

de la Torre

Lee

Y. S.

(2013). Evaluating the Wald test for item-level comparison of saturated and reduced models in cognitive diagnosis. Journal of Educational Measurement, 50, 355-373.

17.

Dogan

Tatsuoka

(2008). An international comparison using a diagnostic testing model: Turkish students’ profile of mathematical skills on TIMSS-R. Educational Studies in Mathematics, 68, 263-272.

18.

Doornik

J. A.

(2009). An object-oriented matrix programming language Ox 6 [Computer software]. London, England: Timberlake Consultants.

19.

Haertel

E. H.

(1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26, 301-323.

20.

Hartz

Roussos

(2008). The fusion model for skills diagnosis: Blending theory with practicality (Report No. RR-08-71). Princeton, NJ: Educational Testing Service.

21.

Henson

R. A.

Templin

J. L.

Willse

J. T.

(2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74, 191-210.

22.

Hou

(2013). Differential item functioning assessment in cognitive diagnostic modeling: Applying the Wald test to investigate DIF in the generalized DINA model framework (Unpublished doctoral dissertation). University of Delaware, Newark.

23.

Hsu

C. L.

Wang

W. C.

Chen

S. Y.

, (2013). Variable-length computerized adaptive testing based on cognitive diagnosis models. Applied Psychological Measurement, 37(7), 563-582.

24.

Park

H. J.

(2010). A comparison of US and Korean students’ mathematics skills using a cognitive diagnostic testing method: Linkage to instruction. Educational Research and Evaluation, 16, 287-301.

25.

Junker

B. W.

Sijtsma

(2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25, 258-272.

26.

Kane

M. T.

(2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 1-73.

27.

Kaplan

Torre

Barrada

J. R.

(2014). New item selection methods for cognitive diagnosis computerized adaptive testing. Applied Psychological Measurement, 39, 167-188.

28.

Lee

Y.-S.

Johnson

Park

J. Y.

Sachdeva

Zhang

Waldman

(2013, April). A multidimensional scaling (MDS) approach for investigating students’ cognitive weakness and strength on the TIMSS 2007 mathematics assessment. Paper presented at the 2013 Annual Meeting of the American Educational Research Association Conference in San Francisco, CA.

29.

Lee

Y.-S.

Park

Y. S.

Taylan

(2011). A cognitive diagnostic modeling of attribute mastery in Massachusetts, Minnesota, and the U.S. national sample using the TIMSS 2007. International Journal of Testing, 11, 144-177.

30.

Liu

Ying

(2012). Data-driven learning of Q-matrix. Applied Psychological Measurement, 36, 548-564.

31.

Liu

Huggins-Manley

A. C.

Bulut

(2018). Retrofitting diagnostic classification models to responses from IRT-based assessment forms. Educational and Psychological Measurement, 78, 357-383.

32.

Liu

Douglas

J. A.

Henson

R. A.

(2009). Testing person fit in cognitive diagnosis. Applied Psychological Measurement, 33, 579-598.

33.

Iaconangelo

Torre

(2016). Model similarity, model selection, and attribute classification. Applied Psychological Measurement, 40, 200-217.

34.

Mullis

I. V.

Martin

M. O.

Foy

Arora

(2012). TIMSS 2011 international results in mathematics. Amsterdam, The Netherlands: International Association for the Evaluation of Educational Achievement.

35.

Rupp

A. A.

Templin

J. L.

Henson

R. A.

(2010). Diagnostic assessment: Theory, methods, and applications. New York, NY: Guilford Press.

36.

Şen

Arıcan

(2015). A diagnostic comparison of Turkish and Korean students’ mathematics performances on the TIMSS 2011 assessment. Journal of Measurement and Evaluation in Education and Psychology, 6, 238-253.

37.

Sen

Bradshaw

(2017). Comparison of relative fit indices for diagnostic model selection. Applied Psychological Measurement, 41, 422-438.

38.

Sorrel

M. A.

Torre

Abad

F. J.

Olea

(2017). Two-step likelihood ratio test for item-level model comparison in cognitive diagnosis models. Methodology, 13, 39-47.

39.

Tatsuoka

K. K.

, (1983). Rule-space: An approach for dealing with misconceptions based on item response theory. Journal of Educational Measurement, 20, 345-354.

40.

Templin

Henson

(2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11, 287-305.

41.

Terzi

(2017). New Q-matrix validation procedures (Unpublished doctoral dissertation). Rutgers, The State University of New Jersey, New Brunswick.

42.

Terzi

de la Torre

(2018). An iterative method for empirically-based Q-matrix validation. International Journal of Assessment Tools in Education, 5, 248-262.

43.

Tjoe

de la Torre

(2014). The identification and validation process of proportional reasoning attributes: An application of a cognitive diagnosis modeling framework. Mathematics Education Research Journal, 26, 237-255.

44.

Toker

Green

(2012, April). An application of cognitive diagnostic assessment on TIMMS-2007 8th Grade Mathematics items. Paper presented at the annual meeting of the American Educational Research Association, Vancouver, British Columbia, Canada.

45.

von Davier

. (2005). A general diagnostic model applied to language testing data (Report No. RR-05-16). Princeton, NJ: Educational Testing Service.

J	α₁	α₂	α₃	α₄	J	α₁	α₂	α₃	α₄	J	α₁	α₂	α₃	α₄
1	1	0	0	0	11	0	0	1	0	21	0	1	0	0
2	1	0	0	0	12	0	0	1	0	22	0	1	0	0
3	0	1	0	0	13	0	0	0	1	23	0	1	0	0
4	1	0	0	0	14	0	0	0	1	24	0	1	0	0
5	1	0	0	0	15	1	0	0	0	25	0	0	1	0
6	0	1	1	0	16	1	0	0	0	26	0	0	1	0
7	0	1	0	0	17	0	1	0	0	27	0	0	1	0
8	0	1	0	0	18	1	0	0	1	28	0	0	0	1
9	0	0	1	0	19	0	1	0	0	29	0	0	0	1
10	0	0	1	0	20	0	1	0	0	30	1	0	0	1
										31	1	0	0	1

J	α₁	α₂	α₃	α₄	J	α₁	α₂	α₃	α₄	J	α₁	α₂	α₃	α₄
1	1	0	0	0	11	0	0	1	0	21	0	1	0	0
2	1	0	0	0	12	0	0	1	0	22	0	1	0	0
3	0	1	0	0	13	0	0	0	1	23	0	1	0	0
4	1	0	0	0	14	0	0	0	1	24	0	1	0	0
5	1	0	0	0	15	1	0	0	0	25	0	0	1	0
6	0	1	1	0	16	1	0	0	0	26	0	0	1	0
7	0	1	0	0	17	0	1	0	0	27	0	0	1	1*
8	0	1	0	0	18	1	0	0	1	28	0	0	0	1
9	0	0	1	0	19	0	1	0	0	29	0	0	0	1
10	0	0	1	0	20	0	1	0	0	30	0*	0	0	1
										31	1	0	0	1

J	α₁	α₂	α₃	α₄	J	α₁	α₂	α₃	α₄	J	α₁	α₂	α₃	α₄
1	1	0	0	0	11	0	0	1	0	21	0	1	0	0
2	1	0	0	0	12	0	0	1	0	22	0	1	0	0
3	0	1	0	0	13	0	0	0	1	23	0	1	0	0
4	1	0	0	0	14	0	0	0	1	24	0	1	0	0
5	1	0	0	0	15	1	0	0	0	25	0	0	1	0
6	0	1	1	0	16	1	0	0	0	26	0	0	1	0
7	0	1	0	0	17	0	1	0	0	27	0	0	1	0
8	0	1	0	0	18	1	0	0	1	28	0	0	0	1
9	0	0	1	0	19	0	1	0	0	29	0	0	0	1
10	0	0	1	0	20	0	1	0	0	30	1	0	0	1
										31	1	0	0	1

J	α₁	α₂	α₃	α₄	J	α₁	α₂	α₃	α₄	J	α₁	α₂	α₃	α₄
1	1	0	0	0	11	0	0	1	0	21	0	1	0	0
2	1	0	0	0	12	0	0	1	0	22	0	1	0	0
3	0	1	0	0	13	0	0	0	1	23	0	1	0	0
4	1	0	0	0	14	0	0	0	1	24	0	1	0	0
5	1	0	0	0	15	1	0	0	0	25	0	0	1	0
6	0	1	1	0	16	1	0	0	0	26	0	0	1	0
7	0	1	0	0	17	0	1	0	0	27	0	0	1	1*
8	0	1	0	0	18	1	0	0	1	28	0	0	0	1
9	0	0	1	0	19	0	1	0	0	29	0	0	0	1
10	0	0	1	0	20	0	1	0	0	30	0*	0	0	1
										31	1	0	0	1

A Nondiagnostic Assessment for Diagnostic Purposes: Q-Matrix Validation and Item-Based Model Fit Evaluation for the TIMSS 2011 Assessment

Abstract

Keywords

Introduction

Purpose of the Study

Background

Q-Matrix

Saturated and Reduced CDMs

Method

Data Source

Statistical Procedures

Q-Matrix validation

Model fit evaluation

Results

Q-Matrix Validation

Model Fit Evaluation

Summary and Discussion

Footnotes

Declaration of Conflicting Interests

Funding

ORCID iD

Author Biographies

References

J	α₁	α₂	α₃	α₄	J	α₁	α₂	α₃	α₄	J	α₁	α₂	α₃	α₄
1	1	0	0	0	11	0	0	1	0	21	0	1	0	0
2	1	0	0	0	12	0	0	1	0	22	0	1	0	0
3	0	1	0	0	13	0	0	0	1	23	0	1	0	0
4	1	0	0	0	14	0	0	0	1	24	0	1	0	0
5	1	0	0	0	15	1	0	0	0	25	0	0	1	0
6	0	1	1	0	16	1	0	0	0	26	0	0	1	0
7	0	1	0	0	17	0	1	0	0	27	0	0	1	0
8	0	1	0	0	18	1	0	0	1	28	0	0	0	1
9	0	0	1	0	19	0	1	0	0	29	0	0	0	1
10	0	0	1	0	20	0	1	0	0	30	1	0	0	1
										31	1	0	0	1

J	α₁	α₂	α₃	α₄	J	α₁	α₂	α₃	α₄	J	α₁	α₂	α₃	α₄
1	1	0	0	0	11	0	0	1	0	21	0	1	0	0
2	1	0	0	0	12	0	0	1	0	22	0	1	0	0
3	0	1	0	0	13	0	0	0	1	23	0	1	0	0
4	1	0	0	0	14	0	0	0	1	24	0	1	0	0
5	1	0	0	0	15	1	0	0	0	25	0	0	1	0
6	0	1	1	0	16	1	0	0	0	26	0	0	1	0
7	0	1	0	0	17	0	1	0	0	27	0	0	1	1*
8	0	1	0	0	18	1	0	0	1	28	0	0	0	1
9	0	0	1	0	19	0	1	0	0	29	0	0	0	1
10	0	0	1	0	20	0	1	0	0	30	0*	0	0	1
										31	1	0	0	1