Sage Journals: Discover world-class research

Abstract

The purpose of this study is to investigate the relationship between the Shannon entropy procedure and the Jensen–Shannon divergence (JSD) that are used as item selection criteria in cognitive diagnostic computerized adaptive testing (CD-CAT). Because the JSD itself is defined by the Shannon entropy, we apply the well-known relationship between the JSD and Shannon entropy to establish a relationship between the item selection criteria that are based on these two measures. To understand the relationship between these two item selection criteria better, an alternative way is also provided. Theoretical derivations and empirical examples have shown that the Shannon entropy procedure and the JSD in CD-CAT have a linear relation under cognitive diagnostic models. Consistent with our theoretical conclusions, simulation results have shown that two item selection criteria behaved quite similarly in terms of attribute-level and pattern recovery rates under all conditions and they selected the same set of items for each examinee from an item bank with item parameters drawn from a uniform distribution U(0.1, 0.3) under post hoc simulations. We provide some suggestions for future studies and a discussion of relationship between the modified posterior-weighted Kullback–Leibler index and the G-DINA (generalized deterministic inputs, noisy “and” gate) discrimination index.

Keywords

cognitive diagnosis models computerized adaptive testing item selection methods the Shannon entropy procedure the Jensen–Shannon divergence

Summative assessments are typically used for grading and accountability purposes, and formative assessments are often used for supporting student learning (Laveault & Allal, 2016). Researchers and practitioners began to focus on formative assessments for student learning, rather than focus solely on summative assessments because many evidences showed that formative assessments produce significant and often substantial learning gains and improve student confidence and achievement (Black & Wiliam, 1998; Laveault & Allal, 2016). Cognitive diagnosis assessment (CDA) can be regarded as a kind of formative assessments because it is intended to promote assessment for learning to modify instruction and learning in classrooms by providing the formative diagnostic information about students’ cognitive strengths and weaknesses (Jang, 2008; Leighton & Gierl, 2007). CDA has received increasing attention in recent years (Leighton & Gierl, 2007; Rupp et al., 2010; K. K. Tatsuoka, 2009), especially since the No Child Left Behind Act of 2001 mandated the selection and use of diagnostic assessments to improve teaching and learning and the new federal grant program known as “Race to the Top” (RTTT) has led into a new era of K–12 assessments which emphasized both accountability and instructional improvement (Chang, 2012).

Computerized adaptive testing (CAT) has become a popular mode of many summative and formative assessments (Quellmalz & Pellegrino, 2009). As a method of administering test items, CAT tailors the item difficulty to the ability level of the individual examinee (Chang & Ying, 2007). It is attractive to practitioners because it yields a high measurement precision with a short test. In the framework of CAT, cognitive diagnostic computerized adaptive testing (CD-CAT) is also a popular mode of online testing for cognitive diagnosis, as it can help one make informed decisions about the next steps in instruction for each student and greatly facilitate individualized learning (Chang, 2015) and provide many benefits to support formative assessments (Gierl & Lai, 2018). Particularly, the U.S. National Education Technology Plan 2017 with the title of “Reimagining the Role of Technology in Education” (U.S. Department of Education, 2017) emphasizes that technology can help us redefine assessment to meet the needs of the learner in a variety of ways. For technology-based formative assessments or CAT, test items are adapted to learner’s ability and knowledge during the testing process. Thus, CAT can provide real-time reporting of results during the instructional process, which is crucial for personalized learning (Chen & Chang, 2018).

A key ingredient in CD-CAT is the item selection index. Researchers have attempted to investigate many item selection indices. The first type of index is based on the Kullback–Leibler (KL) information, such as the KL index (Cheng, 2009; McGlohen & Chang, 2008; C. Tatsuoka & Ferguson, 2003; Xu et al., 2003), the likelihood- or posterior-weighted KL (LWKL or PWKL) index and the hybrid KL index (Cheng, 2009), the restrictive progressive or threshold PWKL index (Wang et al., 2011), the aggregate ranked information index and the aggregate standardized information index (Wang et al., 2014), the modified PWKL (MPWKL) index (Kaplan et al., 2015), the KL expected discrimination index (W. Y. Wang et al., 2015), the posterior-weighted cognitive diagnostic model (CDM) discrimination index and the posterior-weighted attribute-level CDM discrimination index (Zheng & Chang, 2016), and the information product index (Zheng et al., 2018). The second is based on the Shannon entropy, called the Shannon entropy (SHE) procedure (Cheng, 2009; McGlohen & Chang, 2008; C. Tatsuoka, 2002; C. Tatsuoka & Ferguson, 2003; Xu et al., 2003, 2016). The third is based on the mutual information, including the expected mutual information index (Wang, 2013) and the Jensen–Shannon divergence (JSD) index (Kang et al., 2017; Minchen & de la Torre, 2016; Yigit et al., 2018). There are other indices, such as the generalized deterministic inputs, noisy “and” gate (G-DINA; de la Torre, 2011) model discrimination index (GDI; Kaplan et al., 2015), the rate function approach (Liu et al., 2015), the halving algorithm (C. Tatsuoka & Ferguson, 2003; W. Y. Wang et al., 2015; Zheng & Wang, 2017), and so on. Yigit et al. (2018) has proved that the mutual information index and the JSD index are equivalent. Although the previous simulation studies showed that the SHE and the JSD or mutual information perform quite similarly, the main purpose of this study is to describe the theoretical relationship between the SHE procedure and the JSD index.

CDMs

Before introducing item selection indices for CD-CAT, the general concept of CDMs and the G-DINA model as a kind of general CDMs are described here. CDMs have been defined by Rupp and Templin (2008) as “probabilistic, confirmatory multidimensional latent variable models with a simple or complex loading structure” (p. 226). The loading structure for a CDM is represented by its Q-matrix (K. K. Tatsuoka, 1983, 2009). The entries of a Q-matrix indicate 1 or 0, in which $q_{j k} = 1$ when item j involves attribute k for answering item j correctly and $q_{j k} = 0$ otherwise. CDMs define an item response function of a Q-matrix, examinee’s discrete latent variables, and item parameters to predict the probability of an observable categorical response to an item. This study only focuses on CDMs designed to handle dichotomous responses. For a dichotomous CDM, the form of an item response function for a binary response variable is denoted by $P_{j u} (α_{i}) = P (U_{i j} = u | α_{i}, q_{j}, β_{j})$ , where $u \in {0, 1},$ $q_{j}$ qj is the j th row of a Q-matrix, and $β_{j}$ is item parameters for item j. Note that this study will use $P_{j u} (α_{i})$ to discuss the theoretical relationship between the SHE procedure and the JSD index. While the G-DINA model and other CDMs are described below only for showing details about different item response functions for dichotomous CDMs.

Let $K_{j}^{*} = \sum_{k = 1}^{K} q_{j k}$ denote the number of required attributes for item j, where K is the number of attributes of a test. For $K$ attributes, there are $2^{K}$ distinct attribute patterns in the universal set of knowledge states, attribute patterns, or latent classes. Let $α_{i}$ denote an attribute pattern from the universal set of knowledge states. In the G-DINA model, item j that measures $K_{j}^{*}$ attributes partitions the $2^{K}$ distinct attribute patterns into $2^{K_{j}^{*}}$ latent groups. To simplify the notation, let $α_{i j}^{*}$ be the reduced attribute pattern of the full attribute pattern $α_{i}$ with respect to the required attributes for item j. Let $U_{i j}$ be the response of examinee i to item j. We define $P_{j u} (α_{i}) = P (U_{i j} = u | α_{i j}^{*})$ to be the probability distribution of the binary random variable $U_{i j}$ , where $P_{j 1} (α_{i})$ and $P_{j 0} (α_{i}) = 1 - P_{j 1} (α_{i})$ are the probabilities of getting a right answer and wrong answer on item j by examinee $i$ with full attribute pattern $α_{i}$ or reduced attribute pattern $α_{i j}^{*}$ . For the G-DINA model, the probability of correctly answering item $j$ by examinee $i$ is given by de la Torre (2011) and Ma et al. (2016).

\begin{array}{l} g [P_{j 1} (α_{i})] = δ_{j 0} + \sum_{k = 1}^{K_{j}^{*}} δ_{j 1} α_{i k}^{*} \\ + \sum_{k = 1}^{K_{j}^{*} - 1} \sum_{k' = k + 1}^{K_{j}^{*}} δ_{j k k'} α_{i k}^{*} α_{i k'}^{*} + \dots + δ_{j 12 \dots K_{j}^{*}} \prod_{k = 1}^{K_{j}^{*}} α_{i k}^{*}, \end{array}

(1)

where $δ_{j 0}$ is the intercept for item $j$ , $δ_{j 1}$ is the main effect due to $α_{i k}^{*}$ , $δ_{j k k'}$ is the interaction effect due to $α_{i k}^{*}$ and $α_{i k'}^{*}$ and $δ_{j 12 \dots K_{j}^{*}}$ is the interaction effect due to $α_{i 1}^{*}, \dots, α_{i K_{j}^{*}}^{*}$ . In addition, link function $g [P_{j u} (α_{i})]$ can be formulated using the logit, log, and identity links. The logit link results in a general model which is equivalent to the log-linear CDM (Henson et al., 2009) and can be viewed as a special case of the general diagnostic model (GDM; von Davier, 2005, 2008). The resulting model from the log link function is referred to as the log CDM (de la Torre, 2011).

For the identity link, that is $g [P_{j u} (α_{i})] = P_{j u} (α_{i})$ , the “deterministic input; noisy ‘and’ gate” (DINA) model (de la Torre & Douglas, 2004; Haertel, 1989; Junker & Sijtsma, 2001), the “deterministic input; noisy ‘or’ gate” (DINO) model (Templin & Henson, 2006), and the additive CDM (A-CDM) can be obtained from the G-DINA model when appropriate constraints are applied. For example, the item response function of the DINA model is $P_{j 1} (α_{i}) = δ_{j 0} + δ_{j 12 \dots K_{j}^{*}} \prod_{k = 1}^{K_{j}^{*}} α_{i k}^{*},$ by setting all lower-order interaction terms to zero and by taking $δ_{j 0} = g_{j}$ and $δ_{j 12 \dots K_{j}^{*}} = 1 - s_{j} - g_{j}$ . The parameter $g_{j}$ is the probability of correctly guessing the answer if an examinee lacks at least one of the required attributes, and the parameter $s_{j}$ refers to the probability of slipping and incorrectly answering the item if an examinee has mastered all the required attributes. The DINA model is a parsimonious and interpretable model that requires only two parameters for each item regardless of the number of attributes being considered.

Overview of Two Item Selection Indices for CD-CAT

SHE Procedure

After an item bank has been calibrated with a CDM, one must determine how to choose items for examinees from the item bank. CD-CAT employs algorithms to select items sequentially on the basis of examinee’s responses, which is designed to classify student’s attribute pattern accurately with a short test. The SHE procedure (Cheng, 2009) and the JSD index (Minchen & de la Torre, 2016; Yigit et al., 2018) are described below.

Suppose that the prior is chosen as $π_{0} (α_{c})$ for attribute pattern $α_{c}$ , where $c = 1, 2, \dots, 2^{K}$ . For examinee $i$ , suppose that $t$ items are selected, the vector of corresponding item responses is denoted as $u_{i}^{(t)}$ , and a set $R_{i}^{(t)}$ represents the set of available items at this stage. The posterior distribution $π (α_{c} | u_{i}^{(t)})$ then becomes

π_{i, t} (α_{c}) = π (α_{c} | u_{i}^{(t)}) = \frac{π_{0} (α_{c}) L (u_{i}^{(t)} | α_{c})}{\sum_{c = 1}^{2^{K}} π_{0} (α_{c}) L (u_{i}^{(t)} | α_{c})},

(2)

where $L (u_{i}^{(t)} | α_{c})$ is the likelihood function, and it is the product of each item response function if the assumptions of local independence are satisfied. The SHE of the posterior distribution π_i,t can then be written as

H (π_{i, t}) = - \sum_{c = 1}^{2^{K}} π_{i, t} (α_{c}) \log π_{i, t} (α_{c}) .

(3)

Assuming that $π_{i, t} (α_{c})$ is an updated prior probability distribution and $U_{i j} = u$ is an item response for candidate item $j$ in $R_{i}^{(t)}$ , the posterior distribution $π_{i, t + 1}$ then becomes

π_{i, t + 1} (α_{c}) = π_{i, t + 1} (α_{c} | u_{i}^{(t)}, U_{i j} = u) = \frac{π_{i, t} (α_{c}) P_{j u} (α_{c})}{\sum_{c = 1}^{2^{K}} π_{i, t} (α_{c}) P_{j u} (α_{c})} .

(4)

From the last term, $π_{i, t + 1} (α_{c} | u_{i}^{(t)}, U_{i j} = u)$ can be rewritten as $π_{i, t + 1} (α_{c}^{(t)} | U_{i j} = u)$ , where $α_{c}^{(t)}$ has an updated prior distribution π_i,t.

By considering the uncertainty of item response $U_{i j}$ , the marginal probability distribution of item response on item $j$ given the probability distribution π_i,t can be computed as follows

\begin{array}{l} P (U_{i j} = u | u_{i}^{(t)}) = \sum_{c = 1}^{2^{K}} P (U_{i j} = u, α_{c} | u_{i}^{(t)}) \\ = \sum_{c = 1}^{2^{K}} π_{i, t} (α_{c}) P_{j u} (α_{c}), \end{array}

(5)

where the second term follows directly from $P (U_{i j} = u | u_{i}^{(t)}, α_{c}) = P (U_{i j} = u | α_{c}),$ which can be derived from the assumption of local independence; as the current posterior distribution π_i,t can be viewed as a new prior for $α_{c}^{(t)}$ after having seen test data $u_{i}^{(t)}$ , $P (U_{i j} = u | u_{i}^{(t)})$ can be simplified to $P (U_{i j} = u) = \sum_{c = 1}^{2^{K}} π_{i, t} (α_{c}^{(t)}) P_{j u} (α_{c}^{(t)})$ when $α_{c}^{(t)}$ is substituted for $α_{c}$ in the last term.

The next item to be selected for examinee $i$ by the SHE is the one in $R_{i}^{(t)}$ that minimizes the expected SHE:

S H E_{i j} = \sum_{u = 0}^{1} P (U_{i j} = u | u_{i}^{(t)}) H (π_{i, t + 1} (α_{c} | u_{i}^{(t)}, U_{i j} = u)) .

(6)

As shown above, $P (U_{i j} = u | u_{i}^{(t)})$ and $π_{i, t + 1} (α_{c} | u_{i}^{(t)}, U_{i j} = u),$ respectively, become $P (U_{i j} = u)$ and $π_{i, t + 1} (α_{c}^{(t)} | U_{i j} = u)$ . Let $H (α_{c}^{(t)} | U_{i j})$ be the conditional entropy of $α_{c}^{(t)}$ given $U_{i j}$ . From the definition of conditional entropy (Cover & Thomas, 2006), $H (α_{c}^{(t)} | U_{i j})$ is defined as the weighted sum of $H (α_{c}^{(t)} | U_{i j} = u)$ over each possible value of $u$ taken by the random variable $U_{i j}$ , using $P (U_{i j} = u)$ as the weights. Thus, $S H E_{i j}$ can be considered as the conditional entropy $H (α_{c}^{(t)} | U_{i j})$ of $α_{c}^{(t)}$ given $U_{i j}$ .

JSD Index

The JSD as a new class of information measures based on the SHE was introduced by Lin (1991) to measure the overall difference of any finite number of distributions. Let $P_{j u} (α_{1}), P_{j u} (α_{2}), \dots, P_{j u} (α_{2^{K}})$ be $2^{K}$ item response functions with weights of prior probabilities $π_{i, t} (α_{1}), π_{i, t} (α_{2}), \dots, π_{i, t} (α_{2^{K}})$ , respectively. By the definition of the generalized JSD in Equation 5.1 of the paper of Lin (1991), or from Equations A.3, A.4, and A.5 in online appendices of the paper of Yigit et al. (2018), the JSD for item $j$ can be written as

J S D_{i j} = H (P (U_{i j} = u | u_{i}^{(t)})) - \sum_{c = 1}^{2^{K}} π_{i, t} (α_{c}) H (P_{j u} (α_{c})),

(7)

where

\begin{array}{l} H (P (U_{i j} = u | u_{i}^{(t)})) = - \sum_{u = 0}^{1} [\sum_{c = 1}^{2^{K}} π_{i, t} (α_{c}) P_{j u} (α_{c})] \log \\ [\sum_{c = 1}^{2^{K}} π_{i, t} (α_{c}) P_{j u} (α_{c})], \end{array}

(8)

and

\begin{array}{l} \sum_{c = 1}^{2^{K}} π_{i, t} (α_{c}) H (P_{j u} (α_{c})) = \\ - \sum_{c = 1}^{2^{K}} [π_{i, t} (α_{c}) [\sum_{u = 0}^{1} P_{j u} (α_{c}) \log P_{j u} (α_{c})]] . \end{array}

(9)

The next item to be administered for examinee $i$ is the one in $R_{i}^{(t)}$ that maximizes $J S D_{i j}$ . Yigit et al. (2018) have proved that $J S D_{i j}$ can be considered as the mutual information $I (α_{i}; U_{i j}) = I (U_{i j}; α_{i})$ between the two discrete random variables of $α_{i}$ and $U_{i j}$ . Thus, the JSD index is also a measure of the amount of information one random variable $α_{i}$ contains about another $U_{i j}$ .

Similar results have been observed by Kang et al. (2017) within the framework of dual-objective CD-CAT (Kang et al., 2017; McGlohen & Chang, 2008; Wang et al., 2014; Zheng et al., 2018). For simultaneously estimating examinees’ $α_{i}$ and general ability $θ_{i}$ , the dual-objective CD-CAT is based on a CDM and an item response theory model. In other words, item response $U_{i j}$ has two Bernoulli distributions $P_{j u} (α_{i})$ and $P_{j u} (θ_{i})$ . Take the two-parameter logistic model as an example, the probability of responding correctly to item $j$ is defined as $P_{j 1} (θ_{i}) = 1 / (1 + \exp (- D a_{j} (θ_{i} - b_{j})))$ , that is, $P_{j u} (θ_{i}) = \exp (u D a_{j} (θ_{i} - b_{j})) / (1 + \exp (D a_{j} (θ_{i} - b_{j}))) .$ Here, $D$ is a constant, $a_{j}$ is the discrimination parameter, and $b_{j}$ is the difficulty parameter. The JSD of dual-objective CD-CAT was defined as mutual information between the two discrete random variables $U_{i j}$ and $Z$ , where $U_{i j}$ has a mixture distribution between $P_{j u} (α)$ and $P_{j u} (θ)$ , and $Z$ is the binary indicator variable for each distribution. For detailed information about relationship between the JSD, KL information, and Fisher information, please refer to Kang et al. (2017).

Relationship Between the SHE and the JSD

The purpose of this section is to establish the statement that the SHE and the JSD as two item selection criteria in CD-CAT are linearly related. Because the JSD itself is defined by the SHE, we apply the well-known relationship between the mutual information (or JSD) and SHE to establish a relationship between the item selection criteria that are developed using these two measures. The mutual information and SHE satisfy two well-known equations 2.43 and 2.44 from Theorem 2.4.1 in Cover and Thomas (2006, p. 21); that is, $I (X; Y) = H (X) - H (X | Y)$ and $I (X; Y) = H (Y) - H (Y | X)$ , where $I (X; Y)$ and $H (X | Y)$ are respectively mutual information and conditional entropy for two random variables $X$ and $Y$ . Let $X = U_{i j}$ and $Y = α_{c}^{(t)}$ , with probability distributions $P (U_{i j} = u)$ and π_i,t. As $J S D_{i j} = I (U_{i j}; α_{c}^{(t)})$ was proved in Yigit et al. (2018), we have $J S D_{i j} = I (U_{i j}; α_{c}^{(t)}) = H (α_{c}^{(t)}) - H (α_{c}^{(t)} | U_{i j})$ , which follows directly from the second well-known equation. As shown in the “SHE Procedure” section, $S H E_{i j}$ can be written as the conditional entropy $H (α_{c}^{(t)} | U_{i j})$ . Thus, we have $J S D_{i j} = H (π_{i, t}) - S H E_{i j}$ .

Next, we will provide an alternative way to prove the above statement, which would be useful for a better understanding of the relation. For simplicity, let the denominators or the normalizing constants of Equations 2 and 4 be $C_{1} = \sum_{c = 1}^{2^{K}} π_{0} (α_{c}) L (u_{i}^{(t)} | α_{c})$ and $C_{2} = \sum_{c = 1}^{2^{K}} π_{i, t} (α_{c}) P_{j u} (α_{c}) .$ Note that the right-hand side of $C_{2}$ is relevant to $u$ . The detailed mathematical steps are described below. Substituting Equations 4 and 5 into Equation 6, the $S H E_{i j}$ can be written in the following equivalent form

S H E_{i j} = \sum_{u = 0}^{1} [\sum_{c = 1}^{2^{K}} π_{i, t} (α_{c}) P_{j u} (α_{c})] H (\frac{π_{i, t} (α_{c}) P_{j u} (α_{c})}{C_{2}}) .

(10)

By the definition of SHE, Equation 10 can be computed by

\begin{array}{l} S H E_{i j} = \sum_{u = 0}^{1} [\sum_{c = 1}^{2^{K}} π_{i, t} (α_{c}) P_{j u} (α_{c})] \\ [- \sum_{c = 1}^{2^{K}} \frac{π_{i, t} (α_{c}) P_{j u} (α_{c})}{C_{2}} \log \frac{π_{i, t} (α_{c}) P_{j u} (α_{c})}{C_{2}}] \end{array}

(11)

Recall from two basic logarithmic properties that the log of a quotient is equal to the difference between the logs of the numerator and denominator, and the log of a product is equal to the sum of the logs of the factors. Equation 11 can be written as

\begin{array}{l} S H E_{i j} = \sum_{u = 0}^{1} [\sum_{c = 1}^{2^{K}} π_{i, t} (α_{c}) P_{j u} (α_{c})] \\ [- \sum_{c = 1}^{2^{K}} \frac{π_{i, t} (α_{c}) P_{j u} (α_{c})}{C_{2}} (\log π_{i, t} (α_{c}) + \log P_{j u} (α_{c}) - \log C_{2})] . \end{array}

(12)

Notice $C_{2}$ and $\log C_{2}$ can be factored out from the third summation, as they remain constant over the summation index $c$ from 1 to $2^{K}$ . Hence, Equation 12 has the form

\begin{array}{l} S H E_{i j} = \sum_{u = 0}^{1} [- \sum_{c = 1}^{2^{K}} (π_{i, t} (α_{c}) P_{j u} (α_{c}) (\log π_{i, t} (α_{c}) + \log P_{j u} (α_{c})))] \\ + \sum_{u = 0}^{1} C_{2} \log C_{2}, \end{array}

(13)

since $C_{2} = \sum_{c = 1}^{2^{K}} π_{i, t} (α_{c}) P_{j u} (α_{c})$ .

After changing the order of the summation and factoring two constant terms (i.e., $π_{i, t} (α_{c}) \log π_{i, t} (α_{c})$ and $π_{i, t} (α_{c})$ ) out from the new second summation, the first term on the right-hand side of the Equation 13 can be written as in the following equivalent form:

\begin{array}{l} - \sum_{c = 1}^{2^{K}} [π_{i, t} (α_{c}) \log π_{i, t} (α_{c}) \sum_{u = 0}^{1} P_{j u} (α_{c})] \\ - \sum_{c = 1}^{2^{K}} [π_{i, t} (α_{c}) [\sum_{u = 0}^{1} P_{j u} (α_{c}) \log P_{j u} (α_{c})]] \\ = - \sum_{c = 1}^{2^{K}} [π_{i, t} (α_{c}) \log π_{i, t} (α_{c})] + \sum_{c = 1}^{2^{K}} π_{i, t} (α_{c}) H (P_{j u} (α_{c})), \end{array}

(14)

which follows from $\sum_{u = 0}^{1} P_{j u} (α_{c}) = 1$ and Equation 9.

Based on Equations 3, 13, and 14, the $S H E_{i j}$ can be written as

S H E_{i j} = H (π_{i, t}) + \sum_{c = 1}^{2^{K}} π_{i, t} (α_{c}) H (P_{j u} (α_{c})) + \sum_{u = 0}^{1} C_{2} \log C_{2} .

(15)

From Equation 8, the third term on the right-hand side of Equation 15 is equal to $- H (P (U_{i j} = u | u_{i}^{(t)}))$ , since $C_{2} = \sum_{c = 1}^{2^{K}} π_{i, t} (α_{c}) P_{j u} (α_{c}) .$ Then based on Equations 7 and 15, the $S H E_{i j}$ can be written as

S H E_{i j} = H (π_{i, t}) - J S D_{i j},

(16)

which can be rewritten as

J S D_{i j} = H (π_{i, t}) - S H E_{i j} .

(17)

The meaning of Equation 17 is consistent with the fact that the JSD or mutual information is a special case of a more general quantity called relative entropy. As H(π_i,t) is not a function of both item parameters and item responses of the candidate item j, it is a constant for examinee i. Thus, minimizing $S H E_{i j}$ is equivalent to maximizing $J S D_{i j}$ because they select the same item for administration at the $(t + 1)$ th stage of testing. So far, this completes the proof that the SHE and the JSD have a linear relationship under the G-DINA model. Note that two proofs of the relationship between the SHE and JSD just rely on the form of an item response function for a binary response variable and do not depend on any particular CDM, because the item response function $P_{j u} (α_{i}),$ the prior distribution $π_{0} (α_{c})$ , and the current item response vector $u_{i}^{(t)}$ are all we need to calculate SHE and mutual information.

Tables A1 and A2 in the appendix are presented for the illustration of computation of values of indices for the SHE and JSD. Here, the SHE and JSD are computed for two items with different item response distributions or different item parameters, where a discrete uniform prior distribution for attribute patterns was used. From these two tables, the relationship between the SHE and JSD for the two items satisfied Equation 16 or Equation 17, and minimizing $S H E_{i j}$ is equivalent to maximizing $J S D_{i j}$ . Empirical examples show that the two item selection criteria are expected to behave similarly in CD-CAT.

Simulation Study

Design

A small-scale simulation study was conducted to compare the performance of the SHE and JSD. Following a design similar to that in Cheng (2009) and Xu et al. (2016), the DINA model and five independent attributes were considered in the simulation study. For the generation of four item banks, a Q-matrix for 300 items should be first simulated. The entries of the Q-matrix were generated item by item and attribute by attribute. Each item has 20% chance of measuring each attribute. Four item banks were considered: (a) slipping and guessing parameters were fixed as one of the three levels, such as 0.05, 0.1, or 0.2, and (b) both slipping and guessing parameters were randomly draw from a uniform distribution on the interval [0.1, 0.3]. Test length was either fixed at 5 or 10 items. The sample size of examinees was set to 2,000. Attribute patterns for all examinees were randomly drawn from all possible attribute patterns with equal probability. Details of simulation design are presented in Table A3 in the appendix.

To consider the impact of the simulation of item responses in CD-CAT on the performance of the SHE and JSD, two types of CAT simulation were considered: full simulations or post hoc simulations (Magis et al., 2017). In case of a full CAT simulation, an item response for examinee $i$ on item $j$ was randomly drawn from a Bernoulli distribution, denoted by $Bernoulli (P_{j 1} (α_{i}))$ . Full CAT simulations imply that item responses for examinee $i$ on the same set of items may be different. Under the post hoc simulation scenario, a complete item response matrix was created first for all examinees on each item bank before CD-CAT administrations and the responses to the selected items for the SHE or JSD were drawn from the complete item response matrix. First of all, post hoc simulations were considered to make use of exactly the same item responses for two item selection methods (SHE and JSD) under each item bank in CD-CAT. We replicated each type of simulation process 100 times under each condition and recorded final estimates of attribute patterns for all examinees.

Results

The attribute-level recovery rate is defined as the proportion of each attribute that is correctly identified. The pattern recovery rate is defined as the proportion of entire attribute pattern that is correctly recovered. Mean and standard deviation of attribute-level and pattern recovery rates for each level of item parameters are shown in Tables 1 –4. For the SHE, our results are consistent with the results of Xu et al. (2016). Consistent with our theoretical conclusions, the SHE and JSD behaved quite similarly, because their attribute-level and pattern recovery rates were very close to each other under all conditions.

Table 1.

Mean and Standard Deviation (in brackets) of Attribute and Pattern Recovery Rate for Slipping and Guessing Parameters of 0.05.

			Attribute					Pattern
Simulations	Test length	Method	1	2	3	4	5	Pattern
Post hoc	5	SHE	0.950 (0.008)	0.950 (0.007)	0.950 (0.008)	0.950 (0.008)	0.950 (0.008)	0.774 (0.011)
		JSD	0.950 (0.008)	0.950 (0.007)	0.950 (0.008)	0.950 (0.008)	0.950 (0.008)	0.774 (0.011)
	10	SHE	0.985 (0.005)	0.987 (0.006)	0.988 (0.005)	0.989 (0.006)	0.986 (0.007)	0.941 (0.009)
		JSD	0.984 (0.005)	0.990 (0.006)	0.987 (0.005)	0.989 (0.006)	0.986 (0.007)	0.942 (0.009)
Full	5	SHE	0.950 (0.008)	0.951 (0.007)	0.950 (0.008)	0.951 (0.007)	0.951 (0.008)	0.775 (0.012)
		JSD	0.951 (0.007)	0.950 (0.007)	0.950 (0.008)	0.950 (0.007)	0.950 (0.009)	0.773 (0.009)
	10	SHE	0.985 (0.005)	0.987 (0.006)	0.988 (0.006)	0.989 (0.006)	0.986 (0.007)	0.942 (0.009)
		JSD	0.984 (0.005)	0.989 (0.005)	0.987 (0.005)	0.989 (0.005)	0.986 (0.007)	0.943 (0.011)

Note. SHE = Shannon entropy; JSD = Jensen–Shannon divergence.

Table 2.

Mean and Standard Deviation (in brackets) of Attribute and Pattern Recovery Rate for Slipping and Guessing Parameters of 0.1.

			Attribute					Pattern
Simulations	Test length	Method	1	2	3	4	5	Pattern
Post hoc	5	SHE	0.901 (0.008)	0.900 (0.007)	0.898 (0.008)	0.902 (0.008)	0.898 (0.008)	0.589 (0.011)
		JSD	0.901 (0.008)	0.900 (0.007)	0.898 (0.008)	0.902 (0.008)	0.898 (0.008)	0.589 (0.011)
	10	SHE	0.951 (0.005)	0.955 (0.006)	0.954 (0.005)	0.963 (0.006)	0.952 (0.007)	0.816 (0.009)
		JSD	0.949 (0.005)	0.955 (0.006)	0.952 (0.005)	0.962 (0.006)	0.955 (0.007)	0.814 (0.009)
Full	5	SHE	0.897 (0.008)	0.901 (0.007)	0.898 (0.008)	0.902 (0.007)	0.900 (0.008)	0.591 (0.012)
		JSD	0.899 (0.007)	0.900 (0.007)	0.900 (0.008)	0.898 (0.007)	0.898 (0.009)	0.587 (0.009)
	10	SHE	0.950 (0.005)	0.955 (0.006)	0.954 (0.006)	0.962 (0.006)	0.954 (0.007)	0.815 (0.009)
		JSD	0.947 (0.005)	0.958 (0.005)	0.952 (0.005)	0.961 (0.005)	0.957 (0.007)	0.816 (0.011)

Note. SHE = Shannon entropy; JSD = Jensen–Shannon divergence.

Table 3.

Mean and Standard Deviation (in brackets) of Attribute and Pattern Recovery Rate for Slipping and Guessing Parameters of 0.2.

			Attribute					Pattern
Simulations	Test length	Method	1	2	3	4	5	Pattern
Post hoc	5	SHE	0.800 (0.008)	0.807 (0.007)	0.800 (0.008)	0.797 (0.008)	0.799 (0.008)	0.329 (0.011)
		JSD	0.800 (0.008)	0.807 (0.007)	0.800 (0.008)	0.797 (0.008)	0.799 (0.008)	0.329 (0.011)
	10	SHE	0.845 (0.005)	0.870 (0.006)	0.854 (0.005)	0.871 (0.006)	0.853 (0.007)	0.508 (0.009)
		JSD	0.847 (0.005)	0.870 (0.006)	0.851 (0.005)	0.868 (0.006)	0.855 (0.007)	0.509 (0.009)
Full	5	SHE	0.801 (0.008)	0.802 (0.007)	0.803 (0.008)	0.799 (0.007)	0.800 (0.008)	0.330 (0.012)
		JSD	0.801 (0.007)	0.801 (0.007)	0.797 (0.008)	0.798 (0.007)	0.803 (0.009)	0.329 (0.009)
	10	SHE	0.844 (0.005)	0.865 (0.006)	0.850 (0.006)	0.873 (0.006)	0.855 (0.007)	0.505 (0.009)
		JSD	0.850 (0.005)	0.865 (0.005)	0.847 (0.005)	0.867 (0.005)	0.853 (0.007)	0.504 (0.011)

Note. SHE = Shannon entropy; JSD = Jensen–Shannon divergence.

Table 4.

Mean and Standard Deviation (in brackets) of Attribute and Pattern Recovery Rate for Slipping and Guessing Parameters of U(0.1, 0.3).

			Attribute					Pattern
Simulations	Test length	Method	1	2	3	4	5	Pattern
Post hoc	5	SHE	0.884 (0.008)	0.882 (0.007)	0.848 (0.008)	0.897 (0.008)	0.838 (0.008)	0.530 (0.011)
		JSD	0.884 (0.008)	0.882 (0.007)	0.848 (0.008)	0.897 (0.008)	0.838 (0.008)	0.530 (0.011)
	10	SHE	0.953 (0.005)	0.946 (0.006)	0.939 (0.005)	0.937 (0.006)	0.913 (0.007)	0.747 (0.009)
		JSD	0.953 (0.005)	0.946 (0.006)	0.939 (0.005)	0.937 (0.006)	0.913 (0.007)	0.747 (0.009)
Full	5	SHE	0.884 (0.008)	0.882 (0.007)	0.846 (0.008)	0.898 (0.007)	0.838 (0.008)	0.529 (0.012)
		JSD	0.884 (0.007)	0.880 (0.007)	0.846 (0.008)	0.898 (0.007)	0.838 (0.009)	0.529 (0.009)
	10	SHE	0.953 (0.005)	0.946 (0.006)	0.939 (0.006)	0.938 (0.006)	0.914 (0.007)	0.747 (0.009)
		JSD	0.953 (0.005)	0.946 (0.005)	0.940 (0.005)	0.938 (0.005)	0.915 (0.007)	0.749 (0.011)

Note. SHE = Shannon entropy; JSD = Jensen–Shannon divergence.

Figure 1 presents pattern recovery rates for different test lengths and simulation types under slipping and guessing parameters of U(0.1, 0.3). From the two top panels of Figure 1, it can be observed that pattern recovery rates obtained by the SHE and JSD are the same for each replication under post hoc simulations. While for the full simulations, pattern recovery rates for the SHE and JSD were different for each replication because different item responses had an impact on test item selection. When the test length was 5, 43% pattern recovery rates of the SHE were higher than the mean of pattern recovery rates of the JSD, and 51% pattern recovery rates of the JSD were higher than the mean of pattern recovery rates of the SHE. The percentages became 49% and 60%, respectively, when the test length was 10. This result is consistent with the previous finding: “The mutual information item selection algorithm generates nearly the most accurate attribute pattern recovery in more than half of the conditions” (Wang, 2013, p. 1030).

Figure 1.

Pattern recovery rate for different test lengths and simulation types under slipping and guessing parameters of U(0.1, 0.3).

We also checked whether two item selection algorithms selected the same set of items for each examinee under post hoc simulations. For the first three item banks, two item selection algorithms based on the SHE and JSD indeed selected the same set of items but with slightly different orders. Because all test items in these item banks have the same values of item parameters, some items presented in different positions have the same value of SHE or JSD. For example, two items with same item parameters but a single distinct attribute may have the same value of SHE or JSD. For the fourth item bank, we found that two item selection algorithms based on the SHE and JSD selected the same set of items.

Discussion

In this study, we complete the proof that the SHE procedure and the JSD are linearly related under CDMs. In other words, we showed that minimizing JSD and maximizing SHE can be used interchangeably because they will select the same items in CD-CAT. The two measures are linearly related but they are not equal, meaning that two measures have the form $J S D_{i j} = H (π_{i, t}) - S H E_{i j}$ . Although they are not equal, item selection methods based on the SHE and JSD will randomly select one of test items with the same value of the SHE or JSD index for administration at the next stage of testing, because H(π_i,t) is a constant for examinee $i$ and minimizing $S H E_{i j}$ is equivalent to maximizing $J S D_{i j}$ at the current stage of testing.

This study is not without limitations. Theoretically, SHE, KL information, and mutual information are three ways to measure the uncertainty, and they are related to each other. It would be interesting to further investigate relationships of item selection indices based on the KL information, the SHE, the JSD, and other indices under general dichotomous or polytomous CDMs. For example, the GDI and MPWKL might be related, because they perform similarly and better than the PWKL in terms of correct attribute classification rates or test lengths. We believe GDI is simply a weighted variance of the probabilities of success of an item associated with attribute patterns given an attribute pattern distribution, and therefore we can start with comparing the weighted KL with the weighted variance to show a relationship. The GDI is defined as follows (Kaplan et al., 2015):

ζ_{i j}^{2} = \sum_{c = 1}^{2^{K}} π_{i, t} (α_{c}) {[P_{j 1} (α_{c}) - {\bar{P}}_{i j}]}^{2},

(18)

where ${\bar{P}}_{i j} = \sum_{c = 1}^{2^{K}} π_{i, t} (α_{c}) P_{j 1} (α_{c})$ . The MPWKL is defined as follows (Kaplan et al., 2015):

\begin{array}{l} M P W K L_{i j} = \\ \sum_{d = 1}^{2^{K}} [\sum_{c = 1}^{2^{K}} [\sum_{u = 0}^{1} (\log \frac{P_{j u} (α_{d})}{P_{j u} (α_{c})}) P_{j u} (α_{d}) π_{i, t} (α_{c})] π_{i, t} (α_{d})] . \end{array}

(19)

The following algebraic procedures will simplify the calculations in Equation 19 above

\begin{array}{l} M P W K L_{i j} = \\ \sum_{d = 1}^{2^{K}} \sum_{c = 1}^{2^{K}} \sum_{u = 0}^{1} (\log P_{j u} (α_{d}) - \log P_{j u} (α_{c})) P_{j u} (α_{d}) π_{i, t} (α_{c}) π_{i, t} (α_{d}) \end{array}

(20)

\begin{array}{l} = \sum_{d = 1}^{2^{K}} \sum_{c = 1}^{2^{K}} \sum_{u = 0}^{1} P_{j u} (α_{d}) π_{i, t} (α_{c}) π_{i, t} (α_{d}) \log P_{j u} (α_{d}) \\ - \sum_{d = 1}^{2^{K}} \sum_{c = 1}^{2^{K}} \sum_{u = 0}^{1} P_{j u} (α_{d}) π_{i, t} (α_{c}) π_{i, t} (α_{d}) \log P_{j u} (α_{c}) \end{array}

(21)

\begin{array}{l} = \sum_{d = 1}^{2^{K}} \sum_{u = 0}^{1} π_{i, t} (α_{d}) P_{j u} (α_{d}) \log P_{j u} (α_{d}) \\ - \sum_{c = 1}^{2^{K}} \sum_{u = 1}^{1} \sum_{d = 0}^{2^{K}} P_{j u} (α_{d}) π_{i, t} (α_{d}) π_{i, t} (α_{c}) \log P_{j u} (α_{c}) \end{array}

(22)

\begin{array}{l} = \sum_{c = 1}^{2^{K}} \sum_{u = 0}^{1} π_{i, t} (α_{c}) P_{j u} (α_{c}) \log P_{j u} (α_{c}) \\ - \sum_{c = 1}^{2^{K}} \sum_{u = 0}^{1} [π_{i, t} (α_{c}) \log P_{j u} (α_{c}) (\sum_{d = 1}^{2^{K}} P_{j u} (α_{d}) π_{i, t} (α_{d}))] \end{array}

(23)

= \sum_{c = 1}^{2^{K}} π_{i, t} (α_{c}) \sum_{u = 0}^{1} [P_{j u} (α_{c}) - \sum_{d = 1}^{2^{K}} P_{j u} (α_{d}) π_{i, t} (α_{d})] \log P_{j u} (α_{c})

(24)

= \sum_{c = 1}^{2^{K}} π_{i, t} (α_{c}) [P_{j 1} (α_{c}) - {\bar{P}}_{i j}] [\log P_{j 1} (α_{c}) - \log (1 - P_{j 1} (α_{c}))],

(25)

where Equation 20 follows from the quotient rule of logarithmic properties and Equation 25 follows directly from the complement rule in probability, as expressed by the equation $P_{j 0} (α_{c}) = 1 - P_{j 1} (α_{c})$ . An easy way to calculate the MPWKL is provided by Equation 25. From Equations 18 and 25, both the GDI and MPWKL are a function of $π_{i, t} (α_{c}) (P_{j 1} (α_{c}) - {\bar{P}}_{i j}),$ but with different weights $\log P_{j 1} (α_{c}) - \log (1 - P_{j 1} (α_{c}))$ and $P_{j u} (α_{c}) - {\bar{P}}_{i j}$ . Thus, we know that the GDI is closely related to the MPWKL.

The findings of this study may contribute to the growing literature on formative assessments. First, theoretical derivations and empirical examples have shown that both indices (SHE and JSD) are expected to select the same next item given item response pattern of the same set of previous test items in CD-CAT. Consistent with our theoretical conclusions, simulation results have shown that the SHE and JSD behaved quite similarly in terms of attribute-level and pattern recovery rates. This finding can possibly be useful to help practitioners to choose an effective item selection algorithm (SHE or JSD) in the development and application of CD-CAT system in the field of educational and psychological measurement. Second, the effectiveness of item selection algorithm in CD-CAT will impact the quality of curriculum delivery and the outcomes of learning. If individual diagnosis results with a high measurement precision can be provided by using an effective item selection algorithm of CD-CAT, then diverse instructional materials can cater to the diverse needs or specific knowledge status of all learners (Lashley, 2019). Finally, information-based indices are now not only widely applied in CD-CAT, but also useful for any test construction stage where test items are selected based on their statistical characteristics (e.g., Henson & Douglas, 2005; Henson et al., 2008; Kuo et al., 2016). For example, the cognitive diagnostic index, the attribute-level discrimination index, and their modified indices as KL information based measures have been used for the construction of diagnostic tests. Future research on automated test assembly for cognitive diagnosis will expand the scope of the application of the current finding.

Footnotes

Appendix

Table A3.

Details of Simulation Design.

Factors	Details
Attribute structure	Independent structure with five attributes
CDM	The DINA model
Examinees	Sample size is 2,000
	Attribute patterns are generated by taking one of the 2⁵ possible patterns with equal probability
Item banks	Each of four item banks consists of 300 items
	Each item has 20% chance of measuring each attribute
	Item parameters are set to s = g = 0.05, s = g = 0.1, s = g = 0.2, or s~U(0.1, 0.3) and g~U(0.1, 0.3)
CD-CAT	Test length is either fixed at 5 or 10 items
	Two item selection indices are the SHE and JSD with a prior uniform distribution
	MLE method is used to estimate attribute patterns of examinees
Simulations	Full simulations or post hoc simulations are used to generate item responses

Note. CDM = cognitive diagnostic model; DINA = deterministic inputs, noisy “and” gate; CD-CAT = cognitive diagnostic computerized adaptive testing; SHE = Shannon entropy; JSD = Jensen–Shannon divergence; MLE = maximum likelihood estimation.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was partially supported by the Key Project of National Education Science “Twelfth Five Year Plan” of the Ministry of Education of China (Grant No. DHA150285).

ORCID iD

Wenyi Wang

Author Biographies

Wenyi Wang is an associate professor at Jiangxi Normal University. His primary research interests include cognitive diagnostic assessment and computerized adaptive testing.

Lihong Song is an associate professor at Jiangxi Normal University. Her primary research interests include cognitive diagnostic assessment and application of statistical methods to education.

Teng Wang is a postgraduate student at Jiangxi Normal University. His primary research interests include cognitive diagnostic assessment and computerized adaptive testing.

Peng Gao is a postgraduate student at Jiangxi Normal University. His primary research interests include cognitive diagnostic assessment and computerized adaptive testing.

Jian Xiong is a postgraduate student at Jiangxi Normal University. His primary research interests include item response theory and computerized adaptive testing.

References

Black

Wiliam

(1998). Inside the black box: Raising standards through classroom assessment. Phi Delta Kappa, 80(2), 1–13.

Chang

H.-H.

(2012). Making computerized adaptive testing diagnostic tools for schools. In Lissitz

R. W.

Jiao

(Eds.), Computers and their impact on state assessment: Recent history and predictions for the future (pp. 195–226). Information Age Publisher.

Chang

H.-H.

(2015). Psychometrics behind computerized adaptive testing. Psychometrika, 80(1), 1–20.

Chang

H.-H.

Ying

(2007). Computerized adaptive testing. In Salkind

N. J.

Rasmussen

(Eds.), Encyclopedia of measurement and statistics (Vol. 1, pp. 170–173). Sage.

Chen

Chang

H.-H.

(2018). Psychometrics help learning: From assessment to learning. Applied Psychological Measurement, 42(1), 3–4.

Cheng

(2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74(4), 619–632.

Cover

T. M.

Thomas

J. A.

(2006). Elements of information theory (2nd ed.). John Wiley.

de la Torre

(2011). The generalized DINA model framework. Psychometrika, 76(2), 179–199.

de la Torre

Douglas

(2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69(3), 333–353.

10.

Gierl

M. J.

Lai

(2018). Using automatic item generation to create solutions and rationales for computerized formative testing. Applied Psychological Measurement, 42(1), 42–57.

11.

Haertel

E. H.

(1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26(4), 301–321.

12.

Henson

Douglas

(2005). Test construction for cognitive diagnosis. Applied Psychological Measurement, 29(4), 262–277.

13.

Henson

Roussos

Douglas

(2008). Cognitive diagnostic attribute-level discrimination indices. Applied Psychological Measurement, 32(4), 275–288.

14.

Henson

R. A.

Templin

J. L.

Willse

J. T.

(2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74(2), 191–210.

15.

Jang

E. E.

(2008). A framework for cognitive diagnostic assessment. In Chapelle

C. A.

Chung

Y. R.

. (Eds.), Towards adaptive CALL: Natural language processing for diagnostic language assessment (pp. 117–131), Iowa State University.

16.

Junker

B. W.

Sijtsma

(2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25(3), 258–272.

17.

Kang

H.-A.

Zhang

Chang

H.-H.

(2017). Dual-objective item selection criteria in cognitive diagnostic computerized adaptive testing. Journal of Educational Measurement, 54(2), 165–183.

18.

Kaplan

de la Torre

Barrada

J. R.

(2015). New item selection methods for cognitive diagnosis computerized adaptive testing. Applied Psychological Measurement, 39(3), 167–188.

19.

Kuo

B.-C.

Pai

H.-S.

de la Torre

(2016). Modified cognitive diagnostic index and modified attribute-level discrimination index for test construction. Applied Psychological Measurement, 40(5), 315–330.

20.

Lashley

(2019). A reflective analysis of the selection and production of instructional material for curriculum delivery at the primary level in postcolonial Guyana. SAGE Open, 9(2), 1–15.

21.

Laveault

Allal

(2016). Implementing assessment for learning: Theoretical and practical issues. In Laveault

Allal

(Eds.), Assessment for learning: Meeting the challenge of implementation (pp. 1–20). Springer.

22.

Leighton

J. P.

Gierl

M. J.

(2007). Cognitive diagnostic assessment for education: Theory and applications. Cambridge University Press.

23.

Lin

(1991). Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37(1), 145–151.

24.

Liu

Ying

Zhang

(2015). A rate function approach to computerized adaptive testing for cognitive diagnosis. Psychometrika, 80(2), 468–490.

25.

Iaconangelo

de la Torre

(2016). Model similarity, model selection, and attribute classification. Applied Psychological Measurement, 40(3), 200–217.

26.

Magis

Yan

von Davier

A. A.

(2017). Computerized adaptive and multistage testing with R: Using packages catR and mstR. Springer.

27.

McGlohen

M. K.

Chang

H.-H.

(2008). Combining computer adaptive testing technology with cognitively diagnostic assessment. Behavior Research Methods, 40(3), 808–821.

28.

Minchen

de la Torre

(2016, July). The continuous G-DINA model and the Jensen–Shannon divergence [Paper presentation]. Paper presented at the International Meeting of the Psychometric Society, Asheville, NC, United States.

29.

Quellmalz

E. S.

Pellegrino

J. W.

(2009). Technology and testing. Science, 323(5910), 75–79.

30.

Rupp

A. A.

Templin

J. L.

(2008). Unique characteristics of diagnostic classification models: A comprehensive review of the current state-of-the-art. Measurement, 6(4), 219–262.

31.

Rupp

A. A.

Templin

J. L.

Henson

R. A.

(2010). Diagnostic measurement: Theory, methods, and applications. Guilford Press.

32.

Tatsuoka

(2002). Data analytic methods for latent partially ordered classification models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 51(3), 337–350.

33.

Tatsuoka

Ferguson

(2003). Sequential classification on partially ordered sets. Journal of the Royal Statistical Society Series B (Statistical Methodology), 65(1), 143–157.

34.

Tatsuoka

K. K.

(1983). Rule space: An approach for dealing with misconceptions based on item response theory. Journal of Educational Measurement, 20(4), 345–354.

35.

Tatsuoka

K. K.

(2009). Cognitive assessment: An introduction to the rule space method. Taylor & Francis.

36.

Templin

J. L.

Henson

R. A.

(2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11(3), 287–305.

37.

U.S. Department of Education. (2017). Reimagining the role of technology in education: 2017 National Education Technology Plan update. https://tech.ed.gov/files/2017/01/NETP17.pdf

38.

von Davier

. (2005). A general diagnostic model applied to language testing data (ETS Research Report RR-05-16). Educational Testing Service.

39.

von Davier

. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, 61(2), 287–307.

40.

Wang

(2013). Mutual information item selection method in cognitive diagnostic computerized adaptive testing with short test length. Educational and Psychological Measurement, 73(6), 1017–1035.

41.

Wang

Chang

H.-H.

Huebner

(2011). Restrictive stochastic item selection methods in cognitive diagnostic computerized adaptive testing. Journal of Educational Measurement, 48(3), 255–273.

42.

Wang

Zheng

Chang

H.-H.

(2014). An enhanced approach to combine item response theory with cognitive diagnosis in adaptive testing. Journal of Educational Measurement, 51(4), 358–380.

43.

Wang

W. Y.

Ding

S. L.

Song

L. H.

(2015). New item-selection methods for balancing test efficiency against item-bank usage efficiency in CD-CAT. Springer Proceedings in Mathematics & Statistics, 89, 133–151.

44.

Wang

Shang

(2016). On initial item selection in cognitive diagnostic computerized adaptive testing. British Journal of Mathematical and Statistical Psychology, 69(3), 291–315.

45.

X. L.

Chang

H. H.

Douglas

(2003). A simulation study to compare CAT strategies for cognitive diagnosis [Paper presentation]. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL, United States.

46.

Yigit

H. D.

Sorrel

M. A.

de la Torre, J. (2018). Computerized adaptive testing for cognitively based multiple-choice data. Applied Psychological Measurement, 43(5), 388–401. https://doi.org/10.1177/0146621618798665

47.

Zheng

Chang

H.-H.

(2016). High-efficiency response distribution-based item selection algorithms for short-length cognitive diagnostic computerized adaptive testing. Applied Psychological Measurement, 40(8), 608–624.

48.

Zheng

Gao

(2018). The information product methods: A unified approach to dual-purpose computerized adaptive testing. Applied Psychological Measurement, 42(4), 321–324.

49.

Zheng

Wang

(2017). Application of binary searching for item exposure control in cognitive diagnostic computerized adaptive testing. Applied Psychological Measurement, 41(7), 561–576.

A Note on the Relationship of the Shannon Entropy Procedure and the Jensen–Shannon Divergence in Cognitive Diagnostic Computerized Adaptive Testing

Abstract

Keywords

CDMs

Overview of Two Item Selection Indices for CD-CAT

SHE Procedure

JSD Index

Relationship Between the SHE and the JSD

Simulation Study

Design

Results

Discussion

Footnotes

Appendix

Declaration of Conflicting Interests

Funding

ORCID iD

Author Biographies

References