Sage Journals: Discover world-class research

Abstract

Through N-dimensional person space, the article gives measures of test parameters and item statistics, including difficulty/discriminating value of test, correlations between a pair of items, and item-total correlations with binary items using angular similarity between two vectors. Relationships between difficulty value and discriminating value of items and test were derived, including relationship between test reliability and test discriminating value. Reliability of a test as per theoretical definition in terms of length of score vectors of two parallel subtests and angle between such vectors was derived. The method was extended to find reliability of a battery of tests. Reliability and discriminating value of a Likert-type item and scale was found in terms of angular similarity without involving assumptions of continuous nature or linearity or normality for the observed variables, or the underlying variable being measured. The proposed methods also avoid test of unidimensionality or assumption of normality or bivariate normality associated with the polychoric correlations. Thus, the proposed methods are in fact nonparametric and considered as improvement over the existing ones. Reliability as a measure of association of two vectors and discrimination as a measure of distance between the vectors are likely to show a negative relationship.

Keywords

item statistics reliability difficulty and discriminating values test battery

Introduction

It is common to consider the variables as orthogonal axes and performance of individuals as points in the Euclidean space defined by the axes. However, it is possible to treat each person as an axis and represent the variables as points or vectors in the Euclidean space. This type of presentation is called “subject space” or “person space” or “N-dimensional person space” for N-persons. Such presentation has been considered by Reyment and Joreskog (1993), Huberty (1994), Chong et al. (2002), and so on. The orthogonality of the axes is an accurate indication if the persons are assumed to be independent of one another. Chong et al. (2002) observed that vectors representing latent factors rotate in subject space rather than variable space. In regression analysis, eigenvalues can be conceptualized as subject space equivalent to variance inflation factor (VIF). The concept of subject space can also be used as a means of visualization of spatial relationships. Glass and Collins (1970) dealt with three variables in N-dimensional subject space to find possible values of $r_{X Y}$ when $r_{X Z}$ and $r_{Y Z}$ are fixed.

If N-persons take a test, then the observed test scores can be viewed as a point or a vector in N-dimensional space. Once a person has taken a test, he or she is not considered to take the test again. Thus, a person axis is not overlapped. The angle θ between the two N-dimensional vectors X and Y is given by $Cos θ = \frac{X . Y}{‖ X ‖ ‖ Y ‖}$ , where $‖ X ‖$ denotes length of the vector and is defined as $‖ X ‖ = \sqrt{\sum_{i = 1}^{N} X_{i}^{2}}$ . $‖ Y ‖$ is defined accordingly. For N-persons, the number of $Cos θ_{i j}$ for $i \neq j$ is the same as the number of possible pairs, which is $N (N - 2) / 2 .$ This gives the novel area of angular statistics, where Cosθ gives similarity between two vectors of the same dimension. In many diverse scientific fields, measurements are directions. Use of cosine similarity is common in areas like machine learning, classification, data mining, information retrieval, and pattern recognition.

This type of presentation helps to make useful inferences about the sample as well as computation of various parameters of the test, along with geometrical interpretation of such computations. It is possible to give theoretical formulation of mental testing through N-dimensional person space primarily in terms of the length of the observed score vector, angle separating the vector with the vector representing the maximum possible score in the test. For item statistics, the corresponding two vectors will be item score vector and the vector for maximum possible item score with each component equal to unity for a binary item.

For a test consisting of binary items (1 for right answer and 0 for others), the article attempts to provide geometric visualization along with computation of parameters of the test and item statistics, including test error variance and thereby the test reliability, as a ratio of true score variance and observed score variance, via a single administration of the test under classical test theory. Thus, the approach is an improvement as it enables calculation of theoretically defined reliability despite the fact that true scores of individuals taking the test are not known. Subsequent to the determination of the error variance of the test, test reliability method is extended to find reliability of a battery of tests. The approach can be extended for Likert-type test.

Methodology

The setup for test consists of binary items.

Suppose a test consisting of n-binary items has been administered among N-subjects. Scores obtained by the subjects can be presented as a point or vector X in the N-dimensional space with components $X_{1}, X_{2}, \dots, X_{N}$ , where $X_{i}$ denotes test score of the ith subject. One may consider another vector I representing maximum possible score in the test with components $I_{1}, I_{2}, \dots, I_{N}$ , where $I_{i} = I_{j} = n \forall i$ , $j = 1, 2, \dots, N$ . Call the vector I as maximum possible vector or ideal vector. Let the angle between the vectors X and I be $θ_{X}$ . Similarly, the true score vector T with components $T_{1}, T_{2}, \dots, T_{N}$ can be conceptualized, where the angle between the vectors T and I is $θ_{Y}$ .

Dot product

Dot product of two N-dimensional vectors X and Y is defined as

X . Y = ‖ X ‖ ‖ Y ‖ Cos θ

(1)

where $‖ X ‖$ denotes length of the vector X and is defined as $‖ X ‖ = \sqrt{\sum_{i = 1}^{N} X_{i}^{2}}$ . $‖ Y ‖$ is defined accordingly and $θ$ is the angle between the vectors X and Y.

Note that

$X . Y = 0 \Leftrightarrow θ = 90^{°}$ ⇔ X and Y are orthogonal (perpendicular)

X.Y = Cos θ if X = Y = 1

||X + Y|| ⩽ ||X|| + ||Y|| (Triangle inequality)

X.Y ⩽ ||X||||Y|| (Schwarz’s inequality)

${‖ X + Y ‖}^{2} + {‖ X - Y ‖}^{2} = 2 ({‖ X ‖}^{2} + {‖ Y ‖}^{2})$ (Parallelogram Law)

${‖ X + Y ‖}^{2} = {‖ X ‖}^{2} + {‖ Y ‖}^{2}$ where X and Y are orthogonal (Pythagorean Formula)

Ranks of individuals

Arranging the components of the vector X in decreasing order will give ranks of the individuals who took the test.

Mean

Let $θ_{X}$ be the angle between the score vector and the ideal vector. Then

\begin{array}{l} Cos θ_{X} = \frac{\sum X_{i} I_{i}}{‖ X ‖ ‖ I ‖} or Cos θ_{X} = \frac{\sum X_{i}}{‖ X ‖ \sqrt{N}} or \\ \bar{X} = \frac{‖ X ‖ Cos θ_{X}}{\sqrt{N}} \end{array}

(2)

Thus, test mean is equal to the product of length of the score vector and cosine of the angle between the score vector and the ideal vector divided by square root of sample size.

From equation (2)

{\bar{X}}^{2} = \frac{{‖ X ‖}^{2} {Cos}^{2} θ_{X}}{N}

(3)

Similarly

\bar{T} = \frac{‖ T ‖ Cos θ_{T}}{\sqrt{N}}

(4)

and

{\bar{T}}^{2} = \frac{{‖ T ‖}^{2} {Cos}^{2} θ_{T}}{N}

(5)

Since, $\bar{X} = \bar{T}$ , equating (2) and (4), we get

\begin{array}{l} ‖ X ‖ Cos θ_{X} = ‖ T ‖ Cos θ_{T} or \\ {‖ X ‖}^{2} {Cos}^{2} θ_{X} = {‖ T ‖}^{2} {Cos}^{2} θ_{T} = {‖ T ‖}^{2} - {‖ T ‖}^{2} {Sin}^{2} θ_{T} \end{array}

(6)

Equation (6) helps to find ${‖ T ‖}^{2} {Cos}^{2} θ_{T}$ from the data, even if ${‖ T ‖}^{2}$ or ${Cos}^{2} θ_{T}$ are not known.

Equating (3) and (5), we get

\frac{{‖ T ‖}^{2}}{{‖ X ‖}^{2}} = {Cos}^{2} θ_{X} . {Sec}^{2} θ_{T}

(7)

which gives relationship between $‖ X ‖$ and $‖ T ‖$ .

Let $θ_{E}$ represents the angle between the error score vector and the ideal vector. As per classical test theory

\begin{array}{l} \bar{E} = 0 i .e . \bar{E} = \frac{‖ E ‖ Cos θ_{E}}{\sqrt{N}} = 0 \Rightarrow \\ Cos θ_{E} = 0 since ‖ E ‖ \neq 0 \end{array}

(8)

Variance

Test variance or observed score variance can also be obtained from N-dimensional person space by $S_{X}^{2} = \frac{{‖ x ‖}^{2}}{N}$ where $x_{i} = X_{i} - \bar{X}$ represents deviation scores.

Alternatively, $S_{X}^{2} = \frac{{‖ X ‖}^{2}}{N} - {\bar{X}}^{2} \Rightarrow S_{X}^{2} = \frac{{‖ X ‖}^{2} - {‖ X ‖}^{2} {Cos}^{2} θ_{X}}{N}$ by equation (3)

\Rightarrow S_{X}^{2} = \frac{{‖ X ‖}^{2} {Sin}^{2} θ_{X}}{N}

(9)

Thus, standard deviation (SD) of test score is the product of the length of the score vector and sine of the angle between the score vector and the ideal vector divided by the sample size.

Thus, the relationship between x and X is given by

{‖ x ‖}^{2} = {‖ X ‖}^{2} {Sin}^{2} θ_{X}

(10)

Similarly, true score variance is obtained

\begin{array}{l} S_{T}^{2} = \frac{{‖ t ‖}^{2}}{N} = \frac{{‖ T ‖}^{2} {Sin}^{2} θ_{T}}{N} = \\ \frac{{‖ T ‖}^{2}}{N} - \frac{{‖ X ‖}^{2} {Cos}^{2} θ_{X}}{N} using equation (6) \end{array}

(11)

Equation (11) helps to find value of true score variance of a test from the data if the length of true score vector is known, and

{‖ t ‖}^{2} = {‖ T ‖}^{2} {Sin}^{2} θ_{T}

(12)

Now $S_{X}^{2} = S_{T}^{2} + S_{E}^{2}$ implies ${‖ X ‖}^{2} {Sin}^{2} θ_{X} = {‖ T ‖}^{2} + {‖ X ‖}^{2} {Cos}^{2} θ_{X} + {‖ E ‖}^{2}$

\Rightarrow {‖ X ‖}^{2} = {‖ T ‖}^{2} + {‖ E ‖}^{2}

(13)

Correlations

One assumption of classical test theory is $r_{T E} = 0 \Rightarrow Cov(T, E) = 0$

\Rightarrow \sum T_{i} E_{i} = 0 \Rightarrow \sum T_{i} (X_{i} - T_{i}) = 0

\Rightarrow \sum T_{i} X_{i} = \sum T_{i}^{2} \Rightarrow \sum T_{i} X_{i} = {‖ T ‖}^{2} \Rightarrow Cov (X, T) = S_{T}^{2}

One may also write $\sum T_{i} E_{i} = 0 \Rightarrow \sum (X_{i} - E_{i}) E_{i} = 0$

\Rightarrow \sum X_{i} E_{i} = {‖ E ‖}^{2} \Rightarrow Cov (X, E) = S_{E}^{2}

(14)

Thus

S_{X}^{2} = Cov (X, T) + Cov (X, E)

(15)

So variance–covariance matrix of observed score (X), true score (T), and error score (E) of a test is $[\begin{matrix} S_{X}^{2} & S_{X T} & S_{X E} \\ - & S_{T}^{2} & 0 \\ - & - & S_{E}^{2} \end{matrix}] = [\begin{matrix} S_{X}^{2} & S_{T}^{2} & S_{E}^{2} \\ - & S_{T}^{2} & 0 \\ - & - & S_{E}^{2} \end{matrix}]$ and trace of the matrix is $2 S_{X}^{2}$ .

It is well known that correlation between X and T is given by $r_{X T} = Cos θ_{x t} = \frac{\sum x_{i} t_{i}}{‖ x ‖ ‖ t ‖}$ , where $x_{i}$ and $t_{i}$ denote deviation scores and $θ_{x t}$ is the angle between two deviation score vectors.

Now, $\sum x_{i} t_{i} = \sum (X_{i} - \bar{X}) (T_{i} - \bar{T}) = \sum (X_{i} T_{i} - \bar{T} X_{i} - \bar{X} T_{i} + \bar{X} \bar{T}) = \sum X_{i} T_{i} - N {\bar{T}}^{2} = {‖ T ‖}^{2} - N {\bar{T}}^{2}$ (since, $r_{T E} = 0 \Rightarrow \sum X_{i} T_{i} = {‖ T ‖}^{2}$ ) = $N S_{T}^{2}$ .

And ${‖ x ‖}^{2} = {‖ X ‖}^{2} {Sin}^{2} θ_{X}$ by equation (10) and ${‖ t ‖}^{2} = {‖ T ‖}^{2} {Sin}^{2} θ_{T}$ by equation (12).

Thus

\begin{array}{l} r_{X T} = Cos θ_{x t} = \frac{N S_{T}^{2}}{‖ X ‖ Sin θ_{X} ‖ T ‖ Sin θ_{T}} \\ = \frac{S_{T}^{2}}{S_{X} S_{T}} (from equation (15)) \end{array}

(16)

But $Cos θ_{X T} = \frac{\sum X_{i} T_{i}}{‖ X ‖ ‖ T ‖} = \frac{{‖ T ‖}^{2}}{‖ X ‖ ‖ T ‖}$ (since r_TE = 0 ⇒ $\sum X_{i} T_{i} = {‖ T ‖}^{2}$ ).

Therefore

\frac{C o s θ_{x t}}{Cos θ_{X T}} = (\frac{S_{T}}{S_{X}}) (\frac{‖ T ‖}{‖ X ‖})

(17)

Equation (17) gives the relationship between $Cos θ_{x t}$ (which is equal to $r_{X T}$ ) and $Cos θ_{X T}$ .

Note that

Value of $Cos θ_{X}^{2}$ can be obtained from the data using equation (3). Value of $Sin θ_{X}^{2}$ can also be obtained accordingly or by using equation (10).

Value of ${‖ T ‖}^{2} Sin θ_{T}^{2}$ which is equal to ${‖ X ‖}^{2} Cos θ_{X}^{2}$ by equation (6) can be obtained from data even if the value of ${‖ T ‖}^{2}$ or $Sin θ_{T}^{2}$ is unknown.

Equations (15) to (17) each involves unknown quantity and thus does not help to compute directly $r_{t t}$ or $S_{T}^{2}$ or ${‖ E ‖}^{2}$ .

Reliability and parallel tests

A test consisting of n-items when dichotomized in parallel halves say gth subtest and hth subtest results in two points $X_{g}$ and $X_{h}$ in the N-dimensional person space. As per classical definition, two tests “g” and “h” are parallel if $T_{i}^{(g)} = T_{i}^{(h)}$ and $S_{e}^{(g)} = S_{e}^{(h)}$ , where the subscript “g” refers to subtest g and subscript “h” refers to subtest h, and $S_{e}^{(p)}$ is the SD of error scores in the pth subtest, p = g,h. It can be proved that if g and h are parallel, $\bar{X_{g}} = \bar{X_{h}}$ and $S_{X g}^{2} = S_{X h}^{2} .$

Also, $X_{g} = T_{g} + E_{g}$ and $X_{h} = T_{h} + E_{h}$ . Now $T_{i}^{(g)} = T_{i}^{(h)}$ implies $X_{g} - X_{h} = E_{g} - E_{h}$ , so that

\begin{array}{l} {‖ X_{g} ‖}^{2} + {‖ X_{h} ‖}^{2} - 2 ‖ X_{g} ‖ ‖ X_{h} ‖ Cos θ_{g h} = \\ {‖ E_{g} ‖}^{2} + {‖ E_{h} ‖}^{2} - 2 ‖ E_{g} ‖ ‖ E_{h} ‖ Cos θ_{g h}^{(E)} \end{array}

(18)

where $θ_{g h}$ is the angle between $X_{g}$ and $X_{h}$ while $θ_{g h}^{(E)}$ is the angle between $E_{g}$ and $E_{h} .$ But correlation between error scores of two parallel tests is zero. Thus

\begin{array}{l} {‖ X_{g} ‖}^{2} + {‖ X_{h} ‖}^{2} - 2 ‖ X_{g} ‖ ‖ X_{h} ‖ Cos θ_{g h} = \\ {‖ E_{g} ‖}^{2} + {‖ E_{h} ‖}^{2} = 2 N S_{E}^{2} \end{array}

(19)

since $S_{e}^{(g)} = S_{e}^{(h)}$ by definition. The above equation suggests

S_{E}^{2} = \frac{1}{N} [{‖ X_{g} ‖}^{2} + {‖ X_{h} ‖}^{2} - 2 ‖ X_{g} ‖ ‖ X_{h} ‖ Cos θ_{g h}]

(20)

Hence

r_{t t} = 1 - \frac{{‖ X_{g} ‖}^{2} + {‖ X_{h} ‖}^{2} - 2 ‖ X_{g} ‖ ‖ X_{h} ‖ Cos θ_{g h}}{N S_{X}^{2}}

(21)

Equation (20) helps to find value of error variance of the test and hence true score variance as $(S_{X}^{2} - S_{E}^{2})$ , and use them directly in equation (21) to find reliability of the test as per the classical definition from a single administration in terms of length of score vectors of two parallel tests and angle between such vectors. Thus, it is possible to find true score variance from the data and to calculate a reliability that conforms to the theoretical definition even if true scores of individuals taking the test are not known.

Note that while equation (21) finds reliability $(r_{t t})$ of a test which is isomorphic to definition, the process of splitting the test in two parallel halves gives the popular method of finding split-half reliability of a test as correlation between two parallel forms of the test $(r_{g h})$ . However, value of $r_{g h}$ may be different from value of $r_{t t}$ .

The approach of finding reliability as per definition can be extended to find reliability of a battery of tests aiming to measure a finite number of variables. After administration of the battery to N-individuals, values of $S_{X}^{2}$ , $S_{E}^{2}$ , $S_{T}^{2}$ , and $r_{t t}$ for each constituent test can be computed. Method of obtaining reliability of the battery depends on these parameters and also on definition of battery score.

If battery score is taken as sum of score of K-tests (summative score), it can be proved that

r_{t t (battery)} = \frac{\sum_{i = 1}^{K} r_{t t (i)} S_{X i}^{2} + \sum_{i = 1, i \neq j}^{K} \sum_{j = 1}^{K} 2 Cov (X_{i}, X_{j})}{\sum_{i = 1}^{K} S_{X i}^{2} + \sum_{i = 1, i \neq j}^{K} \sum_{j = 1}^{K} 2 Cov (X_{i}, X_{j})}

(22)

If $W_{1}, W_{2}, \dots, W_{K}$ are the weights to K-constituent tests of a battery where $W_{i} \geq 0 \forall i = 1, 2, 3, \dots, K$ and $\sum W_{i} = 1$ and the battery score be $Y_{i} = \sum_{i = 1}^{K} W_{i} X_{i}$ .

Clearly, $var (Y) = \sum_{i = 1}^{K} W_{i}^{2} var(X_{i})$ . Reliability of the battery can be computed as

r_{t t (battery)} = \frac{\sum_{i = 1}^{K} r_{t t (i)} W_{i}^{2} S_{X i}^{2} + \sum_{i = 1, i \neq j}^{K} \sum_{j = 1}^{K} 2 W_{i} W_{j} Cov (X_{i}, X_{j})}{\sum_{i = 1}^{K} W_{i}^{2} S_{X i}^{2} + \sum_{i = 1, i \neq j}^{K} \sum_{j = 1}^{K} 2 W_{i} W_{j} Cov (X_{i}, X_{j})}

(23)

Validity

Usual procedure of obtaining validity of a test is to find correlation of the test score (X) and criterion score (Y). It is known that $r_{X Y} = Cos θ_{x y}$ , where $θ_{x y}$ is the angle between the vectors of deviation scores. Linear regression of Y on X can also be represented by the vector resulting from the projection is $\hat{y} = p (y on x)$ , the vector of predicted values on Y.

Item statistics for test consisting of binary items

Difficulty value of a test

If X coincides with I , then the test is extremely easy since each subject has got maximum possible score. Thus, difficulty value of a test should consider twofold criteria, namely, $θ_{X}$ and ratio of $‖ I ‖$ and $‖ X ‖$ .

Accordingly, difficulty value of a test D may be defined as

D = \frac{‖ X ‖ Cos θ_{X}}{‖ I ‖} = \frac{\bar{X}}{n}

(24)

Note that equation (24) defines difficulty value of a test as ratio of product of length of the observed score vector andcosine of the angle between the observed score vector and the ideal vector as the numerator and length of the ideal vector as the denominator, keeping harmony with the usual notion of difficulty value of a test which actually measures degree of easiness of a test. It can be proved that 0 ⩽ D ⩽ 1.

Item difficulty value

In the N-dimensional person space, scores of an item may be characterized by an N-dimensional vector whose components are 0s and 1s. For an item, the vector I represents maximum possible score with components $I_{1}, I_{2}, \dots, I_{N}$ , where $I_{i} = I_{j} = 1$ $\forall i, j = 1, 2, \dots N$ . Thus, $‖ I ‖ = \sqrt{N}$ . If K-persons (K ⩽ N) answer the ith item correctly, then $‖ X ‖ = \sqrt{K}$ . Thus, difficulty value of the ith item is defined by

D_{i} = \frac{{‖ X ‖}^{2}}{{‖ I ‖}^{2}} = \frac{K}{N}

(25)

Clearly, 0 ⩽ Di ⩽ 1. It may be observed that difficulty value of an item as per equation (25) in terms of ratio of square of length of the observed score vector and square of length of the ideal vector coincides with normal idea of proportion of persons passing an item.

Discriminating value of a test

If the vector X makes a zero degree angle with the vector I, then the test fails to discriminate the subjects. So $θ_{X}$ or a suitably defined function of $θ_{X}$ will measure the discriminating value of a test. Since it is desirable for the discriminating value to lie in [0, 1], $Tan θ_{X}$ will measure the discriminating value of a test. Thus

D^{1} = Tan θ_{X} = \frac{S_{X}}{\bar{X}} (from equations (1) and (9))

(26)

where $D^{1}$ denotes the discriminating value of a test.

Thus, discriminating value of a test equals to tan of the angle between observed score vector and ideal vector which is the ratio of SD and mean of the test score. In other words, discriminating value of a test is a linear function of its coefficient of variation (CV), a well-known measure of relative precision which is independent of change of scale but not of origin.

Discriminating value of an item

The method also helps to find discriminating value of an item as follows

D_{i}^{1} = Tan θ_{X i}

(27)

where $D_{i}^{1}$ denotes the discriminating value of the ith item and $θ_{X i}$ is the angle between the observed score vector for the ith item and ideal vector of an item with each component equal to unity.

Note that this method of finding discriminating value of an item does not consider total test score. The method considers entire data relating to the item and avoids consideration of top 27% and bottom 27% of data. However, item discriminating value tends to be lower for tests measuring different content areas and cognitive skills.

Evaluation of Tan θX

For the ith item, the components of a vector X are K-numbers of 1s and rest 0s, if K-persons could answer the item correctly, where K ⩽ N. Each of the N-component of the ideal vector is equal to unity. Let θ_Xi denotes the angle between the vectors X and I. Here

Cos θ_{X i} = \sqrt{\frac{K}{N}} \Rightarrow Tan θ_{X i} = \sqrt{\frac{N - K}{K}}

(28)

Note that

1. The discriminating value of an item is equal to the ratio of SD of the item score and mean of the item score, and can exceed unity. It is a function of CV.

2. If K = 0, that is, if all the subjects fail in an item, then discriminating value is not defined for the item.

3. If K = N, that is, if all the subjects pass an item, then discriminatory value is zero for that item.

4. It can be shown that square of discriminating value of an item is equal to reciprocal of difficulty value minus one. That is

{[D_{i}^{1}]}^{2} = \frac{1}{D_{i}} - 1

(29)

Equation (29) gives the non-linear relationship between difficulty value and discriminating value of items.

5. Relationship between difficulty value and discriminating value of a test can be found as follows

D = \frac{\bar{X}}{n} and D^{1} = Tan θ_{X} = \frac{S_{X}}{\bar{X}} \Rightarrow D D^{1} = \frac{S_{X}}{n}

(30)

Thus, product of difficulty value and discriminating value of a test is equal to SD of the test divided by the number of items.

Relationships between test reliability and discriminating value of a test

Since $\bar{X} = \bar{T}$ and test reliability $r_{t t} = S_{T}^{2} / S_{X}^{2}$

r_{t t} {(D^{1})}^{2} = \frac{S_{T}^{2}}{S_{X}^{2}} = {(\frac{S_{T}}{\bar{X}})}^{2} = {(\frac{S_{T}}{\bar{T}})}^{2}

Thus, product of test reliability and square of test discriminating value is equal to square of CV of true scores. However, verification of the relationship may require finding test reliability as per the definition, which is described in equation (21).

Item correlations

Phi coefficient $(Φ_{s t})$ is the proper statistic to estimate the degree of relationship between the scores of sth and tth items (both dichotomous variables) and is defined as $Φ_{s t} = k_{11} k_{00} - k_{10} k_{01} / \sqrt{k_{1 .} k_{0 .} k_{. 0} k_{. 1}}$ , where the non-negative symbols are taken from the 2 × 2 table for two different items (s and t):

	s = 1	s = 0	Total
t = 1	k ₁₁	k ₁₀	k _1.
t = 0	k ₀₁	k ₀₀	k _0.
Total	k _.1	k _.0	N

Correlation between two items can also be viewed through N-dimensional person space. Let $θ_{s t}$ be the angle between the vectors $X_{s}$ and $X_{t}$ . Then, ${‖ X_{t} ‖}^{2} = k_{1}$ , that is, the number of persons attempting correctly the tth item. Similarly, ${‖ X_{s} ‖}^{2} = k_{. 1}$ , that is, the number of persons attempting correctly the sth item. The number of persons who could answer correctly both sth and tth items is $k_{11} = \sum \sum X_{i s} X_{j t} = X_{s}^{Τ} X_{t} .$

The number of persons who failed to answer correctly both the items is $k_{00} = N - \sum \sum X_{i s} X_{j t}$ . Clearly, $k_{10} = k_{1} - k_{11} = {‖ X_{t} ‖}^{2} - \sum \sum X_{i s} X_{j t}$ . Similarly, k₀₁ = k_.1 – k₁₁ $= {‖ X_{s} ‖}^{2} - \sum \sum X_{i s} X_{j t}$ and k₀₀ = k₀ – k_{01 = (N – k₁)} $- k_{01} = (N - {‖ X_{t} ‖}^{2}) - {‖ X_{s} ‖}^{2} + \sum \sum X_{i s} X_{j t} .$

Thus, each term of the formula for $Φ_{s t}$ can be written in terms of ${‖ X_{t} ‖}^{2}$ , ${‖ X_{s} ‖}^{2}$ , $X_{s}^{Τ} X_{t}$ , and N. In other words, $Φ_{s t}$ can be viewed and computed through N-dimensional person space as

Φ_{s t} = \frac{\begin{array}{l} X_{s}^{T} X_{t} (N - X_{s}^{T} X_{t}) - (X_{t}^{2} - X_{s}^{T} X_{t}) ({‖ X_{s} ‖}^{2} - X_{s}^{T} X_{t}) \end{array}}{\sqrt{{‖ X_{t} ‖}^{2} {‖ X_{s} ‖}^{2} (N - {‖ X_{t} ‖}^{2}) (N - {‖ X_{s} ‖}^{2})}}

(31)

Item-total correlations

Point-biserial correlation coefficient $(r_{p b})$ is the proper statistic to estimate the degree of relationship between score of an item (dichotomous variable) and test score (interval/ratio scale) and is defined as $r_{p b} = \frac{(M_{p} - M_{q}) \sqrt{p q}}{S_{X}}$ , where $r_{p b}$ denotes point-biserial correlation coefficient, $M_{p}$ denotes test mean for persons answering the item correctly (i.e. those coded as 1s), $M_{q}$ denotes test mean for persons answering the item incorrectly (i.e. those coded as 0s), $S_{X}$ denotes SD of the test scores, p denotes proportion of persons answering correctly (i.e. those coded as 1s), and q denotes proportion of persons answering incorrectly (i.e. those coded as 0s).

Let $X_{s}$ be the vector showing scores of the sth item in the N-dimensional person space. If K-persons could answer the sth item correctly, then ${‖ X_{s} ‖}^{2} = K$ .

Now $X^{⊺} X_{s} = \sum \sum X_{i} X_{i s}$ = sum of test scores for persons answering the sth item incorrectly = sum of test scores– $\sum \sum X_{i} X_{i s} = \sqrt{N} ‖ X ‖ Cos θ_{X} - \sum \sum X_{i} X_{i s}$ (using equation (1)).

Here

M_{p s} = \frac{\begin{array}{l} whole-test mean for persons \\ answering the s th item correctly \end{array}}{K} = \frac{\sum^{​} \sum^{​} X_{i} X_{i s}}{K}

\begin{array}{l} M_{q s} = \frac{\begin{array}{l} whole-test mean for persons \\ answering the s th item incorrectly \end{array}}{N - K} \\ = \frac{\sqrt{N} ‖ X ‖ Cos θ_{X} - \sum \sum X_{i} X_{i s}}{N - K} \end{array}

\begin{array}{l} SD of the whole test S_{t} = \frac{‖ X ‖ Sin θ_{X}}{\sqrt{N}} . \\ p_{s} = Proportion of persons answering the \\ s th item correctly = \frac{K}{N} \end{array}

\begin{array}{l} q_{s} = Proportion of persons answering \\ the s th item incorrectly = \frac{N - K}{N} \end{array}

Thus, point-biserial correlation between the sth item $(r_{p b (s)})$ and test score may be obtained by

\begin{array}{l} r_{p b (s)} = \frac{(\begin{array}{l} \frac{\sum \sum X_{i} X_{i s}}{K} - \\ \frac{\sqrt{N} ‖ X ‖ Cos θ_{X} - \sum \sum X_{i} X_{i s}}{N - K} \end{array}) \frac{1}{N} \sqrt{K (N - K)}}{\frac{‖ X ‖ Sin θ_{X}}{\sqrt{N}}} \\ = \frac{(\begin{array}{l} \frac{\sum \sum X_{i} X_{i s}}{K} - \\ \frac{\sqrt{N} ‖ X ‖ Cos θ_{X} - \sum \sum X_{i} X_{i s}}{N - K} \end{array}) \sqrt{\frac{K (N - K)}{N}}}{‖ X ‖ Sin θ_{X}} \end{array}

(32)

This may be simplified further as

r_{p b (s)} = \frac{(\frac{X^{⊺} X_{s}}{K} - \frac{\sum X_{i} - X^{⊺} X_{s}}{N - K}) \sqrt{\frac{K (N - K)}{N}}}{\sqrt{N} S_{X}}

(33)

Item reliability

In addition to item difficulty and item discrimination, the concept of item reliability is also used in item statistics. Item reliability is defined as a product of SD of item scores and a correlational discrimination index, usually item-total correlation. Thus, reliability of the ith item could be taken as

S_{i} r_{p b (i)} = \frac{\sqrt{N} (M_{p} - M_{q}) p q}{S_{X}} = \frac{(N - K) (M_{p} - M_{q})}{S_{X}}

(34)

where $S_{i} = \sqrt{N p q}$ denotes SD of the score of ith item and $r_{p b (i)}$ denotes point-biserial correlation coefficient of the ith item, and the number of persons answering correctly the ith item is K. However, $Cos θ_{i X}$ , that is, the angle between score vector of the ith item and total score vector of the test X, indicates a better measure of reliability of the ith item.

Higher values of item reliability are desirable. If two items have equal item discriminating value, then the item with higher variance is preferred to be retained.

Likert-type test

Background

Suppose n-respondents have answered each of the m-items of a Likert-type questionnaire, where each item has k-numbers of response categories, marked as $1, 2, \dots, k$ . Let $X_{i j}$ be a general element of the basic data matrix where n-individuals are in rows and m-items are in columns. $X_{i j}$ represents score of the ith individual for the jth item, where 1 ⩽ X_ij ⩽ k.

Note $\sum_{i = 1}^{n} X_{i j}$ = Sum of scores of all individuals for the jth item (item score for the jth item); $\sum_{j = 1}^{m} X_{i j}$ = Sum of scores of all the items for ith individual, that is, total score of the ith individual (individual score); and $\sum_{i = 1}^{n} \sum_{j = 1}^{m} X_{i j}$ = Sum of scores of all the individuals on all the items, that is, total test score. In addition, one can have another matrix $((f_{i j}))$ of order m × k showing frequency i-th item to j-th response category. A row total = the sample size (n). Similarly, a column total will indicate total number of times that response category was chosen by all the respondents. Denote the column total of jth response category by $f_{0 j}$ for j = 1, 2, …, k. $j = 1, 2, \dots, k .$ Clearly, $\sum_{j = 1}^{k} f_{0 j} = g rand total = (sample size \times number of items) = m n$ .

After administration of the questionnaire to n-respondents, we can calculate empirical probabilities of each response categories of the ith item as $P_{i} = {(p_{i 1}, p_{i 2}, \dots, p_{i k})}^{T} .$ Note that $\sum_{j = 1}^{k} p_{i j} = \sum_{j = 1}^{k} p_{0 j} = 1$ . The ith item will fail to discriminate or it will have zero discriminating value if all respondents choose the jth response category, that is, $p_{i j} = 1$ or if $p_{i j} = \frac{1}{k} \forall j = 1, 2, 3, \dots, k$ . Discriminating value of the ith item may be found as a measure of dissimilarities between these two vectors $P_{i}$ and $P_{0}$ , where $P_{0}$ represents the vector of equal probabilities. Similarly, for the scale, vector showing empirical probabilities will be $T = {(\frac{f_{01}}{m n}, \frac{f_{02}}{m n}, \dots, \frac{f_{0 k}}{m n})}^{T}$ . Clearly, $\sum_{j = 1}^{k} \frac{f_{0 j}}{m n} = 1$ . Discriminating value of the scale may be found as dissimilarities between the vectors T and P₀.

Reliability of an item can be measured as cosine similarity between $P_{i}$ and T. While reliability is a measure of association of two vectors, discrimination is a measure of dissimilarities between the vectors. Thus, a negative relationship is expected between reliability and discriminating value. Computation of the vectors ${P^{'}}_{i} s$ , $P_{0}$ , and T forms the stage to obtain discriminating values and reliability of Likert-type scale and items from the data.

Measure for reliability of Likert-type test

Chakrabartty (2018) proposed two nonparametric methods of finding reliability of Likert-type scale and items considering only the frequency or proportion for each cell of the item-response category matrix without involving any assumptions of continuous nature or linearity or normality for the observed variables, or the underlying variable being measured.

Cosine similarity between ith and jth item is

Cos θ_{i j} = \frac{P_{i}^{T} P_{j}}{‖ P_{i} ‖ ‖ P_{j} ‖}

(35)

where $θ_{i j}$ is the angle between the vectors $P_{i}$ and $P_{j}$ .

Item reliability in terms of correlation between the ith item score and total score can be obtained by

Cos θ_{i T} = \frac{P_{i}^{T} T}{‖ P_{i} ‖ ‖ T ‖}

(36)

where $θ_{i T}$ is the angle between the vectors $P_{i}$ and T.

Test reliability with m-items was found by replacing the polychoric correlation between items i and j by $Cos θ_{i j}$

r_{t t} = \frac{m}{m - 1} (1 - \frac{m}{m + \sum \sum_{i \neq j} Cos θ_{i j}})

(37)

For Bhattacharyya’s similarity, each vector $P_{i}$ and $P_{j}$ was converted to unit vector $π_{i}$ and $π_{i}$ , where $π_{i} = \sqrt{P_{i} / ‖ P_{i} ‖}$ and $π_{j} = \sqrt{P_{j} / ‖ P_{j} ‖}$ so that ${‖ π_{i} ‖}^{2} = {‖ π_{j} ‖}^{2} = 1$ . Here, item reliability in terms of item-test correlation was computed from

ρ (π_{i}, \sqrt{T}) = \sum_{j = 1}^{k} \frac{f_{i j}}{m n} p_{i j}

(38)

and test reliability was taken as

Cos (\bar{\emptyset}) = Cos ({Cot}^{- 1} \frac{\sum Cos \emptyset_{i}}{\sum Sin \emptyset_{i}})

(39)

where ∅ denotes mean or most preferred direction among the angles $\emptyset_{1}, \emptyset_{2}, \emptyset_{3}, \dots, \emptyset_{k}$ between unit vectors.

Each method enables to find reliability irrespective of distribution of the observed or underlying variables and avoiding test of unidimensionality or assumption of normality for Cronbach’s alpha or bivariate normality for polychoric correlations. Bhattacharyya’s similarity helps to find range of reliability of the entire Likert-type scale.

Discriminating value of Likert-type item and Likert-type scale

The concept of CV as a measure of discriminating value of a binary item can be extended to find discriminating value of a Likert-type item with multi-response alternatives and define $D_{i} = S D_{i} / {Mean}_{i}$ for the ith item. Discriminating value of the test is defined accordingly, that is, $D_{T e s t} = \frac{S D_{T e s t}}{{Mean}_{T e s t}}$ . Note that mean of the ith item is $\frac{1}{n} \sum X_{i j} = \frac{f_{0 i}}{n}$ and variance of the ith item is $n {‖ P_{i} ‖}^{2} - {(\frac{f_{0 i}}{n})}^{2}$ .

If $θ_{X}$ be the angle between the test score vector and the maximum possible vector I (each component is equal to k for k-point scale), it can be proved that

{CV}_{T e s t} = \frac{S D_{T e s t}}{{Mean}_{T e s t}} = \frac{‖ X ‖ Sin θ_{X}}{‖ X ‖ Cos θ_{X}} = Tan θ_{X}

(40)

CV is a well-known measure of relative precision which is independent of change of scale but not of origin. Note that $S D / Mean$ is:

non-negative if response categories are marked with positive numbers;

defined even if $p_{i j} = 0$ for a particular response category of ith item;

CV = 0 ⇔ either $p_{i j} = \frac{1}{k} \forall j = 1, 2, \dots, k$ or $p_{i j} = 1$ for a particular jth category;

D_i = D_j ⇔ P_i = P_J;

possible to express $D_{T e s t}$ as a function of ${D'}_{i} s$ since ${Mean}_{T e s t}$ and $S D_{T e s t}$ are combined mean and combined SD, respectively;

shown that estimate of population CV is $σ / μ$ , where µ and $σ$ are population estimates of mean and SD, respectively;

shown that statistical inferences for CV for normally distributed data are often based on McKay’s $χ^{2}$ approximation for the CV (Forkman, 2013).

Findings and conclusion

Through N-dimensional person space, measures of test parameters and item statistics including difficulty value, discriminating value of test, Phi coefficient to depict correlations between a pair of items, and point-biserial correlation to depict item-total correlations with binary items were presented using angular similarity between two vectors and considering the entire data. Non-linear relationships between difficulty value and discriminating value of items and test were derived, including relationship between test reliability and test discriminating value.

Dichotomization of the test in parallel halves helps to find value of error variance, true score variance, and reliability of a test as per theoretical definition from a single administration in terms of length of score vectors of two parallel tests and angle between such vectors. The method was extended to find reliability of a battery of tests.

Reliability and discriminating value of a Likert-type scale and Likert-type item were found in terms of angular similarity using only the permissible operations for a Likert-type scale, that is, frequencies or probabilities of item-response categories without involving assumptions of continuous nature or linearity or normality for the observed variables, or the underlying variable being measured. Thus, such reliabilities and discriminating value are in fact nonparametric. The proposed methods also avoid test of unidimensionality or assumption of normality or bivariate normality associated with the polychoric correlations. Thus, the proposed methods are considered as improvement over the existing ones.

Reliability as a measure of association of two vectors and discrimination as a measure of distance between the vectors are likely to show a negative relationship. Further studies may be undertaken to facilitate comparison of the proposed methods.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Satyendra Nath Chakrabartty

Author biography

Satyendra Nath Chakrabartty is an M.Stat. from Indian Statistical Institute. He was a research scholar at Psychometric Research and Service Unit of Indian Statistical Institute. He has taught postgraduate courses at Indian Statistical Institute, University of Calcutta, Galgotias Business School, and so on. He has 50 publications to his credit, including presentations in national and international seminars. After serving Kolkata Port Trust for 25 years in various managerial positions, he joined Mumbai Port Trust as director (Planning and Research) and subsequently took over as director, Indian Institute of Port Management, and retired from the position of director, Kolkata Campus of the Indian Maritime University. Currently, he is associated with Indian Ports Association, New Delhi, as a consultant.

References

Chakrabartty

(2018) Cosine similarity approaches to reliability of Likert scale and items. Romanian Journal of Psychological Studies 6(1): 3–16.

Chong

Sandra

David

, et al. (2002) Teaching factor analysis in terms of variable space and subject space using multimedia visualisation. Journal of Statistics Education. Epub ahead of print 1 December. DOI: 10.1080/10691898.2002.11910546.

Forkman

(2013) Estimator and tests for common coefficients of variation in normal distributions. Communications in Statistics 38(2): 233–251.

Glass

Collins

(1970) Geometric proof of the restriction on the possible values of rXY when rXZ and rYZ are fixed. Educational and Psychological Measurement 30: 37–39.

Huberty

(1994) Why multivariate analyses? Educational and Psychological Measurement 54: 620–627.

Reyment

Joreskog

(1993) Applied Factor Analysis in the Natural Sciences. Cambridge: Cambridge University Press.

Angular similarity in test parameters

Abstract

Keywords

Introduction

Methodology

Dot product

Ranks of individuals

Mean

Variance

Correlations

Reliability and parallel tests

Validity

Item statistics for test consisting of binary items

Difficulty value of a test

Item difficulty value

Discriminating value of a test

Discriminating value of an item

Evaluation of Tan θX

Relationships between test reliability and discriminating value of a test

Item correlations

Item-total correlations

Item reliability

Likert-type test

Background

Measure for reliability of Likert-type test

Discriminating value of Likert-type item and Likert-type scale

Findings and conclusion

Footnotes

Declaration of conflicting interests

Funding

ORCID iD

Author biography

References