Abstract
This study investigated the effectiveness of the Mantel-Haenszel (MH) statistic in detecting dif ferentially functioning (DIF) test items when the internal criterion was varied. Using a dataset from a statewide administration of a life skills examina tion, a sample of 1,000 Anglo-American and 1,000 Native American examinee item response sets were analyzed. The MH procedure was first applied to all the items involved. The items were then cate gorized as belonging to one or more of four subtests based on the skills or knowledge needed to select the correct response. Each subtest was then analyzed as a separate test, using the MH pro cedure. Three control subtests were also established using random assignment of test items and were analyzed using the MH procedure. The results revealed that the choice of criterion, total test score versus subtest score, had a substantial influence on the classification of items as to whether or not they were differentially functioning in the American and Native American groups. Evidence for the convergence of judgmental and statistical procedures was found in the unusually high proportion of DIF items within one of the classifications and in the results of the reanalysis of this group of items.
Get full access to this article
View all access options for this article.
