Abstract
The application of item response theory (IRT) methodology to test equating has been a research topic of considerable interest in the past 2 decades. Despite the volume of research, it has been difficult to draw conclusions and make generalizations because different studies have used different types of tests, different types of samples, and different methods for assessing the accuracy of equating results. The purpose of this paper is threefold: (a) to review some of the major studies thus far and synthesize their results, (b) to discuss what questions are as yet unanswered and what problems exist with research methodology, and (c) to provide direction for future research. Whereas earlier research focused on comparing equating methods and IRT models, recent research has addressed such statistical concerns as standard errors of equating, parameter stability, and robustness of IRT models to violations of their assumptions. A major finding from the research so far is that it is unreasonable to expect a single equating method to provide the best results for equating all types of tests. Future research must determine how conditions, such as multidimensionality and test content, affect IRT equating.
Get full access to this article
View all access options for this article.
