Promoting Validity in the Assessment of English Learners

Abstract

Get full access to this article

View all access options for this article.

References

Abedi

(2001, December). Language accommodation for large-scale assessment in science: Assessing English language learners (Final Deliverable, Project 2.4 Accommodation). Los Angeles: National Center for Research on Evaluation, Standards, and Student Testing, University of California Los Angeles.

Abedi

(Ed.). (2007). English language proficiency assessment in the nation: Current status and future practice. Davis: University of California, Davis, School of Education.

Abedi

Ewers

(2013). Accommodations for English learners and students with disabilities: A research based decision algorithm. Retrieved from http://www.smarterbalanced.org/wordpress/wp-content/uploads/2012/08/Accomodations-for-under-represented-students.pdf

Abedi

Linquanti

(2012, January). Issues and opportunities in improving the quality of large scale assessment systems for ELLs. Paper presented the Understanding Language conference, Palo Alto, CA. Retrieved from http://ell.stanford.edu/publication/issues-and-opportunities-improving-quality-large-scale-assessment-systems-ells

Abedi

Lord

Hofstetter

Baker

(2000). Impact of accommodation strategies on English language learners’ test performance. Educational Measurement: Issues and Practice, 19(3), 16–26.

Abedi

Hofstetter

C. H.

Lord

(2004). Assessment accommodations for English language learners: Implications for policy-based empirical research. Review of Educational Research, 74, 1–28.

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.

American Institutes for Research. (2013). ELPA-to-ELPA Look Back. Washington DC. Retrieved from http://www.ode.state.or.us/teachlearn/standards/contentperformance/elpa-to-elpa_look-back_report_02052013.pdf

10.

Anstrom

DiCerbo

Katz

Millet

Rivera

(2010). A review of the literature on academic English: Implications for K-12 English language learners. Arlington, VA: The George Washington University Center for Equity and Excellence in Education.

11.

Bailey

A. L.

Butler

(2007). A conceptual framework for academic English language for broad application to education. In Bailey

A. L.

(Ed.), The language demands of school: Putting academic English to the test (pp. 68–102). New Haven, CT: Yale University Press.

12.

Bailey

A. L.

Carroll

P. E.

(2015). Assessment of English language learners in the era of new academic content standards. Review of Research in Education, 39, 253–294.

13.

Bailey

A. L.

Huang

B. H.

(2011). Do current English language development/proficiency standards reflect the English needed for success in school? Language Testing, 28, 343–365.

14.

Bennett

R. E.

(2010). Cognitively based assessment of, for, and as learning (CBAL): A preliminary theory of action for summative and formative assessment. Measurement: Interdisciplinary Research and Perspectives, 8, 70–91.

15.

Bhola

D. S.

Impara

J. C.

Buckendahl

C. W.

(2003). Aligning tests with states’ content standards: Methods and issues. Educational Measurement: Issues and Practice, 22(3), 21–29.

16.

Bunch

M. B.

(2011). Testing English language learners under No Child Left Behind. Language Testing, 28, 323–341. doi:10.1177/026553221140418.

17.

California Department of Education. (2012, November). Overview of the California English language development standards and proficiency level descriptors. Retrieved from http://www.cde.ca.gov/sp/el/er/documents/sbeoverviewpld.pdf

18.

Campbell

D. T.

Fiske

D. W.

(1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81–105.

19.

Carroll

Bailey

(2014, April). Classification models and English learner redesignation: High performing students left behind? Paper presented at the annual meeting of the National Council on Measurement in Education, Philadelphia, PA.

20.

Cook

H. G.

Wilmes

Boals

Santos

(2008). Issues in the development of annual measurable achievement objectives for WIDA Consortium states (WCER Working Paper No. 2008-2). Madison: Wisconsin Center for Education Research.

21.

Council of Chief State School Officers. (2013). English language proficiency (ELP) standards with correspondences to K-12 English language arts (ELA), mathematics, and science practices, K-12 ELA standards, and 6-12 literacy standards. Washington, DC: Author.

22.

Crocker

L. M.

Miller

Franks

E. A.

(1989). Quantitative methods for assessing the fit between test and curriculum. Applied Measurement in Education, 2, 179–194.

23.

Crotts

Sireci

S. G.

(2014, April). Evaluating computer-based test accommodations for English learners. Paper presented at the annual meeting of the National Council on Measurement in Education, Philadelphia, PA.

24.

Currie

Chiramanee

(2010). The effect of the multiple-choice item format on the measurement of knowledge of language structure. Language Testing, 27, 471–491. doi:10.1177/0265532209356790

25.

De la Torre

Song

Hong

(2011). A comparison of four methods of IRT subscoring. Applied Psychological Measurement, 35, 296–316. doi:10.1177/0146621610378653

26.

Duncan

G. D.

del Rio Parant

Chen

W.-H.

Ferrara

Johnson

Oppler

Shieh

Y.-Y.

(2005). Study of a dual-language test booklet in eighth-grade mathematics. Applied Measurement in Education, 18, 129–161.

27.

Educational Testing Service. (2009). Guidelines for the assessment of English learners. Princeton, NJ: Author.

28.

Forte

(2010). Examining the assumptions underlying the NCLB federal accountability policy on School Improvement. Educational Psychologist, 45, 76–88.

29.

Forte

Faulkner-Bond

Waring

Kuti

Fenner

D. S.

(2010). The administrator’s guide to federal programs for English learners. Washington, DC: Thompson.

30.

Geisinger

K. F.

(2000). Psychological testing at the end of the millennium: A brief historical review. Professional Psychology: Research and Practice, 31, 117–118.

31.

Grissom

J. B.

(2004). Reclassification of English Learners. Education Policy Analysis Archives, 12(36). Retrieved from http://isla.tamucc.edu/files/esl_reclassificationenglishlearners.pdf

32.

Haladyna

T. M.

Rodriguez

M. C.

(2014). Developing and validating test items. New York, NY: Routledge.

33.

Hauger

J. B.

Sireci

S. G.

(2008). Detecting differential item functioning across examinees tested in their dominant language and examinees tested in a second language. International Journal of Testing, 8, 237–250.

34.

Holland

P. W.

Wainer

(Eds.). (1993). Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum.

35.

In’nami

Koizumi

(2009). A meta-analysis of test format effects on reading and listening test performance: Focus on multiple-choice and open-ended formats. Language Testing, 26, 219–244. doi:10.1177/0265532208101006

36.

In’nami

Koizumi

(2012). Factor structure of the Revised TOEIC[R] Test: A multiple-sample analysis. Language Testing, 29, 131–152.

37.

International Test Commission (2010). Guidelines for translating and adapting tests. Retrieved from http://www.intestcom.org. Accessed October 12, 2014.

38.

Kachchaf

Solano-Flores

(2012). Rater language background as a source of measurement error in the testing of English language learners. Applied Measurement in Education, 25, 162–177.

39.

Kane

(1994). Validating the performance standards associated with passing scores. Review of Educational Research, 64, 425–461.

40.

Kane

(2006). Validation. In Brennan

R. L.

(Ed.), Educational measurement (4th ed., pp. 17–64). Washington, DC: American Council on Education/Praeger.

41.

Kane

(2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 1–73.

42.

Kane

M. T.

(1992). An argument-based approach to validity. Psychological Bulletin, 112, 527–535.

43.

Kieffer

M. J.

Lesaux

N. K.

Rivera

Francis

D. J.

(2009). Accommodations for English language learners taking large-scale assessments: A meta-analysis on effectiveness and validity. Review of Educational Research, 79, 1168–1201.

44.

Kim

Herman

J. L.

(2009). A three-state study of English learner progress. Educational Assessment, 14, 212–231. doi:10.1080/10627190903422831

45.

Kim

Herman

J. L.

(2012). Understanding patterns and precursors of ELL success subsequent to reclassification (CRESST Report No. 818). Los Angeles, CA: National Center for Research on Evaluation, Standards, and Student Testing.

46.

Koenig

J. A.

Bachman

L. F.

(2004). Keeping score for all: The effects of inclusion and accommodation policies on large-scale educational assessments. Washington, DC: National Academies Press.

47.

Kopriva

R. J.

Hedgspeth

(2005). Technical manual, selection taxonomy for English language learner accommodation (STELLA) decision-making systems. College Park: University of Maryland, C-SAVE.

48.

Kopriva

R. J.

Emick

J. E.

Hipolito-Delgado

C. P.

Cameron

C. A.

(2007). Do proper accommodation assignments make a difference? Examining the impact of improved decision making on scores of English language learners. Educational Measurement: Issues and Practice, 26(3), 11–20.

49.

Kuriakose

(2011, January 1). The factor structure of the English language development assessment: A confirmatory factor analysis (Doctoral dissertation). Available from ProQuest Dissertations and Theses database. (ED535975)

50.

Lakin

J. M.

Young

J. W.

(2013). Evaluating growth for ELL students: Implications for accountability policies. Educational Measurement: Issues and Practice, 32(3), 11–26. doi:10.1111/emip.12012

51.

Lane

(2014). Validity evidence based on testing consequences. Psicothema, 26, 127–135. doi:10.7334/psicothema2013.258

52.

Lane

Leventhal

(2015). Psychometric challenges in assessing English language learners and students with disabilities. Review of Research in Education, 39, 165–214.

53.

Lee

(2008). Classification consistency and accuracy for complex assessments using item response theory (CASMA Research Report No. 27). Iowa City: University of Iowa.

54.

Suen

H. K.

(2012). The effects of test accommodations for English learners: A meta-analysis. Applied Measurement in Education, 25, 327–346.

55.

Linquanti

(2001). The redesignation dilemma: Challenges and choices in fostering meaningful accountability for English learners (Policy Report No. 2001-1). Santa Barbara: University of California Linguistic Minority Research Institute.

56.

Livingston

S. A.

Lewis

(1995). Estimating the consistency and accuracy of classifications based on test scores. Journal of Educational Measurement, 32, 179–197.

57.

Luecht

R. M.

Ackerman

(2007). Oregon English Language Proficiency Examination (EPLA) dimensionality analysis for blended-domain locator blocks. Greensboro, NC: Center for Assessment Research and Technology.

58.

Martiniello

(2008). Language and the performance of English-language learners in math word problems. Harvard Educational Review, 78, 333–368.

59.

Martone

Sireci

S. G.

(2009). Evaluating alignment between curriculum, assessments, and instruction. Review of Educational Research, 4, 1332–1361.

60.

Massachusetts Department of Elementary and Secondary Education. (2012). Transitioning English language learners in Massachusetts: An exploratory data review. Malden, MA: Author. Retrieved from http://www.nciea.org/publication_PDFs/Transitioning%20ELL_CD12.pdf

61.

McNamara

Knoch

(2012). The Rasch wars: The emergence of Rasch measurement in language testing. Language Testing, 29, 555–576. doi:10.1177/0265532211430367

62.

Messick

(1989). Validity. In Linn

(Ed.), Educational measurement (3rd ed., pp. 13–100). Washington, DC: American Council on Education.

63.

Mosher

F. A.

(2011). The role of learning progressions in standards-based education reform (Policy Brief No. RB-52). Philadelphia, PA: Consortium for Policy Research in Education.

64.

National Center for Education Statistics. (n.d.). Common core of data (CCD). Retrieved from http://nces.ed.gov/ccd/index.asp

65.

National Research Council. (2011). Allocating federal funds for state programs for English language learners. Washington, DC: National Academies Press. Retrieved from http://www.nap.edu/openbook.php?record_id=13090

66.

O’Conner

Abedi

Tung

(2012a). A descriptive analysis of enrollment and achievement among English language learner students in Pennsylvania: Summary (Issues & Answers, REL 2012-No. 127). Retrieved from http://files.eric.ed.gov/fulltext/ED531429.pdf

67.

O’Conner

Abedi

Tung

(2012b). A descriptive analysis of enrollment and achievement among limited English proficient students in New Jersey (Issues & Answers, REL 2012-No. 108). Retrieved from http://files.eric.ed.gov/fulltext/ED531432.pdf

68.

Padilla

J.-L.

Benitez

(2014). Validity evidence based on response processes. Psicothema, 26, 110–117.

69.

Parker

Louie

O’Dwyer

(2009). New measures of English language proficiency and their relationship to performance on large-scale content assessments (Issues & Answers, REL 2009 No. 066). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Northeast and Islands. Retrieved from http://ies.ed.gov/ncee/edlabs

70.

Partnership for the Assessment of Readiness for College and Career. (2013). Accessibility features and accommodations manual. Washington, DC: Author.

71.

Pennock-Roman

(2002). Relative effects of English proficiency on general admissions tests versus subject tests. Research in Higher Education, 43, 601–623.

72.

Pennock-Roman

Rivera

(2011). Mean effects of test accommodations for ELLs and non-ELLs: A meta-analysis of experimental studies. Educational Measurement: Issues and Practice, 30(3), 10–28.

73.

Ragan

Lesaux

(2006). Federal, state, and district level English language learner program entry and exit requirements: Effects on the education of language minority learners. Education Policy Analysis Archives, 14(20), 1–32.

74.

Ramsey

P. A.

(1993). Sensitivity review: The ETS experience as a case study. In Holland

P. W.

Wainer

(Eds.), Differential item functioning (pp. 367–388). Hillsdale, NJ: Erlbaum.

75.

Rivera

Collum

Willner

L. S.

Sia

J. K.

Jr. (2006). An analysis of state assessment policies addressing the accommodation of English language learners. In Rivera

Collum

(Eds.), A national review of state assessment policy and practice for English language learners (pp. 1–173). Mahwah, NJ: Lawrence Erlbaum.

76.

Robinson

J. P.

(2011). Evaluating criteria for English learner reclassification: A causal-effects approach using a binding-score regression discontinuity design with instrumental variables. Educational Evaluation and Policy Analysis, 33, 267–292. doi:10.3102/0162373711407912

77.

Römhild

Kenyon

MacGregor

(2011). Exploring domain-general and domain-specific linguistic knowledge in the assessment of academic English language proficiency. Language Assessment Quarterly, 8, 213–228. doi:10.1080/15434303.2011.558146

78.

Rudner

L. M.

(2001). Computing the expected proportions of misclassified examinees. Practical Assessment, Research & Evaluation, 7(14).

79.

Rudner

L. M.

(2004, April). Expected classification accuracy. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego, CA.

80.

Saunders

W. M.

Goldenberg

(2010). Research to guide English language development instruction. In Improving education for English learners: Research-based approaches (pp. 20–81). Sacramento: California Department of Education.

81.

Sawaki

Stricker

L. J.

Oranje

A. H.

(2009). Factor structure of the TOEFL Internet-based test. Language Testing, 26, 5–30.

82.

Schleppegrell

M. J.

O’Hallaron

C. L.

(2011). Teaching academic language in L2 secondary settings. Annual Review of Applied Linguistics, 31, 3–18. doi:10.1017/S0267190511000067

83.

Shepard

Taylor

Betebenner

(1998). Inclusion of limited-English-proficient students in Rhode Island’s grade 4 mathematics performance assessment. Los Angeles: University of California, Center for the Study of Evaluation/National Center for Research on Evaluation, Standards, and Student Testing.

84.

Sireci

S. G.

(1998). Gathering and analyzing content validity data. Educational Assessment, 5, 299–321.

85.

Sireci

S. G.

(2013). Agreeing on validity arguments. Journal of Educational Measurement, 50, 99–104.

86.

Sireci

S. G.

Scarpati

(2003). The effects of tests accommodations on test performance: A review of the literature. Commissioned paper by the National Academy of Sciences/National Research Council’s Board on Testing and Assessment. Washington, DC: National Research Council.

87.

Sireci

S. G.

Mullane

L. A.

(1994). Evaluating test fairness in licensure testing: The sensitivity review process. CLEAR Exam Review, 5(2), 22–28.

88.

Sireci

S. G.

Rios

J. A.

Powers

(in press). Comparing test scores from tests administered in different languages. In Dorans

Cook

(Eds.) Fairness. New York, NY: Routledge.

89.

Sireci

S. G.

Wells

(2014, April). Using internal structure validity evidence to evaluate test accommodations. Paper presented at the annual meeting of the National Council on Measurement in Education, Philadelphia, PA.

90.

Smarter Balanced Assessment Consortium. (2013). Usability, accessibility, and accommodations guidelines. San Francisco, CA: WestEd.

91.

Solano-Flores

(2012). Translation accommodations framework for testing English language learners in mathematics. Retrieved from http://www.smarterbalanced.org/wordpress/wp-content/uploads/2012/09/Translation-Accommodations-Framework-for-Testing-ELL-Math.pdf

92.

Solano-Flores

(2009). Language variation and score variation in the testing of English language learners, native Spanish speakers. Educational Assessment, 14, 180–194.

93.

Solano-Flores

Trumbull

Nelson-Barber

(2002). Concurrent development of dual language assessments: An alternative to translating tests for linguistic minorities. International Journal of Testing, 2, 107–129.

94.

Stansfield

C. W.

(2003). Test translation and adaptation in public education in the USA. Language Testing, 20, 188–206.

95.

Swanson

C. B.

(2009). Perspectives on a population: English-language learners in American Schools. Bethesda, MD: Editorial Projects in Education Research Center.

96.

Thompson

Blount

Thurlow

(2002). A summary of research on the effects of test accommodations: 1999 through 2001 (Technical Report No. 34). Minneapolis: University of Minnesota, National Center on Educational Outcomes. Retrieved from http://education.umn.edu/NCEO/OnlinePubs/Technical34.htm

97.

UCLA. (n.d.). Dynamic Language Learning Progression Project: Resources on learning progressions. Retrieved from http://www.dllp.org/index.php/resources/

98.

U.S. Department of Education. (2011). Consolidated state performance reports: Parts I and II school years 2009-2010. Washington, DC: Office of Elementary and Secondary Education. Retrieved from http://www2.ed.gov/admins/lead/account/consolidated/index.html#sy09-10

99.

U.S. Department of Education. (2012). National evaluation of Title III implementation supplemental report: Exploring approaches to setting English language proficiency performance criteria and monitoring English learner progress. Washington, DC: Author.

100.

U.S. Department of Education, Office of Elementary and Secondary Education. (2009). Standards and assessments peer review guidance: Information and examples for meeting the requirements of the No Child Left Behind Act of 2001 (Third revision). Washington, DC: Author.

101.

WIDA Consortium. (2012). 2012 Amplification of the English language development standards, Kindergarten-Grade 12. Madison: Board of Regents of the University of Wisconsin System.

102.

Willner

Rivera

Acosta

(2008). Descriptive study of state assessment policies for accommodating English language learners. Arlington, VA: George Washington University Center for Equity and Excellence in Education.

103.

Wilson

Moore

(2011). Building out a measurement model to incorporate complexities of testing in the language domain. Language Testing, 28, 441–462. doi:10.1177/0265532210394142

104.

Wolf

M. K.

Kim

Kao

(2012). The effects of glossary and read-aloud accommodations on English language learners’ performance on a mathematics assessment. Applied Measurement in Education, 25, 347–374.

105.

Wolf

M. K.

Leon

(2009). An investigation of the language demands in content assessments for English language learners. Educational Assessment, 14, 139–159. doi:10.1080/10627190903425883

106.

Wolf

M. K.

Wang

Holtzman

(2011, April). Investigating the constructs of English language proficiency assessments and ELLs’ performance on the assessments. Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA.

107.

Working Group on ELL Policy. (2010). Improving educational outcomes for English language learners: Recommendations for the reauthorization of the Elementary and Secondary Education Act. Retrieved from http://ellpolicy.org/wp-content/uploads/ESEAFinal.pdf

108.

Young

J. W.

Pitoniak

M. J.

King

T. C.

Ayad

(2012). Guidelines for accessibility for English language learners. Retrieved from http://www.smarterbalanced.org/wordpress/wp-content/uploads/2012/05/TaskItemSpecifications/Guidelines/AccessibilityandAccommodations/GuidelinesforAccessibilityforELL.pdf

109.

Zhang

(2010). Assessing the accuracy and consistency of language proficiency classification under competing measurement models. Language Testing, 27, 119–140.

110.

Zwick

Schlemer

(2004). SAT validity for linguistic minorities at the University of California, Santa Barbara. Educational Measurement: Issues and Practice, 23, 6–16.