Abstract
Although students are often taught to look for keywords when solving word problems, this strategy is erroneous. It is especially problematic when students solve
In this study, we examined the performance of Grade 3 students on consistent and inconsistent word problems. Half of these students experienced mathematics difficulty (MD), and many were dual-language learners (DLLs). In this introduction, we review the research on inconsistent word problems and the use of a keyword strategy to solve word problems. Then, we describe students with MD, DLLs, and the word-problem features that might cause difficulty for these students. Finally, we present the purpose and research questions guiding our research.
Prior Research on Inconsistent Word Problems
Much of the prior research related to inconsistent word problems involved undergraduate participants (e.g., Hegarty et al., 1992, 1995; Jaffe & Bolger, 2023; Lewis & Mayer, 1987; Lubin et al., 2016). Across studies, undergraduate participants spent more time working on inconsistent problems compared with consistent problems and solved inconsistent problems with less accuracy. Hegarty et al. (1995) compared participants who successfully solved inconsistent problems with those who did not. Using eye-tracking technology, the researchers determined that participants who were unsuccessful at solving inconsistent word problems tended to fixate more on the numbers and relational terms (e.g.,
Other researchers have focused on participants in elementary and middle school (e.g., Boonen et al., 2016; Pape, 2003; Passolunghi et al., 2022; Shum & Chan, 2020). Bartalis et al. (2023) conducted a similar study to Hegarty et al. (1995) with Grade 4 participants. In this eye-tracking study, students who were less successful at solving word problems also tended to fixate more on the relational term (e.g.,
To date, few studies have examined the impact of keywords on the word-problem performance of a more diverse sample of elementary students. As such, implications for elementary classrooms, particularly in the United States, are limited. According to the National Center for Education Statistics (NCES), approximately 15% of U.S. public school students receive special education services, and approximately 11% are DLLs (NCES, 2024a, 2024b). Thus, it is crucial to extend the research on consistent versus inconsistent word problems by including a broader range of elementary students.
Keywords Instruction
Despite a breadth of research related to how difficult it is for students to solve inconsistent word problems such as those in which
However, not all word problems have consistent language. Powell, Namkung and Lin (2022) analyzed 214 word problems from high-stakes tests in the United States. Among the one-step word problems, the keywords strategy would be effective less than 50% of the time. For multi-step problems, the keywords strategy would be effective less than 10% of the time. Importantly, even without the presence of instruction that ties keywords directly to operations, students can form these assumptions on their own, making inconsistent problems particularly difficult (Van Dooren et al., 2005). To strengthen the call for teachers to avoid reinforcing the ineffective keywords strategy, our study extends previous research by including students with MD.
Mathematics Difficulty
Students with MD demonstrate persistent low performance in mathematics. Students with MD may or may not have a school-identified learning disability. Often, researchers identify students as having MD if they score at or below a specific percentile (e.g., 25th) on a mathematics screener (Nelson & Powell, 2018). In this study, we categorized students as having MD if they performed at or below the 25th percentile on a word-problem screener described later in the manuscript. Students with MD often experience challenges with a variety of mathematical tasks, including fact fluency, computation, and word-problem solving (Andersson, 2008; Jordan et al., 2003; Mabbott & Bisanz, 2008). When solving word problems, students with MD often struggle to plan and effectively solve problems without explicit instruction on strategies (Powell, Doabler, et al., 2020). Another compounding factor related to word-problem proficiency is dual-language status.
Dual-Language Learners
We define DLLs as students who speak a language other than English in the home. The average percentage of DLLs enrolled in public schools, by state, is approximately 11% (NCES, 2024b). The largest percentage is in Texas, with more than 20% of students categorized as DLLs. To increase the generalizability of education research, it is essential for researchers to recruit diverse samples that are representative of today’s classrooms. Previous researchers have examined DLL’s performance on word problems (e.g., Abedi & Lord, 2001; King & Powell, 2023; Martin & Fuchs, 2019; Powell, Urrutia, et al., 2022). For example, Abedi and Lord (2001) administered word problems to Grade 8 students, and DLLs scored significantly lower than non-DLLs. However, after revising the word problems to reduce linguistic complexity, the majority of students’ scores increased, particularly those of DLLs.
To examine the interaction between DLL status and MD status, Martin and Fuchs (2019) administered word problems to Grade 1 students in the fall and spring. In the fall, DLLs and non-DLLs with MD performed comparably. However, among students without MD, DLLs performed lower than their non-DLL peers. Finally, by spring, both DLLs with and without MD performed significantly lower than their non-DLL peers. This finding differed from that of Powell, Urrutia, et al. (2022), who administered word problems to Grade 3 students with MD, both with and without DLL status. Among these students with MD, DLL status did not result in significantly different word-problem performance.
To further examine the interaction between DLL status and MD, King and Powell (2023) analyzed the data of Grade 3 DLLs with MD. Prior to implementing a word-problem intervention, DLL’s scores on an English-proficiency assessment in the areas of speaking, listening, reading, and writing, strongly correlated with their word-problem performance. However, upon implementing the word-problem intervention, these correlations weakened, leaving only reading and writing as significant predictors of DLLs’ word-problem performance. In addition to MD and DLL status, word-problem performance can vary due to word-problem features, a topic we explore in the next section.
Word-Problem Features
Arsenault and Powell (2022) analyzed which word-problem features were the most challenging for students with and without MD. Nearly one third of their participants were DLLs. The features of interest included schema (i.e., problem type), position of the unknown, inclusion of irrelevant information, and relevant information presented in charts or graphs. Arsenault and Powell (2022) conducted their analysis using data from the same larger study (i.e., Powell et al., 2021) with which we conducted our analysis. Their sample included 692 Grade 3 students with MD and 2,149 Grade 3 students without MD. Word problems were scored as having correct or incorrect numerical answers.
First, Arsenault and Powell (2022) analyzed students’ performance on word problems according to the
Next, Arsenault and Powell (2022) analyzed students’ performance on word problems according to the location of the unknown. In one-step word problems, the unknown information may be in the initial, medial, or final position. For Total problems, the unknown information may be the total or one of the parts. For Difference problems, the greater amount, lesser amount, or difference can be unknown. For example:
The results of the Arsenault and Powell (2022) analysis demonstrated conflicting patterns of student performance among schemas and positions of the unknown, due in part to the inclusion of irrelevant information and information presented in charts or graphs. For example, previous research has suggested that Difference problems with an unknown difference typically elicit a high rate of accuracy (García et al., 2006; Powell et al., 2009). However, on such a word problem, students in Arsenault and Powell (2022) performed poorly. Importantly, this item also included irrelevant information and relevant information presented in a graph, which contributed to the difficulty. Conversely, students with and without MD had a markedly higher percentage of accuracy on the Difference problem with the greater amount unknown. Arsenault and Powell hypothesized this increase was due to the correct operation being addition, citing students’ tendency to choose addition rather than subtraction when solving word problems. However, this item also included the word
Lastly, coinciding with previous research, students with MD generally had a higher percentage of accuracy on Change problems with an unknown end amount compared with those with an unknown start or change amount (Arsenault & Powell, 2022). However, there was an outlier. A Change decrease problem with an unknown change amount had one of the highest percentages of accuracy among students with MD. The authors hypothesized this may have occurred because the item included keywords related to subtraction (i.e.,
We aimed to build on the work of Arsenault and Powell (2022) by examining an additional word-problem feature that could adversely affect word-problem performance: the inclusion of keywords that may be tied directly to operations (e.g.,
Purpose and Research Questions
The purpose of our study was to further isolate and examine the impact of the inclusion of keywords on the word-problem performance of Grade 3 students. We focused specifically on word problems that included the word
How does the inclusion of the word
How does the accuracy of constructed equations differ between students with and without MD, and students with and without DLL status?
Method
Context
We analyzed screening data collected during a randomized-controlled trial about the efficacy of a word-problem intervention (Powell et al., 2021) that had been approved by our university’s Institutional Review Board. This study was conducted in a large, urban school district in the Southwest of the United States, and we had received approval from the school district to conduct this study in their schools with Grade 3 students. Each year, for 3 years, we screened Grade 3 students from 1 of 26 elementary schools for eligibility into a study focused on efficacy of a word-problem intervention. At the time, this public school district served more than 75,000 students. On average, the district reported 55.5% of students as Hispanic, 29.6% as White, 7.1% as African American, and 7.7% as belonging to another race or ethnic category. Overall, 27.1% of students identified as DLLs, 52.4% qualified as economically disadvantaged, and 12.1% received special education services. In Cohort 1 (2015–2016), we screened 1,109 students. In Cohort 2 (2016–2017), we screened 914 students, and we screened 818 students in Cohort 3 (2017–2018).
Measure
Before describing the participants, we describe the measure of focus for this study because participants were selected based on their performance on this measure. We screened all Grade 3 students with the screening measure of
Of these four word problems with the word
The other two items were Change problems (i.e., a starting amount increases or decreases to a new amount); one was consistent, and the other was inconsistent. Consider this item:
Participants
Grade 3 students (

Procedures for Participant Inclusion.
Participants With MD
Students were designated as having MD if they answered 50% or fewer of the items correctly on an additional screener,
To be included in this analysis, students had to have (a) demographic information on record, (b) constructed an equation for all four items of interest on the
In an effort to include a wider range of students with MD, we included students who wrote equations but did not include the operational symbol if their intended operation was clear (e.g., 59 + 34 = 93). We also included students who misplaced the minuend and subtrahend in their subtraction equations (e.g., writing 26 − 85 = ? instead of 85 − 26 = ?) because we were solely interested in students’ choices in operations. Finally, we included students who made slight errors when copying the numbers from the word problem into their equations (e.g., writing 59 + 32 = ? instead of 59 + 34 = ?).
Of the 473 Grade 3 students with MD, 56 students met this inclusion criteria. Next, we selected a comparison sample of students without MD.
Participants Without MD
To form the comparison sample of students without MD, we began by coding the assessments of students who shared the same school and teacher of those in the MD sample. This included 295 students in Cohort 1, 282 students in Cohort 2, and 155 students in Cohort 3, for a total of 732 Grade 3 students without MD. Of these, 123 students without MD met our inclusion criteria (i.e., demographic information on record, constructed an equation for all four items of interest, and used more than one operation on the assessment). We randomly selected 56 of these students to form our comparison sample of students without MD, matching the year of the study, school, and teacher with those in our MD sample when possible.
Demographic Information
Table 1 presents the demographic information for the 56 students with MD and 56 students without MD. Gender and special education status were comparable. The majority of students with and without MD were Hispanic/Latine, but the non-MD sample had a slightly smaller proportion of Hispanic/Latine students and a slightly larger proportion of white students. DLL status was also comparable, as the MD sample had 37 DLLs, and the non-MD sample had 33 DLLs.
Participant Demographics for Word Problem Performance Study.
Coding and Data Analysis
We recorded students’ equations for the four items of interest and categorized them as accurate or inaccurate based on whether solving them would result in the correct answer. If a student wrote a plus sign, but clearly subtracted, we coded these students as having intended to subtract. If a student wrote a minus sign, but clearly added, we coded these as having intended to add. After the initial coding, one of the authors double-coded by indicating agreement or disagreement. This resulted in five discrepancies (98.9% agreement), which we then resolved.
We calculated the accuracy rate for each of the four items by dividing the number of students who constructed an accurate equation by the total sample. First, we calculated the accuracy rates for students with MD compared to students without MD. Then, we calculated accuracy rates for DLLs with MD, non-DLLs with MD, DLLs without MD, and non-DLLs without MD. We also calculated the percentage of students with and without MD who constructed accurate equations across all four items, and the percentage of students with and without MD who added the numbers in the word problem across all four items.
Results
This analysis explored how the inclusion of the word
Percentage of Accurate Equations of Students With and Without Mathematics Difficulty (MD).
For the inconsistent Difference problem, only 42.9% of students with MD constructed an accurate equation. All of these students constructed the equation 85 – 26 = ?. Conversely, all of the students with MD who constructed an inaccurate equation added the numbers in the word problem together. Of the students without MD, 67.9% constructed an accurate equation with the majority of these students constructing the subtraction equation. Only one of the students without MD constructed the equation 26 + ? = 85. Similarly, all of the students without MD who constructed an inaccurate equation added the two numbers together.
For the inconsistent Change problem, 39.3% of students with MD constructed an accurate equation. The majority of these students constructed the equation 34 – 29 = ?. One student constructed the equation ? + 19 = 34. Conversely, all of the students with MD who constructed an inaccurate equation added the numbers in the word problem together. Of the students without MD, 80.4% constructed an accurate equation. Of these students without MD, most constructed an accurate equation by subtracting. Seven students without MD (12.5%) constructed the equation ? + 19 = 34. Again, all of the students without MD who constructed an inaccurate equation added the two numbers together.
Overall, only 17.9% of students with MD constructed accurate equations for all four items, compared with 53.6% of students without MD. To explore our hypothesis that many students would add due to the inclusion of the word
The percentages of accurate equations for DLLs and non-DLLs, with and without MD, are displayed in Table 3. On the consistent Difference problem, percentages were similar regardless of DLL and MD status, with a range of 84.2%–95.7%. On the consistent Change problem, 86.5% of DLLs with MD constructed an accurate equation, compared with 100% of non-DLLs with MD. Similarly, 93.9% of DLLs without MD constructed an accurate equation, compared with 100% of non-DLLs without MD.
Status Comparison of Dual Language Learners (DLLs) aand Non- Dual Language Learners.
The range between scores increased on the two inconsistent problems. On the inconsistent Difference problem, only 37.8% of DLLs with MD constructed an accurate equation, compared with 52.6% of non-DLLs with MD. Non-MD students performed comparably regardless of DLL status, with 66.7% of DLLs without MD constructing an accurate equation compared with 69.7% of non-DLLs without MD. On the inconsistent Change problem, only 35.1% of DLLs with MD constructed an accurate equation, compared with 47.4% of non-DLLs with MD. Similar to the inconsistent Difference problem, students without MD performed comparably regardless of DLL status, with a range of 78.2%–81.8%.
Discussion
To investigate the potential impact of keywords instruction, we analyzed the constructed equations of 112 Grade 3 students on four word problems that included the word
Furthermore, this analysis suggests that many of the participants may have been relying on the ineffective keywords strategy. On both inconsistent problems that included the word
Our analysis suggests that students with MD may be more likely to use the ineffective keywords strategy than students without MD. Evidence of this is that 39.3% of the students with MD added across all four items, compared with only 10.7% of the students without MD. In fact, we identified several students with MD who explicitly underlined or circled keywords while solving word problems. See Figure 2 for one of these students’ items of interest in which they circled the word

Circling of the Word “More” by a Student With Mathematics Disability.
Finally, further disaggregating accuracy rates by dual-language status demonstrated an interesting trend. Generally, non-DLLs constructed accurate equations more frequently than DLLs. This trend is the strongest among the students with MD, particularly on the inconsistent word problems. On the inconsistent Difference problem, 14.8% fewer DLLs with MD constructed an accurate equation than non-DLLs with MD. Similarly, on the inconsistent Change problem, 12.3% fewer DLLs with MD constructed an accurate equation than non-DLLs with MD. This indicates that among students with MD, DLLs may particularly struggle with inconsistent word problems. In fact, 16 DLLs with MD added across all four items, compared with only 6 non-DLLs with MD. This suggests that DLLs with MD may be particularly vulnerable to the ineffective keywords strategy.
In summary, the results of this analysis suggest that, due to a possible reliance on the ineffective keywords strategy, students struggle with inconsistent word problems. Students with MD, particularly those who are DLLs, may especially need support in solving inconsistent problems. Next, we describe implications for assessment, implications for instruction, and future directions for research.
Implications for Assessment
Features that influence the difficulty of a word problem include schema, position of the unknown, inclusion of irrelevant information, and relevant information presented in charts and graphs (Arsenault & Powell, 2022). This analysis demonstrated that the inclusion of a keyword (e.g.,
Implications for Instruction
Crucially, these findings support prior calls for educators to avoid teaching students to associate isolated words (e.g.,
Similarly, De Koning et al. (2022) investigated students’ use of schematic diagrams when solving inconsistent word problems. In this study, students drew and labeled bar diagrams as part of their problem-solving process. Upon analysis, students who drew accurate bar diagrams were more likely to solve inconsistent word problems successfully. In a different study, De Koning et al. (2017) implemented verbal instruction, which involved making students aware of inconsistent word problems. Students were taught to pay close attention to word problems in their entirety. They were explicitly told that at times, word problems will include words that may imply the need to add, but the required operation is subtraction (or vice versa). Students who received the verbal instruction demonstrated an improvement in solving inconsistent word problems successfully.
In summary, educators should not directly tie keywords to operations. It is important to expose students to both consistent and inconsistent word problems and explicitly teach them to be aware of the features of inconsistent word problems. To support students in solving both consistent and inconsistent word problems, educators may implement schema instruction and the use of bar diagrams. Although all students would likely benefit from these instructional strategies, this analysis suggests that they may be particularly important for students with MD, and especially those who are DLLs.
Future Directions for Research
This analysis demonstrated that both students with and without MD have more difficulty constructing accurate equations for inconsistent word problems. A much greater percentage of students with MD constructed inaccurate addition equations while solving word problems that included the word
Moreover, researchers should continue to examine the impact of schema instruction, bar models, and verbal instruction on students’ proficiency with inconsistent word problems. This research should be expanded by including DLLs, and participants in expanded grade levels and with a variety of disability statuses. Additionally, research should be conducted that involves multiple schemas and a variety of included keywords.
Limitations
Although this study supports findings from previous research on inconsistent word problems (e.g., Pape, 2003; Passolunghi et al., 2022), there are some notable limitations. First, we only analyzed four word problems that included the
Conclusion
Across the four items on the
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
This research was supported in part by the Institute of Education Sciences in the U.S. Department of Education to the University of Texas at Austin (grant no. R324A150078). The content is solely the responsibility of the authors and does not necessarily represent the official views of the U.S. Department of Education.
