Abstract
The Tower of London (TOL) is a set of problem-solving tasks that are commonly used to measure cognition. No studies have developed construct specification equations (CSEs) to mathematically quantify how the characteristics of test problems relate to the difficulty of the problem in the TOL. We aimed to investigate the relationship between TOL problem characteristics and problem difficulty in Veterans with mild traumatic brain injury (mTBI). For each problem, the sample average of moves, time, and optimal moves was used to quantify TOL problem difficulty from 77 Veterans with mild traumatic brain injury. Problem characteristics of minimum moves, optimal paths, move choices, start position, and goal position for 29 TOL problems were linearly regressed against quantifications of TOL problem difficulty. Only the problem characteristic of minimum moves showed a significant correlation across all three quantifications of problem difficulty (r = |−.460|−.851). Minimum moves accounted for 71.4 % and 51.2% of the adjusted variance of problem difficulty quantified by average moves and average time, respectively. A CSE depicting the relationship of the TOL problem characteristics of minimum moves to problem difficulty, as measured by average moves, was highly accurate. These findings have implications for selecting primary TOL performance variables for research studies and provide insight into creating shorter TOL versions.
Introduction
Mild traumatic brain injury (mTBI) can be the beginning of a disease process that continues years after injury resulting in debilitating chronic TBI symptomatology (Masel & DeWitt, 2010; Whitnall et al., 2006). Some studies find that up to 30% of people who have sustained mTBI have persistent symptoms (Mercier et al., 2018; Zumstein et al., 2011). The combination of mTBI and post-traumatic stress disorder (PTSD) may contribute to enduring cognitive deficits in Veterans (Esopenko et al., 2022; Lange et al., 2022). Attention problems are reported by 50% of combat Veterans (Lew et al., 2006), which are requisite for other cognitive processes that are vital to everyday functioning, such as memory, problem-solving, language skills, and the cognitive control of behavior. Executive dysfunction, poor attention–concentration, and memory difficulties are the most persistent disabilities faced by mTBI survivors (Belanger et al., 2007; Bogdanova & Verfaellie, 2012). The Tower of London (TOL) test is an appropriate measure in this population since it captures the constellation of cognitive abilities that are required in completing everyday problems, such as identifying a goal and holding the goal in working memory while problem-solving the path to successful completion.
The TOL is commonly used to measure executive function in TBI research (Buttner-Kunert et al., 2022; Tornas et al., 2019; Waid-Ebbs et al., 2014, 2023). Several aspects of executive function are measured using the TOL, including planning and problem-solving of a multiple-step visual-spatial problem. Shallice (1982) first presented the TOL, and over time, the test has morphed into various versions and methods of scoring (Shallice, 1982). TOL problems consist of three differently colored balls (red, green, blue) that are placed on three pegs of different heights that can only hold three balls, two balls, or one ball. Participants are asked to match a goal image by moving one ball at a time from a set starting position, as quickly as they can, using the fewest number of moves possible. This results in a variety of problems that can range in different levels of difficulty. An underlying assumption is that the ability to solve problems requiring more moves (referred to as minimum moves) represents higher abilities since these problems require greater working memory load and increase the number of opportunities for missteps (Berg et al., 2010).
A computerized version of the TOL was developed to reduce the administrative burden, improve reliability, and allow the examination of many aspects of performance (Berg & Byrd, 2002; Korkman et al., 1998). The TOL produces a multitude of performance variables including moves, time it takes to solve the problem, time to first move, excess moves, optimal moves, and rule violations (Berg & Byrd, 2002; Berg et al., 2010; Debelak et al., 2016; Kaller et al., 2012; Trakoshis et al., 2022; Unterrainer et al., 2005, 2003). The many performance variables generated by the computerized TOL offer a wide selection of variables that can be used in the determination of the hierarchy of problems based on the difficulty and construct validity of the TOL.
Traditionally, most approaches for determining construct validity are inferred from person score variation (Stenner & Smith, 1982). For example, the construct validity of a newly developed test is commonly determined by correlating the performance of a sample of individuals taking a newly developed test with their performance on an existing, well-established test. This indirect approach to determining construct validity according to Stenner and Smith (1982) is “unsatisfyingly primitive.” Thus, Stenner and Smith (1982) proposed establishing construct validity using construct specification methodology. This methodology investigates the relation between items (in the case of the TOL, “problems”) and a person’s performance or scores on these problems. The underlying premise of measurement is that a person scoring higher on an instrument shows more of the construct (e.g., problem-solving ability) than a person scoring lower on an instrument. Problems that are more difficult (e.g., problems requiring many moves to solve) than problems that are easy (e.g., problems requiring few moves to solve) are considered as requiring more of the construct (Stenner & Smith, 1982). Once this problem-score relationship is established, it can be taken a step further to be described in a construct specification equation (CSE) which explains the quantity of a construct (problem difficulty) in terms of explanatory variables (problem characteristics). Melin et al. (2021) suggest that CSEs are the highest level of construct theory examination in social science measurement (Melin et al., 2021). An example of the use of construct specification methodology in addressing construct validity is an investigation of the Knox Cube Test. The Knox Cube Test is a memory test that involves replicating the tapping pattern of four cubes in increasingly complex sequences (Stone, 2002). Stenner and Smith (1982) found that the task characteristics, distance covered, and number of taps explained 93% of the variance of the performance variable, task difficulty (Stenner & Smith, 1982). Initial evidence of the construct validity of the TOL has been examined using factor and regression analysis, but several limitations have been identified (Berg et al., 2010). Shallice (1982), in the original development of the TOL, graded the difficulty of problems based on the minimal number of moves to optimally solve a problem (problem characteristic referred to as minimum moves). Berg et al. (2010) suggest that minimum moves represent the construct of a working memory load, with more minimal moves requiring a greater working memory load (Berg et al., 2010). In a study of 104 college students, Berg et al. (2010) provide insight into this assumption by investigating problem characteristics that contribute to the TOL problem difficulty of 128 TOL problems with 4 to 7 minimum moves. Using factor analysis of six TOL performance variables, perfect solutions, optimal move score, extra moves, solution time, average solution move time, and first move time, these investigators generated four quantifications of problem difficulty: move efficiency, solution speed, planning speed, and overall quality (combination of the three previous performance variables). Using separate regression analyses, Berg and colleagues regressed 5 problem characteristics described below in section 2.3 (minimum moves, start position, goal position, number of paths, and move choices) against the four performance variables used to quantify problem-solving difficulty. The authors found that the problem characteristic, minimum moves, accounted for the majority of variance of problem-solving (21.6–34.3%), while goal position hierarchy, start position hierarchy, and the number of solution paths available accounted for 5.1 to 18.3% of the variance of problem-solving (Berg et al., 2010).
While the Berg et al. (2010) study provides initial evidence of the construct validity of the TOL, they did not provide a rationale for the performance measures they selected to quantify problem difficulty (e.g., no explanation of why “planning time” should logically represent problem difficulty) nor did they select problem characteristics with a clear hypothesized relationship to TOL problem difficulty (e.g., rationale for why the problem characteristic “start position” should influence problem difficulty). The present study is intended to address both issues by providing a rationale for the performance variables selected to quantify problem difficulty and by providing hypothesized relationships between the TOL problem characteristics selected and problem difficulty.
The purpose of this study is to address the limitations of previous examinations of the construct validity of the TOL by providing a rationale for selecting the performance variables that quantify problem difficulty and determine a hypothesis-based relationship between the TOL problem characteristics and problem difficulty using a CSE in a sample of Veterans with mTBI.
It was hypothesized that our model would explain a significant amount of adjusted variance in TOL problem difficulty and that a generated CSE would accurately determine the difficulty of a TOL problem using relevant problem characteristics.
Methods
Participants
Veterans were recruited using flyers posted in Veteran Health System clinics and from referrals to Speech Services for complaints of cognitive deficits. Data were collected as part of two studies that included 77 Veterans consecutively screened to participate in a cognitive intervention, who passed validity testing (score of greater or equal to 45 on the Test of Memory Malingering (Tombaugh, 1997)) and completed the TOL. Inclusion criteria included: Veterans from recent wars with a mTBI as defined by the Veterans Affairs/Department of Defense classification, who occurred during deployment; at least 6 months from the most recent TBI; 19 to 55 years of age; and access to a home computer or smartphone with internet access. Exclusion criteria included: a history of psychiatric diagnosis sufficiently severe to have resulted in inpatient hospitalization; neurological disease unrelated to TBI; substance abuse within the past year; and currently enrolled in other cognitive therapy.
Instrumentation
The computerized TOL is a computer application (version 1.1) developed by Berg (2002) based on the Shallice (1982) TOL test (Berg & Byrd, 2002; Shallice, 1982). Two pictures are shown simultaneously of a goal board that shows the final position of the balls and a move board that allows the participant to move the balls to match the goal board (Figure 1). Each picture shows three balls of different colors (red, green, and blue) arranged on three pegs. Each peg holds three, two, and one ball consecutively. Participants are instructed to match the move board to the goal board using the fewest number of moves, as quickly as possible, within 60 s. The screen displays the minimum moves needed to solve the problem and provides feedback as to whether the problem was solved correctly.

Screenshot of the Computerized TOL With the Goal Board at the Top, Minimum Moves of 6 to the Right, and the Move Board at the Bottom.
Participants practiced four problems (1–4 move problems) before beginning the test. They then completed 29 problems that ranged from 4 to 7 moves selected from Berg et al. (2010).
Analysis
Data were analyzed using IBM SPSS (Version) 28.0.1.0 (142). The methodological approach for identifying problem characteristics contributing to problem difficulty and generating a CSE involved the following steps: (a) quantify problem difficulty, (b) select problem characteristics, (c) perform stepwise regressions, and (d) generate a CSE that accurately determines problem difficulty using relevant problem characteristics. In contrast to traditional regression analysis where power is based on the number of participants, CSE, power is based on the number of problems administered. For the present study, the stepwise regressions included at the most, 2 problem characteristics (independent variables) relative to 29 problems (number of problems administered), resulting in a ratio of 1:14.5, and the final CSE included 1 problem characteristic to 29 problems (1:29), both of which exceeds the “one in ten rule” for the recommended number of independent variables to the number of events (Harrell et al., 1984, 1996).
Quantify Problem Difficulty
Three TOL performance variables generated from the computer program were hypothesized to quantify problem difficulty: (a) moves
Select Problem Characteristics
Five TOL problem characteristics that were studied by Berg et al. (2010) were investigated: (a) minimum moves, (b) optimal paths, (c) move choices, (d) start position, and (e) goal position (Berg et al., 2010). Minimum moves are the fewest number of moves needed to solve the TOL problem. We hypothesize that minimum moves should be related to problem difficulty (i.e., problems with fewer minimum moves should be easier to perform than problems with more minimum moves). Optimal paths are the possible directions balls may be placed to solve the problem with the minimum possible moves. It was hypothesized that problems’ optimal paths should be related to problem difficulty (i.e., problems with four minimum moves and more optimal paths should be easier to perform than problems with four minimum moves with fewer optimal paths). Move choices are the number of positions available to place a ball for an upcoming move. It was hypothesized that problems with fewer move choices should be easier to solve than problems with more move choices. Start position and Goal position are the starting point and ending position of the balls with a value of one representing a single ball on each column, a two representing two balls on one column and one ball on a second column, and three as a value representing all three balls on one column. It was hypothesized that a single ball on each column, two balls on one column, and one ball on a single column, and all three balls on one column reflect increasing difficulty for both the start code and goal code. These problem characteristic values for each of the selected 29 TOL problems are presented in Table 1, selected from Berg (2002; Berg et al., 2010). Note problem #12 was a duplicate of problem #1 and was removed from the analysis since it was a repeated problem.
Problem Characteristics.
Perform Stepwise Regressions
Pearson’s correlations were used to quantify the level of association between the three quantifications of problem difficulty (performance variables: moves, time, and optimal moves). High correlations between the quantifications of problem difficulty would suggest that any of the three performance variables could be used to quantify problem difficulty and therefore be used as dependent variables in the regression equation. Correlations were also generated across the five problem characteristics (minimum moves, optimal paths, move choices, start position, and goal position) and the three quantifications of problem difficulty (moves, time, and optimal moves) to test for linear relationships. For each regression equation (separate regression equations for each quantification of problem difficulty—dependent variable), only problem characteristics (independent variables) with significant correlations with the quantifications of problem difficulty were entered into the regression equation; if there were multiple problem characteristics with significant correlations with the quantifications of problem difficulties, the problem characteristics with the highest correlations were entered into the equation first. Only problem characteristics that were significantly correlated with quantifications of problem difficulty were retained in the stepwise regression analysis. If multiple problem characteristics were found to be significantly related to problem difficulty, the collinearity of those problem characteristics was investigated using the variance inflation factor. We considered a variance inflation factor greater than 10, as the criterion that 2 variables were collinear (Belsley et al., 1980). If collinearity was found, the variable accounting for the least amount of variance was removed from the regression analysis.
Generate CSE
For the regression analysis that identified the largest amount of variance, a CSE was generated using the formula Y = a + bX, where Y is the problem difficulty, X is the problem characteristic, b is the slope coefficient (beta), and a is the intercept. Values for the slope and intercept were retrieved from the results of the regression analysis.
Finally, differences between predicted problem difficulty and actual problem difficulty were determined by generating a predicted value from the final CSE equation (Melin et al., 2021; Sterner, 1996). Residuals, differences between the predicted value and the actual value, were generated and then standardized to identify problems whose predicted values most differed from actual values. The absolute value of standardized residuals 2 or greater was identified as outlier since they are outside of the 95% confidence interval for a normal distribution of standardized residuals (Pedhazur, 1997).
Results
Participants
A total of 77 Veterans with mTBI completed the TOL as part of outcome variables in Goal Management Training studies. The participants were on average 38 years old and experienced an average of 3 traumatic brain injuries. Approximately 94% were male, 60% were white, and 78% had post-high school education. Only 16% were employed full-time. Table 2 presents detailed demographic information for the participants.
Demographics Information.
Note. SD = standard deviation.
Construct Specification Equation
Identify Quantifications of Problem Difficulty
A correlation matrix of the performance variables: moves, time, and optimal moves is presented in Table 3. Correlations were in the expected direction (moves and time being positively correlated, and optimal moves negatively correlated with moves and time). Correlations among the performance variables were strong, ranging between |−.809| and 0.920.
Correlation Across Performance Variables.
Significant at p < .01
Correlation of Problem Difficulty with Problem Characteristics
The correlations of problem characteristics (minimum moves, optimal paths, move choices, start position, and goal position) versus problem difficulty (moves, time, and optimal moves) are presented in Table 4. The problem characteristic minimum moves correlated significantly across all problem difficulties (p < .01). In addition to the problem characteristic, minimum moves, start position significantly correlated with the problem difficulty, optimal moves (p < .05). The strongest correlation was demonstrated across the problem characteristic minimum moves and the problem difficulty moves (r = .851).
Correlations of Dependent Variables With Problem Characteristics.
Significant at p < .05.
Significant at p < .01.
Regression of problem characteristics (minimum moves and start position) against the three problem difficulties (moves, time, optimal moves)
Two problem characteristics significantly correlated with problem difficulty variables. Minimum moves, significantly correlated with all problem difficulty variables (moves, time, and optimal moves). While start position significantly correlated only with the optimal moves problem difficulty variable. Only the significantly correlated problem characteristics were entered into the regression analyses. The problem characteristic, minimum moves, accounted for 71.4% and 51.2% of adjusted variance for the problem difficulties, moves and time (Table 5). For the problem difficulty, optimal moves, both problem characteristics, minimum moves, and start position were entered into the regression analysis and accounted for 34.1% of the adjusted variance (Table 5). The variance inflation factor for minimum moves and start position was 1.00 (less than 10), therefore, both problem characteristics were retained in the regression analysis for the problem difficulty optimal moves.
Regression Results for Problem Characteristic Minimum Moves for Each Quantification of Problem Difficulty.
Note. RMSE = root mean square deviation; SE = standard error.
The results of the linear regression analysis of the problem difficulty variable moves, and problem characteristic minimum moves were further investigated since they accounted for the most variance. Figure 2 presents a scatterplot of problem characteristics (minimum moves) against problem difficulty (moves) and the regression line. The scatterplot demonstrates a strong linear relationship between the problem characteristic of minimum moves and the problem difficulty of moves.

Scatterplot and Regression Line of the Problem Characteristic, Minimum Moves (x-axis) Against the Problem Difficulty, Average Moves for Each Problem (y-axis) for the 29 Problems of 4 to 7 minimum moves investigated in this study.
The following CSE was generated and used to predict problem difficulty:
where Y is the predicted problem difficulty, moves, and X is the problem characteristic, minimum moves.
Table 6 presents the problem characteristics (minimum moves), problem difficulty (actual moves), predicted problem difficulty (predicted moves), and residuals. The CSE was accurate in predicting problem difficulty from moves, with only the first problem demonstrating a standardized residual greater than |2.0|.
Problem Characteristics, Problem Difficulty, Predicted Difficulty, and Residuals.
Discussion
The purpose of this study was to determine the construct validity of the TOL by relating problem characteristics based on hypothesized rationale to problem difficulty and to generate a CSE for TOL problem-solving difficulty for Veterans with mTBI. The study findings support the hypothesis that a CSE derived from a linear regression model would explain a significant amount of variance. The CSE explained 71.4% of the adjusted variance in TOL problem difficulty as measured by the number of moves.
Three Performance Variables Were Strongly Correlated
In identifying performance variables to quantify problem difficulty, it was found that the three performance variables of moves, time, and optimal moves were strongly correlated (r = |−.809| to .920). Similarly, Berg et al. (2010) found many of their performance variables strongly correlated, for example, optimal move score, extra moves, and solution time, correlating .69 to .92 (Berg et al., 2010). Berg et al. (2010) combined performance variables to create factor scores to quantify problem difficulty. There are a number of challenges to this approach; first, the preferred methodology of analyzing two dependent variables in a regression model would be to apply a multivariate multiple regression (not combining dependent variables factorially) and run a multiple regression (Dattalo, 2013). Second, high correlations between dependent variables suggest that the dependent variables are measuring the same construct, therefore, it is logical to select only one of those variables for the multiple regression or run separate multiple regression models for each dependent variable. We chose the latter approach. Finally, individual variables versus combined variables may be easier for clinicians to understand, since these variables are automatically generated by the computerized version of the TOL.
Minimum Moves Correlated With all Problem Difficulty Quantifiers
Among the problem characteristics investigated (minimum moves, optimal paths, and move choice), minimum moves was the only problem characteristic that significantly correlated with each quantification of problem difficulty (moves, time, and optimal moves). Furthermore, when entered into the multiple regression analyses, minimum moves accounted for 71.4% and 51.2% of the adjusted variance for problem difficulty as quantified by moves and time, respectively, and minimum moves with start position accounted for 34.1% of the adjusted variance in problem difficulty when quantified by optimal moves. Despite different samples (college students vs. Veterans with mTBI) and different quantifications of problem difficulty (factorially derived vs. individual performance variables), these findings replicate those of Berg et al. (2010) who found that minimum moves accounted for most of the variance in problem difficulty (Berg et al., 2010). Different amounts of variance being accounted for between studies appear to be a function of the performance variable used to quantify problem difficulty, with results in this study showing the problem characteristic of minimum moves accounts for the most adjusted variance (71.4%) when problem difficulty is quantified by average moves. Furthermore, the general finding that minimum moves account for the most variance in regression models when problem difficulty is quantified in samples of college students (Berg et al., 2010) and Veterans with mTBI provides initial evidence that the construct validity of the TOL may be independent of the sample whether individuals have brain injury or no brain injury.
Construct Specification Equation
While previous studies demonstrated that the TOL problems increase in difficulty with the number of minimum moves, the present study is the first to provide a formula quantifying the relationship of problem characteristics to problem difficulty. The accuracy of our CSE was determined by comparing actual average problem difficulty with predicted problem difficulty. Our CSE was very accurate with an RSME below 1 (0.82), demonstrating standardized residuals below 2, for 28 of the 29 problems.
The findings from this study have implications for future versions of the TOL. The CSE generated from this study can quantify the expected difficulty of a TOL problem simply from the minimum moves of a problem. While this equation is only valid for problems with between 4 and 7 minimum moves (the range of minimum moves for the problems used in this study), it has implications for generating different TOL problem sets. The equation may allow for new, equivalent problems to be created based on minimum moves. Furthermore, equivalent forms of the TOL may be created by having different problems with the same distribution of problems with minimum moves. Also, since minimum moves are the critical problem characteristic contributing to problem difficulty, the findings from this study suggest that there is considerable redundancy in the TOL variables. The present version of the TOL used in this study had 5 problems with 4 minimum moves, 7 problems with 5 minimum moves, 9 problems with 6 minimum moves, and 9 problems with 7 minimum moves. Since minimum moves are the critical variable in determining the challenge of the TOL, shorter forms of the instrument may provide equivalent results, for example, using only 2 to 3 problems with a specific number of minimum moves. Reduction of the number of TOL problems should be effective in reducing the response burden for participants and shortening data collection time for investigators and clinicians.
Finally, findings from this study have implications for the selection of a primary TOL performance variable for research studies. While the TOL produces many performance variables, the performance variable moves showed the strongest relationship to the problem characteristic, minimum moves. That is, the performance variable moves best describes the implied challenge of the TOL problems. When applying an intervention expected to result in improvements in problem-solving ability, the selection of the performance variable that most closely reflects the varying challenge of the TOL problems seems to be the logical choice as a primary outcome variable.
Limitations
The present study has several limitations. First, the regression equations and CSE had only 29 problems, therefore, limiting the power of the analyses. Unfortunately, virtually all CSE studies have this limitation since most assessments have 10 to 30 items. Second, only problems with 4 to 7 minimum moves were used in the present study. With the addition of problems with a wider range of minimum moves, we may account for more or less adjusted variance in our regression analyses. The restricted range also limits the applicability of the CSE to problems with 4 to 7 minimum moves. Finally, we only investigated TOL performance on Veterans with mTBI, which included a few females. While our study’s findings were like those found for college students, the generalizability of the findings to other populations, including females, will require further investigations.
Clinical Implications
Studies involving the TOL report multiple outcome variables in determining whether an intervention results in significant problem-solving improvement. The present study suggests that research studies investigating change in problem-solving ability should report moves as the primary dependent measure since moves explain the highest amount of adjusted variance in TOL problem-solving problem difficulty. The use of a common variable will improve the potential for comparison across studies and future meta-analysis of intervention outcomes.
The present study also suggests that the minimum moves of the problem are the critical variable, independent of the sample. That is minimum moves explain the most variance for both college students and Veterans with mTBI. Further studies are needed to determine if this relationship persists for other diagnostic groups.
Finally, due to the redundancy of problems with similar minimum moves, the present study suggests that shorter forms of the TOL (reducing the number of problems with similar minimum moves) may be adequate for measuring problem-solving. Of course, this will require further investigation.
Footnotes
Acknowledgements
“This work was supported by Career Development Award #B0902-W (IK2RX000902), and Pilot Project Award #N3189-P (1I21RX003189) from the United States Department of Veterans Affairs, Rehabilitation Research and Development Service.”
Special thanks to W.K. Berg for providing the computerized TOL problems and his invaluable guidance.
Abbreviations
Mild traumatic brain injury (mTBI), Tower of London (TOL), construct-specific equation (CSE), Knox Cube Test (KCT).
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by Career Development Award #B0902-W (IK2RX000902), and Pilot Project Award #N3189-P (1I21RX003189) from the United States Department of Veterans Affairs, Rehabilitation Research and Development Service.
Ethical Approval and Informed Consent Statements
The two studies were approved by the University of Florida IRB (approval no. 201902069 and no. 201601606) in February 2019 and January 2016. All participants gave written consent.
Data Availability
The data is available upon request.
