Sage Journals: Discover world-class research

Abstract

Background

Learning from feedback is an important part of skill improvement. A common assumption is that failure offers superior learning opportunities when compared to success. However, this assumption remains underexplored in natural time-constrained settings.

Objective

To examine the impact of self-analysis on performance improvement in online speed chess, with a focus on whether analyzing wins versus losses differentially predicts gains in player ratings.

Methods

Using observational data from roughly 2 million online speed chess games, we studied the association between analysis and performance using regression.

Results

Analyzing game outcomes was positively associated with rating improvement, with stronger effects observed when players analyzed wins rather than losses. The total number of games played showed only a weak association with improvement, indicating that repetition of an activity alone is insufficient without reflective practice.

Conclusions

The findings challenge the notion that failure is inherently more instructive than success, suggesting instead that success-based analysis may also offer opportunities for learning improvement. While based on observational data, this study contributes to understanding how feedback mechanisms support learning and performance, particularly in domains requiring rapid, repeated decisions under time constraints.

Keywords

internet games learning theory gaming simulation/gaming

Background

People can benefit from information about past performances in a variety of activities, but there is evidence that some people overlook it. The ‘ostrich effect’ describes how people avoid low or no-cost feedback (Webb et al., 2013); for example, in financial matters, people are less likely to consult information during market downturns (Olafsson & Pagel, 2017; Sicherman et al., 2016). People also avoid negative health information related to diet and alcohol consumption when it may affect their sense of identity or make them feel uncomfortable (Barbour et al., 2012). However, much of the literature focuses on information about goal progress, and the observation that avoiding information is primarily motivated by a desire to preserve a sense of self-worth (Webb et al., 2013) or avoid information that causes stress and discomfort (Karlsson et al., 2009).

It is widely assumed that people can learn from failure, and that the ostrich effect is a hindrance to improvement and skill development because it prevents us from learning from mistakes (Metcalfe, 2017). While this may be true in the abstract, experimental evidence suggests that people may learn less from feedback about failure than previously assumed (Eskreis-Winkler & Fishbach, 2019). The explanations for this seem to be both emotional and cognitive; cognitively, failures can be more difficult to process in a way that lends itself to future success, particularly because failure often requires more effort to process into useful information. Emotionally, failure can undermine self-esteem, and as a result, can be harder to process and retain (Eskreis-Winkler and Fishbach (2022)).

Early in the development of a skill, just ‘doing’ may efficiently advance learning and skill acquisition, but over time a greater investment in deliberative practice is necessary to continue the learning process (Ericsson et al., 1993). At the same time, research on deliberative practice suggests that there may be limits to what can be gained through studied reflection of performance (Macnamara et al., 2014). Understanding the process of skill acquisition seems to require understanding a balance of these processes; on the one hand, the improvements that can be found by performing an activity, and on the other hand, careful deliberation of that activity (and the form that deliberation takes) can advance learning further.

In this exploratory study, we investigate learning from information in the context of playing a game—specifically, online speed chess. In recent years chess has seen considerable growth of online play (Keener, 2022). Online games typically have short time controls (for example, both players may have 5 minutes to complete all their moves in a game) and chess is an example of a sequential and cognitively demanding decision making activity. On most online chess platforms it is possible for players to analyze their games after they play them. This can involve reviewing each move, typically supported by a chess engine that identifies the impact of moves on the game state. For example, a chess engine can identify a significant mistake (‘blunder’) that a player can then use to learn from. This provides a learning opportunity that can be used as a guide for future play. Using a subset of data from an online chess platform, we ask two questions: 1) do analyses of wins have the same impact on future performance as analyses of losses? And 2) does playing games (just ‘doing’)) have a greater positive impact on improvement than analyzing games (‘deliberative practice’)?

Methods

Data Source

We used data from Lichess.org through their application programming interface (API). Lichess is an open-source online chess platform that allows people to play chess without cost, and has been used previously to understand things like decision-making style (McIlroy-Young et al., 2021) the impact of social status on decision making (Tay, 2023) and compare expert and non-expert behaviour (Chowdhary et al., 2023).

The first step involved downloading an entire month of games (May 2023). We then filtered by blitz games (3-5 minutes per player). We also restricted our search to games involving players with a chess rating between 1600 and 1800 on the Glicko rating scale (Glickman, 1999). This rating is calculated based only on games played within Lichess and this range was chosen because this range of players is stronger than beginners, but still have considerable room to improve their game. This resulted in 35,815 unique players.

We then used the Lichess API to request details about all games of these players between May 1 2023 and May 31 2024. In order to avoid rate limiting restrictions on using the API, instead of using all 35,815 players, we use a two-stage cluster sampling approach. First we randomly sampled 6,000 players. We then sampled up to 90 games per month for each player. We also restrict our analysis to players who were playing actively over the entire period, so included only those who played at least 1 blitz game every month over the time period. The final step involved cleaning the data by excluding games tagged as cheating and games that did not include any moves (likely due to a dropped internet connection). In order to avoid data loss through rate limiting, we added a small delay of 30 seconds between the extraction of each player’s data. The end result is 2,137 players and 1,955,774 games. Downloading the games using the API took roughly 46 hours.

We extracted the Glicko rating of both players in each game, the date of play, how the game ended, who won, and whether or not the game was evaluated with the assistance of a chess computer. This latter measure is used to determine whether a player analyzed their game. This is a very blunt measure of analysis. First, it does not consider analyses that do not include computer assistance. A player that analyzes their game without computer assistance will be mischaracterized as a non-analyzer. Second, this does not indicate which of the two players in a game performed the analysis—only that it was analyzed by someone. As such, some players will be characterized as having analyzed their game without having done so. Importantly, both of these issues bias our results to the null; that is, whatever effect we see is likely an underestimate of the true effect. This is because mischaracterizing players as having analyzed their game when they have in fact not analyzed their game adds statistical noise to the analysis variable that weakens associations between analysis and performance.

Statistical Analysis

We first explore the games as a time series observing changes in rating over the series for each player. In addition we predict the change in chess rating (delta rating) from their first to last game in the series as a function of their rating at the beginning of the series, the total number of games they played, the number of wins analyzed, the number of losses analyzed and the percentage of games that are won. This last variable is important to control for the potential confounding influence of success, which could correlate with both rating change and analysis behaviour. We include all main effects in the models but include interaction and polynomial terms only if they are statistically significant at the α=0.05 level.

Results

Players played a median of 982 games, with a minimum of 140, and a maximum of 1,170. Of the 1,955,774 games played, 119,398 (6.105%) of games were flagged as analyzed with a computer (henceforth referred to as ‘analyzed’). On the chess server, games can end in a draw/stalemate (both representing tied games) or a checkmate, loss on time, or resignation in which one player wins and one player loses. Losses on time occur when a player runs out of time on their clock. Analysis occurred in 5.286% of stalemate/draws, 6.483% of losses and 5.793% of wins.

On average, players gained 3.992 rating points over the 13 months, with a maximum gain of 569 rating points, and a maximum loss of 407 rating points. Each player’s performance can be represented as a time series over the 13 months in which data were extracted. Figure 1 shows this pattern over this period for a random sample of 1000 players, with each thin line corresponding to the daily rating for a player. The thicker horizontal lines represent +/- one standard deviation calculated for each day in the series and shows that other than the first few weeks in the series, most players stay within a narrow rating band, and over the course of a year, most player improvement is modest. However, there are exceptions, with some players seeing rapid rises or falls over time.

Figure 1.

The relationship between rating change and the number of games played

Table 1 contains two main effects models that are the same except that the top includes analysis wins and the bottom includes analysis losses. For analysis of wins, for every 10 analyzed games, there is 1.4 increase in rating points. For analysis of losses, for every 10 analyzed games there is less than a 1.2 increase in rating points. In both cases the total number of games played is positively associated with player rating, but the effect is very weak; for every game played, there is less than a 0.2 increase in rating. Unsurprisingly, the percentage of won games is positively associated with delta rating in both models. This effect has an noteworthy impact on model predictions (for every 1 percent increase in won games there is a 6 to 7 point increase in rating); however, there is little variation in percentage of won games in this sample of players. The mean winning percentage is 47.48% with a standard deviation of 2.72%.

Table 1.

Main Effects Models for the Analysis Wins and Losses Separately

	Estimate	Std. error	t-value	Pr(>\|t\|)
Intercept	94.9863	50.1815	1.893	0.0585
Analysis - wins	0.1395	0.03802	3.668	0.0003
% Of games won	6.9362	0.6491	10.687	<0.0001
Total games played	0.0182	0.0081	2.238	0.0254
Base rating	−0.2607	0.0229	−11.409	<0.0001
Intercept	90.5349	50.3760	1.797	0.0724
Analysis – losses	0.1151	0.04980	2.311	0.0209
% Of games won	7.0407	0.6506	10.823	<0.0001
Total games played	0.0166	0.0082	2.020	0.0435
Base rating	−0.2602	0.0229	−11.355	<0.0001

In the final model found in Table 2, we see the addition of squared and cubed terms for analysis wins as well as an interaction between total games played and base rating. Much of the variation in rating gain between players is unexplained by the final model (R²=0.12). The analysis of losses term had no impact on model predictions in this full model. The complexity of the non-linear model terms is best understood visually (Figure 2), where we see the predicted player rating and games played per year graphed for different levels of analysis of wins (as a percentage of games). We see that analyzing wins is associated with an increase in rating over time. We also see how playing more games has a greater benefit for players that analyze games than those who do not. The curve in the predictions suggest some diminishing returns to analysis such that at some point analysis may no longer help players improve. The interaction term observed in table 2 suggests that players with higher ratings more greatly benefit from playing more games, though the effect size is fairly small. The percentage of games won estimate is similar in direction and magnitude to the estimates in the main effects models.

Table 2.

Final Model

	Estimate	Std. error	t-value	Pr(>\|t\|)
Intercept	464.3092	179.8117	2.582	0.0099
Analysis - wins	1.0782	0.1785	6.0396	<0.0001
Analysis – wins2	−0.0069	0.0014	−4.9186	<0.0001
Analysis – wins3	1.1230e-05	2.6751 e-05	4.1981	<0.0001
Analysis – losses	−0.0017	0.0074	−0.2268	0.8206
% Of games won	6.9896	0.6435	10.8141	<0.0001
Total games played	−0.3949	0.1841	−2.1368	0.0327
Base rating	−0.4854	0.1044	−4.6467	<0.0001
Total games * base rating	2.4219e-04	1.0906 e-04	2.2206	0.0264

Figure 2.

Model predicted rating by games played and % of games analyzed

Discussion

Online speed chess is a game comprised of dozens of quick sequential decisions that have an impact on whether a game result will be favourable (win) or unfavourable (loss). If analysis benefits player performance, it would be expected to do so by gradually improving the decisions they make over time or by speeding up the rate at which they make these decisions. In this study we have examined the impact of both 1) deliberative practice (as analysis of wins and losses) and 2) playing games on chess rating improvement over time. The findings may speak more broadly about how self-analysis could be used to improve decision-making and skill development more generally, but this is largely speculative. The overall effects we observed are modest, and much of the variation in performance over the year seem to be explained by factors exogenous to our analysis—including playing other games, mentorship, the use of chess coaches, reading chess books and, perhaps most significantly, the vagaries of online game playing. For this reason, the results should not be seen as a guide about improving chess play.

Concerning research question 1, our results are consistent with the findings of Eskreis-Winkler and Fishbach (2019) suggesting people may learn less from feedback from failures than from feedback from successes. In speed chess, the effect does not appear to be linear; reviewing more than 20% or so of one’s games may provide little additional benefit to rating. This is also consistent with previous work suggesting that there exist some limits to the benefits of deliberative practice (Macnamara et al., 2014). Concerning research question 2, our results suggest that playing chess may contribute less to the improvement of a player’s rating than analyzing chess games. Indeed, the different magnitude of effects is noteworthy; based on the models in Table 1, for every 10 games played, a player can expect a 1 to 2 unit increase in rating points. In contrast, for every 10 games analyzed, a player could expect a 11 to 13 unit increase in rating points. This may be a self-selection effect; players who choose to analyze games may be more generally interested in improving than players with no expectation of improvement.

Our work is based on secondary observational rather than experimental data, but the effects we have observed here are probably an underestimate of the true effect size, since there is very likely considerable variation in the degree to which players actually review their games. As noted earlier, the analysis data we obtained does not indicate which of the two players in a game did the analysis, or the care with which either player may have analyzed their game, merely that the analysis was flagged by one of the two players. This means that we are very likely over-estimating the frequency of deliberative practice. Moreover, we do not know how rigorous the analysis is; on average, it may be that players spend merely seconds reviewing their game, looking at major mistakes or the overall trend rather than deeply diving into their games. As such, our analysis would only measure the effects of cursory analysis, while the effects of deeper analysis could be even more impactful on improvement.

Our results add to the existing literature on how self-analysis can contribute to improved decision making, particularly in time restricted and sequential decision-making contexts. Outside of chess, our results best generalize to domains in which fast sequential decisions are required. In the real world, rapid sequential decision making is required in many professions, including: airplane pilots, air traffic controllers, emergency service personnel, financiers, military operations and medical professionals. Training in these professions often includes simulation and then reflection on case studies and scenarios to help learners understand the impact of right and wrong decisions on outcomes (Mavin et al., 2018; McKenna et al., 2015). Our results suggest that it may be especially important for the learner to reflect on their successful decision sequences, and that such reflections may help reinforce good patterns that advance their learning. This does not suggest they cannot learn from failures, but that learning from successes could be an untapped opportunity for improvement.

Chess has outcomes that are easy to evaluate. A chess learner can see which moves lead to clear positional weaknesses and tactical opportunities by looking at computer evaluations and game results. In the real world, many activities do not come with objectively measurable success and failure, especially in the short term. This impacts the way that success and failure can be analyzed; for a medical student training to be a family physician, there are often not simple metrics that can provide clear and immediate feedback on every decision. As such, it may be that our results apply more to learning activities in which success and failure can be measured objectively and quickly. For example, learning from success may be more useful to explore for an investment manager—whose success can be quantified in the performance of their portfolio, or in for athlete who can see how their decisions impact the outcome of a game.

Limitations and Suggestions for Further Future Research

As noted above, the measure of analysis we use does not precisely capture the level of self analysis. This very likely biases our results towards the null since our measurement is contaminated by misclassification. However, we cannot rule out other systematic biases that affect our results. As an observational study design, there is also a risk of omitted variable confounding. It may be that analyzing games is confounded with a trait that explains one’s proficiency at improving at chess, rather than the actual effect of analysis itself. This would mean that the effect we are observing is not a measure of the effect of self-analysis on improvement, but a measure of the level of this trait; people with more of this latent proficiency analyze more games, and improve more over time. Another limitation is that ratings do not give a complete picture of improvement. Players in this rating range are playing players with similar ratings, and the rating points gained by one player are lost by another. Indeed, all players may be improving over time by some unknown absolute measure, but this cannot be determined by a rating system which is based on performance against other players. This helps to explain the time series pattern that shows little net change for the players on average.

Future work should explore the causes of the observed effects in an experimental form; for example, whether reflecting on successes increases confidence in a useful way, or whether successes are easier to process cognitively. Future work should also drill into the specifics of analysis in more detail, ideally identifying the depth of analysis a person uses when reflecting on their past decisions.

Conclusion

The results here suggest that reflecting on positive feedback may be a useful strategy for improvement in activities that require sequential decisions under time pressure. For activities that involve rapid sequential decision making, it may be useful to develop more tools that help learners study their successes. Our results also show that by itself, playing speed chess does not lead to timely improvement in player rating. This is consistent with literature suggesting that deliberative practice is an important part of learning and skill improvement.

Footnotes

Acknowledgement

We would like to thank the anonymous reviewers for their helpful suggestions on the manuscript.

ORCID iD

Niko Yiannakoulias

Informed Consent

There is no informed consent. The research used open data available online, and did not collect primary data.

Ethical Considerations

No ethics approval was required or sought in this research.

Funding

No funding supported this research

Declaration of Conflicting Interests

I confirm that this manuscript has not been published elsewhere and is not under consideration by another journal. The work is original, and as the sole author, I have approved the manuscript and its submission. There are no conflicts of interest to declare.

Data Availability Statement

All data used in this study are open source and available from .

Author Biography

Niko Yiannakoulias is an Associate Professor in the School of Earth, Environment and Society, McMaster University. He studies information systems, environmental health, artificial intelligence and research gaming.

References

Barbour

J. B.

Rintamaki

L. S.

Ramsey

J. A.

Brashers

D. E.

(2012). Avoiding health information. Journal of Health Communication, 17(2), 212-229. https://doi.org/10.1080/10810730.2011.585691

Chowdhary

Iacopini

Battiston

(2023). Quantifying human performance in chess. Scientific Reports, 13(1), 2113. https://doi.org/10.1038/s41598-023-31622-8

Ericsson

K. A.

Krampe

R. T.

Tesch-Römer

(1993). The role of deliberate practice in the acquisition of expert performance. Psychological Review, 100(3), 363-406. https://doi.org/10.1037/0033-295X.100.3.363

Eskreis-Winkler

Fishbach

(2019). Not learning from failure—The greatest failure of all. Psychological Science, 30(12), 1733-1744. https://doi.org/10.1177/0956797619881133

Eskreis-Winkler

Fishbach

(2022). You think failure is hard? So is learning from it. Perspectives on Psychological Science, 17(6), 1511-1524. https://doi.org/10.1177/17456916211059817

Glickman

M.E.

(1999). Parameter estimation in large dynamic paired comparison experiments. Journal of the Royal Statistical Society Series C: (Applied Statistics), 48(3), 377-394.

Karlsson

Loewenstein

Seppi

(2009). The ostrich effect: Selective attention to information. Journal of Risk and Uncertainty, 38, 95-115. https://doi.org/10.1007/s11166-009-9060-6

Keener

(2022, June 17). Chess is booming. The New York Times. https://www.nytimes.com/2022/06/17/crosswords/chess/chess-is-booming.html

Macnamara

B.N.

Hambrick

D.Z.

Oswald

F.L.

(2014) Deliberate practice and performance in music, games, sports, education, and professions: a meta-analysis. Psychological Science 25(8), 1608-1618. https://doi.org/10.1177/0956797614535810.

10.

Mavin

T. J.

Kikkawa

Billett

(2018). Key contributing factors to learning through debriefings: commercial aviation pilots’ perspectives. International Journal of Training Research, 16(2), 122-144. https://doi.org/10.1080/14480220.2018.1501906

11.

McIlroy-Young

Wang

Sen

Kleinberg

Anderson

(2021). Detecting individual decision-making style: Exploring behavioral stylometry in chess. Advances in Neural Information Processing Systems, 34, 24482-24497. https://doi.org/10.48550/arXiv.2208.01366

12.

McKenna

K. D.

Carhart

Bercher

Spain

Todaro

Freel

(2015). Simulation use in paramedic education research (SUPER): a descriptive study. Prehospital Emergency Care, 19(3), 432-440. https://doi.org/10.3109/10903127.2014.995845

13.

Metcalfe

(2017). Learning from errors. Annual Review of Psychology, 68(1), 465-489. https://pubmed.ncbi.nlm.nih.gov/27648988/

14.

Olafsson

Pagel

(2017). The ostrich in us: Selective attention to financial accounts, income, spending, and liquidity (No. w23945). National Bureau of Economic Research. https://doi.org/10.3386/w23945

15.

Sicherman

Loewenstein

Seppi

D. J.

Utkus

S. P.

(2016). Financial attention. The Review of Financial Studies, 29(4), 863-897. https://doi.org/10.1093/rfs/hhv073

16.

Tay

L. Q.

(2023). Can higher social status of competitors cause decision makers to commit more errors? In Goldwater

Anggoro

Hayes

Ong

(Eds.), Proceedings of the 45th Annual Meeting of the Cognitive Science Society (pp. 2422–2425). Cognitive Science Society.

17.

Webb

T. L.

Chang

B. P.

Benn

(2013). The ostrich problem: Motivated avoidance or rejection of information about goal progress. Social and Personality Psychology Compass, 7(11), 794-807. https://doi.org/10.1111/spc3.12071

Learning From Wins and Losses: An Analysis of Improvement in Online Speed Chess

Abstract

Background

Objective

Methods

Results

Conclusions

Keywords

Background

Methods

Data Source

Statistical Analysis

Results

Discussion

Limitations and Suggestions for Further Future Research

Conclusion

Footnotes

Acknowledgement

ORCID iD

Informed Consent

Ethical Considerations

Funding

Declaration of Conflicting Interests

Data Availability Statement

Author Biography

References