Abstract
This study investigates how competition participation during youth relates to dropout risk in competitive swimming, while also examining whether participation variables can predict key sporting success outcomes such as: (a) effective career length; (b) competitive activity between ages 20 and 26; (c) number of World Championships and Olympic Games appearances; and (d) highest status achieved at those meets (medallist, non-medallist or non-participant). Using data from 2154 swimmers (F = 1197; M = 957) obtained from the World Aquatics database spanning 1980 to 2019, athletes aged 13 to 32 years were analysed. Dropout was defined as a consecutive four-year period without competition results following debut. A Cox proportional hazards model assessed the influence of debut age, medal success, race volume and engagement in multiple disciplines between ages 13–18 on dropout risk. Median time to dropout was 6 years for men and 5 years for women. Women were 1.47 times as likely as men to drop out. Athletes from Asia were more likely to drop out than athletes from Europe, Oceania and South America. Older debut age, youth medal success, higher total races with lower annual average, and greater discipline diversity were associated with lower dropout risk. For predicting sporting success outcomes, Random Forest and XGBoost machine learning models trained on a standardised set of predictors demonstrated predictive ability above chance, though their performance suggested that incorporating additional variables could enhance accuracy. Overall, findings support encouraging multi-discipline engagement and managing competition load during youth to reduce dropout risk.
Keywords
Introduction
The World Aquatics Championships (WC), formerly known as the FINA World Swimming Championships, and the Olympic Games (OG) symbolise the highest level of swimming competitions where athletes, and their respective home nations, contest in various events covering different swimming disciplines and distances. Success in swimming is typically measured through qualification, finishing positions, performance times and medal achievements in these competitions.1–5 Researchers have explored the contributions of participation in Junior World Championships1–3,5 and previous participation at the WCs1,5 to the aforementioned success metrics, noting that similar success at the junior level does not appear to be a requirement. However, a narrow focus on these outcomes may overlook the importance of longer-term development of youth athletes.
A recent scoping review highlighted alternative forms of sporting success that extend beyond traditional metrics like medals and high rankings (Masismadi et al., manuscript in review). These include fostering athlete retention for longer careers and ensuring active participation during the peak performance age in the sport.6,7 Such measures not only complement traditional definitions of success but also underscore the importance of sustained engagement in providing athletes with more opportunities to achieve competitive milestones. The development of sport hinges on the cyclical process of recruiting a broad base of participants, sustaining their engagement through progressive stages of involvement, and systematically developing a select group of athletes for high-performance pathways. 8
Dropout from competitive sport carries implications not only for elite athlete development but also for the broader aims of national sport systems. Sport contributes to a healthier, more active population, supporting national productivity and reduced healthcare costs.9–11 Elite athletes can provide informal mentorship during their competitive careers, offering valuable guidance that supports the development of emerging talent and strengthens the broader sporting ecosystem.12,13 Beyond this, they may serve as role models – not only within their sport, but also in wider society – by exemplifying qualities such as resilience, integrity, kindness and purposeful effort.14,15 They can also elevate the profile of their sport, driving fan engagement, sponsorship, and media visibility. 16 Retaining their involvement beyond retirement in leadership, developmental or support roles preserves critical expertise and reinforces long-term system sustainability – making the minimisation of attrition in competitive pathways a strategic priority for sustaining both athletic excellence and broader system growth. 17 Although increased dropout is often observed during the age range in which puberty typically occurs,18–20 the present study focuses specifically on cessation from structured, sanctioned competition. Within the context of high-performance pathways, ‘dropout’ refers to this form of competitive withdrawal rather than complete disengagement from the activity of swimming. Bronfenbrenner and Morris’ Process-Person-Context-Time (PPCT) model provides a valuable theoretical lens for examining the complex interplay of factors contributing to youth sport dropout. 21 Drawing from the PPCT framework, Moulds et al. 20 conducted a systematic review of 69 studies on youth sport dropout, highlighting that 42% of them failed to clearly define or measure dropout. Among those that did, most used a relatively short-term disengagement period of one to two consecutive years while a few reported a period of between 3 to 6 years. The absence of standardised and transparent definitions compromises the accuracy of dropout estimates and hinders cross-study comparisons — an ongoing issue in talent identification in sport research, where inconsistent terminology remains problematic.22,23
The youth years, typically spanning ages 6 to 15, represent critical development years in which the athlete engages in sampling and specialisation of sport. 24 While youth sport development broadly encompasses this range, the present study focuses specifically on the later youth years of 13 to 18, corresponding to specialisation and early investment phases relevant to competitive swimming. The timing, frequency and diversity of competition events during this period shape their experience in sport and can influence whether athletes persist or dropout. Examining the structure of a youth athlete's competitive schedule can further illuminate its influence on dropout risk, providing actionable insights for sport organisations and coaches to optimise athlete retention and development for long-term success. Despite its potential significance, no study to date has explored dropout in relation to competition participation patterns during youth years. To ensure academic rigour in examining youth sport dropout, this study references the Youth Sport Dropout – Study Checklist to guide the framing of research in this area. 20
The primary objective of talent identification and development programmes is to produce internationally successful athletes, as measured by participation and medal success in major competitions.25–27 While athlete retention contributes to this goal, understanding and quantifying these additional outcomes remains essential. Research has shown that aspects of the youth sport experience can influence later success at the senior level.2,5,28–30 Athlete trajectories vary in both duration and competitive achievement, 31 making the prediction of medal potential a critical focus in high-performance sport, particularly given the finite resources available. These finite resources include direct monetary aid as well as indirect support through the establishment of TID programmes, provision of access to training centres, coaching support, and research initiatives. 32 Recent TID research has leveraged machine learning models to distinguish more and less successful athletes from self-reported training history.33–35 The increasing availability of publicly accessible competition datasets offers new opportunities to complement existing approaches and extend analysis to a more international scale.
Given the gap in literature, the primary aim of the current study is to investigate the link between competition participation history at youth ages and competitive dropout in swimmers. Specifically, this study addresses the influence of competitive debut age, experience of medal success, volume of competitive participation, and engagement in multiple swimming disciplines on dropout risk.
A secondary aim is to examine the predictive value of these variables for other success outcomes: (a) effective career length; (b) active competition status between 20 to 26 years old; (c) number of major swimming competitions (WC and OG) competed in; and (d) highest status achieved at these meets as either a medallist, non-medallist or non-participant.
Methods
Participants
Data were sourced from the World Aquatics (WA) database, covering sanctioned meets spanning 40 years between 1980 and 2019. This was done to prevent confounding effects of the COVID-19 pandemic on competition schedules. The raw data contained: athlete name, country represented, event raced, race time, medal (where applicable), pool length, age, competition, country competition was based in, date of competition, race ranking, and WA swimming points. Athlete ages were standardised to their age as of December 31 of the corresponding year. Race data from 2154 swimmers (F = 1197; M = 957) born between 1967 and 1987 were tracked over a 20-year period, spanning ages 13—the age of specialization in swimming as per De Bosscher et al. 36 — to 32. For the purposes of this study, ‘youth years’ refers to ages 13 to 18, and all swimmers included in the dataset had recorded at least one race result during this period. Events included backstroke (50 m, 100 m, 200 m), breaststroke (50 m, 100 m, 200 m), butterfly (50 m, 100 m, 200 m), freestyle (50 m, 100 m, 200 m, 400 m, 800 m, 1500 m), and individual medley (100 m, 200 m, 400 m) for both men and women. Data points were removed if no meet date or race time was provided. Athletes and their corresponding data were removed if a year of birth or biological sex could not be ascertained. The study was approved by the institutional review board of La Trobe University (IRB number: HEC21129). Participation consent was not needed as the study examined competition results obtained after the fact from a publicly available source. All statistical analyses were performed using R statistical programming language on RStudio software (RStudio Team, 2023).
Assessing dropout risk
Definition of competitive debut and dropout
An athlete's debut age (WA_debut) was defined as the age at their first recorded competition result in the dataset. For this study, WA_debut ranged from 13 to 18, reflecting the youth age range of the included athletes.
An athlete was classified as ‘active’ in a calendar year if they recorded at least one competition result and as ‘inactive’ otherwise. Dropout, defined as cessation from sanctioned competition in this study, was determined from four consecutive years of inactivity following an initial period of competitive activity (i.e., after debut). This threshold for defining dropout was selected following a sensitivity analysis and selecting the threshold that produced the lowest sum percentage of false positives and false negatives (Figure S1 in Appendix I). The age of dropout (dropout_age) was assigned as the age at the start of the first inactive year within this four-year period. For instance, an athlete who competed until age 25 but had no results between ages 26 to 29 would have a dropout_age of 26. Dropout_age ranged from 13 to 29, with the latter applying to athletes inactive from ages 29 to 32. Athletes missing competition for 1 to 3 years but returning with a result in the following year were considered to have taken a break and remained classified as ‘active’ in ensuing calculations for those years (Figure S2 in Appendix I).
Cox proportional hazards regression model
Survival analysis offers a robust methodological approach to examining time-to-dropout trends and their influencing factors across different fields. The Cox Proportional Hazards Regression37,38 assesses the effect of multiple risk factors on survival time and has been applied to investigate risk of dropout in athletes.18,19,39 The model accounts for right-censored data, representing swimmers whose survival time (i.e., time to dropout) was unknown due to the absence of the event during the data collection period.
Model covariates
The Cox proportional hazards regression model comprised 12 covariates to examine their influence on dropout risk (Table 1). Guided by the PPCT framework, 21 all four aspects were considered in the selection of covariates. Person factors included biological sex, debut age and medal success thus far. Biological sex (sex) was determined based on the category of the events in which athletes competed, either Men's or Women's races. 55.6% were women and the remaining 44.4% were men. The age at which an athlete first competed in a recorded event (WA_debut), was also included as a covariate – 1.9% debuted at age 13, 5.8% at age 14, 14.7% at age 15, 20.8% at age 16, 28.6% at age 17 and 28.2% at age 18. To represent medal success, the total number of medals won during youth years (noOfMedals_youth) was used. Process factors included the frequency of competition as well as the diversity of swim strokes. Competitive experience, instead of being measured by the number of active years as in prior studies,1,5,40 was quantified by race volume. Two variables captured this: the cumulative number of races competed during youth (totalRaces_youth) and the average number of races per active year of youth (meanRaces_youth). For example, an athlete who competed in 24 races from age 13 through to 18 would have totalRaces_youth = 24 and meanRaces_youth = 4. However, an athlete who competed in 24 races only from age 16 through to 18 would have totalRaces_youth = 24 and meanRaces_youth = 8. The breadth of stroke specialisation was measured by the number of different swim disciplines an athlete competed in during youth age (noOfDisciplines_youth). The maximum value of noOfDisciplines_youth is five for athletes who competed in the backstroke, breaststroke, butterfly, freestyle and individual medley events. Additional variables (backstroke_youth, breaststroke_youth, butterfly_youth, freestyle_youth and medley_youth) were created to capture the number of races competed in for each of the above strokes, regardless of distance. Context factors encompass the environments, systems, and structures surrounding the athlete. This study considered the geographical continent of the country the athletes represented (continent). In cases where multiple athletes shared the same name but represented different countries, they were treated as separate individuals. 5.6% were from Africa, 21.8% from Asia, 42.2% from Europe, 15.1% from North America, 9.3% from Oceania and 6.1% from South America. The distributions of these covariates are summarised in Tables S1 and S2 (Appendix I).
Description of the covariates used for the Cox proportional hazards model.
Relating to the Time aspect of the athlete's experiences, nine covariates were identified as being specific to the youth years: noOfMedals_youth, totalRaces_youth, meanRaces_youth, noOfDisciplines_youth, backstroke_youth, breaststroke_youth, butterfly_youth, freestyle_youth, and medley_youth. These covariates were treated as time-varying risk factors, meaning their values change over time as the athlete progresses through this period.41,42 Starting from age 13 or debut age, whichever earlier, the value of these covariates may increase year on year up till age 18 or dropout age, whichever earlier. From age 18 onwards, the value of these covariates remain constant until the athlete reaches age 32 or drops out, whichever occurs first. Thus, a time-dependent analysis that examined time intervals and event as a function of the covariates was employed.41,42 The final hazard ratio (HR) represents the weighted average of HRs specific to each one-year time interval. 41 The proportional hazards assumption was evaluated using Schoenfeld residual plots 38 (Figure S3 in Appendix I). Based on visual inspection of the curves, none of the covariates were deemed to have violated the assumption thus no further adjustment was needed to account for time-varying coefficients.
A systematic approach was employed to evaluate all possible combinations of 12 predictors, incorporating main effects and, where applicable, a single two-way interaction term. To ensure model consistency, interaction terms were only included if both corresponding main effects were also present. Cox proportional hazards models were generated for each combination, with a constraint of no more than one interaction term per model to preserve interpretability. Model performance was assessed using the concordance index (C-index) which evaluates the discriminatory power of a survival model. 43 This method facilitated a comprehensive exploration of predictor combinations while maintaining a balance between model complexity and predictive accuracy.
Statistical analysis
A higher C-index, closer to 1, indicates better model performance, as it means the model more accurately distinguishes between individuals with different survival outcomes. The model estimated regression coefficients (B), standard errors (SE), Wald statistics, p-values, HR, and 95% confidence intervals (CI). An alpha level of 0.05 was set to determine the statistical significance of each covariate. The HRs provide insight into the direction and strength of the relationship between each covariate and the risk of dropout. A HR greater than 1 indicates that an increase in the covariate is associated with a higher risk of dropout, while a HR less than 1 suggests a lower risk.
Predicting other sporting success outcomes
Predictor variables
The independent variables used for predicting other sporting success outcomes are the same covariates employed in the Cox regression model, with the distinction that these covariates are not treated as time-dependent in this analysis. An additional predictor, specialisation_youth, captured multi-discipline engagement during youth based on the number of disciplines participated in and the proportion of races in the dominant discipline. Athletes were categorised as specialised (competing in one discipline), less specialised (competing in multiple disciplines with one accounting for over 50% of races), or diversified (competing in multiple disciplines, none exceeding 50%).
Outcome variables
Four outcome variables were selected to represent diverse dimensions of sporting success (Table 2). Competition-based success was assessed through athletes’ participation and medal achievements at major senior competitions, specifically the WC and OG, between ages 13 to 32. The variable WCOG_status assigned athletes to one of three categories: medallist, for those who had won at least one medal at the WC or OG; non-medallist, for those who had participated in at least one WC or OG but had not medalled; and non-participant, for those who had never competed in the WC or OG. The variable WCOG_rep was defined as the total number of WC or OG editions in which an athlete competed and assigned into one of three levels, either ‘0 to 1’, ‘2 to 4’ or ‘More than 4’.
Description of the four sporting success outcome variables to be predicted.
Non-competition-based success was evaluated using two other outcome variables. The first variable, statusAtPeak, assessed athletes’ competitive status during the peak performance age for swimmers, recognised as 20 to 26 years.44,45 Athletes were classified as ‘active’ if they had at least one recorded performance during this age range or ‘inactive’ if they had none. The second variable related to career length, commonly defined as the total number of years between an athlete's debut and their last recorded competition before age 33, was refined in this study through the variable careerLength_eff. This variable captured the total number of years the athlete actively competed during this period, excluding years of inactivity. For instance, an athlete with a career spanning 20 years but with an 8-year competitive break would have a careerLength_eff of 12 years. careerLength_eff comprised three levels either ‘1 to 2’, ‘3 to 8’ or ‘More than 8’. The distributions of these outcome variables can be found in Table S3 (Appendix I).
Machine learning models
The task of predicting the selected sporting success outcomes was framed as a classification problem. To address this, two ensemble learning algorithms based on decision trees – RandomForest 46 and XGBoost 47 – were employed, both of which have been used in similar research.33–35 These models were built using the 13 predictor variables. The dataset was partitioned into training and test sets, with an 80%–20% split. Hyperparameter tuning was performed through a grid search across 20 different hyperparameter combinations, with performance evaluated via five-fold cross-validation. Given the class imbalances, class-specific recall, Brier score and weighted F1-score were used as primary evaluation metrics on the test set. For the binary statusAtPeak classification task, the Matthews Correlation Coefficient (MCC) was additionally computed to assess model performance, where a positive MCC implies predictive value beyond random guessing. 48
Results
Assessing dropout risk
Out of 71,670 models generated, the model that produced the highest C-index score of 0.644 demonstrated a moderate ability to distinguish between athletes who drop out sooner versus later. This model included the following predictor variables: sex, WA_debut, noOfMedals_youth, totalRaces_youth, meanRaces_youth, noOfDisciplines_youth, backstroke_youth, butterfly_youth, medley_youth, continent and the interaction term noOfMedals_youth:totalRaces_youth. Likelihood ratio test, Wald test, Log-Rank test and robust score test produced p-values <2e-16, indicating that the model is significant. Of the 2154 athletes examined, 1936 athletes – 1113 women and 823 men – had dropped out with approximately 10.1% still participating at the end of the data inspection period. 15.2% dropped out within the first year, an additional 6.9% and 8.3% by the second and third years respectively with 9.7% dropping out after more than 10 years. The median time to dropout is 6 years for men and 5 years for women.
The Cox regression analysis identified several significant predictors of dropout risk, including sex, WA_debut, noOfMedals_youth, totalRaces_youth, meanRaces_youth, noOfDisciplines_youth, continent and the interaction term noOfMedals_youth:totalRaces_youth (Table 3). Conversely, none of the stroke-specific covariates backstroke_youth, breaststroke_youth, butterfly_youth, freestyle_youth and medley_youth were significantly associated with dropout risk.
Cox regression analysis results.
Note. B = regression coefficient; SE = standard error; CI = confidence intervals
Person factors
Women (sexWomen) were 1.47 times as likely as men to drop out (B = 0.383, HR = 1.466, p < 0.001, 95% CI = [1.326, 1.622]). Athletes who debuted in WA competitions at an older age exhibited a lower dropout risk compared to those who debuted earlier (WA_debut: B = −0.230, HR = 0.795, p < 0.001, 95% CI = [0.745, 0.848]). Specifically, each additional year of delay in debut age corresponded to a 20.5% reduction in dropout risk. Winning more medals during youth age is associated with a lower risk of dropout (noOfMedals_youth: B = −0.109, HR = 0.897, p < 0.001, 95% CI = [0.870, 0.925]). Each additional youth medal decreases the likelihood of dropout by 10.3%.
Process factors
Participating in a higher total number of races during youth was associated with lower dropout risk (totalRaces_youth: B = −0.041, HR = 0.960, p < 0.001, 95% CI = [0.942, 0.978]). Dropout risk decreases by 4.0% for each additional race. However, the analysis also found that having a higher average number of races per year increases likelihood of dropout by 10.8% per race (meanRaces_youth: B = 0.103, HR = 1.108, p < 0.001, 95% CI = [1.065, 1.153]). The interaction term between the number of medals and total races during youth further revealed that the impact of total races on dropout risk is modulated by the number of medals won. Specifically, for each additional race, the risk of dropout increases by 0.2% more for athletes with higher number of youth medals (noOfMedals_youth:totalRaces_youth: B = 0.002, HR = 1.002, p < 0.001, 95% CI = [1.002, 1.003]). Greater participation in different disciplines is protective, reducing dropout risk (noOfDisciplines_youth: B = −0.087, HR = 0.917, p = 0.014, 95% CI = [0.855, 0.982]). Participation in specific strokes including backstroke (backstroke_youth: p = 0.353), butterfly (butterfly_youth: p = 0.656) and individual medley (medley_youth: p = 0.173) did not influence dropout risk.
Context factors
Lastly, compared to athletes from Asia, those from Europe (continentEurope: B = −0.505, HR = 0.603, p < 0.001, 95% CI = [0.527, 0.692]), Oceania (continentOceania: B = −0.332, HR = 0.718, p < 0.001, 95% CI = [0.599, 0.860]) and South America (continentSouth America: B = −0.732, HR = 0.481, p < 0.001, 95% CI = [0.377, 0.613]) were less likely to dropout. Athletes from Africa and North America also had lower dropout risk but the effect was not significant for either (continentAfrica: p = 0.875; continentNorth America: p = 0.062).
Predicting other sporting success outcomes
The set of predictors related to competition participation at youth age performed best in predicting WCOG_status and statusAtPeak based on weighted F1-scores (Table 4). The models performed worse in predicting the other success outcomes WCOG_rep and careerLength_eff.
Results of the RandomForest and XG boost models predicting WCOG_status, WCOG_rep, statusAtPeak and careerLength_eff.
Note: Confusion matrices can be found in the supplementary material Tables S4-S7.
WCOG_status
The XGBoost model better predicted whether an athlete's highest status at the WC and OG was as a medallist, non-medallist or non-participant, with a weighted F1-score of 73.3% compared to the RandomForest model that scored 62.8%, and comparable Brier scores of 0.206 and 0.207 respectively. The RandomForest model had greater recall in predicting medallists and non-medallists. Neither models could predict non-participants.
WCOG_rep
The RandomForest and XGBoost models had comparable weighted F1-scores of 48.0% and 48.3% and Brier scores of 0.289 and 0.299 in predicting the number of WC and OG appearances of an athlete between ages 13 and 32. The RandomForest model had better recall in predicting which athletes had competed in less than one WC and OG meet. However, the XGBoost model fared better in predicting those who had competed in two to four and more than four WC and OG meets.
Statusatpeak
The RandomForest and XGBoost models had comparable weighted F1-scores of 69.5% and 70.2% and Brier scores of 0.181 and 0.196 in predicting whether the athlete was still active or had stopped competing between the ages of 20 to 26 years. Additionally, the former had MCC score of 0.305 and the latter 0.316, both scores suggesting that the models had limited to moderate predictive value above chance predictions. The RandomForest model had greater recall in predicting inactive athletes while the XGBoost model fared better at predicting active athletes.
Careerlength_eff
The XGBoost model better predicted the number of years in which the athlete competed in at least one race between ages 13 and 32, with weighted F1-score of 54.5% and lower Brier score of 0.269 compared to the RandomForest model that had 51.4% weighted F1-score and Brier score of 0.277. The XGBoost model had greater recall across all classes.
Discussion
This study aimed to highlight how various aspects of an athlete's competition participation history may influence dropout risk and identify the predictive value of such aspects for different sporting success outcomes.
Factors contributing to dropout risk
Person factors
Women were found to be at a higher risk of dropout than men, a trend observed in swimming and other sports.18,20,49–51 Pizzuto et al. 51 proposed that women in track and field may disengage due to the longer time required to reach peak performance compared to men. In swimming, however, women tend to reach peak performance earlier than men 45 yet this study, along with two others on French and Australian swimmers, still report higher dropout rates among women.18,19 One contributing factor in some contexts may be the societal expectation that women take on primary caregiving responsibilities. 50 While men may also have caregiving responsibilities, they do not face the same prolonged, enforced break from sport due to pregnancy and childbirth. The following ongoing childcare demands can hinder their return to high-performance training and competition. In fact, the lack of financial and logistical support post-partum has been identified as a significant barrier to re-entry.52,53 Other reasons associated with dropout in women include pressure, intensity of training and disinclination towards competition, 20 but these could just as likely apply for men as well.
A later debut was found to correlate with lower dropout risk. Moulds et al. 19 similarly identified that younger swimmers in Australia were at a higher risk of dropout, suggesting this may be linked to the sampling nature of early participation in the sport. Elite Olympic swimmers typically specialise around the age of 13, with a variation of about 4.5 years. 36 Younger athletes may not yet possess the coping strategies to manage competitive stress, potentially increasing vulnerability to burnout. 54 Therefore, a later competitive debut could indicate that athletes who choose to compete at this stage have already made a commitment to specialise in swimming and are likely more serious about their career, thus less prone to dropout. Additionally, early specialisation when associated with high volumes of training and competition could lead to burnout from fatigue, injury and/or monotony at an early age.55,56
Regarding medal success at youth age, each additional medal earned was associated with a 10.3% reduction in dropout risk. While previous research suggests that youth medal success does not necessarily translate to success at the senior level,1,3 our findings indicate that it positively influences athlete retention. One possible explanation lies in the role of basic psychological needs: low satisfaction in these areas has been identified as a key factor in adolescent sports dropout.57,58 Medals can serve as a tangible marker of competence and progress, likely reinforcing motivation and commitment to continued participation in the sport. This aligns with literature citing performance stagnation as a contributor to dropout. 50 However, the effects of early success are not universally positive. For instance, Kristiansen et al. 59 found no significant link between Youth Olympic Games (YOG) medal success and retention in elite sport, noting that some youth athletes viewed medalling at the YOG as the culmination of their careers. More broadly, early medal success may have complex psychological effects: while it can enhance motivation by validating progress, it may also reduce long-term drive if perceived goals are prematurely fulfilled or increase pressure and anxiety about meeting heightened expectations.59,60 MacNamara et al. 60 highlighted a duality in athletes’ developmental responses to early performance outcomes: while some athletes used a lack of early success as motivation to continue competing and demonstrate their abilities, others who achieved early success reported increased pressure related to sustaining high performance and meeting elevated expectations. These effects likely coexist with a range of additional factors that fall outside the scope of this analysis such as injuries, socioeconomic background, parental influence and coaching support.50,58,61 Without direct input from athletes, coaches or parents, interpreting the underlying reasons behind observed dropout patterns remains speculative.
Process factors
Relating to volume of competitive participation during youth age, our findings caution against competing in higher number of races within a single year, while promoting higher total number of races overall. Gaining competitive experience plays a crucial role in athlete retention with each competition experience providing an additional opportunity to develop and hone competitive coping strategies, 54 but managing competition load is essential to prevent fatigue and burnout. 55 Born et al. 40 found that high-level swimmers typically had at least eight years of competitive experience, reinforcing the importance of sustained participation over more years for success. Our findings further support this by highlighting its significance in athlete retention.
The results indicate that while a higher number of races and greater medal success at the youth level were each individually associated with a reduced risk of dropout, the interaction between the two paradoxically elevated the risk of dropout. Previous literature on YOG athletes found no significant association between medalling and continued elite sport participation, potentially due to the prestige of the YOG leading athletes to view it as a career endpoint. 59 By contrast, our study includes medals from any sanctioned competition, and we hypothesise that this broader conceptualisation reflects a different motivational mechanism. Athletes who demonstrate both high racing volume and consistent medal success during youth may be at risk of overexposure, potentially reflecting the cumulative demands of sustained training and performance from an early age. The pressures associated with maintaining such high standards over time may contribute to psychological strain or burnout,55,56,60 which could increase the likelihood of dropout. Moreover, repeated success with limited exposure to failure may constrain opportunities for developing coping strategies and resilience 62 – skills that facilitate adaptation to the conditions of adult-level competition.
Engaging in multiple swim disciplines during youth was associated with an approximately 8% reduction in dropout risk per additional discipline. Similar to multi-sport participation, competing in multiple swim disciplines breaks the monotony of training and increases enjoyment. Exposure to varied experiences could allow athletes to better understand their preferences, develop adaptive skills and mitigate overuse injuries. 63 Athletes who disengaged from sport often report accumulating high volumes of deliberate practice hours with fewer opportunities for diverse experiences. 31 Additionally, better-performing swimmers were more likely to compete in multiple disciplines, 29 which may reinforce continued participation by enhancing perceived competence and motivation. Conversely, specialising in a single discipline from young may limit skill variety as well as expose athletes to higher psychological and physical strain, both known to be contributors to dropout.55,56
Context factors
Athletes from Asia were more likely to drop out compared to those from other continents, with statistically significant differences observed relative to athletes from Europe, Oceania and South America. The sample was predominantly composed of athletes from Europe, followed by those from Asia and North America. The dropout trends highlight the influence of broader sociocultural and institutional factors. For example, national sport policies can play a key contextual role. In Norway, youths are restricted from competing at national and international championships before age 13 to safeguard developmental wellbeing. 59 While this policy may not significantly influence regional-level trends given Norway's relatively small population in the European context, it serves to illustrate how national-level frameworks can shape sport participation and athlete retention. Although such regulations help mitigate early exposure to competitive stress, they may also delay opportunities to develop competition-specific skills. Another commonly cited barrier to sustained sport participation is conflict with non-sporting commitments including academic pressure.50,58 This tension may be especially pronounced in cultures that prioritise educational achievement over sport or lack adequate support for dual career pathways. These examples show how cultural and institutional contexts shape the competitive sport environment and may either support or constrain youth athlete retention. The elevated dropout risk among athletes from Asia in this study aligns with findings from Pizzuto et al. 51 on track and field athletes, suggesting cultural factors in certain Asian contexts may play a significant role.
Based on these findings, several practical applications within swimming can be considered. Initiatives should be implemented to address barriers specific to women, such as providing support for maternity breaks to improve retention. Competition selection should be carefully managed to ensure athletes face an appropriate level of challenge, maximising opportunities for medal success, which has been linked to lower dropout risk. Additionally, competition load should be balanced—limiting the frequency of races within a single year to prevent burnout while promoting consistent participation throughout youth. Lastly, encouraging multi-discipline exposure at a young age may help sustain motivation, enhance skill development, and reduce the likelihood of monotony-related dropout.
In this study, ‘specialisation’ is understood within the context of swimming, where participation across multiple disciplines (e.g., butterfly, breaststroke) reflects a form of intra-sport diversification. While conceptual parallels can be drawn with multi-sport participation, the structure and physical or technical demands of individual disciplines differ markedly between sports (e.g., gymnastics, athletics), limiting the direct transferability of findings.
Predicting success outcomes
Predicting medal potential and career longevity is a key focus in talent development research, driven by the need to allocate limited developmental resources effectively and the challenge of limited access to comprehensive data on international competitors. These constraints necessitate reliance on publicly available information and underscore the importance of accurate forecasting in high-performance sport. The accuracy of the machine learning models in this study was better than chance predictions. The models scored above 50% in weighted F1-scores for the two-category outcome statusAtPeak, and above 33.3% for the three-category outcomes WCOG_status, WCOG_rep and careerLength_eff. These results demonstrate the competition participation patterns during youth hold meaningful predictive signal for sporting success, though they are not strong predictors on their own. Other factors, such as injury, can significantly influence long-term outcomes, particularly in relation to career length. Nonetheless, these participation patterns provide value and may help inform broader evaluations of success.
In comparison to earlier studies, such as Pion et al. 39 which applied discriminant analysis on test battery results to predict dropout in gymnastics after 3 to 5 years with progressively higher accuracy rates (68.7%, 79.4%, and 87.7%), our study extends this predictive work by attempting to forecast athlete dropout in swimming over a much longer time frame — between 5 to 15 years — using the statusAtPeak outcome and competition history as primary predictors. This aims to provide a deeper understanding of how early competition participation and other factors influence athlete trajectories, even in the long term, in contrast to previous studies focused on shorter windows of time. Unfortunately, the results indicate that the accuracy of the predictive model remains limited and does not meet the desired level of precision.
Strength and limitation of the study
Defining dropout presents a challenge due to the fluid nature of athletic participation, particularly in youth sport, where various interpretations exist.20,57 To our knowledge, this study is the first to incorporate a sensitivity analysis to establish an appropriate dropout threshold, accounting for temporary absences due to factors such as injury or other interruptions. While this approach may not be directly applicable to prospective data collection, it provides a valuable framework for analysing retrospectively collected longitudinal data. Previous applications of survival analysis to examine dropout rate in gymnastics and swimming did not report the model's C-index18,19,39 leaving this study as the first to offer a basis for comparison.
A key limitation of this study is that, while it quantifies the impact of youth competition participation on dropout risk, it does not identify the definitive causal mechanisms. A complementary qualitative approach, such as interviews, could provide deeper insights into athletes’ motivations for leaving the sport which could be attributed to a number of factors including injuries, changing priorities and psychological reasons.50,58,61 However, given that this analysis relies on publicly available data spanning multiple countries, conducting such qualitative investigations remains impractical.
Conclusion
This study highlights the significant role of competition history in understanding youth dropout in competitive swimming. Key factors such as competition volume, multi-discipline engagement, and medal success have been shown to influence dropout risk. Dropout risk was also observed to be higher for women than men, and for athletes from Asia compared to other continents. A more comprehensive understanding requires considering additional variables, such as training history, while acknowledging that factors like injuries, though difficult to predict, also influence dropout risk and retention. Future models should incorporate a broader range of influencing factors to improve predictions of long-term retention in sport and achievements in major swimming events. To mitigate early talent loss and foster sustainable athlete development, sport governing bodies and coaches must prioritise balanced competition schedules and encourage multi-disciplinary participation in youth swimming programmes. Additionally, there is a need to monitor the combined impact of competitive volume and early success, and to consider integrating structured challenges, including exposure to setbacks, to support long-term athlete development and retention.
Supplemental Material
sj-pdf-1-spo-10.1177_17479541251372398 - Supplemental material for Utilising competition participation history at youth age to forecast dropout risk and sporting success likelihoods in competitive swimming
Supplemental material, sj-pdf-1-spo-10.1177_17479541251372398 for Utilising competition participation history at youth age to forecast dropout risk and sporting success likelihoods in competitive swimming by Nur Adilah Masismadi, Matthew Wylde, Minh Huynh, Paul B. Gastin, Esther Chia and Haresh T. Suppiah in International Journal of Sports Science & Coaching
Footnotes
Author contributions
Conceptualisation: Nur Adilah Masismadi, Matthew Wylde, Minh Huynh, Paul B. Gastin, & Haresh T. Suppiah; Methodology: Nur Adilah Masismadi, Matthew Wylde, Minh Huynh, & Haresh T. Suppiah; Data analysis: Nur Adilah Masismadi, Minh Huynh & Haresh T. Suppiah; Writing – original draft preparation: Nur Adilah Masismadi; Writing – review and editing: Matthew Wylde, Minh Huynh, Paul B. Gastin, Esther Chia & Haresh T. Suppiah
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Data availability
Not applicable.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical considerations
The study was approved by the Ethics Committee of La Trobe University (no. HEC21129) on May 07, 2024, with the need for written informed consent waived.
Funding
This work was supported by the National Youth Sports Institute-La Trobe University Industry-Funded Graduate Research Scholarship and a La Trobe University Full Fee Research Scholarship.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
