Abstract
The question which nature and scope of developmental participation patterns lead to international senior-level success has been controversially discussed in the literature for many years. The present article aimed to extend existing literature in two respects. First, we reviewed studies comparing developmental sport activities of international-level and national-level athletes. The results indicated that comparisons among the highest success levels are infrequent, findings partly varied across studies, while the practice volume in other sports, but not in the athlete’s main sport, mostly distinguished international-level from national-level athletes. Second, a new methodical approach combining decision trees and gradient boosting (conducted under the R environment) was applied to data from a previously published study. It allowed for multivariate, interactive, and nonlinear analysis and was promising to achieve relatively better explanation than earlier, traditional procedures. The results indicate that some formerly found differences between international and national-level athletes in the volume of main-sport and other-sports practice may represent artifacts of uncontrolled age effects, rather than variables factually differentiating success. In the context of the specialization–diversification debate, the present findings suggest that the debate addresses a “production function,” the structure of which is still unknown. Practice-related recommendations on developmental participation patterns are apparently expressions of highly rationalized myths, rather than evidence-based efficient norms.
Keywords
Introduction
Independent of the importance of genetic variation for reaching exceptional performance, authors seem to agree on the essential role of (structured) practice (input side) for success in international senior-level competitions (hereinafter: international success, representing the output side; Güllich, 2017; Tucker & Collins, 2012).
Apart from this broad consensus statement, the nature and scope of activities and developmental participation that lead to extraordinary success in sports have been controversially discussed in international literature for many years (Côté, Baker, & Abernethy, 2007; Güllich, 2017; Sieghartsleitner, Zuber, Zibung, & Conzelmann, 2018). Given the increasing number of international competitions with growing governmental involvement (Heinilä, 1982; Houlihan & Green, 2008) and the belief that senior sports success is producible (De Bosscher & De Rycke, 2017; De Bosscher, Shibli, Westerbeek, & Van Bottenburg, 2015), there is a clear need for more detailed research on the career of athletes achieving international success from different theoretical perspectives (Barth, Emrich, & Daumann, 2018; Barth, Güllich, & Emrich, 2018; Güllich & Emrich, 2014).
This article contributes to the existing body of literature in two respects: first, by reviewing the existing literature on developmental activities—that is, the volume of domain-specific structured practice (institutionalized practice in organized settings such as sports clubs, extracurricular high-school sports, or sport academies in the main sport of the athlete; hereinafter: main-sport practice) as well as outside domain-specific structured practice (institutionalized practice in other sports; hereinafter: other-sports practice) and, additionally, the age at the beginning of main-sport practice—of athletes achieving at least once international success (hereinafter: international-level athletes) and (internationally) less successful senior athletes (hereinafter: national-level athletes), laying special focus on the statistical methods applied in the empirical studies. Second, a new methodical approach, combining decision trees and gradient boosting, will be applied to revisit data from an early study by Emrich and Güllich (2005). 1 This approach does not only allow for the realization of a multivariate analysis but also gives reasonable hope of achieving a relatively better explanation than with procedures applied in the past.
This article is structured as follows: Originating from two well-established theoretical frameworks about developmental activity patterns leading to expertise and/or extraordinary success in sports, the question of operationalization for empirical testing is discussed. This enables the restriction of literature to be reviewed. After a brief presentation of the excluded articles and reasons for excluding them, the findings of the literature review are reported. Then, the methods used in the empirical study are described. Afterward, the results are presented, followed by a discussion of the study’s limitations as well as future directions in the study of athlete development in sport. Finally, a conclusion is drawn.
Problem and State of Research
The problem and state of research part of this article is aimed at setting thematic limits: A restrictive approach is applied to avoid extrapolating the scope of findings and inappropriate considerations. In this context, reasons for including the examined articles and excluding others are given. In this, both the input side and output side were examined.
Input Side
The debate on developmental activity patterns leading to expertise and/or exceptional success in sports was particularly marked by a controversial discussion of the “Deliberate Practice” (Ericsson, 2006; Ericsson, Chase, & Faloon, 1980; Ericsson, Krampe, & Tesch-Römer, 1993) concept (hereinafter: DP) and the “elite performance through sampling and deliberate play” pathway of the “Developmental Model of Sport Participation” (Côté et al., 2007; hereinafter: DMSP) (Sieghartsleitner et al., 2018). Côté et al. (2007) themselves describe this pathway as “elite performance through sampling.” However, it should be noticed that the sampling and deliberate play pathway is marked by three dimensions (sampling, inherent enjoyment/playfulness, and peer-led/youth-led/loosely supervised). In this context, deliberate play already incorporates two of these dimensions (playfulness and peer-led/youth-led; cf. Côté & Erickson, 2015). “Elite performance through early specialization,” the second pathway of the DMSP (Côté et al., 2007), aligns with the DP (Güllich, 2017).
Contrasting the DP with the sampling and deliberate play pathway of the DMSP enables the construction of a three-dimensional space (Figure 1). On the first axis we can have the personal value the activity provides to the participant (intrinsic values, that is, activities done for inherent enjoyment/playfulness vs. extrinsic values, that is, activities performed to improve skills or performance), on the second axis the social structure of the activity (peer-led/youth-led/loosely supervised vs. coach-led/adults-led/highly supervised), and on the third axis the domain specificity of the activity (sampling/diversification vs. specialization; for the first two dimensions compare Côté & Erickson, 2015). Although all dimensions seem to form a continuum with a broad spectrum of different possibilities, many researchers treat them as dichotomous counterparts (Sieghartsleitner et al., 2018).

Schematic illustration of specialization and sampling in structured/institutionalized practice (authors’ illustration).
Not least because of its strict definition, DP has since been heavily criticized (e.g., Abernethy, Farrow, & Berry, 2003; Helsen, Starkes, & Hodges, 1998). However, in literature there seems to exist a broad consensus that the concept of DP is related to the idea of maximizing domain specificity/specialization and structured (coach-led/adults-led/highly supervised) practice (performed to improve skills/performance) from an early stage on. In contrast, the sampling and deliberate play pathway of the DMSP recommends early involvement in several sports (sampling) in combination with a high amount of deliberate play (inherent enjoyment/playfulness and peer-led/youth-led/loosely supervised) and a low extent of DP at an early age (“sampling years”: 6-12 years; “specializing years”: 13-15 years; “investment years”: age 16 years and older; Côté et al., 2007, pp. 196-197).
Testing these two concepts empirically, the first question of operationalization arises on the input side. Particularly in the context of the two dimensions—personal value, the activity provides to the participant, and the social structure of activity—different approaches are to be observed: first, studies categorizing activities on the basis of their reported nature and purposes (e.g., Berry, Abernethy, & Côté, 2008; Memmert, Baker, & Bertsch, 2010) and second, studies using the organizational structure of the athletes’ activities to distinguish between structured practice and deliberate play. In the context of the latter, authors assume that institutionalized sports participation is connected to structured practicing activities, whereas activities outside sports clubs are associated with a playful and loosely supervised form of participation (Côté, Baker, & Abernethy, 2003). Basically, Côté et al. (2003) describe structured practice as “activities typical of organized sport” (p. 95). Although this approach seems to be appropriate to distinguish between structured practice and deliberate play, it seems to be questionable, if a differentiation between main-sport practice and DP as defined by Ericsson et al. (1993) is possible. Considering Côté et al. (2003, p. 95), who compare “free play,” “deliberate play,” “structured practice,” and “deliberate practice,” it seems reasonable to view DP as a special form of structured practice and of main-sport practice. We thus consider DP as a special form of structured practice, not only because of its sport specificity but also because of its characteristics on the different dimensions as described by Côté et al. (2003).
Furthermore, it seems reasonable to describe other-sports practice as structured practice in other sports, thus as highly supervised performance-oriented activities in other sports and therefore with low domain specificity. Consequently, this article contributes primarily to the specialization/sampling-debate in structured/institutionalized practice, but not to the DP/sampling and deliberate play debate.
After defining our main variables of interest on the input side we take a closer look on the side of the output, hence the question of defining expertise (and success, respectively).
Output Side
Many articles with a broadly diversified thematic spectrum have been published on the topic of talent development in sports, including several articles reviewing the literature on developmental patterns of expert sports performers (e.g., Baker, 2003; Baker, Cobley, Fraser, & Thomas, 2009; Baker & Young, 2014; Côté et al., 2007; Coutinho, Mesquita, & Fonseca, 2016; Davids & Baker, 2007; Davids, Güllich, Shuttleworth, & Araújo, 2017; Güllich & Cobley, 2017; Güllich & Emrich, 2014; Rees et al., 2016; Vaeyens, Güllich, Warr, & Philippaerts, 2009). Recently, Macnamara, Hambrick, and Oswald (2014) published a meta-analysis on “Deliberate Practice and Performance in Music, Games, Sports, Education, and Professions” (as well as a corrected version of it in Macnamara, Hambrick, & Oswald, 2018); in addition, a meta-analysis on the relationship between DP and performance in sports was presented by Macnamara, Moreau, and Hambrick (2016).
Examining empirical articles in more detail shows that most of them are based on “the relative approach,” comparing groups with higher and lower performance levels (Coutinho et al., 2016), while, however, using different and inconsistent criteria for what defines an expert athlete. Mostly, the differentiation of experts and nonexpert athletes is based on the level of proficiency attained. Interestingly, Coutinho et al. (2016) stated that nonexpert athletes are usually only classified by their failure to meet expert athletes’ criteria. We clearly follow Coutinho et al.’s (2016) statement that a detailed description of the criteria defining an athlete’s level of expertise should be given. However, we want to emphasize that these criteria should not only be given to increase the understanding of what an expert is, but also to better understand what a nonexpert is, because the way this group was built and possibly restricted downward also influences the findings. In the light of the results of Emrich and Güllich (2005), Güllich and Emrich (2006), and Güllich and Emrich (2014), Güllich (2014) states that “conditions for international senior-level success cannot be concluded by simply extrapolating the scope of findings from athletes with moderate success level or from junior athletes” (p. 764). This means, when determining the level of the athletes and assigning them to groups for comparison in prospective empirical studies, we should at least take two variables into account: first, the athletes’ age level, which is commonly determined in studies by the level of the competition/league athletes are competing in and/or on behalf of the teams athletes are affiliated to. Some studies additionally apply an age limit, which is oriented at the international competition regulations of the athletes’ respective sports. Second, a variable for distinguishing experts and nonexperts (e.g., success in competition) and the level each group has to achieve (e.g., medalist at Senior World Championship) should be considered.
Due to the article’s purposes, the literature review therefore comprises (based on a relative approach) studies comparing developmental activities (i.e., practice in main sport and other sports 2 and additionally the age at onset of main-sport practice) of international-level and national-level athletes. We used Macnamara et al.’s (2016) search methods and results as a starting point for our literature research, extended the search through July 14th, 2018, and added the search terms “talent development” and “early specialization.” The search resulted in 81 additional eligible records that were scanned together with those previously identified by Macnamara et al. (2016). Ten studies met the criteria defined for the purpose of the present study: comparison of international-level and national-level athletes, reporting age of starting main-sport practice, accumulated amount of main-sport practice and/or of other-sports practice, and being published in English in a peer-reviewed journal. 3
Due to the fact, that in this context clarifying and operationalizing are normative issues, we decided to inductively develop our classification system and therefore the limits for the determination of international-level and national-level athletes. International-level athletes were defined considering the entire range of articles, namely as athletes having achieved international success at least once. This means that studies did not compellingly have to have used a cut-off age to be included in our review.
The international-level athletes’ peers for comparison—the less successful athletes (national-level athletes)—are to be described as athletes belonging to a nation’s best senior athletes (later, we will further distinguish between national best and nation’s best athletes), but having never achieved international success. Determining the senior level was possible through the athlete’s affiliation to a nation’s senior national team or national squad or participation in open-age leagues/competitions. We would like to state that this does not seem completely unproblematic because national squads of sports governing bodies commonly encompass junior athletes (and even youth athletes). Table 1 shows the respective classification of international-level and national-level athletes.
International-Level and National-Level Athletes Classification (Age Level and Level of Achievement).
Although there exists a correlation between level of competition and age limit for (international) competitions regulations, the one does not automatically determine the other. However, it can be assumed that in view of the criteria used in most of the studies the athletes’ age level can be described as senior.
Beside age level of athletes and achieved success level in competitions, we used a third variable to exactly characterize the groups of the examined studies, called selection level. It distinguishes between athletes selected for a senior national team (=highest level of squads of a national sports governing) and athletes affiliated to a club playing in the highest national league/division or to a senior squad of a national sports governing body, but not to the senior national team. Therefore, less successful athletes have to be at least national best athletes. However, it should be noted that a description of national-level athletes only by their level of selection seems to be problematic, because nations’ levels in a certain sport are not congruent. Unfortunately, the same is true for success in national competitions.
Tables 2 and 3 present a detailed description of the success level in competitions and the selection level.
Description of Levels of Success.
Levels of Selection.
Note. Athletes were classified via the affiliation given in the respective study or via their level of success (e.g., medalists at Senior World Championships have to be part of the senior national team). However, this means the affiliation can correspond to the time of the survey or the time at which the athlete achieved the success.
These operational definitions were primarily used to select studies appropriate for inclusion in our review. This review is aligned with the works by Güllich and Emrich (2014), Rees et al. (2016), and Davids et al. (2017), but new articles were added. Before the findings of this review are concisely described, the reasons for omission from the pool of (apparently) eligible studies are defined. We feel this to be important because reviews commonly compromise a broader range. However, those approaches may run the risk of extrapolating results; thus, we used quite a restrictive approach to avoid inappropriate considerations.
Excluded Articles
Table 4 shows the excluded articles 4 and reasons for excluding them from this review.
Articles Excluded From the Review.
Included Articles
Based on the criteria stated above, we identified 10 studies, including one additional study (De Bosscher & De Rycke, 2017) although it does not report on the volume of practice in main sport and other sports. The rationale behind is that this study has been published recently and includes a huge sample size as well as a definition for the comparison groups applied. Although the scope of our study deals particularly with the question of the success-relevance of practice in main sport and other sports, it seems important to show the results of our analysis, which are limited to comparing the group’s success level as presented in Table 5.
Success Levels of International-Level and National-Level Athletes in Included Studies.
Note.
International-level athletes
National-level athletes
I = individual sport(s), T = team sport(s).
The study distinguished between three levels of success. Here, the highest and lowest levels (national level) are compared.
In addition to the variation in the absolute success level of both groups as well as “the difference in the differences of groups” between the studies, the analysis shows that some studies have additively applied age limits, while others have not. Furthermore, some studies compare successful national team athletes with less successful ones, whereas others compare successful national team athletes to national best athletes. In addition, as international-level athletes were older than national-level athletes in some studies (cf. Table 6), the question arises whether these compared successful and less successful athletes or rather “not yet successful athletes.” With regard to the statistical procedures to be applied, these problems clearly speak in favor of the use of covariates or the introduction of matching procedures to control, among other factors, the age of the athletes.
Description of the Sample, Age at Onset of Main-Sport Practice, and Its Success Relevance Within the Included Studies.
Note. + = international-level athletes younger than national-level athletes; – = international-level athletes older than national-level athletes; O = no significant difference; n.a. = information not available.
Effect sizes were calculated using an estimation function for hedges (g*) (Fröhlich & Pieter, 2009). It should be noted that, in the absence of further details in the contributions, the effect strengths were calculated using the total sample sizes.
Results were not reported for the whole group.
Using data from an identical survey.
Retrieved from Vaeyens et al. (2009).
According to the description of the achievement levels, the study distinguished between three levels of success. Here, the highest and lowest levels (national level) are compared.
Unclear, whether the age is related to the onset of structured/institutionalized practice or overall participation.
No inferential statistical results were reported.
Before describing the included studies’ samples and the reported ages at the onset of main-sport practice, it should be mentioned that four of the 10 studies have used data from an identical survey. These are marked in Table 6.
Four of 10 studies are to be described as studies with small sample sizes (one of the groups with n < 20), and these are exactly those reporting no significant difference between international-level and national-level athletes concerning the age of onset of main-sport practice. At this point, the importance of reporting effect sizes becomes clear. For this reason, the effect sizes were calculated, if possible, according to an estimation function for hedges g (g*). In four cases, the effect strengths can be described as “small” and in one case as “medium” (Rasch, Friese, Hofmann, & Naumann, 2010).
The four studies using data from an identical survey (Studies 2, 4, 6, and 9 in Table 6) reported that international-level athletes start main-sport practice at a significantly older age compared with national-level athletes. However, De Bosscher and De Rycke (2017) came to adverse results. Although Güllich and Emrich (2014) report that no contrary findings were revealed in any sports category, a difference in the relative frequency of sports (categories) may have led to the different results in the studies. To avoid an influence due to a difference in the distribution of relative frequencies of sports categories between international-level and national-level athletes, the sample has to be tested for homogeneity of the respective distributions. Surprisingly, none of the studies involving multisport samples reported such a test. This seems to be a matter of concern, not only for results in the context of comparing the age at onset of main-sport practice. Excepted from this problem for multisport studies is the study of Güllich (2017) because it applies a matching procedure with the type of sport being one of the matching variables.
Table 7 shows the (aggregated) results for the (success) relevance of the practice in main sport and other sports, respectively for international success.
(Success) Relevance of Practice in Main Sport and Other Sports for International Success (Referring to Inference Statistical Results).
Note. n.a. = no information available. Relevance for success: + = significant positive correlation (international-level athletes train more compared with national-level athletes); – = significant negative correlation (international-level athletes train less compared with national-level athletes); O = no correlation; x/y = the majority of the results in this category correspond to x, but y was also found; x//y = x and y were determined the same number of times.
Regarding the effect of main-sport practice on international success in juvenile age categories/during childhood and adolescence (≤18 years), the results are inconsistent with studies finding no and negative effects. Interestingly, only one study found a positive effect of main-sport practice. However, it must be said that the sample of the study consists of rhythmic gymnastics, which means athletes of a sport which can be described as an “early specialization sport” (cf. De Bosscher & De Rycke, 2017). Interestingly, six studies reported a positive effect of other-sports practice on international success.
Analyzing the evaluation of practice in main sport and other sports reveals that different variables have been used to measure the volume of training and test for differences. Differences are to be found not only in the way measurement was done (hours vs. numbers of sessions) but also in the way data were analyzed (expressed p.a., accumulated within age categories).
We are particularly interested in whether authors have used multivariate inference statistics to analyze the training history of athletes. As already mentioned, no data related to this aspect were collected except in one study; therefore, nine studies were analyzed in this context. Interestingly, only three studies applied multifactorial (but univariate) methods: Two studies used the factors success level and sports category and then performed univariate analysis on practice in main sport and other sports. 6 The third study used a univariate MANOVA (ANOVA with repeated measurement, whereas the two other studies used ANOVAs only; interestingly, with no use of differences between age categories) and thus stages in age and success levels as factors.
Two further additions should be made before summing up the findings. First, the study not reporting any data on the athletes’ practice history (De Bosscher & De Rycke, 2017) introduced age as important covariate. Second, as already mentioned above, two of the studies applied a matching-procedure (variables taken together for both studies: gender, age, sports category, discipline, and performance at age 19 years) to control for confounding.
Literature Review Summary
Summing up, our literature review reveals that
experts are defined in the literature both by their performance in specific aspects/skills and directly by their competitive success. In both cases, this is a normative setting;
several studies cited in other studies and/or reviews examining the relevance of main-sport practice (often interpreted in the broader context of DP) and other-sports practice for success (not seldom interpreted in the sense of international success) have not compared international-level with national-level athletes, but athletes at lower age levels or senior athletes across lower and/or more heterogeneous success levels;
investigations on the most accomplished performers are still relatively scarce (Güllich & Emrich, 2014);
although, in comparison to the existing literature, we used a rather restrictive definition for studies to be included in our review, our analysis reveals not only that studies have compared groups at different success levels, but also that there exists a “difference in the differences of groups”;
no study has differentiated between single success and multiple success (on the concentration of success and the resulting precariousness of this categorization compare Barth, Güllich, et al., 2018);
too little attention has been paid to the description of national-level athletes;
the selection and description of groups should include at least three variables:
7
the athlete’s “age”: age category (e.g., above the junior age limit of the sport’s respective international competition regulations; senior/adult) and age level of most competitions on national and international level (e.g., internationally: youth; nationally: senior) in the last 12 months, the athlete’s “affiliation”: age category (e.g., senior) and performance level (e.g., National Team, B-squad) at the time of the survey, and the athlete’s (greatest) success at national and international level: age category (e.g., not restricted; senior), level (e.g., World Championships), and rank (e.g., 2nd place); international competitions results are to be preferred, because the national level may vary between nations in sports;
studies are inconsistent in their findings regarding the relevance of practice in main sport and other sports for international success. On one hand, a positive effect of main-sport practice or negative effect of other-sports practice in childhood and adolescence has not yet been reliably demonstrated. Although main-sport practice and DP may not be assessed as congruent (cf. problem and state of research), the findings obtained to date have called into question the prediction of the DP framework assuming that “early specialization”8,9 is overrepresented in international-level athletes and that they have accumulated more main-sport practice (cf. Güllich, 2017). On the other hand, from seven studies reporting about the relevance of other-sports practice for international success, six mentioned at least partially about a positive effect. However, the empirical verification of a positive effect of sampling (in structured practice) is—especially before the background of the shown weaknesses in the context of data analysis in the studies—still questionable.
none of the studies involving multisport samples has reported testing on homogeneity of the relative frequency of sports or sports categories between the groups of international-level and national-level athletes. This may not only confound previous studies’ results, 10 but also impede the comparability of the studies. Prospectively, studies should report the (relative) frequency of sports or at least sports categories in both groups;
different variables have been used to represent main-sport practice;
only three studies have applied multifactorial statistical procedures to analyze the success-relevance of practice in main sport and other sports; only one of those used a MANOVA;
only two studies applied a matching-procedure to control for confounding;
no study analyzing the success-relevance of practice in main sport and/or other sports used age as covariate;
no study used multivariate inference statistics to jointly analyze practice in main sport and other sports 6 ;
no study analyzed an interaction effect between practice in main sport and other sports 6 ;
with exception of the above-mentioned study analyzing data gathered in a longitudinal design (Güllich & Emrich, 2014), no study used hierarchical discriminant function analyses or regression analysis; however, this has been recommended by several authors (e.g., Coutinho et al., 2016; Ericsson, 2013, 2016);
there exists a clear lack of empirically verified theories, and therefore research in this field has to be described as explorative.
Given the fact that most studies in the broader context of talent development in sports have used a relative approach, the results of our literature research seem alarming in several ways: first, the small sample size in some studies, which may be justified by the fact that the population of international-level athletes is small per se. However, considering the power of inference statistical procedure, it seems to be a matter of concern as the tests applied will produce significant results only if the effect size is high enough (increasing beta error with smaller sample size). Second, only few studies have applied statistical procedures able to control for confounding variables; and third, no study analyzed a possible interaction effect of practice in main sport and other sports. 6 Fourth, it must be generally said that the relative approach very rarely allows causal conclusions (Furley, Schul, & Memmert, 2016).
Therefore, the aim of this study is to analyze the relevance of practice in main sport and other sports absolved in childhood and adolescence for international success by applying a new methodical approach, combining decision trees and gradient boosting, to data from the early study by Emrich and Güllich (2005). This approach allows the realization of multivariate analysis and thus controlling for possible cofounders. Furthermore, the method used permits us to discover nonlinear relationships and interaction effects. Therefore, hoping to achieve a relatively better explanation with this approach than with procedures applied in the past seems reasonable. To the best of our knowledge, the application of gradient boosting to talent development in sports has not been fully documented to date. For a first application of the Gradient Boosting Machine (GBM), compare Barth and Emrich (2018).
Methods—in Search of an Appropriate Procedure
The statistical procedures applied in the analyzed studies have hardly fulfilled the recommended application of more complex methods for data analysis, such as multilevel modeling, structural equation modeling, or regression analysis (e.g., Coutinho et al., 2016; Ericsson, 2013, 2016). Furthermore, against the background of our literature review, showing a clear lack of empirically verified theories, research in this field has to be described as explorative. Therefore, the proposed statistical approaches, starting by assuming an appropriate data model and estimating parameters for this model from the data, seem to be less appropriate. To avoid starting with a data model and use general-purpose learning algorithms, instead, to learn about the relationship between the response and its predictors (Bzdok, Altman, & Krzywinski, 2018; Elith, Leathwick, & Hastie, 2008), approaches from machine learning seem to be more suitable. With such an approach we are not only able to carry out a multivariate analysis of data and therefore controlling for a potential age-bias, but also to discover more complex/nonlinear relationships.
“Among the machine learning methods used in practice, gradient tree boosting is one technique that shines in many applications” (Chen & Guestrin, 2016, p. 785). The basic idea of decision tree–based regression analysis is to divide the data set by means of the independent variables into classes that are as homogeneous as possible regarding the response variable. The goal of training a decision tree is to learn a sequence of yes/no questions (tests) that leads to the correct answer regarding the outcome variable as quickly as possible (Müller & Guido, 2017). It is therefore a representation of a complex relationship within data by means of the collection of binary decision rules (Adler, 2010). One of the main disadvantages of decision trees is their tendency to overfitting the training data. Furthermore, trees generally do not achieve the same level of predictive accuracy as some other regression and classification approaches (James, Witten, Hastie, & Tibshirani, 2013). However, their predictive performance can be substantially improved by aggregating many of them. For these reasons, so-called ensemble models—combining the prediction from multiple models—are usually used instead of single decision trees (James et al., 2013; Schmidberger & Stahl, 2018). The underlying idea is that it seems easier to find and combine several rules of thumb than to define a single highly accurate rule (Elith et al., 2008).
A specific strength of an additive model of decision trees is that nonlinearities and interactions need not be explicitly specified beforehand (Miller, McArtor, & Lubke, 2017; Schonlau, 2005). A further advantage of decision tree–based models is their robustness against collinearities under the following features (Luo et al., 2018). In the relevant literature on machine learning, there is a multitude of methods that can be assigned to the category of “ensemblers.” Three of the best known methods used in the context of decision trees are bagging, random forests, and boosting. Bagging stands for bootstrap and aggregation. Random forests provide an improvement over bagged trees. Unlike bagging, where each decision tree is built on a bootstrap data set and trees are built independently before they are finally combined to create a single predictive model, boosting allows trees to grow sequentially/iteratively, that is, each new tree uses the information from previous trees (James et al., 2013). Today there is a variety of boosting algorithms for decision trees, including adaptive boosting (AdaBoost), gradient boosting (GBM; Friedman, 2001), or (the advancement of GBM) extreme gradient boosting (XGBoost). XGBoost belongs to the group of tree learning algorithms, whose impact has been widely recognized in a number of machine learning and challenges (Chen & Guestrin, 2016; for a comparison of different supervised machine learning methods compare, for example, Caruana & Niculescu-Mizil, 2006). Especially XGBoost recently enhanced great reputation by consistently winning competitions hosted by the machine learning competition site Kaggle (Chen & Guestrin, 2016; Wang & Ross, 2018). Although using these new approaches might require some reorientation in thinking, the clear evidence of their strong predictive performance and reliable identification of relevant variables and interactions (Elith et al., 2008) make them an extremely promising approach for the research problem at hand.
Data from Emrich and Güllich (2005) were reanalyzed by using XGBoost. Implementation was done by the freely available package XGBoost (version 0.71.2) under the R environment (version 3.5.1; cf. Chen & Guestrin, 2016). Except from maximum depth of a tree (application of four) default values were used. The number of rounds for boosting was 200. For documentation of XGBoost, compare Chen et al. (2018). This seems particularly relevant as four of the 10 studies discussed in this review have used data from an identical project, which were also used in this work. For a description of the methods for data collection, compare Emrich and Güllich (2005) and Güllich and Emrich (2014).
By following a relative approach, we differentiated between international-level and national-level athletes; therefore, our task can be described as binary classification problem. We define international-level and national-level athletes in accordance with Emrich and Güllich (2005): International-level athletes are athletes having reached a 1st to 10th place at Olympic Games, Senior World Championships, or European Championships. 11 After deleting, among others, the records of athletes not having exceeded the junior-age limit according to the international competition regularities of the respective sport, the number of participants in the data set decreased from n = 1,566 to n = 595 athletes. The senior-age level sample includes 211 national-level athletes and 484 international-level athletes. Concerning the success level of national-level athletes, it can be said that around 75% of the athletes achieved a medal at senior national-level competitions and/or had (not successfully) participated in the mentioned international competitions. Therefore, describing this group as national-level senior athletes, in brief national-level athletes, seems justified. The percentage distribution of national-level athletes and international-level athletes within the individual sports groups 12 does not differ significantly from one sports group to another (χ² = 7.33; df = 4; p = .119; n = 595). The results are shown in Table 8.
Distribution of National-Level Athletes and International-Level Athletes Within Different Sports Categories.
Furthermore, we tested for each group if it differs from all other groups taken together. The analysis reveals a significant result only for artistic composition sports compared to all other groups taken together (χ² = 5.32; df = 1; p = .021; n = 595).
An inspection of the athletes’ mean age at the time of their interview shows that national-level athletes (Mdn = 22.0; interquartile range [IQR] = 4.8; n = 210) were significantly younger compared to international-level athletes (Mdn = 24.0; IQR = 7.0; n = 384) (U = 28,667, p < .01). This speaks in favor of the need to control for age. To classify the groups, we used the following predictors: age of athletes at the time of the survey, gender (dummy for male), categorization of sports (dummies), cumulative main-sport practice as well as other-sports practice sessions in each age category during childhood and adolescence (up to 10 years, 11-14 years, and 15-18 years), age at onset of training in the athletes’ main sports and in other sports, age at first admission to a national squad and to an Olympic Training Center (OTC). The volume of structured/institutionalized practice in a sport reflected the total accumulated volume, but did not differentiate the “micro-structure” within the practice sessions (e.g., proportions of technical skills exercises, physical conditioning, playing forms, stretching, etc.).
The model’s accuracy is assessed by means of a 10-fold cross-validation and the subsequent averaging of the respective areas under the receiver operating characteristic curves (AUROC). Furthermore, the relative importance of the model’s features as well as two-dimensional partial dependency plots—implementation was done by the “pdp” package version 0.6.0 (Greenwell, 2017) under the R environment—for four selected variables is presented. Last but not least, the interaction effect of main-sport practice and other-sports practice in the age category 15 to 18 years is investigated by using a three-dimensional partial dependency plot. 13 We acknowledge that the investigation of an interaction effect would require further analysis. Therefore, we want to emphasize that the results of this study in the mentioned context are a first step. The intention was more to demonstrate the potential of the procedures used for data analyses and less to investigate this question in detail.
Results
The mean AUROC is 0.60 (±0.06). Thus, the model hardly succeeds in making a prediction better than chance (0.50). The relative importance of the model’s features is presented in Figure 2.

Importance of features in the model.
Interestingly, age of athletes at the time of the survey is the model’s most important feature. This clearly indicates the necessity of controlling for age when analyzing data; otherwise, results may be age-biased. The two-dimensional partial dependency plot of this feature (Figure 3) shows the older the athlete at this time, the higher his probability to be classified as successful/international-level athlete. There seems to be a weak indication for the existence of a negative correlation between the age at first admission to an OTC and the probability to be classified as successful/international-level athlete. In particular, the findings for main-sport practice at 15-18 years demonstrate the need to take nonlinear relationships into account.

Two-dimensional partial dependency plots.
Although the model’s performance is very weak, a three-dimensional plot for investigating an interaction effect of main-sport practice at 15-18 years and other-sports practice at 15-18 years is presented in Figure 4. This will show the potential of the machine learning approach applied for data analysis in the context of talent development in sports.

Three-dimensional partial dependency plot for investigating an interaction effect of main-sport practice at 15-18 years and other-sports practice at 15-18 years.
Keeping in mind the model’s weakness, Figure 4 shows that there does not seem to exist any interaction effect of main-sport practice at 15-18 years and other-sports practice at 15-18 years. This does not seem surprising, considering that other-sports practice in the age category 15 to 18 years has a relative importance of 2.4% in the (weak) model. Other-sports practice at 15-18 years seems to have no influence; this is also demonstrated by the flat course of the feature in the two-dimensional plot (cf. Figure 3).
Discussion
This article addresses one of the central questions in connection with talent development in sports, namely how relevant practice in main sport and other sports absolved in childhood and adolescence are for international success, and reanalyzes an already existing data set with a more efficient procedure to answer this question.
As our review of literature reveals, there seems to exist a kind of terminological confusion at the output as well as the input side of the function for producing sporting success. This might increase the risk of extrapolating results or drawing unjustified conclusions in terms of the validity of theoretical models due to insufficient operationalization. The detailed analysis of literature in conjunction with our restrictive approach clearly shows that there is a lack of empirical studies questioning the role of the relevance of practicing in childhood and adolescence for sports talent development and achievement of international success. Remarkably, out of the few studies only some applied statistical procedures able to control for confounding variables (e.g., age), and no study applied procedures such as hierarchical discriminant function analyses or regression analysis. 6 Furthermore, it has to be said that there exists a lack of empirically verified theories and therefore research in this field has to be described as explorative. These considerations demand not only for a multivariate approach for data analysis, but also for procedures not starting with assuming an appropriate data model. That is why we have adopted a machine learning approach.
Despite the application of the XGBoost—seemingly one of the most promising procedures in supervised machine learning models (Song, Chen, Deng, & Li, 2016)—it must be noted that on the basis of the features used in this study the model is hardly capable of making a correct classification and thus differentiation between international-level and national-level athletes. Interestingly, the age of the athletes at the time the survey was conducted is the feature with the relatively highest influence.
The fact that age is the most important predictor in a model hardly able to correctly classify international-level and national-level athletes seems to indicate that the significant differences between international-level and national-level athletes in the context of main-sport practice (e.g., Güllich, 2014; Güllich & Emrich, 2006, 2014; Law, Côté, & Ericsson, 2007) and other-sports practice (Güllich & Emrich, 2006, 2014; Hornig, Aust, & Güllich, 2016; Vaeyens et al., 2009) may represent rather artifacts of uncontrolled age effects than variables that differentiate the two groups. In the case of this study, the mean age of international-level athletes was higher than that of national-level athletes. Notes of this kind can also be found in other studies. The fact that in five of the 10 studies included in the literature analysis the data were not sufficient to determine this seems to be extremely problematic. Furthermore, the literature review of this study showed that out of 10 existing studies only one reports a positive effect of main-sports practice in childhood and adolescence for international success. However, the analysis clearly showed that this study is problematic. The study’s comparison groups consisted of only six athletes each. Furthermore, athletes of two different nations (cf. Güllich, 2018) were compared, which means that we have to consider a possible effect of differences in the nations’ sport systems. Therefore, also the question whether Law et al.’s (2007) results that the successful athletes were just a bit over 2 years younger than the comparatively less successful athletes at the time of the survey is due to the respective sports or not must remain open at this point. Interestingly, this study is cited in several studies without considering its methodological weaknesses.
However, the results should not be interpreted in a way that assumes training has no effect; instead, the empirical results of this study indicate that main-sport practice and/or other-sports practice absolved during childhood and adolescence are not variables discriminating international-level and national-level athletes. In addition, the results illustrate the need to take nonlinear relationships into account in modeling.
Interestingly and in contrast to the results of this study, Güllich (2017) found significant differences in main-sport practice and other-sports practice absolved during childhood and adolescence, when comparing successful and relatively less successful athletes with small to medium effect sizes for main-sport practice and medium to large effect sizes for other-sports practice. What makes these results so interesting is the fact that the author applied a matching-procedure (with age being one variable for the procedure) and therefore has already controlled for a possible age-bias.
A potential source of the divergent results might be that success was operationalized in different ways. Compared with Güllich’s study (2017), we used a quite broad definition of success. In addition, the applied matching procedure reduced the sample size to n = 166, which makes it difficult to compare the two data sets—despite the fact that they originate from identical surveys. Furthermore, Güllich’s (2017) data analysis was restricted to a statistical comparison of groups, which means that neither potential nonlinear effects nor interaction effects were taken into account.
Given this first-time investigation of a (nonlinear) interaction effect, the importance of the results should not be exaggerated. We did not discover an interaction effect; however, it has to be said that only an effect within the same age category was investigated. There seems to be a weak indication for a negative correlation between the age at first admission to an OTC and the probability to be classified as successful/international-level athlete.
This study does have limitations. First of all, it has to be said that the model’s response variable is the athletes’ individual success and not the success at the collective level of a nation. Due to the fact that the type and volume of scholastic physical education lessons and of deliberate play in leisure (cf. Côté et al., 2007) were not recorded in the data set, implementing these variables in the model was not possible. However, a positive effect of deliberate play on senior success has not been demonstrated reliably to date (Güllich & Emrich, 2014).
Further predictors should be considered in future projects. However, the intention of the present article was not to create the best possible model, but to work on the question of how relevant juvenile engagements in main sport and other sports are for international success, taking into account further (control) variables (robustness check). In addition, it has to be said that, in this first step, no further cross-validations were made to improve the parameters of the model.
In addition to possible (small) further model improvements by parameter optimization, the question arises as to whether a narrower version of “success” would engender better results in classification. A next step could be to compare this model with a model based on a “narrower version” of success, comparable to that of Güllich (2017), as a response variable. A further model could be calculated based on Güllich’s (2017) data set, which was created by a matching-procedure (without using the matching variables as predictors). We should also apply different approaches to our classification problem, compare them and maybe combine them in an ensemble to possibly improve overall prediction (cf. Oppel et al., 2012). Perhaps, the procedure applied in this article is not optimal (basically, for “no free lunch theorems for optimization” compare Wolpert & Macready, 1997). Other projects seem to suggest that the relative performance of different procedures depends on the data at hand. Thinking in bigger steps, multifactorial research on “nature and nurture” as well as the incorporation of other factors such as contextual factors (e.g., socioeconomic factors; Coutinho et al., 2016) seems to be promising; even more with new approaches from machine learning to analyzing data, making it possible to develop more interactive and dynamic models. Furthermore, we should not only rely on “habitual factors” but implement “situative factors” (e.g., competition anxiety immediately before the event). Moreover, further features enable us to better exploit the potential of machine learning methods. In our first implementation (Barth & Emrich, 2018) we used more variables. However, we find it more valuable to limit our variables to those in childhood and adolescence as we wanted to work on the question of whether these variables have the potential to make predictions.
Moreover, against the background of the significant age differences found in this study, the question of whether this approach of operationalization does actually compare experts and nonexperts (relative approach) or rather experts and “not-yet-experts” arises. In general, the categorization in experts and nonexperts does not seem to do justice to the multidimensional, bipolar structure of a relative model. Completing athletes’ biographies by means of document analysis will not always be possible. In terms of the validity of the approach, however, this attempt seems desirable.
Conclusion
In the context of the specialization–diversification debate, the present results indicate that from today’s perspective there is a debate about a “production function,” the structure of which is still unknown to date. Even by implementing one of the seemingly most promising procedures in supervised machine learning (Song et al., 2016), we have not (yet) been able to discover a model with an acceptable detection rate.
Obviously, the institutionalized programs for talent development—in essence, expansion of the available training time and more intensive usage of the individual units of time (extensive and intensive time economy; for economics of time in training, see Emrich & Güllich, 2005; Fröhlich, Emrich, & Büch, 2007)—are an expression of highly rationalized myths rather than evidence-based efficient norms. The function of these myths for sports organizations thus seems to lie in the area of the legitimization function rather than the production function (for differentiation see, among others, Emrich & Güllich, 2005). Although, as stated above, we have to be careful in drawing conclusions on DP, the results seem to be in line with Hambrick, Burgoyne, Macnamara, and Ullén (2018), stating that “key claims of the deliberate practice view are not well supported by the available evidence” (p. 287).
It became evident that with every methodological progress there must be a willingness in the sense of organized skepticism to fundamentally revise previously provisionally confirmed correlations. Research, and especially research in a field that is, as shown above, widely theory-less and therefore explorative, must therefore be aware that generally only the contemporary state of error is reported. It also became clear, however, that the production of sporting success through the production of high performance for the purpose of standing out in international competitions is not only highly uncertain in its results, also there is obviously no really reliable knowledge of the means that can be used to achieve these goals.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
