Abstract
In recent years, there has been an increase in the scientific production of youth soccer. However, a panoramic map of research on this subject does not exist. The aim of this study was to identify global research trends in youth soccer over time, among the main levels of analysis: sources, authors, documents, and keywords. The bibliometric software Biblioshiny was used to analyze 2606 articles in Web of Science (WoS) published between 2012 and 2021. The main conclusion is that US and UK scholars dominate the research; the topics of research are changing with the real needs, and research on the topic of performance has been of interest to scholars; talent identification and development, performance, injury prevention, and concussion are the studies of interest to scholars in this area. This finding, which offers a global picture of youth soccer research over time, can help future research in this or similar domains.
Introduction
As the most popular sport in the world, soccer generated enormous social wealth and a sizable human market. 1 There are 270 million (4% of the world's population) active soccer participants worldwide, including referees and officials, according to the FIFA Big Count 2006. 2 Players’ payment had increased significantly as a result of the sport’s growing professionalization and commercialization, 3 as well as the need for clubs to succeed at a high level of competition. 4 According to the FIFA Global Transfer Report 2021, 5 while there were 185 transactions among FIFA’s 211 member associations in 2021, two fewer than in 2020, there were 4544 international moves by clubs, and the total transfer fee was USD 4.86 billion.
The requirement for excellent performance and rising costs have sparked a global arms race to find youth talent. 6 Additionally, knowledge about the performance of youth soccer players can help different professionals and how this information can be used in the future. Considering the definition of youth, according to the World Health Organization and UNICEF age definition criteria for youth, it refers to the group of 15 to 24 years old. However, there are differences in the age range of adolescents in soccer-related studies, which makes it difficult to form age group criteria for some indicators. 7 In a review from McCalman, 8 the age range of adolescents was 9–23 years old, whereas in another study, 9 the starting age of this age group was 7 years old. Given the difficulty in defining the age range in the current study, our literature was selected by reviewing only articles whose titles contained the word youth. To the best of our knowledge, studies of youth soccer have targeted groups that are more likely to be players with the potential to become professionals rather than recreational and amateur soccer groups in the same age range.
Inspired by deliberate practice theory, 10 researchers have conducted numerous studies related to youth soccer. These studies encompass various aspects such as talent identification and development (TID) in soccer,11,12 which focuses on the process of recognizing and nurturing young players’ potential. Additionally, the relative age effect13,14 refers to the phenomenon where players born earlier in the selection year tend to have a competitive advantage due to age-related physical and psychological differences. Furthermore, researchers have also explored incident injuries15,16 in youth soccer, which refers to injuries that occur during matches or training sessions. Understanding the causes, patterns, and consequences of such injuries is crucial for ensuring the safety and well-being of young soccer players. In addition to these areas of study, researchers have also examined match performance, 17 among other factors, to gain insights into various aspects of youth soccer.
Through literature searches, it is possible to observe the growth in linked-topic research. Since more academic articles being published every year, it is getting more and more difficult to keep up with everything that is being published. 18 Therefore, it's difficult to observe some information implied behind the research from a holistic perspective, such as research trends, hot spots, collaborator networks, citations, etc. To illustrate the knowledge structures and dynamic evolution of a particular research subject, bibliometric analysis uses statistical methodologies which can give a detailed overview of a study area, topic, or issue as a quantitative method. 19 Enlightened by previous studies, we aimed at mapping youth soccer literature by analyzing the influence of various nations, organizations, writers, journals, and terms using the Biblioshinny in the present study. The study’s findings may offer broad perspectives and recommendations about the current state of youth soccer research worldwide.
Methodology
The bibliometric method is widely used in multiple fields, such as climate change, 20 patent mining, 21 sports, 22 with commonly used software, for instance, citespace, 23 VOSviewer, 24 SciMAT, 25 and R-tool. 26
The purpose of this article is to evaluate publications, citations, and information sources addressing the developments in the field of youth soccer by following the steps of a bibliometric study. To characterize the bibliographic data, number of publications, citations, authors, and citations per nation, this paper employs bibliometric indicators. The study also examines the intellectual frameworks that show how articles are received by the scientific community as well as the social networks of authors, universities, and collaboration between nations.
The biblioshiny, an R package, was used for statistical processing. The process was divided into two parts:
Data collection; and Bibliometric analysis.
Data collection
In this process, published research were retrieved in the Web of Science (WoS) core collection by Clarivate, the most widely recommend 27 and reliable 28 international bibliometric database. The search strategy was used: TS(topic search) = (youth OR juven*) AND ((football OR soccer)NOT rugby) and time span = Jan.1 2012–Dec.31 2021. This search took place on Nov.15 2022.
In order to enhance the precision of this investigation, the screening process of the retrieved papers was conducted by BL and CJZ independently based on title and abstract, with oversight from BG, to ascertain adherence to the predetermined inclusion and exclusion criteria. When there is a discrepancy between the two reviewers, an internal discussion is first conducted to reach a consensus. If a disagreement persists after the discussion, a third reviewer, BG, will make the final decision. The inclusion criteria for this study were as follows: (1) the articles were required to have a specific focus on the domain of youth soccer; (2) only articles written in English were considered eligible. On the other hand, the exclusion criteria were as follows: (1) records pertaining to announcements and book reviews were excluded, with a focus solely on regular research papers; conference papers were also excluded to maintain the focus on journal distribution analysis; (2) studies that primarily concentrated on competitive youth soccer populations were included, while those focusing on recreational soccer populations were excluded. Total records retrieved were 2803, reviews, books and other languages were eliminated as we want to mapping the articles written in English, then the final number of articles recorded between 2012 and 2021 in this work is 2606 (Figure 1).

Data collection flow diagram.
Analysis
In the analysis process, bibliometric techniques were used to extract networks of different levels (Table 1). Co-citation analysis was first put up by American intelligence scientist Small as a technique to gauge the degree of link between documents. Two (or more) papers that are simultaneously cited by one or more subsequent papers are said to constitute a co-citation relationship. 29 For example, literature A cites both literature C and D. At this point, literature C and D are cited together. The number of documents that cite both is called co-citation intensity, and in this case the co-citation intensity is 1, because only document A cites both C and D. Co-word analysis is a technique for examining the actual content of a publication itself. Words in co-word analysis are usually derived from “author keywords,” but in its absence, notable words can also be extracted from “article title,” “abstract,” and the words of interest can also be extracted from “article title,” “abstract,” and “full text” for analysis. Similar to co-citation analysis, co-word analysis assumes that words that frequently occur together are thematically related to each other. 30 A co-author network is a common form of scientific collaboration network, which is a network established through the cooperative relationship of the authors of a paper. If two authors co-authored one or more papers, a contiguous relationship is created between the two authors. Co-authorship networks can provide a powerful tool for the study of the structure of academic relationship networks. 31
Specifications of the analysis.
Source: Adapted from “Bibliometrix: An R-tool for comprehensive science mapping analysis.” 18
Result and discussion
Research trends
According to the scientific production associated with youth soccer from 2012 to 2021 (Figure 2) associated with bibliometric analysis (Table 2), studies in the last decade have grown rapidly at an annual rate of 14.86%. As is shown in Table 2, during the previous decades, 2455 papers were written by 7077 authors worldwide. Interestingly, the analysis revealed large figures for co-author per document and international co-authorships. This indicates that the authors in this field of study prefer to collaborate and that there is a high percentage of international collaboration, which is in line with the trend of globalization of soccer.

Growth in sources.
Main information of the papers.
Sources
With regard to sources, the most productive journals are shown in Table 3. In the past ten years, the number of articles published in the Journal of Sports Science and Journal of Strength and the Conditioning Research were exceptionally high, 168 and 138, respectively, accounting for 6.8% and 5.6% of the total number of articles published.
Top 10 productive journals in youth soccer.
With regard to the most local citation sources, 15 journals were cited more than 1000 times. Journal of Sports Science, Journal of Strength and Conditioning Research, and British Journal of Sports Medicine lead the top tree, with the number of citations being 6870, 5426, and 4470, respectively, far exceeding that of peer journals. The H-index ranking of these three journals has not changed since the perspective is turned to Source Impact. An interesting discovery is that although Sports Medicine ranks 18th in the number of publications (with a number of 36), its H-index is the same as that of British Journal of Sports Medicine with a value of 18. As such we can infer that although Sports Medicine does not publish a high volume of articles in youth soccer, they are all of the high quality.
When concerning the Source Dynamics of the top 10 journals (Figure 3), the cumulative publication volume of most journals showed a steady growth trend except two. The British Journal of Sports Medicine was firmly in third place in terms of cumulative number of publications until 2016, however, it was overtaken by seven journals including the International Journal of Environment Research and Public Health, International Journal of Sports Science & Coaching and Journal of Human Kinetics. The journal that has seen the most increase in the number of papers published in the area of youth soccer over the past few years is the International Journal of Environment Research and Public Health, ranking from tenth in 2019 to the third in 2021, closely following the Journal of Sports Science and Journal of Strength and Conditioning Research.

Source dynamics of the top 10 productive journals.
Three clusters were discovered in the source co-citation network (Figure 4). In general, the journals at each cluster's furthest edges show a weaker relationship to the other cluster, while the size of the nodes shows the intensity of connection. 32 It is worth noting that cluster 3 is located in the middle of the other two clusters, connecting the other two clusters together, in addition to the fact that cluster 3 shows an advantage, both in terms of betweenness and number of publications. Analysis shows that more than half of the top ten productive journals are located in cluster 3, led by the Journal of Sport Science and Sports Medicine. Although the number of publications in cluster 1 is not greater compared to cluster 3, the betweenness of cluster 1 yields a larger number than other clusters, which is created by the British Journal of Sports Medicine and Medicine & Science in Sports & Exercises. On the contrary, the journals in cluster 2 are inferior to the other two clusters both in terms of number of publications and centrality.

Co-citation network of sources.
According to the titles of the journals, the red cluster is dominated by sports medicine, such as the British Journal of Sports Medicine, American Journal of Sports Medicine, Clinical Journal of Sports Medicine, etc. These journals are more inclined to publish research on medical topics, such as the ECG of high-level junior soccer players, 33 the left ventricular function of pre-adolescent soccer players, 34 head impacts. 35 Journal of Sports Sciences, the Journal of Strength and Conditioning Research and Sports Medicine are in the green cluster, are more focused on the topics of physical fitness, 36 training, 37 and sports medicine. 38 The blue clusters are dominated by sports psychology topics such as motivation, 39 attitudes toward moral decision, 40 represented by journals such as Psychology of Sport and Exercise, Journal of Applied Sport Psychology, Sport Psychologist.
Authors
Regarding the analysis at the author level, the results display first the most relevant authors and some bibliometric indicators (Table 4), secondly the author collaboration network, finally the authors’ production over time (Figure 5) and co-citation network (Figure 6).

Top authors’ production over time.

Author collaboration network.
Most relevant authors.
As can be observed in Table 4, there is consistency in the tendency of the number of authors’ publications and the fractionalized frequency. When comparing the H-index and the total citations, the figures show some fluctuations. Clemente FM ranks fifth in the top 10 over the last decade, but it has a low citation number of merely 427, in contrast, Vaeyens R has a publication of 30, but it has a high H-index of 21, only 2 points lower than the top two authors. When analyzing the total citations of him, the same high number is observed, which indicates that this author has not only a quantitative advantage but also a higher quality of articles.
Top-authors’ productive over time can effectively reflect the time distribution and quantity characteristics of the authors’ articles over a certain period of time. As shown in Figure 5, five authors such as Lloyd RS, Oliver JL, and Malina RM are both productive and continuous, they have had an annual output of papers for the past decade and have sustained a high total citation. Clemente FM has only been involved in the field of youth soccer since 2016, however, he has the best performance in terms of output and total citations in the last two years. He has 28 research outputs in 2020 and 2021, which explains why he has higher outputs, but does not rank high in total citations in most relevant authors.
By analyzing the collaboration network, a total of 10 clusters were generated and most of them were interconnected with each other, except for cluster 4 and cluster 10. Cluster 1 had the largest number of highly productive authors, accounting for half of the top ten, and the author studies in this cluster focused on maturation, 41 relative age effects, 42 mental fatigue, 43 and injuries 44 in youth soccer players. Although the number of authors in cluster 2 was not as high as in cluster 1, the authors with the highest number of publications Lloyd RS, Oliver JL, and Myer GD were in cluster 2, and their research topics focused on youth physical development model, 45 injury risk screening, 46 bio-banding, 47 and neuromuscular control. 48 Authors in cluster 3 are keen on themes related to small-side games, 49 plyometric training, 50 and physical fitness training. 51
Country collaboration, production, and citation
The country cooperation social networks, national and international collaboration, as well as productivity and citation networks, were investigated to analyze the nations with the highest levels of production based on the country of authors.
By analyzing the network of collaboration between countries, Figure 7 shows the collaboration network among the top 30 countries, the size of the nodes reflects the amount of collaboration, the larger the nodes the more collaboration, while the thickness of the linkage reflects the number of collaboration, the thicker the linkage, the greater the number. Countries with the most collaboration with others are the United States (US) and the United Kingdom (UK). The UK has collaboration between 47 countries, while the figure for the US is 44. The most frequent collaborations between countries were between the US and the UK, as well as the UK and Australia, with 86 and 60 papers collaborating, respectively.

Country collaboration network.
With regard to the most cited countries (Table 5), the USA and the UK show dominance again. The USA had the highest total number of citations at 11,289, followed by the UK at 7929. Canada, Australia, and Spain are in third to fifth place, but their total citations are further away from the top two. An interesting phenomenon was discovered when inspecting the average citations per year (ACP), only Switzerland, Netherlands, and Canada of the top 10 countries in terms of total citations appear in the top 10 ranking of the ACP. Colombia, Kuwait, and Nigeria ranked in the top three in the ACP, notwithstanding the fact that their total citations were not as high.
Most cited countries.
TC: total citations; ACP: average citations per year.
Cited local documents
The most 10 local cited documents (Table 6) are mainly from Journal of Sports Sciences, British Journal of Sports Medicine, Annals of Biomedical Engineering, Strength and Conditioning Journal, and Journal of Strength and Conditioning Research. Taking into account of the document clusters coupled by abstracts, two clusters were identified (Figure 8). The literature in cluster 1 (blue cluster) is centered on Sarmento H 52 and focuses on research related to youth talent identification. Compared to cluster 1, scholars are more focused on injury prevention in football, with Steffen as the main representative. 53 Cluster 2 has a high impact factor and centrality observed from the axes, indicating that this is a well-defined and long-running set of research topics. Despite the low impact factor and centrality of cluster 1, Cobo M J 54 advocates that a dynamic analysis of the literature in this zone is needed to understand their contribution. Inspired by this claim, we analyzed the normalized local citation score of Sarmento H, which was as high as 10.24, a figure that represents a much higher local citation than his peers.

Documents coupling.
The most local cited documents.
LC: local citations; GC: global citations.
Keyword
The most frequent words used by the authors are soccer (football), youth, concussion, youth (sport), concussion, adolescents, talent identification, performance, injury prevention, and so on (Figure 9). An analysis of research trends (Figure 10) shows that the number of studies on themes velocity, blood lactate, high-intensity running, and aerobic endurance have decreased over the years since 2013, and for comparison, the hot spots themes in recent years are head impact, coach-athlete relationship, and performance.

Word cloud of youth soccer.

Trend topic of youth soccer.
According to the co-word analysis (Figure 11), four clusters were found, reflecting four main areas of research. As the co-occurrences network was extracted from the keywords of the authors, therefore, the analysis provides an understanding of the research areas that academics are focusing on.

Keywords co-occurrences network.
In cluster 1, scholars have conducted research mainly on the TID of youth soccer. It is revealed that the broad topic of TID is a vibrant area of research, ranging from a large number of books, systematic reviews, 2018, narrative reviews, and empirical papers published in the last 20 years. 55 The sub-themes in this area are predictors for selection of the youth players, 56 relative age effect, 57 biological maturity, 58 and methodological issues. 7
The studies in cluster 2 are relatively comprehensive, as the most centralized betweenness words are in this cluster, such as soccer, football, and youth. This cluster contains both sports performance topics in cluster 3 and covers injury prevention studies in cluster 4. Additionally, this cluster contains studies related to training 36 and fatigue. 59
The theme of cluster 3 is sports performance. In terms of physical performance, indicators such as speed, 60 strength, 61 and agility 62 are included. A common testing protocol is to use Global Positioning System 63 and other devices to monitor heart rate 64 and other relevant athletic indicators 65 during small-side games, 66 which refer to soccer matches played with fewer players on a smaller field, promoting enhanced player involvement, skill development, and tactical understanding.
Cluster 4 has fewer nodes, compared to other clusters. However, in this cluster, the research is more focused on the theme of concussion in youth soccer. 67 Compared to other sports injuries, concussions are not as easily noticed and have a significantly negative impact on players’ health. Concussion risk has been a hot topic in soccer, with several retired players falling ill from the effects of head impacts in recent years, 68 which has sparked public concern. Research hotspots in the field of youth soccer regarding concussion include prevention and identification strategies, 69 clinical management and rehabilitation, 70 long-term consequences and monitoring, 71 education and awareness, 72 and policy development. 73
Theme map and thematic evolution
Thematic map was analyzed to understand the research themes (Figure 12). The results of the analysis show that studies related to youth soccer over the decade can be divided into five thematic clusters, and their labels for representative are children, performance, reliability, perception, and epidemiology. To better understand the trend of theme change, we analyzed the thematic evolution of the publications (Figure 13). There is an increasing trend in the number of subjects, with themes developing from four in the first phase to six in 2021. It is noteworthy that themes of performance exist at all phases, and the proportion of studies shows an upward trend. Research topics in this field have evolved from epidemiology, football, sport, and performance in 2012 to performance, reliability, concussion, risk-factors in 2021.

Thematic map.

Thematic evolution.
Conclusions
This bibliometric study examined global trends in scientific production on youth soccer from 2012 to 2021. The analysis revealed an increase in the number of sources, authors, and documents, indicating growing interest in youth soccer research. Collaboration among authors increased, with a significant international collaboration rate of 38.13%. Journals such as Journal of Sports Sciences and Journal of Strength and Conditioning Research played prominent roles in publishing youth soccer articles. Key authors like Lloyd RS, Malina RM, and Clemente FM contributed significantly to the field. Important research themes focused on talent identification, physical fitness, and sports injury prevention in youth soccer players. The findings highlight the international scope and evolving research topics within the field of youth soccer. Future research in this area could explore emerging areas such as head impact, coach-athlete relationships, relative age effects, body composition, monitoring, and performance, further advancing our understanding of youth soccer and its implications for player development and injury prevention.
Limitations
This study has several limitations that should be acknowledged. Firstly, the literature search was conducted using the WoS database, which may result in the omission of relevant publications not indexed in WoS. Although efforts were made to minimize this limitation by using one of the most influential databases, it is possible that some relevant studies were not included in the analysis. Secondly, it is important to note that the term “youth” is commonly used in international journal publications to refer to the age group of interest in youth soccer research. However, the age range associated with the term “youth” varies across different countries and contexts. This poses challenges in accurately defining the age range in this study. It is recommended that future research on youth soccer clearly report the specific age range under investigation to ensure consistency and comparability across studies. Despite these limitations, the findings of this study provide valuable insights into the trends and research themes in the field of youth soccer. The limitations do not undermine the overall assessment of the trends or the implications for future research in this area.
Footnotes
Acknowledgements
The authors thank the National Social Science Foundation of China for funding this study.
Contributorship
Bo Gong proposed the research design, Bo Liu conducted the article, and Changjing Zhou and Haowei Ma proposed revisions to the paper. All authors participated in the revision and finalization of the manuscript, and read and agreed to the published version of the manuscript.
Consent statement
All contributing authors of this manuscript have given the submission consent. In addition, patient consent is not necessary for this manuscript, as it is a bibliometric study.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical approval
Not applicable, because this article does not contain any studies with human or animal subjects.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Social Science Foundation of China (grant number 20BTY089).
Guarantor
BL.
