Abstract
Statistical methods are essential in sports sciences for decision-making in performance analysis, injury prevention, and athlete outcomes. Generalized Linear Mixed Models (GLMMs) are widely used to estimate fixed and random effects, particularly when dependent variables are binary, ordinal, count, or non-normally distributed quantitative data. Alternative models, such as Vector Generalized Additive Models (VGAM) and transformation mixed-effects models (tramME), may also be appropriate for specific data structures, especially in repeated measures contexts. This scoping review, following PRISMA guidelines, examines the use and reporting of GLMMs in sports sciences. A search of articles published before March 4, 2023, identified 55 studies from databases such as PubMed and Web of Science. GLMMs were primarily applied in soccer (20%) and multidisciplinary sports (16.4%). The most common response variable distributions were Poisson and Binary (25.7% each), while overdispersion was not evaluated in 75% of studies. R was the most frequently used software (41.8%), but only 34.3% of articles specified the statistical package. Data and/or code sharing was reported in 17.1% of articles. Most important information about GLMMs was not reported in most articles, indicating a need to improve the quality of reporting in line with current recommendations for the use of GLMMs.
Introduction
Sports Statistics has emerged as a field of considerable interest, spurred by the availability of resources and the drive to enhance sports performance and management. Notably, the American Statistical Association (ASA) established a Section on Statistics in Sports in 1992, responding to the growing demand for the development and application of statistical methodologies within the sports domain (https://community.amstat.org/sis/home). Similarly, the Special Interest Group in Sports Statistics of the International Statistical Institute (https://www.isi-web.org/committee/special-interest-group-sports-statistics) aims to promote global understanding, development, and best practices in sports statistics. The creation of the SCORE Network (https://scorenetwork.org/) in 2023 represents an important step forward, aligning with the burgeoning interest in sports analytics. Numerous journals, including Chance, The American Statistician, and The Statistician, routinely feature articles on the statistical analysis of sports data. The inception of the Journal of Quantitative Analysis in Sports in 2005 marked a relevant advancement, establishing the first academic journal dedicated exclusively to the statistical analysis of sports. Conferences organized by MathSport International, Joint Statistical Meetings (JSM), and seminars/webinars by the S-Training group (Sports – Training and Research in Data Science Methods for Analytics and Injury Prevention Group, S-Training [https://s-training.eu/Homepage.html]) further solidify the significance of sports statistics within the scientific community.
Sports and Statistics have a long history, evolving into sports analytics, a field that has grown significantly in recent years. Sports Biostatistics, a subfield focused on applying statistical methods to sports data related to injury prevention, athletes’ health, and performance, has made notable contributions (Casals and Finch, 2017; Knudson, 2020; Sainani et al., 2021; Vagenas et al., 2018). The exponential growth of this discipline is reflected in its advancements in statistical modeling, concepts, and metaresearch (Bullock et al., 2022; Mansournia et al., 2021; Nielsen et al., 2017; Nielsen et al., 2020a, 2020b; Sainani et al., 2021; Schulz et al., 2022; Warmenhoven et al., 2025). The demand for professionals with expertise in this area is increasing as sports organizations and academies seek to capitalize on the vast amounts of data available for competitive advantage (Evans et al., 2016; Miller, 2015; Mondello and Kamke, 2014; Reddy, 2023).
As is common in other fields, sports science problems are typically approached with objectives that can be exploratory, predictive, or causal (Nielsen et al., 2020a). The complexity of research designs needed to adequately test relevant hypotheses often results in hierarchical data structures. In sports and exercise medicine research, it is common to encounter players or athletes observed within larger units such as teams, games, competitions, and seasons. This data structure is also prevalent in longitudinal studies where some of the variables are collected as repeated measurements within subjects.
Generalized Linear Mixed Models (GLMMs) are an extension of Generalized Linear Models (GLMs) that incorporate random effects into the linear predictor (Breslow and Clayton, 1993). GLMMs offer a more flexible analysis procedure when the response variable follows a non-Gaussian distribution, and the assumption of independence is violated. They allow modeling of grouped data through random effects. Moreover, GLMMs are valuable for addressing issues such as overdispersion (McGilchrist, 1994) and autocorrelation in Poisson or Binomial distribution models. In the literature, GLMMs are also referred to as hierarchical generalized linear models (HGLMs) and multilevel generalized linear models (MGLMs), depending on the research field in which they are applied. Other statistical methods alongside GLMMs, such as gradient boosting, random forests, and neural networks, also provide alternative approaches for analyzing non-Gaussian outcome variables with mixed effects, though from a different perspective. These methods, which fall under the domain of statistical learning, offer additional flexibility and are especially useful when dealing with large and complex datasets. Non-parametric models like these can capture intricate patterns within the data, making them viable alternatives in cases where traditional GLMMs might face limitations.
In sports, GLMMs have been used in various disciplines. For example, they have been applied to identify variables that most influence scoring in NBA players by fitting a GLMM assuming a Poisson-distributed response variable (points scored) (Casals and Martinez, 2013). Similarly, they have been applied to analyze whether the type of competition (club level, state, or international) affects the intensity of women's rugby matches (Newans et al., 2022), with the response variable (number of fouls committed) following a Poisson distribution. Furthermore, studies in American football and basketball have used GLMMs to model normal or Poisson responses with binary predictions (Broatch and Karl, 2018). However, we suspect that while GLMMs have become more accessible through statistical software like R, JASP, Jamovi, and SPSS, they are often applied without a comprehensive understanding of the model's assumptions, specification, and validation. Studies such as (Newans et al., 2022) and (Iannaccone et al., 2021) highlight the advantages of mixed models in handling hierarchical and repeated measures data, but they also suggest that these models are sometimes used as a default statistical approach, without adequate consideration of key issues such as overdispersion, multicollinearity, and proper model validation. This misuse of GLMMs can lead to biased or misleading results, as discussed by (Bolker et al., 2009). The necessity for rigor in applying these models has been highlighted across fields, including medicine and psychology (Bono et al., 2021; Casals et al., 2014).
Despite the widespread use of GLMMs in sports, there has been no comprehensive review of their application and the quality of reporting in this domain. To our knowledge, this study represents the first scoping review of GLMM application in sports sciences. The aim is to map the existing literature on this topic, identify research gaps, and provide an overview of the available evidence. By doing so, we seek to inform best practices in GLMM reporting and foster a more rigorous approach to their use in sports research.
Methods
Study design
We carried out a scoping review on the application of GLMMs models in the sports sciences, following the guidelines of the PRISMA Extension for Scoping Reviews (PRISMA-ScR) statement (Tricco et al., 2018).
Search strategy
We conducted a comprehensive scoping review search across multiple databases, including PubMed Central (PMC), Web of Science (WoS), and specialized sports statistics journals not indexed in the previous databases, such as the Journal of Quantitative Analysis in Sports (JQAS) and the South African Statistical Journal (SASJ). In all databases, we applied the same Boolean operators and keywords:
((“sport*”) AND (“mixe*” OR “multilevel*” OR “hierarch*” OR “generali*” OR “GLMM*” OR “HGLM*” OR “MGLM*”) AND (“ordinal” OR “binary” OR “count” OR “nominal” OR “categorical” OR “dichot*” OR “polyto*”)).
The final search was performed on March 14, 2023.
Selection of studies
To determine the eligibility of the studies, the following inclusion and exclusion criteria were defined:
Inclusion Criteria: Studies were included in the scoping review if they were original research articles, written in English, that applied GLMM regression models within the context of sports sciences. Exclusion Criteria: Excluded from the review were conference proceedings, theses, dissertations, and non-original articles (such as special issues, opinions, and reviews). Additionally, articles written in languages other than English and those unrelated to sports sciences were excluded. It is important to note that activities classified as physical activity, exercise, or recreational sports were not considered part of sports sciences for this scoping review.
The study selection process involved two stages. First, studies were screened based on their titles and abstracts. In cases where the title and abstract were insufficient to determine eligibility, the full text of the articles was reviewed to ensure they met the inclusion criteria.
Identification of studies
Figure 1 shows the PRISMA-SCR flowchart (Tricco et al., 2018) to summarize all the stages of the selection process. In the first phase, a total of 4341 articles were collected using the Boolean terms and keywords mentioned above. These articles belonged to different databases: PubMed Central PMC (n = 2611), Web of Science (n = 1672), Journal of Quantitative Analysis in Sports (n = 49) and South African Statistical Journal (n = 9) Then, 45 duplicated articles were excluded. After inspection of the abstracts, articles that were non-original (n = 316), were written in a language other than English (n = 17) and covered a non-sports topic (n = 3717) were excluded. In a third review phase, the full text versions of 246 potentially eligible articles were obtained and 191 articles were excluded for not using GLMM methodology. Finally, 55 articles were included for a detailed review (Table S1 of the Supplementary Material).

PRISMA flowchart of the scoping review of the application of GLMMs in original articles in the field of sports science.
Data extraction
The information collected from the selected studies was categorized into three main groups: 1) general characteristics of the articles; 2) characteristics of the sport itself; and 3) characteristics of the GLMM regression models. The first group included information such as authors, the country of origin of the data, the year of publication, the name and type of journal, the number of participants and their age, the purpose of the study, and whether the article was open access (Table S2 of the Supplementary Material).
The second group provided variables such as the type of sport, the gender of the participants (male, female, or both), the category of participants (professional, amateur, or both), the source of the data, and the specific field of sports science addressed in the article (e.g. sports performance analysis, sports technology, movement integration and health) (Table S3 of the Supplementary Material).
For the last group, we extracted variables related to the inference and estimation methods reported in the studies, as well as details on model specification, validation, and construction (Table S4 of the Supplementary Material). In line with our study objectives, special attention was given to aspects that could reflect either suboptimal methodological implementation or insufficient transparency in reporting—or both. For instance, the specification of fixed and random effects is a key modeling decision, but when not clearly documented, it may also raise concerns about reproducibility. All data were collected and stored in a database for further analysis.
Statistical analysis
A comprehensive descriptive analysis was conducted to explore the recorded information from the systematic review. For categorical variables, frequencies and percentages were calculated to provide a summary of their distribution, while for continuous variables, the median and interquartile range (IQR) were used. The results are presented in both tabular and graphical formats. The statistical analysis was performed using R version 4.4.1 (R Core Team 2022). The reproducible code used in this study is available on a publicly accessible GitHub repository (https://github.com/marticasals/GLMM_SR_Sports), ensuring transparency and reproducibility of the analysis. This allows other researchers and practitioners to reproduce the study's findings, build upon the work for further research and updating the analysis in the future, by including new studies published after the date of our final search (March 14, 2023).
Results
General characteristics of the articles
Table 1 presents the general characteristics of the selected articles. In total, 32 articles (58.2%) were published in sports journals, 19 articles (34.5%) in multidisciplinary journals, and 4 articles (7.3%) in statistics journals. Regarding the study design, 43 articles (78%) employed a multilevel design. Additionally, 35 articles (63.6%) were longitudinal/repeated measures studies, 5 articles (9.1%) were cross-sectional studies, and 15 articles (27.3%) did not report their study design. The median number of participants in the data from the reviewed articles was 130, and the median age of the participants was 22.5. Moreover, 36 articles (65.5%) were published as open access, while 19 articles (34.5%) were not. The majority of the data (20 articles, 36.4%) originated from the USA, and 15 articles (27.3%) were published in the Journal of Quantitative Analysis in Sports, see Figure 2.

Distribution of the countries where the data from the articles come from (panel (A)) and the name of the journal where the articles were published (panel (B)) among the 55 selected articles.
General characteristics of the 55 selected articles in the scoping review, frequencies (percentages) for categorical variables and median (IQR) for continuous variables are shown.
aNot Reported = 9; bNot Reported = 35.
Figure 3 presents the annual distribution of the reviewed articles, suggesting an overall increasing trend in publications using GLMM in sports sciences. A noticeable rise is observed from 2015 onwards, with a peak in 2021. While 20.0% of the articles were published up to 2014, this proportion increased to 29.1% in the most recent periods (2018–2020 and 2021–2023).

Number of articles per year of publication (2011–2023).
As shown in Figure 4, the majority of the selected articles (35/55, 63.6% articles) had a longitudinal (repeated measures) design, and 43 articles (73.2%) used a multilevel design. Furthermore, 23 articles (41.8%) used the statistical software R for their analyses.

Distribution of study designs (panel (A)), use of multilevel models (panel (B)), and statistical software employed (panel (C)) across the 55 selected articles.
General characteristics of the sports
In terms of the sports data analyzed, the three most predominant sports were soccer (11 articles, 20%), multidisciplinary sports (9 articles, 16.4%), and baseball (7 articles, 12.7%). A total of 39 articles (70.9%) focused on sports performance analysis, 13 articles (23.6%) focused on health, and 3 articles (5.5%) focused on academic performance. Regarding the participants, 35 articles (63.6%) studied professional athletes, while 18 articles (32.7%) involved amateurs. In terms of gender, 3 articles (5.5%) included only female participants, 27 articles (49.1%) included only male participants, 15 articles (27.3%) included both female and male participants, and 10 articles (18.2%) did not report the gender of the participants (Table 2). Figure 5 illustrates the distribution of the sports analyzed (panel A), the gender of participants (panel B), and the participant category (panel C) across the selected articles in the review. These visual representations complement Table 2 by providing a detailed breakdown of the predominant sports studied, the focus on male participants, and the higher proportion of studies involving professional athletes. Together, these insights highlight key trends in the application of sports science methodologies.

Distribution of the sports disciplines studied (panel (A)), participant gender (panel (B)) and the professional category of the participants (panel (C)) according to the type of sport category in the 55 selected articles.
General characteristics of the sports in the 55 selected articles in the scoping review. Frequencies (percentages) are shown.
General characteristics of the GLMMs models
Table 3 presents the general characteristics of the GLMM models from the 55 selected articles. Fifty articles (90.9%) did not report the estimation method used, 3 articles (5.5%) used Gauss-Hermite quadrature method, an article (1.8%) reported that they used Markov Monte Carlo Chain methods and another article (1.8%) used penalized quasi-likelihood Laplace approximation estimation method. Twelve articles (75%) out of 16 articles that used Poisson- and multinomial-distributed response variables did not assess overdispersion in their variables. Additionally, 50 articles (90.9%) did not provide information on whether any validation was performed for their GLMMs. Most of the articles did not share their data or code; only 9 articles (16.4%) and 8 articles (14.5%) did so, respectively.
General characteristics of the GLMM models of 55 selected articles in the scoping review. Frequencies (percentages) are shown.
Furthermore, seven articles (12.7%) did not report the distribution of the response variable; 17 articles (32.7%) used binary responses, 10 articles (18.2%) used Poisson-distributed responses, 8 articles (14.5%) used normal responses, and 6 articles (10.9%) used ordinal responses (see Table S5 of the Supplementary Material). 43 articles (78.2%) did not report the method used for model selection, see Table S5 of the Supplementary Material.
Discussion
To the best of our knowledge, this is the first review focusing specifically on the application and reporting of GLMMs within the sports sciences. The sports sciences field, which remains relatively unexplored in terms of methodological rigor for GLMMs, greatly benefits from statistical methods that handle complex data structures. Our study aims to address this gap by offering an overview of how GLMMs are currently used and reported and providing insights into the appropriate analysis and communication of such models.
The results of this review suggest an increasing trend in the use of GLMMs in sports sciences over the years (Figure 3). This mirrors trends observed in other disciplines, such as clinical medicine and psychology (Bono et al., 2021; Casals et al., 2014), where GLMMs have gained popularity due to their versatility in modeling both fixed and random effects. However, unlike in medicine and psychology, the sports sciences appear to lag behind in the comprehensive reporting and validation of these models.
Despite the increasing use of GLMMs, our findings indicate significant deficiencies in the reporting quality. Most notably, key methodological details, such as overdispersion evaluation (reported in only 9.1% of the articles) and model selection methods (reported in 21.8% of the articles), were frequently omitted. Additionally, data and code sharing were limited, with only 16.4% and 14.5% of the articles, respectively, providing open access to these essential research materials (Moher et al., 2024; Tenan and Alejo, 2024).
The lack of overdispersion checks is particularly concerning, as overdispersion can lead to misleading inferences, particularly in models that assume a specific distribution for the response variable (e.g. Poisson or Binomial distributions). Furthermore, without clear information on model selection processes, it is difficult to assess whether the chosen models were appropriate for the data. These reporting gaps significantly hinder the reproducibility and reliability of GLMM analyses in sports research.
Comparison with other disciplines
When comparing our findings with those in other fields, it is clear that the reporting standards in sports sciences need improvement. In clinical medicine (Casals et al., 2014) and psychology (Bono et al., 2021), where similar reviews have been conducted, comparable challenges in reporting model assumptions, selection methods, and validation have also been identified. There is a stronger emphasis on proper reporting of model assumptions, selection methods, and validation. For example, while in sports sciences 78.2% of applications of GLMMs do not report the method of model selection, this proportion is higher in clinical medicine (84.3%) and lower in psychology (69.5%). Thus, sports sciences occupy an intermediate position between these two fields. Nevertheless, across all three disciplines the proportion of insufficient reporting is alarmingly high, which may seriously affect the validity, reproducibility, and decision-making value of the scientific findings. The high prevalence of insufficient reporting highlights the importance for sports science This gap underscores the need for sports science researchers to adopt best practices from other disciplines, such as adhering to established guidelines for reporting mixed models (e.g., https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html, TRIPOD, CHAMP) (Mansournia et al., 2021). We recommend that researchers adhere to established guidelines for reporting mixed models where available. As a discipline-specific and immediately actionable resource, we propose the “Checklist for GLMM Reporting in Sports Sciences” (see Table S6 in the Supplementary Material). This checklist, developed based on our review findings, provides concrete questions covering study design, data structure, fixed and random effects, overdispersion, model selection and validation, interpretation of parameters, and reproducibility. We believe that applying this checklist at the manuscript preparation stage will help authors improve the quality, transparency, and consistency of statistical reporting in sports sciences.
Implications for future research
The deficiencies identified in this review present an opportunity to improve the methodological quality of future research in sports sciences. Researchers should prioritize the clear reporting of model assumptions, overdispersion checks, and variable selection methods. Additionally, journals should encourage or mandate the sharing of data and code to promote transparency and reproducibility. In addition to data and code sharing, the open access status of publications also plays a key role in research dissemination. In our review, only 65.5% of the articles were open access. Promoting open access publication can enhance visibility, reproducibility, and collaboration in sports science research. Moreover, we encourage journals to adopt open science practices more forcefully, including the requirement of data and code availability statements in all empirical research. This is particularly relevant in the context of GLMMs, where reproducibility and clarity in model implementation are essential for evaluating methodological soundness and enabling future replication studies.
One research gap identified in this review is the unequal representation of male (49.1%) and female (5.5%) athletes in the existing literature, with a prevailing focus on male and professional sports. Amateur sports were represented in only 32.7% of the studies examined. A representation of studies on amateur sports between 50% and 70% would better reflect the real distribution of the sporting population, since the majority of sport participants are amateurs. However, we acknowledge that the current underrepresentation may be partly due to challenges such as limited data accessibility, resource constraints, and funding issues. Empowering relevant stakeholders—including researchers, sports organizations, and policymakers—is essential to overcome these barriers and facilitate more inclusive and representative research in sports sciences. These findings mirror those of previous reviews on ordinal models (Fernández et al., 2025) and statistical software use in sports science (Casals et al., 2023), suggesting that gender and sport-type imbalances are not unique to GLMM-based research but are systemic across multiple areas of the field. These disparities have implications for the generalizability of the findings and highlight the need for more inclusive and representative study designs.
To address these dual biases, future research should actively promote the inclusion of women's sports and minority or amateur disciplines. This would improve the relevance and applicability of statistical models across the full spectrum of sporting contexts. Initiatives such as the SCORE network (Sports Content for Outreach, Research, and Education) play a key role in fostering visibility and research in underrepresented areas, particularly with a focus on women's sports and less mainstream disciplines.
Furthermore, there is a need for more training and education on the correct application of GLMMs in sports sciences. As the use of these models continues to grow, it is critical that researchers are equipped with the necessary statistical knowledge to avoid common pitfalls, such as misinterpreting random effects or failing to assess the adequacy of model fit. To assist researchers in improving the quality and transparency of GLMM reporting, we have compiled a checklist of key questions that authors should consider when publishing a paper involving GLMMs (Table S6 of the Supplementary Material). The checklist is structured according to the two aspects outlined in Table S7 (Supplementary Material), “Transparency in reporting” and “Methodological issues”, providing a coherent framework for both reflection and practical application.
Limitations and strengths of the study
This study has some limitations. Firstly, our review only included articles published up to March 2023, and thus, more recent studies could potentially provide additional insights. Secondly, our review was limited to specific databases, which may have excluded some relevant articles from niche journals. It should be noted that this review included only studies explicitly focused on sports, excluding those dealing with physical activity and exercise. Although the broader category of sports sciences may encompass these areas, our aim was to focus on competitive sports contexts, as done in previous reviews (Fernández et al., 2025). This scope should be taken into account when interpreting the generalizability of the findings. Despite these limitations, this review provides a comprehensive overview of the use of GLMMs in sports sciences, identifying key areas where reporting can be improved and offering a foundation for future methodological improvements in the field.
Conclusion
In conclusion, while the use of GLMMs in sports sciences is increasing, our review highlights significant gaps in the reporting and application of these models. To ensure the validity and reliability of future research, it is important that researchers adopt best practices for GLMM reporting, including overdispersion checks, model selection transparency, and data sharing. Addressing these gaps will enhance the quality of sports science research, ultimately benefiting decision-making processes related to performance analysis, injury prevention, and athlete health.
Supplemental Material
sj-docx-1-san-10.1177_22150218251384557 - Supplemental material for Reporting of generalized linear mixed models (GLMM) in sports sciences: A scoping review
Supplemental material, sj-docx-1-san-10.1177_22150218251384557 for Reporting of generalized linear mixed models (GLMM) in sports sciences: A scoping review by Martí Casals, Daniel Fernández, Lore Zumeta-Olaskoaga, Arnau Sánchez and Paola Zuccolotto in Journal of Sports Analytics
Footnotes
Acknowledgements
The authors would like to thank Ben Bolker for his helpful comments on earlier drafts of this article. Daniel Fernández is a Serra-Húnter Fellow and a member of the Centro de Investigación Biomédica en Red de Salud Mental (Instituto de Salud Carlos III). Martí Casals and Daniel Fernández work has been supported by MICIU/AEI/10.13039/501100011033 (Spain) and by FEDER (EU)[PID2023-148033OB-C21], and by grant 2021 SGR 01421 (GRBIO) administrated by the Departament de Recerca i Universitats de la Generalitat de Catalunya (Spain).
Ethics statement
We did not seek ethical approval for this work, as all information used and reported is freely available via online sources.
Authors’ contributions
All authors wrote the article, critically read it. All authors have read and approved the final version of the manuscript and agree with the order of presentation of the authors.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Ministerio de Ciencia e Innovación y Universidades (Spain), (grant number PID2023-148033OB-C21).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
