Abstract
The objective of this study was to compose an objective and detailed notational analysis system for 3 vs. 2 + GK small-sided soccer games, in which three roles are examined: attacker with ball, attacker without ball and defender. The actions and the outcome of the actions were registered for each player and in each role. Players earn points for each action and outcome according to an a priori determined scheme. Performance scores for each role are calculated as the average number of points a participant earns per trial. This notation system was tested on 19 highly talented female soccer players and validity and reliability of the system were determined. In addition, practical applications were discussed and the most important items of the notation system were determined and using only these items, a simplified notation system was proposed. The notation system has high ecological validity and can discriminate the high and low categorized players, but further development is necessary to increase the reliability of the system.
Introduction
Assessing tactical skills of team sport players is challenging but interesting for both sport practice and science. In sport practice, trainers, coaches and scouts want an easy tool to determine the quality of performance, identify strengths and weaknesses and follow the developments of players. Scientifically, an objective method to assess tactical skills of team sport players on the field would be valuable for research on expertise and decision making.
Bard and Fleury 1 were the first to attempt to objectively examine decision making skills by presenting slides of offensive basketball game situations to experienced basketball players and novices, after which they had to verbalize their response. However, the validity and reliability of this test was not reported. Better ecological validity would be acquired using film clips as argued by Helsen and Pauwels. 2 They were among the first who developed a film-based decision-making test that has been used frequently ever since.3–6
However, the most ecologically valid way of measuring decision making or tactical skills is by using game play.7–9 By coding, behaviours exhibited during game play actual performance can be assessed. This is more authentic and represents one’s ability more accurately. 10 In sports and physical education, there is an increasing interest in developing performance assessment instruments that can be used on game play performances. In a review, Arias and Castejón 11 showed that the two most often cited assessment instruments are the Team Sport Assessment Procedure (TSAP) and the Game Performance Assessment Instrument (GPAI).
The TSAP of Gréhaigne et al. 12 was designed for invasion games and examines how players gain ball possession and how they play the ball. Ball possession can be gained by conquering or receiving the ball, and then, the player can play a neutral ball, lose the ball, play an offensive ball or execute a successful shot. Based on the frequencies of occurrence, the volume of play and efficiency index can be calculated and those two combined yield a performance score. Although this is an easy-to-use assessment instrument, its major limitation is that it only examines the player in possession of the ball. Since a player carries the ball for less than 2% of the game,13–15 it is essential that a performance assessment instrument for team sports also includes the performances of players off-the-ball.
The GPAI designed by Oslin et al. 16 is the most frequently used assessment instrument 11 and includes both on-the-ball and off-the-ball movements. Oslin et al. 16 aimed for a performance assessment instrument that can be used for any kind of game and identified general game components for which the observer has to assess appropriateness of the player’s behaviour. For example, for each time a player is in ball possession, the observer assesses the decisions made and these are coded as appropriate if a player choses to shoot or pass to an open teammate when the opportunity is available, and coded as inappropriate if a player does not pass at an appropriate time or to a marked teammate. Thus, the observer has to decide whether players are open or marked, whether a pass is given at the appropriate time or not, etc. and this leads to a high level of subjectivity in the assessment process.
Other, more recent, performance assessment instruments used general tactical principles of the game (e.g. ‘penetration’ or ‘offensive coverage’ as in FUT-SAT17,18) or did not assess the performances of all the players (i.e. attackers and defenders) involved in the game (e.g. Game Performance Evaluation Tool 19 ; for an overview of performance assessments instruments, see 11 or 20 ). This inspired us to develop a detailed and, in our view, more objective notation system in which the performances of all players are assessed, that is attacker with ball, attacker without ball and defender. For each role, the actions of the participants are registered as well as the outcome of the actions. Depending on the outcome of the action, the participant earns points for each action corresponding to the a priori determined point distribution, so that the user of the system is not required to judge the quality or appropriateness of the actions performed by the players. Performance scores for each role are calculated as the average number of points a participant earns per trial in that role.
The aim of the current study was to examine the validity and reliability of the notation system among highly talented soccer players. Validity was determined with regard to ecological, content, concurrent and construct validity. To determine the reliability of the notation system, inter- and intra-observer reliability were assessed. Consequently, the most important items of the notation system were determined and using only these items, a simplified notation system was proposed. Finally, practical applications were discussed.
Method
Participants
A total of 19 highly talented female soccer players participated in this study, with a mean age of 16.3 years (
Procedure
To assess the performances of the players (i.e. attackers and defenders), we chose to use 3 vs. 2 + GK small-sided games (i.e. 3 attackers vs. 2 defenders and a goalkeeper) since these are less complex than 11 vs. 11 matches, facilitate more ball touches per player and are the basics of soccer according to the Royal Netherlands Football Association. 21 The small-sided game was played on a 40 -m long and 25 -m wide field (dimensions were advised by the head coach of the national soccer talent team) with official sized goals, and official soccer rules, including offside, were applied.
The six players were instructed to start at specific locations (Figure 1). The attackers’ task was to try to score as quickly as possible, whereas the defenders had to prevent that. If the defenders obtained ball possession, they had to try to score at the opposite goal. However, the turnover was only for motivational reasons, the notational analysis was only carried out on the performance prior to the change of ball possession (the participants were unaware of this). The trial ended if a goal was scored, a foul was made or the ball went out of play. The variables that were measured are explained in the section ‘Notation system’. After five trials, the participants switched roles (except for the goalkeeper), so that all participants played on each position. Thus, in one test, a participant played 15 attacking trials and 10 defending trials. In total eight tests were conducted, spread out over 4.5 months. Participants who attended less than five tests were excluded from analysis. A total of 733 trials were analysed; on average, a participant played 34 trials ( Overview of the small-sided game. Players are located at their specific starting positions.
The tests took place on the regular training pitch of the national soccer talent team and were video recorded with a Go Pro Hero 3 camera (Black Edition, resolution 1920 × 1080, 30 Hz; Go-Pro, USA) that was fixed on a 6.5 -m high platform (Showtec LTB-200/6 Lifting Tower, The Netherlands), and analysed afterwards using the notation system.
Notation system
Actions, outcomes, definitions and allocation of points of notation system.
Depending on the outcome, the participants earned points for the actions they performed. The allocation of points was a priori determined by soccer experts, and is shown in Table 1. For example, when a player passes the ball towards a teammate, this teammate receives the ball and the pass was directed forward, then the passing player earns two points. Only for positioning a slightly different approach was used, the registered duration in each of the categories of positioning were used to calculate the percentage of time a player spend in each of the categories, and consequently, these percentages were multiplied with the points allocated to each category, as can be found in Table 1. For example, when a player was open, on his own half, in the centre of the field, for 25% of the total time, then this player got 0.25 × 2 = 0.5 points for this category. By adding up the points per trial for each role, and calculating the average number of points a player received per trial, a performance score for each role was computed. There were no minimum or maximum scores, as the performance scores depend on the actions that a player made and on the outcome of these actions.
Data analysis
Validity
In addition to descriptions of the ecological and content validity of the notation system, the concurrent validity and construct validity were calculated.
Ecological validity
Ecological validity reflects the congruency between the constraints during assessment and real-life situations. Using a representative design, in which the task constraints are similar to the natural performance setting, a high ecological validity is achieved. 22 Our notation system was applied to 3 vs. 2 + GK small-sided games, this enabled the participants to behave naturally, and thus, with regard to the task constraints of the assessment method the ecological validity of our notation system is high. With regard to the actual soccer game, however, the ecological validity can be improved by assessing the performances of the players while playing 11 vs. 11 on a regular-sized pitch instead of 3 vs. 2 + GK small-sided games. Nevertheless, in comparison with previous research, the assessment method used in the current study is a proper representation of the actual performance environment.
Content validity
Content validity was determined by two experts with over 25 years of experience in coaching soccer at national and international level. They provided feedback on the terms and definitions of the notation system and discussed the allocation of points until consensus was reached.
Concurrent validity
Concurrent validity can be determined by correlating the results of a new measurement technique with a reference criterion that is administered at about the same time. 23 In this study, the head coach a judged the performances of the players and categorized them as high, medium or low. Categorizations were made for their general performance in the 3 vs. 2 + GK tests and on their specific performances as attacker with ball, attacker without ball and defender. As indication of concurrent validity, Kendall’s tau correlations 24 were determined between the categorizations of the coach and the performance scores attained with the notation system.
Construct validity
Construct validity of the notation system was determined by its success in differentiating between the high and low categorized players. Performance scores for the three roles of the high and low categorized players were compared separately using independent
Reliability
The reliability of the notation system was determined using intra-observer and inter-observer reliability.
Intra-observer reliability
A total of 75 trials (10% of the complete dataset) were coded twice by the main researcher to determine intra-observer reliability. Hughes et al. 25 recommend to use percentage error as indicator of reliability for categorical data and values less than 5% are seen as acceptable. With the exception of positioning, percentage error was calculated for each action and outcome separately, to give insight into the reliability of the separate items. For positioning the duration of being open or marked was registered, and thus the Pearson correlation between the two data sets was determined as reliability score.
Inter-observer reliability
Although the main researcher coded all data, an assistant was also trained for 5 h to use the notation system. After training, a total of 118 trials (16% of the complete dataset) were coded by the assistant to assess inter-observer reliability. The percentage error 25 was calculated for all actions and outcomes separately, except for positioning, for which the Pearson correlation between the two coders was assessed.
Simplification of the notation system
As it is labour-intensive to register all actions and outcomes for each role, we also examined whether it is possible to simplify the notation system. For each role, we calculated the average occurrence of each action per player per trial and the percentage of points the players earned with each action in relation to the total number of points they earned for that particular role. We also examined the ability to discriminate the high and low categorized players of each action separately by using independent
Practical applications
For coaches, it is valuable to have an easy method to compare the players to each other and to get an overview of the strengths and weaknesses of each individual player. To fulfil this request, we created two easy-to-read graphs based on the results of the notation system. To compare the performances of the players within a team or group, a graphical representation was created of the performance scores for offence (i.e. the sum of the performance scores for the role of attacker with ball and without ball) and defence of each player. Also, the average group scores were displayed. The individual strengths and weaknesses were explored by calculating the points each participant earned for each action separately. We expressed them as
Results
Validity
Concurrent validity
Significant correlations between the categorizations by the coach and the performance scores have been found for general performances, τ = .486,
Construct validity
Construct validity test; comparison of high- and low-skilled players, based on categorizations of head coach.
Reliability
Intra-observer reliability
Intra- and inter-observer reliability, expressed as percentage error, except for positioning, for which Pearson correlation was calculated.
was coded too infrequently in this sample to compute reliability score.
Inter-observer reliability
The inter-observer reliability for each action and outcome that was coded more than 5 times in this sample is displayed in Table 3. The percentage error varied from 0.0% to 45.9%, indicating that some items had high inter-observer reliability and others low. For positioning, a significant correlation was found between the two coders.
Simplification of the notation system
For each action in each role, the mean occurrence per player per trial, the mean percentage of points earned with that action in relation to the total number of points for that role, the mean and standard deviation of the high and low categorized players and the test of the difference between them.
Possibilities to simplify the notation system and effects on construct validity test for each role. Suggested simplification options are given in bold.
Practical applications
The performance scores on offence and defence are displayed in Figure 2 for each participant. Using this graphical representation, it is easy for coaches to see how the players score in comparison to each other. The best players appear in the top right corner and the weakest in the bottom left corner. The defence specialists (i.e. good in defence, weak in offence) are located in the bottom right corner and the offence specialists (i.e. good in offence, weak in defence) in the top left corner. Several soccer coaches have approved the practical relevance of this graph.
Individual scores of each participant for defence and offence. The black lines indicate the average scores of the group.
Examples of the individual strengths and weaknesses of two participants are shown in Figure 3. Participant 12 had high performance scores for all three roles, whereas Participant 15 scored low on the roles attacker with ball and defender and above average for the role of attacker without ball. The strengths and weaknesses graphs (Figure 3) reveal that Participant 12 especially excels in passing but may benefit from improving her intercepting skills and although Participant 15 scored on average low on defending, her intercepting skills were above average.
Individual strengths and weaknesses of Participants 12 and 15, expressed as 
Discussion
The aim of this study was to take a first step in developing an objective notation system for small-sided soccer games that examines player performances both on and off the ball. The notation system was tested on highly talented female soccer players from the national talent program. Validity and reliability of the notation system were determined, practical applications were shown and a simplified system was proposed to reduce the workload of the complete notation system.
The notation system has high ecological validity as a representative design is used in which the task constraints are similar to the natural performance setting and consequently enables natural behaviour. Assessing the performances of the players while playing 11 vs. 11 regular matches, will even further improve the ecological validity and is interesting for future research. Nevertheless, in comparison with previous research, the method we used to assess performance is a proper representation of the actual performance setting. Furthermore, as two experts with over 25 years of experience in coaching soccer at national and international level contributed to the development of the notation system, the content validity of the notation system was warranted.
The concurrent validity of the notation system was found to be significant for each role and for the overall performance score. However, the correlations between the performance scores and the categorizations by the head coach showed medium to large effects. This could possibly be due to correlating the performance scores with the opinion of one expert instead of a panel of experts. Also the fact that we analysed the small number of 19 players could have affected the results, and furthermore, these players were all enrolled in the national talent program, meaning that they were all highly skilled players and consequently large differences were not to be expected. Applying the notation system on a larger and more heterogeneous skilled group of players will probably yield higher concurrent validity.
Construct validity was determined by comparing the performance scores of the high and low categorized players. In each role, the highly skilled players scored significantly higher than the low categorized players, demonstrating the good ability of the notation system to discriminate the high- and low-skilled players.
The intra-observer reliability was good except for running actions and offside. The inter-observer reliability, however, was good for some actions but low for dribbling, 1:1 duel both offensively and defensively, running action, offside, defensive pressure and intercepting. For most of these, the recognition of the action was found to be more difficult than the determination of the outcome of that action, as the reliability scores of the outcome were more often at an acceptable level than the reliability scores of the actions. The actions that scored low on reliability were all actions that are less objectively identifiable than actions like passes or shots on goal, indicating that improvement in reliability can be expected after clarifying the definitions of those actions. The low reliability of offside is probably due to the fact that it is an item that can be easily forgotten to register and, in addition, the camera’s viewpoint (behind the goal) made it difficult to identify offside. The notation system showed reasonably good intra-observer reliability, but the inter-observer reliability requires more attention. The reliability can be improved by defining the actions and outcomes more clearly and by administering more guided training with the notation system than the current 5 h of practice before starting to assess performances.
Another reason for the low reliability scores may be the complexity of the system, as any actions and outcomes need to be registered. Reducing the workload by eliminating actions from the system may also improve the inter-observer reliability. We found that when for the attacker with ball only the actions shooting, dribbling and offensive 1:1 duel were included, for the attacker without ball running actions, being in promising position and positioning and for the defender only defensive pressure, then the complexity and workload of the notation system were reduced considerably, but its ability to differentiate the high- from the low-skilled players remained.
On the other hand, using specialised camera’s and software that can track the positions of the players and ball 26 in combination with specially designed algorithms, the registration of all actions of all players on the field can be automated. An advantage of registering all actions is that it reveals a great deal of specific information about the players, which can be used to create player profiles indicating strengths and weaknesses of each player, as we showed in the practical applications, and these player profiles can be used to evaluate training, to follow the development of the individual players and to set goals for an individualised training program. 27
Also, the comparison of the performances of the players within a team is of practical relevance to coaches and scouts. For example, coaches can easily compare players and choose a more offensively or defensively playing midfielder according to their preferred game strategy. For both practical applications that we showed, a benchmark would be of great value. Then players can be compared to age- and gender-matched top-level players. To achieve this, the performances of many players of different age and gender should be assessed with the notation system.
Until now, the notation system has only been used to assess the performances of just 19 players. As these players were all enrolled in the national talent program, and thus preselected on their high skills, large differences in performance among the players were not to be expected. The fact that the notation system was able to discriminate the high from the low categorized players shows the potential of the notation system to assist in talent identification.
Conclusion
The notation system we composed for assessing performances of soccer players in 3 vs. 2 + GK small-sided games seems a good first step towards an objective assessment tool that examines both performances on and off the ball. The notation system differentiates the high- and low-skilled players and had high ecological validity, which may be improved by examining 11 vs. 11 matches. Further development is necessary to increase the reliability of the system and a longitudinal study on the use of the system to assist in player evaluation and selection would be valuable.
Footnotes
Acknowledgements
The authors would like to thank head coach Maria van Kortenhof, the other staff members and the players of the CTO Amsterdam Talent Team and Peter van Dort of the KNVB for their cooperation.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was partly funded by the Royal Netherlands Football Association (KNVB).
