Sage Journals: Discover world-class research

Abstract

Masters swimming is a swimming program for adults, featuring both individual and group events. In individual events each swimmer competes against other swimmers in the same age category, whereas in group events any four swimmers can form a relay, and the age category of the relay is given by the sum of the ages of its participants. Setting up relays in masters swimming may be harder than for professional or junior swimming, in which swimmers compete against other athletes in the same age category. In this work, an integer programming model is presented in order to optimize the assembly of relays in masters swimming, and a scheme is implemented to present the results in a friendly format for the coach. These models are applied to data from a team in Buenos Aires City, Argentina.

Keywords

team formation sports integer programming

Introduction

Masters swimming is a swimming program organized by clubs or federations, for adults. This type of swimming is aimed at those who stopped competing due to their age, for people who started practicing the sports at midlife, and in general for those who think that it is never too late to play sports (i-Natación, 2023). The first formal competitions were held in the 1970s and 1980s in Australia, Canada, Great Britain, New Zealand, USA, Japan, Germany, and Italy (Federación Internacional de Natación, 2023). Like many other initiatives, it did not take long to reach the rest of the world with increasing success. The categories in this modality are divided into age groups with a five year span, starting at 25 years old. There is also a category called “pre-masters”, for swimmers between 20 and 24 years old.

Master swimming has individual and group events. In individual competitions, each swimmer competes against other athletes in the same age category, i.e., within a range of five years. In group events, though, it is not mandatory to assemble groups of four people in the same age category, since this would impose too strong a constraint given the wide age range of swimmers. Instead, any four people can form a relay, and the category to which they belong will be given by the sum of the ages of the participanting swimmers, as follows:

Category 1: 100 to 119 years,

Category 2: 120 to 159 years,

Category 3: 160 to 199 years,

Category 4: 200 to 239 years,

Category 5: 240 to 279 years,

Category 6: 280 to 319 years.

Pre-masters can only form relays among themselves (i.e., the relay category spans from 80 to 99 years old) and cannot participate in relays of other categories. Due to these facts, swimmers in this category are not considered in this work. Each tournament usually consists of the following swimming events:

Event 1: women’s 4 $\times$ 50m freestyle relay,

Event 2: men’s 4 $\times$ 50m freestyle relay,

Event 3: mixed 4 $\times$ 50m freestyle relay,

Event 4: women’s 4 $\times$ 50m medley relay,

Event 5: men’s 4 $\times$ 50m medley relay,

Event 6: mixed 4 $\times$ 50m medley relay.

In Events 1 and 4, relays are composed by four women. In Events 2 and 5, relays are composed by four men. Finally, in Events 3 and 6, relays are composed by two men and two women. In Events 1 to 3 all swimmers perform freestyle swimming, whereas in Events 4 to 6 each swimmer performs a different style (backstroke, breaststroke, butterfly, and freestyle, in this order).

Due to these rules, setting up relays in masters swimming may be a nontrivial task compared to professional swimming (where athletes are not grouped into categories given by age) and to minor swimming (in which categories span a two-year range and relays are composed by people from the same age category). In this work we tackle this issue, by seeking to automatically optimize the assembly of relays prior to a competition, as a tool for assisting the coach’s decisions. Usually, the coach performs this task by hand, having previously registered the swimming times for each athlete and each swimming style on a spreadsheet. This is performed by trying different options and manually calculating the time that would result from each combination the coach comes up with. Clearly, if the team is large, it is not possible to test all possible options by hand.

Due to these reasons, in this work we propose an integer programming model for optimizing the design of relays prior to a tournament, by taking into account the available swimmers, their age and their best times at each swimming style, and the available events to be performed, in order to maximize the overall chances of winning trophies for the participating team. As we shall show in the next sections, the proposed approach is computationally effective and allows to find optimal configurations (with respect to the criteria to be defined in the next sections) within reasonable running times.

The remainder of this work is organized as follows. Section “Literature review” reviews similar works in the existing literature. Section “Proposed integer programming model” proposes an integer-programming based approach for optimizing the assembly of relays, and Section “Experimental results” reports our computational experience with this machinery. Finally, Section “Concluding remarks” closes the paper with concluding remarks and lines for future research.

Literature review

The literature contains previous experience applying combinatorial optimization techniques to different problems related to the assembly of swimming teams.

In Nowak M. and Pollock (2006) high school swimming is considered, in particular when two teams face each other. It is assumed that the opponent’s times are known, and with those times a model is run that estimates the composition of the relays in the opposing team. This is then used to run the final model in order to design the participating relay. The related work (Bailey and Nowak, 2018) also deals with high school swimming and two opposing teams, and is a generalization of Nowak M. and Pollock (2006). The user has to propose different scenarios for the opposing team’s setup, so that the model can then decide the best setup for its own team, depending on each possible scenario.

In Mancini (2018) masters swimming is considered, but only one age category is analyzed (women from 25 to 29 years old). Therefore, the proposed scheme is analogous to high school swimming. Multiple teams face each other, and the times and setup of the opposing teams are estimated based on previous tournaments. In the similar work Hannan and McKeown (1979), the user must specify the swimming times of the opposing team, and which athletes he/she believes the opposing team is going to present in each event. In all four cases, the aim is to decide in which individual and relay events (only individual events in Hannan and McKeown, 1979 and Mancini, 2018) each swimmer has to participate in order to maximize the points obtained by the team, taking into account the results of the competing teams.

Finally, in Masedu and Angelozzi (2006) only relays are optimized. All athletes are assumed to swim the same style, and age and gender are not mentioned. Different times are considered depending on the starting place, which is more applicable to athletics than to swimming. As a side remark, these models can be applied both to swimming and athletics due to their similarities, although the consideration of swimming styles does not apply in athletics.

Proposed integer programming model

The objective of this work is to propose a computational machinery in order to optimize the assembly of the relays for a team prior to a competition. The motivations for the coach may not always be the same, namely the aim could be to get as many podiums (i.e., top-three finishes) as possible or to set a national/continental record. In the former case as many competitive relays as possible should be assembled, whereas in the latter case just a single competitive relay may be assembled.

In any case, the notion of “competitive relay” is set up against the expected time for the category, which in turn is calculated as a function of the existing record for the corresponding age category and the expected competitiveness of the category. Indeed, assembling the fastest possible relay may not be a good strategy if the age category for the resulting relay is already too competitive and, therefore, the opposing teams are fast (hence the assembled team is less likely to be among the first three teams), or the existing record is too low. If this is the case, maybe slower relays could have more chances in the competition or at setting a record, since the existing record for the corresponding age category or the available competition is not too strong. Due to these facts, we propose to assemble the relays by optimizing the difference between the relay’s time to the corresponding national/continental record plus a certain additional time aiming to capture the competitiveness of the category (more details are provided in the sequel). As a general rule, Category 1 (i.e., the “youngest” category) and Categories 5 and 6 (i.e., the “oldest” categories) usually have a smaller number of participating relays than Categories 2–4 since it is difficult to assemble a relay within the corresponding age ranges, due to the lack of many young/senior swimmers. Hence, a relatively slow relay could have better chances in Categories 1, 5, or 6, compared to a comparatively similar relay in Categories 2–4.

We consider each event, each pool size (25 or 50 meters), and each type of record (national or continental) individually, thus getting 24 total problems to be solved. We seek a “global solution” by setting up many competitive relays, although the fastest relay may not be identified by this solution. We solve a different problem for each event since, with the exception of some tournaments, swimmers who can participate at each event do not overlap, due to the fact that the first day the competitions for women and men are held, whereas mixed relays compete in the second day.

We could also be interested in finding the best relay for each event, pool size, and type of record, i.e., finding a single relay, the one with the shortest distance to the target time for the resulting age category. The obtained solution will not necessarily be the fastest relay since records depend on the age category and, in general, the higher the category, the greater the record time. This solution provides the best attainable relay, and may be an interesting reference for the coach. There may be a preference to set a record in one category rather than another (for example, it may be that in the category with the highest chances of breaking a record, the current record is already held by the club, and it is therefore preferable to aim for a record in a new category). Once the best relay to be submitted has been identified, the global model can be run again with the remaining swimmers.

Tackling these problems involves solving one or several combinatorial optimization problems, i.e., problems asking the best solution with respect to a certain objective function over a set of solutions defined by combinatorial considerations (Korte and Vygen, 2012). For each event, pool size, and type of record, we propose to solve this combinatorial optimization problem with the following integer programming formulation. Let $N$ be the set of swimmers and let $E = {1, \dots, 4}$ be the set of four swimming styles in the order performed in medley events. Let $R = {1, \dots, r}$ be the set of relays to be formed, and let $C = {1, \dots, 6}$ be the set of age categories. We also assume the following parameters to be given.

For $i \in N$ and $j \in E$ , we consider a random variable $γ_{i j}$ representing the time of the swimmer $i$ with the style $j$ in a competition corresponding to the event’s pool size. We define $t_{i j} \in R_{+}$ to be the mean and $s_{i j} \in R_{+}$ to be the standard deviation of this random variable.

For $i \in N$ , we define $e_{i} \in Z_{+}$ to be the age in years of the swimmer $i$ .

For $i \in N$ , we define $s_{i} \in {0, 1}$ to be the gender of the swimmer $i$ , represented by $s_{i} = 1$ if the swimmer $i$ is a woman and by $s_{i} = 0$ if the swimmer $i$ is a man.

For each category $c \in C$ , we define $ℓ_{c}, u_{c} \in Z_{+}$ to be the lower and upper age limits for that category, respectively. As an example, we have $ℓ_{1} = 100$ and $u_{1} = 119$ , whereas $ℓ_{2} = 120$ and $u_{2} = 159$ .

For each age category $c \in C$ , we define $r_{c} \in R_{+}$ to be the record (either national or continental, depending on the setting) in seconds for the caterogy $c$ . This parameter depends on the particular event being considered.

Finally, for each age category $c \in C$ , we define $n_{c} \in R_{+}$ to be the additional time over the record $r_{c}$ that is expected to be attained during the competition by the age category $c$ . This parameter acts as a proxy of the competitiveness of the age category, and should be larger for “young” and “senior” categories, as mentioned before. We refer to this parameter as the normalization term for the age category.

Some comments on the parameters are in order. The random variable representing the time for each swimmer and each category is in general not known and, furthermore, evolves with the swimmers’ age and training (Alshdokhi et al., 2020; Costa et al., 2011). Furthermore, this random variable corresponds to the swimmer’s performance in competitions (and not in trainings), since the preparation for a competition implies that the times set during a competition are better than times recorded during trainings. Due to these facts, we usually have very few data points in order to estimate the parameters of this random variable. Although there are long-term studies enabling to have a general idea of these distributions (Born et al., 2022; Post et al., 2020), this task will have a certain amount of uncertainty associated with the lack of data for amateur swimmers.

Another parameter that is subject to an educated guess is the competitiveness of each age category. Although in general we expect Categories 2–4 to be more competitive due to a larger number of participating relays, in general the number and strength of the rival relays is difficult to estimate. We would like to “normalize” in some sense the distance to the record of the corresponding age category in order to capture the fact that a small distance to the record in a very competitive category will not be as good as a relatively larger distance to the record in a less competitive category. Since we will solve the problem with an integer programming formulation, we propose to consider $r_{c} + n_{c}$ as the target time for each category $c \in C$ and we try to minimize the distance between the relay time and the target time. Competitive categories should be assigned a smaller normalization term.

In order to obtain the expected time of a relay, the mean times of the four swimmers in the relay are added and then 1.5 seconds are subtracted from this figure. This corresponds to half a second for each swimmer from second to last, accounting for a shortened reaction time. Indeed, except for the first swimmer of the relay whose reaction time is the same as that of an individual race, for the swimmers who follow the reaction time is lower since the start can be anticipated by watching the previous swimmer arriving to the start/stop mark. We use the sum of the standard deviations of the random variables for the swimmers/style in the relays as a measure of the risk associated with each relay. Although this expression does not represent the standard deviation of the random variable representing the sum of the times for the swimmers in the relay, it provides a conservative upper bound on the dispersion and can be readily used in an integer programming formulation since it is a linear expression on the decision variables.

The model includes two sets of binary variables. For each swimmer $i \in N$ , each style $j \in E$ , and each relay $k \in R$ , we define the binary variable $x_{i j k} \in {0, 1}$ in such a way that $x_{i j k} = 1$ if the swimmer $i$ participates in the relay $k$ with the style $j$ , and $x_{i j k} = 0$ otherwise. For each relay $k \in R$ and each age category $c \in C$ , we introduce the binary variable $y_{k c} \in {0, 1}$ in such a way that $y_{k c} = 1$ if the relay $k$ belongs to the age category $c$ , and $y_{k c} = 0$ otherwise. In this setting, we consider the following integer programming formulation.

\begin{aligned} min & λ \sum_{i \in N} \sum_{j \in E} \sum_{k \in R} x_{i j k} t_{i j} - λ \sum_{k \in R} \sum_{c \in C} (0.5 + r_{c} + n_{c}) y_{k c} \\ + (1 - λ) \sum_{i \in N} \sum_{j \in E} \sum_{k \in R} s_{i j} x_{i j k}, \end{aligned}

(1)

\sum_{j \in E} \sum_{k \in R} x_{i j k} \leq 1 \forall i \in N,

(2)

\sum_{k \in R} y_{k c} \leq 1 \forall c \in C,

(3)

\sum_{c \in C} y_{k c} = 1 \forall k \in R,

(4)

ℓ_{c} y_{k c} \leq \sum_{i \in N} \sum_{j \in E} x_{i j k} e_{i} \forall k \in R, \forall c \in C,

(5)

u_{c} + M (1 - y_{k c}) \geq \sum_{i \in N} \sum_{j \in E} x_{i j k} e_{i} \forall k \in R, \forall c \in C,

(6)

x_{i j k} \in {0, 1} \forall i \in N, \forall j \in E, \forall k \in R,

(7)

y_{k c} \in {0, 1} \forall k \in R, \forall c \in C .

(8)

The objective function (1) attempts to minimize the weighted sum of the distance between the relay times and the record times of the corresponding categories and the risk associated with the relays, thus providing a similar approach as the Markowitz model for portfolio selection (Markowitz, 1952). In the first expression the first term is the sum of the times of the swimmers that make up each relay, whereas the second term subtracts 1.5 seconds (by the previously-mentioned approximation) and the corresponding record time plus the normalization term for each relay. The second expression sums the standard deviations of the corresponding random variables associated with the selected swimmers and the assigned styles. In this objective function, the parameter

λ \in [0, 1]

is the risk aversion factor, representing the intended balance between performance and risk. By minimizing the weighted sum (1) for a range of values of this parameter, we expect to find weakly-dominated solutions, thus approximating the Pareto front for this problem (Ehrgott, 2005).

Constraints (2) request that there be no more than one relay and one swimming style per swimmer. Constraints (3) ask that there be no more than one relay per category, while constraints (4) ensure that each relay is in exactly one category. Constraints (5)-(6) bind the variables, in such a way that $y_{k c} = 1$ if and only if the relay $k$ specified by the $x$ -variables belongs to the age category $c$ , for $k \in R$ and $c \in C$ . Finally, constraints (7)-(8) specify the domains for the variables.

The formulation (1)-(8) is appended with additional constraints for the different executions of the model. For Events 1 to 3 (corresponding to freestyle competitions), all four swimmers must use the freestyle style, i,e.,

\begin{aligned} \sum_{i \in N} x_{i j k} = & 0 \forall k \in R, \forall j \in {1, 2, 3}, \\ \sum_{i \in N} x_{i 4 k} = & 4 \forall k \in R . \end{aligned}

For Events 4 to 6 (i.e., medley relays), we must choose exactly one swimmer for each style in each relay, i.e.,

\sum_{i \in N} x_{i j k} = 1 \forall k \in R, \forall j \in E .

Women’s relays (i.e., in Events 1 and 4) must contain exactly four women, i.e.,

\sum_{i \in N} \sum_{j \in E} x_{i j k} s_{i} = 4 \forall k \in R .

On the other hand, men’s relays (i.e., in Events 2 and 5) must contain exactly four men, i.e.,

\sum_{i \in N} \sum_{j \in E} x_{i j k} (1 - s_{i}) = 4 \forall k \in R .

Finally, mixed relays (i.e., in Events 3 and 6) must contain exactly two women and exactly two men, i.e.,

\begin{aligned} \sum_{i \in N} \sum_{j \in E} x_{i j k} s_{i} & = 2 \forall k \in R, \\ \sum_{i \in N} \sum_{j \in E} x_{i j k} (1 - s_{i}) & = 2 \forall k \in R . \end{aligned}

If we need to search for the single best relay, we can run the model by setting $R = {1}$ .

Experimental results

We present in this section our experimental results with the models introduced in the previous section, with the objective of evaluating their solvability with state-of-the-art integer programming solvers. We also report the application of this machinery to real data from a team in Buenos Aires City, Argentina.

Depending on their structure and size, integer programming models may be difficult to solve by computational means. To this end, we implemented the models presented in Section “Proposed integer programming model” with the ZIMPL modeling language (Koch, 2004). Figure 1 reports the number of variables and constraints for the formulation (1)-(8) considering mixed medley relays, which is the most difficult instance from a computational point of view. The model has over 10.000 variables for $104$ swimmers and a smaller, albeit non-negligible, number of constraints.

Figure 1.

Number of variables (blue) and constraints (red) present in the model (1)-(8), as a function of the number of swimmers (vertical axis in logarithmic scale).

These observations may cast doubt on the possibility of effectively solving these instances with an integer programming solver. To evaluate this situation, we try to solve this model with the SCIP integer programming solver (Bolusani et al., 2024), which is the best-performing open-source integer programming solver at the time of writing this work. Figure 2 reports the running time to optimality of randomly-generated instances with increasing numbers of swimmers and a deterministic objective function, i.e., we take $s_{i j} = 0$ for every $i \in N$ and $j \in E$ . We seek $r = ⌊ n / 4 ⌋ - 2$ relays, and we relax the right-hand-side of constraints (3) to $⌈ n / 6 ⌉ + 1$ so more than one relay for each age category can be formed. These experiments were run with a time limit of 5 minutes, and the instances not solved with optimality when reaching this time limit are not reported in this figure. The results show a clear trend as the number of swimmers increases, suggesting that instances with up to 150 swimmers may be safely solved with this machinery. No instance with more than 300 swimmers could be solved with optimality, and instances having between 150 and 300 swimmers show an increasingly degraded performance.

Figure 2.

Time (in seconds) needed by the SCIP integer programming solver to solve randomly-generated instances with optimality, as a function on the number of swimmers.

We now explore the interplay between the two objectives (setting up fast relays with respect to the normalized records and minimizing the risk associated with the relays). To this end, we have constructed synthetic instances with reasonable times and standard deviations for each swimmer, according to his/her age. Figures 3 and 4 report the performance of SCIP over instances between 10 and 300 swimmers, taking $r = min {6, ⌊ n / 5 ⌋}$ so that at most one relay per age category is assembled. Figure 3 reports the average running time as a function of the risk aversion parameter $λ$ , where each dot reports the average of five instances for a fixed number of swimmers. It is interesting to note that time requirements decrease when adding the dispersion to the objective function, a somewhat unexpected outcome. Figure 4 reports the average running time as a function of the number of swimmers in the instance, for the value of $r$ specified before. The computational difficulty peaks at around 40 swimmers (probably due to the combinatorial difficulties of assembling relays) and then slowly increases with the number of swimmers. These results also show that asking for at most six relays is computationally easier than allowing for an arbitrarily large number of relays. The running time and the number of nodes in the enumeration tree are quite correlated, suggesting that the solution of the linear relaxations is not heavily affected by the coice of $λ$ .

Figure 3.

Average time as a function of the risk aversion parameter $λ$ for synthetic instances and at most 6 relays.

Figure 4.

Average time as a function of the number $n$ of swimmers for synthetic instances and at most 6 relays.

Figure 5 reports the solution quality for a representative 50-swimmer instance, showing a typical behavior when performance and risk interplay. As the risk aversion factor $λ$ tends to 1, the model puts more emphasis on the performance and neglects the risk. A graphical description as in this figure is easily communicated to the coach, especially for the model in which a single (best) relay is looked for. In this single-relay case, the “error bars” show an estimated time range for the whole relay, which can be compared with the record and expected times for the competition, thus helping the coach to evaluate whether the relay is acceptable or not. Figure 6 presents an estimation of the Pareto front for this instance which, although usual in multiobjective optimization, may not provide an information as insightful as Figure 5 for a coach.

Figure 5.

Sum of the distances to the target times, plus/minus the sum of the standard deviations, as a function of $λ$ for a representative 50-swimmer synthetic instance.

Figure 6.

Pareto front for a representative 50-swimmer synthetic instance.

We close this section with the application of this machinery to a team in Buenos Aires City, Argentina. The team is composed by 36 swimmers (11 women and 25 men), aged between 24 and 84 years old. The times for each swimmer and each style were taken from tournaments held between 2021 and 2025 in 25- and 50-meter swimming pools (Confederación Argentina de Deportes Acuáticos, Cadda, 2023; Federación de Natación Buenos Aires, FeNaBA, 2023; Torneo Master Open, 2023). Times are taken from individual events and from the first swimmer at each relay, which are official times. As a reference, the average 50m backstroke time is 38.90 seconds, the average breaststroke time is 43.01 seconds, the average butterfly time is 34.31 seconds, and the average freestyle time is 33.11 seconds. The best 50m time is given by 25.33 seconds in freestyle for a 37-year old swimmer.

The data availability is limited in this case, and we have between one and seven measurements for each swimmer and each style. For swimmers with more than four measurements, the dataset appears consistent with a normal distribution. Descriptive statistics reveal slight skewness and moderate platykurtosis, both within reasonable bounds for normality. A Shapiro-Wilk test for normality yields no evidence against the null hypothesis of normality. Additionally, the Q-Q plots for these datasets show that the sample quantiles align closely with the theoretical quantiles of a normal distribution. Taken together, these results suggest that the data are well approximated by a normal distribution.

Figure 7 depicts the standard deviation of swimming times for swimmers with more than four measurements, showing no clear correlation between age and dispersion in swimming times. Also, it is remarkable to note that swimming times have a relatively small variability, especially considering that these swimmers are not professional athletes. It must be noted that these measurements are exclusively taken on competitions (where the performance is expected to peak), and no times taken in trainings are taken into account.

Figure 7.

Standard deviation of swimming times as a function of swimmer age for swimmers with more than four measurements.

If we only have measurements for a certain swimmer and style for one pool size, the missing data is completed with the following approximation. If we have the average time for a 50-meter pool, we take half a second to complete the time for the short pool. Conversely, if we have the time for a 25-meter pool, we add half a second to obtain the time for a 50-meter pool. This estimation is based on the fact that an athlete usually swims faster in a 25-meter pool, since the return helps to propel him/her off again. When a swimmer does not have a registered time for a style in either the long pool or the short pool, we specify a time of 1000 seconds, which is extremely high. This way, the model will not choose a swimmer with a style with such a high time, although this may happen in order to complete a relay when no one else is available.

The Argentine records by category, both for long pool and short pool, were obtained from the Argentine Confederation of Aquatic Sports (Confederación Argentina de Deportes Acuáticos, Cadda, 2023), whereas the South American records by category were obtained from the Brazilian Association of Swimming Masters (Asociação Brasileira de Másters de Natação, 2023). As a reference, the Argentine record for mixed 4 $\times$ 50m freestyle relay in the 160–199 years category is 111.47 seconds, whereas the South American record for this same event is 107.09 seconds. Figure 8 presents the results for this instance with mixed relays. Since the number of swimmers is quite limited, the resulting plot does not have the variability that a larger instance provides, but nevertheless allows to get insight on the possibilities concerning the interplay between the normalized distance to the records and the associated risk given the variability in swimming times.

Figure 8.

Sum of the distances to the target times, plus/minus the sum of the standard deviations, as a function of $λ$ , for the 36-instance at the club that motivated this work.

Once the models are solved, an R Markdown is implemented that exports the results into an .HTML file. In this step, the data provided by SCIP are taken and presented in a readable format for users, in addition to adding complementary information to make decisions, including times, records, time differences, and styles of each swimmer. Then, in a back-and-forth with the coach, he/she can fix a certain number of relays or propose alternative relays. The swimmers involved in these fixed relays are taken out from the imput data and the optimization is performed again with the remaining swimmers. This way, the coach can fix some decisions and the model will optimize around these decisions for the remaining athletes.

This implementation has been used to put together relays for some tournaments since 2023, although the initial implementation resorted to a deterministic model, which is equivalent to setting $λ = 1$ in the formulation (1)-(8). The obtained results fell within the a priori expectations for most of the executions, and the assembled relays usually matched the coach’s intuition (so in these cases the benefits were the fast execution and the guarantee that the obtained relays are optimal). For some tournaments, however, unexpected results were obtained, such as a swimmer not going in his/her strongest style in a medley relay, but instead going in his/her second or even third best style. As a general conclusion, the proposed methodology helps to get the job done much faster and with a greater certainty of obtaining optimal configurations.

Concluding remarks

In this work we have proposed a computational tool based on combinatorial optimization techniques for designing swimming relays prior to a masters competition. The aim is not to provide a definitive solution but rather to provide various options, so the coach can choose among them according to the strategy that he/she believes is best for the tournament. Manually performing this task can be challenging as the number of swimmers increases, and the greatest help is given in the mixed 4 $\times$ 50 combined competition, which is the one with the largest number of possible relays. The combination of R programs (for data pre- and post-processing) and SCIP (for optimization) provides a clean and fast implementation, which can be tailored for specific needs. The user only has to put together a list of swimmers with their times, sex, and age, and the complete procedure is executed automatically¹.

This tool has been in use since 2023 by a swimming coach and a swimmer who assists him in setting up the relays. We collect their thoughts below, translated from Spanish.

Coach. This is a very useful tool for swimming coaches to set up relays, especially mixed master combined styles, due to the different and wide possibilities that exist taking into account the age ranges, best style of each athlete, and their gender. In this way, we save a lot of time and can view the assembly in a more productive way. I consider global assembly to be of fundamental importance, through which we were able to generate higher scores per relay in the national championships that are defined by points. It is also very important for the assembly of senior combined mixed relays (25 years and older), due to the large number of possibilities presented by each style, age, and gender.

Swimmer. Having a relay tool for masters swimming, in which the needs of both the coach and the entire team are optimized, is extraordinary. It started as an academic project but you can even think of it in a commercial way. Having this form brings us the following benefits:

We have the best information to make better decisions.

We can save a lot of time in assembling the relays.

We have a tool that gives us the assembly of the relays. In general, master swimmers do not know all the options there are to assemble or why some swimmers are left out of a relay.

It is the best tool for a coach to design unbiased relays.

From a technical point of view, it would be interesting to explore whether the integer programming formulation presented in this work can be reinforced in order to solve with optimality instances with larger numbers of swimmers. This may involve studying this formulation in order to find valid inequalities (Aardal and van Hoesel, 1996; Aardal and Van Hoesel, 1999) and/or designing decomposition schemes for tackling these large instances (Nemhauser and Wolsey, 1988). The statement of an integer programming formulation considering both times to the record (normalized with respect to the expected competitiveness of the age category) and the risk associated with the relay involves some compromises. It would be interesting to explore whether such a normalization can be achieved by taking the relative time to the record (thus generating a nonlinear objective function), and whether the standard deviation of the complete relay can be incorporated to the risk term, instead of the upper bound used in this work. These issues may make the optimization problem quite challenging from a computational point of view.

Footnotes

Acknowledgements

We would like to take this opportunity to express our gratitude towards the anonymous reviewer for his/her suggestions, which greatly helped to broaden the scope of this manuscript.

ORCID iDs

Florencia De Arca

Javier Marenco

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and publication of this article.

Notes

References

Aardal

van Hoesel

CPM

(1996) Polyhedral techniques in combinatorial optimization I: Theory. Statistica Neerlandica 50(1): 3–26. DOI: https://doi.org/10.1111/j.1467-9574.1996.tb01478.x

Aardal

Van Hoesel

CPM

(1999) Polyhedral techniques in combinatorial optimization II: Applications and computations. Statistica Neerlandica 53(2): 131–177. DOI: https://doi.org/10.1111/1467-9574.00104

Alshdokhi

Petersen

Clarke

(2020) Improvement and variability of adolescent backstroke swimming performance by age. Frontiers in Sports and Active Living 2: 46. DOI: 10.3389/fspor.2020.00046

Asociação Brasileira de Másters de Natação . (2023) Recordes (in Portuguese). https://www.abmn.org.br/recordes.

Bailey

Nowak

(2018) Meetopt: A multi-event coaching decision support system. Decision Support Systems 112: 60–75.

Bolusani

Besançon

Bestuzheva

, et al. (2024) The SCIP Optimization Suite 9.0. ZIB-Report 24-02-29, Zuse Institute Berlin. https://nbn-resolving.org/urn:nbn:de:0297-zib-95528.

Born

Lomax

Rüeger

, et al. (2022) Normative data and percentile curves for long-term athlete development in swimming. Journal of Science and Medicine in Sport 25(3): 266–271. DOI: https://doi.org/10.1016/j.jsams.2021.10.002

Confederación Argentina de Deportes Acuáticos, Cadda . (2023) Resultados (in Spanish). https://cadda.org.ar.

Costa

Marinho

Bragada

, et al. (2011) Stability of elite freestyle performance from childhood to adulthood. Journal of Sports Sciences 29(11): 1183–1189. DOI: 10.1080/02640414.2011.587196

10.

Ehrgott

(2005) Multicriteria Optimization. Lecure Notes in Economics and Mathematical Systems, 2nd edition. Berlin: Springer-Verlag.

11.

Federación de Natación Buenos Aires, FeNaBA . (2023) Resultados (in Spanish). http://fenaba.org.ar.

12.

Federación Internacional de Natación . (2023) Masters (in Spanish). https://www.fina.org/masters/about.

13.

Hannan

McKeown

(1979) Matching swimmers to events in a championship swimming meet. Computers & Operations Research 6(4): 225–231.

14.

i-Natación . (2023) Natación másters (in Spanish). http://www.i-natacion.com/articulos/modalidades/masters.html.

15.

Koch

(2004) Rapid Mathematical Programming. PhD Thesis, Technische Universität Berlin. http://opus4.kobv.de/opus4-zib/frontdoor/index/index/docId/834. ZIB-Report 04–58.

16.

Korte

Vygen

(2012) Combinatorial Optimization: Theory and Algorithms. New York, NY: Springer-Verlag. ISBN 9783642244889.

17.

Mancini

(2018) Assignment of swimmers to events in a multi-team meeting for team global performance optimization. Annals of Operations Research 264(1): 325–337.

18.

Markowitz

(1952) Portfolio selection. The Journal of Finance 7(1): 77–91. DOI: 10.2307/2975974

19.

Masedu

Angelozzi

(2006) Modelling optimum fraction assignment in the 4x100 m relay race by integer linear programming. Italian Journal of Sport Sciences 13: 74–7.

20.

Nemhauser

Wolsey

(1988) Integer and Combinatorial Optimization. New York: John Wiley & Sons, Ltd. ISBN 9781118627372.

21.

Nowak

MEM

Pollock

(2006) Assignment of swimmers to dual meet events. Computers & Operations Research 33(7): 1951–1962.

22.

Post

Koning

Visscher

, et al. (2020) Multigenerational performance development of male and female top-elite swimmers – a global study of the 100 m freestyle event. Scandinavian Journal of Medicine & Science in Sports 30(3): 564–571. DOI: 10.1111/sms.13599

23.

Torneo Master Open . (2023) M2o (in Spanish). https://m2omasteropen.com.