Abstract
Cooperative behavior is the subject of intense study in a wide range of scientific fields, yet its evolutionary origins remain largely unexplained. A leading explanation of cooperation is the mechanism of altruistic punishment, where individuals pay to punish others but receive no material benefit in return. Experiments have shown such punishment can induce cooperative outcomes in social dilemmas, though sometimes at the cost of reduced social welfare. However, experiments typically examine the effects of punishing low contributors without allowing others in the environment to respond. Thus, the full ramifications of punishment may not be well understood. Here, I use evolutionary simulations of agents playing a continuous prisoners dilemma to study behavior subsequent to an act of punishment, and how that subsequent behavior affects the efficiency of payoffs. Different network configurations are used to better understand the relative effects of social structure and individual strategies. Results show that when agents can either retaliate against their punisher, or punish those who ignore cheaters, the cooperative effects of punishment are reduced or eliminated. The magnitude of this effect is dependent on the density of the network in which the population is embedded. Overall, results suggest that a better understanding of the aftereffects of punishment is needed to assess the relationship between punishment and cooperative outcomes.
Introduction
Explaining the evolution of cooperation is one of the greatest unanswered questions facing evolutionary biologists today (Maynard Smith and Szathmáry, 1997; West, Griffin, and Gardner, 2007). Cooperation is instrumental in maintaining human social institutions (Ostrom, Walker, and Gardner, 1992) and is required among nations to effectively address global-scale problems (Kaul and Mendoza, 2003; Sandler, 1997). Thus, understanding the mechanisms that result in cooperation is important to both scientists and policy makers. Yet, despite its fundamental importance, the evolution of cooperative behavior remains largely unexplained.
Several mechanisms have been previously suggested to explain the evolution of cooperation, including kin selection (Hamilton, 1964), multi-level selection (Fletcher and Zwick, 2004; Goodnight, 2005; Reeve and Hölldobler, 2007; Wilson and Wilson, 2007; Wilson and Hölldobler, 2005), direct reciprocity (Axelrod and Hamilton, 1981; Trivers, 1971), indirect reciprocity (Boyd and Richerson, 1989; Leimar and Hammerstein, 2001; Nowak and Sigmund, 2005), and tag-mediated altruism (Axelrod, Hammond, and Grafen, 2004; Riolo, Cohen, and Axelrod, 2001; Spector and Klein, 2006). While these mechanisms explain some instances of cooperation, they generally apply to limited cases or special circumstances such as genetic relatedness or long-term relationships between individuals. The search for a more broadly applicable explanation has increasingly focused on altruistic punishment, where individuals incur a cost to punish others without receiving any material benefit in return.
Punishment is ubiquitous among social organisms and wherever cooperating individuals have an incentive to cheat or free-ride, punishment behavior usually exists as a deterrent (Frank, 1995). This includes toxin release in colonial bacteria that affects only non-cooperators (Travisano and Velicer, 2004), the destruction of eggs laid by workers in social insect colonies (Foster and Ratnieks, 2001), and enforcement of dominance and mating hierarchies in non-human mammals (Clutton-Brock and Parker, 1995). Even the process of cellular meiosis can be viewed as a form of policing selfish genes (Michod, 1996). In humans, punishment and policing are common across many diverse societies and cultural groups (Marlowe et al., 2008) and are prevalent in local-scale management of common property (Coleman and Steed, 2009). Policy makers view punishment institutions as key to resolving social conflict both at local scales, in governance of common pool resources (Dietz, Ostrom, and Stern, 2003; Ostrom et al., 1992), and at global scales where it is considered a prerequisite for successful international agreements (Barrett, 2003). Laboratory and simulation experiments generally support the idea that altruistic punishment can lead to the provisioning of public goods (Boyd, Gintis, Bowles, and Richerson, 2003; Fehr and Gächter, 2000; Gürerk, Irlenbusch, and Rockenbach, 2006; Ostrom et al., 1992; Shutters, 2012), though others have demonstrated exceptions to this finding (Cinyabuguma, Page, and Putterman, 2006; Fehr and Rockenbach, 2003; Herrmann, Thöni, and Gächter, 2008)
While it is fine to propose that altruistic punishment is a mechanism leading to the evolution of cooperation, this only shifts the underlying question from “why should an individual cooperate?” to “why should an individual altruistically punish?” As research begins to focus on the latter question, cultural group selection (Hagen and Hammerstein, 2006; Richerson and Boyd, 2005) and the feedbacks of social structures (Shutters, 2012) have been recently suggested as mechanisms leading to the evolution of altruistic punishment.
What has not been adequately addressed is how punishment affects the efficiency of cooperation, a measure of the net increase in payoffs that result when punishment is used to induce cooperation (Nikiforakis, 2008; Sefton, Shupp, and Walker, 2007). Even if punishment induces a society to cooperate there are costs associated with punishing and being punished that reduce the overall gains from cooperation, and these must be accounted for when discussing the efficacy of punishment as a cooperative mechanism.
Previous experiments using punishment show that its use can increase contributions to a public even though total payoffs decrease compared to a population comprised only of defectors (Fehr and Gächter, 2000; Ostrom et al., 1992). This negative affect on efficiency has been demonstrated when interactions are not repeated sufficiently, though increasing the number of repeat interactions eventually led to positive gains in total payoffs (Gächter, Renner, and Sefton, 2008; Gürerk et al., 2006). Thus, it remains unclear in a world of increasingly frequent one-shot interactions how punishment used to induce cooperation might affect total payoffs.
Understanding how punishment-induced cooperation affects payoff efficiency is especially important when considering the aftermath of punishment. Experiments with punishment typically include only a round of game play and a round in which agents can punish cheaters. These experiments ignore the fact that punishment in real-world situations usually elicits further responses of some type.
Thus, the purpose of the current study is not to support or refute mechanisms that may lead to the evolution of altruistic punishment. Instead, its purpose is to understand how the efficiency of punishment-induced cooperation is affected when a more realistic range of behavior is allowed to take place. In this study, the specific behaviors of retaliation and second-order punishment are allowed in a simulated society and their effects on the efficiency of cooperation are examined.
Second-order punishment
Sanctioning and policing institutions often exist in human societies to deter cheating in the provisioning of public goods. But a paradox arises, known as the second-order free-rider problem, regarding what motivates those who are supposed to punish cheaters (Hodgson, 2009; Sigmund, De Silva, Traulsen, and Hauert, 2010). Without deterrents and/or incentives, enforcement agents are expected to avoid the costs and risks of punishing and to simply ignore cheaters. These agents that avoid their policing duties have an evolutionary advantage over those that do punish (Dreber, Rand, Fudenberg, and Nowak, 2008) and the mechanism of second-order punishment often exists as a deterrent against policing agents that do not do their jobs. Second-order punishment occurs when an agent declines to punish cheaters when given the chance, and is itself punished as a result (Boyd and Richerson, 1992). Even though such individuals may otherwise cooperate and contribute substantially to a public good, they are punished because they take no action against cheaters.
But what is the effect on payoff efficiencies when agents are seemingly coerced into punishing cheaters? One may reason that, since punishing cheaters induces public good contributions, punishing those that ignore cheating will only further enhance public good contributions. On the other hand, laboratory experiments with human participants have demonstrated the opposite, showing that sanctioning otherwise cooperative agents because they ignore cheating can inhibit the emergence of cooperation (Denant-Boemont, Masclet, and Noussair, 2007). This leads to the first question addressed in this study:
Q1) When agents may altruistically punish others that permit cheating, how is efficiency of cooperation affected?
Retaliation
Another under-addressed behavior that often co-occurs with punishment is retaliation. Research has shown that humans and other animals are not indifferent to being punished and often retaliate at a cost to both themselves and their punisher (Clutton-Brock and Parker, 1995; Molm, 1994). The prospect of suffering retaliation can deter agents from punishing free-riders (Nikiforakis and Engelmann, 2011) and ultimately negate the cooperative effects of punishment (Nikiforakis, 2008). This consequence is frequently overlooked in studies of punishment-induced cooperation (Denant-Boemont et al., 2007), which typically allow only punishment of cheaters and do not allow a response from the punished party. Thus the third simulation allows the ability to retaliate when punished and seeks to answer the question:
Q2) When a punished agent may retaliate against its punisher, how are aggregate levels of cooperation affected compared to simulations without retaliation?
Social welfare
In both treatments, second-order punishment and retaliation, the focus of this study is not simply on how contributions to a public good are affected, but on how a population's overall payoffs are affected. Increased contributions to a public good are typically assumed to be due to cooperative behavior but it may also be that contributions increase because of coercion. This is an important distinction that becomes clearer when analyzing how a given treatment affects total net payoffs or payoff efficiency. This study draws a distinction between cooperation, increasing both contributions and payoffs, and coercion, increasing contributions at the expense of lower payoffs. Thus, this study also seeks to answer the question:
Q3) If either retaliation or second-order punishment induces higher levels of cooperation does it also increase aggregate payoffs?
Population structure
Research has demonstrated that populations embedded in spatially explicit grids can evolve different aggregate attributes than non-structured populations (Boyd and Richerson, 2002; Killingback and Doebeli, 1996; Killingback and Studer, 2001; Nowak and May, 1992; Page, Nowak, and Sigmund, 2000). More importantly, several studies show that network structure plays a critical role in the evolution of aggregate behavior such as cooperation (Chen, Fu, and Wang, 2007; Chwe, 1999; Gould, 1993; Huang, Wang, Xu, and Wang, 2008; Hui, Xu, and Zheng, 2007; Ifti, Killingback, and Doebelic, 2004; Ohtsuki, Hauert, Lieberman, and Nowak, 2006; Santos, Rodrigues, and Pacheco, 2006), especially when those networks are dynamic and coevolving with the agents they govern (Hales, 2005; Santos, Pacheco, and Lenaerts, 2006; Shutters and Cutts, 2008; Takács, Janky, and Flache, 2008).
Therefore, it is critical to understand not only how second-order punishment and retaliation affect the efficiency of cooperation but also how differences in population structure influence outcomes. This study examines the role of social structure by analyzing evolutionary outcomes both with and without structured societies.
Materials and Methods
To test the questions outlined above, various punishment options were incorporated into evolutionary simulations of the continuous prisoner's dilemma. Social simulations, including agent-based models, individual-based models, and other evolutionary computational techniques, offer unique insights into dynamic behavior (North, 2005) such as the relationship between individual behavior and emergent properties at higher scales (Anderies, 2002; Harrison and Singer, 2006), that are typically not provided by formal models of social systems (Harrison and Singer, 2006; Sawyer, 2005). Social simulations also allows careful control over factors that may confound empirical studies such as emotion, reputation, visual cues, anonymity, or cultural influences (Cederman, 2001), while probing vast expanses of evolutionary space that would be impractical in laboratory settings due to cost or time constraints. It should be noted that social simulations are generally designed as a complement to laboratory experiments and cast studies, not as a replacement.
To understand the effects of social structure, which are known to significantly influence results of social simulations (Santos, Rodrigues, and Pacheco, 2006), simulations were conducted both with and without social structure. When added, social structure consisted of regular networks of varying density.
The continuous prisoner's dilemma (CPD)
In the standard prisoner's dilemma players are limited to two choices - cooperate or defect. Here, that requirement is relaxed and players select a level of cooperation on the continuum between full cooperation and full defection. This presents an arguably more realistic picture of choices facing those in social dilemmas (Killingback and Doebeli, 2002; Sandler, 1999) and is known as the continuous prisoner's dilemma (CPD).
In a CPD game i and j are each given an endowment standardized here to one unit. From this they independently and simultaneously contribute a portion x ∊ [0,1] to a public good pool, while keeping the remainder, so that x = 1 represents full cooperation and x = 0 full defection (Deng and Chu, 2011; Schofield, 1977). For any given contribution by j, i's payoff is maximized when xi = 0. This is the expected rational choice or Nash equilibrium of the CPD. The dilemma arises, however, because total social welfare, measured as total net payoffs, is maximized when both individuals cooperate fully and xi = xj = 1.
Social Structure
At the beginning of each simulation, a specific network is generated that structures the population and determines the allowable interactions between agents. All networks are non-directed, unweighted, and static.
To understand the effects of social structure on outcomes, a number of regular networks are used. Often represented as lattice structure, are those in which all nodes have the same degree d (or number of neighbors) and are arranged in a regular repeating pattern. In addition, these networks are torroidal, meaning that they have no edges but instead loop around onto themselves such as the surface of a sphere. Two regular networks used commonly in simulations, including this study, are the von Neumann network (d = 4) and the Moore network (d = 8). Hexagonal networks (d = 6) are also used as well as one-dimensional rings known as linear networks (d = 2). Though regular networks bear little resemblance to interaction patterns in real-world social systems, their use in simulation studies reduces confounding effects of social structure because they have no variance in degree, no edge effects, and uniform distances among individuals in a population. When used in this study, regular networks are referenced throughout this paper by their degree d.
In contrast to structured societies, complete networks are used in this study to understand how the absence of social structure affects outcomes. Complete networks are those in which every node is linked to every other node in the population. Though technically a regular network with d = N – 1, where N is the population size, an agent in a complete network has equal probability of interacting with any other agent. Thus, complete networks are analogous to homogeneous, well-mixed systems that have no social structure. Throughout this study, simulations using complete networks are synonymous with unstructured populations.
Base game play
In the base game, agents play the CPD followed by a single round of punishment. A single simulation run initiates with creation of a social network. Each node is occupied by a single agent i consisting of strategy (xi, ti, ci, si) where xi = the contribution i makes to the public good in the CPD, ti = the contribution below which the agent will punish another agent in a game being observed by i, ci = how much i spends to punish an observed agent whose contribution is too low, and si = the amount i spends to retaliate when it has been punished (in simulations that allow retaliation). In other words, ti determines if agent i will punish and ci determines how much agent i will punish. Each strategy component xi, ti, ci, si ∊ [0,1] and is generated randomly from a uniform distribution at the beginning of each simulation. To control for other factors that might contribute to the maintenance of cooperation, such as history or reputation, agents have no memory of prior interactions or agents. Every game is effectively one-shot and anonymous.
During a single CPD game an agent i initiates the encounter by randomly selecting j from its neighborhood, which consists of all nodes one link away from i in the given network type. Agents are given their endowment of one unit from which each simultaneously contributes a portion to the public good pool. Payoffs are then calculated as in Table 1. The initiating player i then randomly selects a second neighbor k, who is tasked with observing and evaluating i's contribution. If k judges the contribution to be too low (xi < tk), k pays ck to punish i by the amount ckM, where M is the relative strength of punishment referred to here as the punishment multiplier. During a single generation of the simulation each agent initiates three CPD games and, on average, acts as an observer (and possible punisher) three times. A single simulation run execute for 10,000 generations.
Payoffs p in a CPD with:
Each generation consists of three routines - game play, observation and punishment (including retaliation and punishment of non-punishers), and selection and reproduction. During each routine an agent interacts only with its immediate neighbors as defined by the network type and all interactions take place in parallel. For each agent, p represents the net payoffs (benefits – costs) an agent earns during a generation. At the start of a new generation p = 0. It is increased by the amount earned in each CPD but is decreased when the agent is punished by other agents or when the agent pays to retaliate or punish someone else.
Following game play and punishment, agents compete with one another in the reproduction routine for the ability to pass offspring to the next generation. During this routine each agent i randomly selects a neighbor j with which to compare respective payoffs accumulated during the generation. If pi > pj, i's strategy remains at i's node in the next generation. However, if pi < pj, j's strategy is copied onto i's node for the next generation. In the event that pi = pj, a coin toss determines the prevailing strategy. As strategies are copied to the next generation each strategy component of every agent is subject to mutation with a probability m = 0.10. If selected for mutation, Gaussian noise with mean = 0 and standard deviation ± 0.01 is added to the component. Should mutation drive a component's value outside [0,1] the value is adjusted back to the closer boundary value.
Introducing second-order punishment
In a second simulation, second-order punishment was introduced and agents were given the ability to punish observers who were too lenient on cheaters. In a CPD game with observer k, a new agent l makes an assessment of whether k's definition of a cheater is more lenient than l's. It does this by determining whether k's threshold for punishment tk is greater than its own tl. If and only if tk > tl then l inflicts second-order punishment on k, and l pays an amount cl to have clM deducted from k's net payoffs.
Introducing retaliation
The third simulation examined the effect of retaliation on cooperative outcomes. The base case simulation was modified so that an agent i automatically retaliated after being punished by paying an amount si ∊ [0,1] to have its punisher sanctioned by the amount siM. Since si could evolve to 0, agents might evolve so that they did not retaliate, even when punished. Three different rules were implemented for calculating how much a punished agent spent on retaliation. All methods of retaliation are arbitrary in the sense that their construction was intentionally limited to existing parameters of the model, but are nonetheless intuitive given the constraint of available variables. The three rules are:
si equals the same amount the punished agent would have spent to punish a low contributor (si = ci). This assumes that a single strategy component dictates how much an agent spends to punish others regardless of the reason.
si is an independently evolving strategy component (si is independent of ci). This assumes that retaliation is a separate form of punishment and governed by its own strategy component.
si equals the amount the agent contributes to the public good in the CPD (si = xi). This reflects the idea that both punishment and public good contributions are non-selfish behaviors, and so may be governed by the same strategy component.
Simulation variables and output
The important parameter governing the mechanism of altruistic punishment is the ratio of costs incurred by the punishing party to those of the party being punished (Casari, 2005; Shutters, 2009). Defined above as the punishment multiplier M, this parameter is analogous to the strength or efficiency of punishment and, along with network type, is an independent variable in these simulations. The dependent variables of interest are the mean contribution and the mean payoff which evolve in a population after 10,000 generations. The mean contribution represents the population's level of cooperativeness while the mean payoff is a measure of the population's social welfare.
It is important to note that the magnitude of payoff values collected is somewhat arbitrary. A more meaningful measure is the magnitude of change in payoffs due to the various punishment and structural treatments. Thus, payoff results are presented in this study by a measure known as payoff efficiency, where 0% efficiency means that payoffs equal those expected in a population composed entirely of defectors without any form of punishment (6.0 in this case) and 100% means that all members of the population contribute their entire endowment to the public good and that no punishment of any kind takes place (for a mean payoff of 9.0 in this case). While it is not possible for a population to evolve higher than 100% payoff efficiency, it is possible for populations under punishment treatments to evolve negative payoff efficiencies. This is due to the additional costs incurred during acts of punishment, both by the punishee and the punisher.
For any given parameter set (Table 2), 100 replications were conducted at M = 0.0 and then at subsequent values of M in increments of 0.5, up to M = 10.0. Because aggregate outcomes using retaliation still showed considerable variability when M > 10.0, simulations were run additionally from M = 10.0 to M = 30.0 in increments of 5.0.
Simulation parameters and their values used in the continuous prisoner's dilemma.
Results and discussion
Control case: effects of first-order punishment only
In the first simulation agents played the CPD followed by a single round of punishment. Agents could pay c to have a low contributor punished by an amount cM. This is the control case as neither second-order punishment nor retaliation was allowed. Consistent with previous studies (Gürerk et al., 2006; Shutters, 2012), cooperation evolved despite the fact that cooperators had no direct incentive to punish and could ignore cheaters without repercussions (Table 5). As M increased in these simulations, cooperation evolved in all simulations with social structure (Figure 1, solid lines). For each regular network, at some threshold value of M (Table 3) the population underwent a rapid transition from nearly full defection to nearly full cooperation. In simulations without social structure cooperation never evolved and mean contributions to the public good evolved to approximately 0.
Approximate value of M required for transition from defection to cooperation, without and with second-order punishment (2OP).
no transition occurred with increasing M even at values as high as M = 5,000.
Mean ending payoff efficiency under different social structures, both without punishment and with different punishment treatments.
mean ending payoff efficiency of 100 runs at M = 15
Mean ending contributions ±SD under different social structures, without punishment, with one round of punishment, and with both a punishment and retaliation (rule 3) round.
mean ending contribution of 100 runs at M = 4
mean ending contribution of 100 runs at each M = 10, 15, 20, 25, 30 (500 total runs)

Results of simulations with first-order punishment only and with both first- and second-order punishment.
Previous studies have suggested that altruistic punishment may only be sustained through group selection (Boyd et al., 2003). One may be puzzled then that this result exhibits sustained punishment without discrete groups. However, Wilson and Wilson (2007) assert that what is important for group selection is not that agents form discretely bounded groups, but that their social interactions are local compared to the entire population. This assertion is supported by the current results from simulations with simple (first-order) punishment only. Not only did punishment, and subsequently cooperation, emerge in networked populations where interactions are local, but the more localized, measured as lower average degree d, the more readily punishment proliferated (Table 3).
The value of M at which populations transitioned to cooperation was particularly influenced by the mean degree of the network. This relates to a debate regarding the effect that network density has on the ability of a networked population to evolve cooperative behavior. Researchers have previously asserted that the more densely connected a population, the more likely that it will evolve cooperation (Marwell and Oliver, 1993; Opp and Gern, 1993), an assertion supported by Jun and Sethi's (2007) simulation experiment. However, many recent studies suggest the opposite, showing that cooperation is inhibited in denser networks (Flache, 2002; Flache and Macy, 1996; Takács et al., 2008) and that increasing average degree requires increasing the relative benefit of cooperative acts before cooperation can emerge (Ohtsuki et al., 2006). Results from this study strongly support the latter view that denser networks inhibit the evolution of cooperation. Though full cooperation eventually evolved on all regular networks, the severity of punishment, M, required to evolve cooperative populations increased as the density of the network increased (Table 3). This finding is similar to that of Ifti et al. (2004) which showed that as neighborhood size increases beyond a critical threshold, cooperation collapses.
Effects of second-order punishment: Structured societies
In the second set of simulations, agents could not only pay c to punish low contributors by an amount cM, they could also pay to punish those who had a higher tolerance for cheaters than themselves. Previous simulations have shown that when using a cultural group selection mechanism, second-order punishment may help to stabilize cooperative behavior in a population (Henrich and Boyd, 2001). However, results here show that instead of enhancing the cooperative effect of punishment, simulations using second-order punishment required higher values of M to induce cooperative behavior than simulations without social structure (Figure 1). In effect, punishment needed to be more severe to achieve cooperation than when there was no option for second-order punishment (Table 3).
One possible reason for this result is that in simulations with second-order punishment, agents that contributed fully to the public good could still suffer punishment for other reasons. Regardless of how cooperative they were, if they were lenient on cheaters, they might be the target of second-order punishment. Thus, many cooperative agents that might have helped move the population toward full cooperation could be injured through sanctions, making these punishers less fit and decreasing the overall effectiveness of punishment. This finding suggests that attempts to incite individuals to police each other through the threat of peer punishment may have unintended and adverse consequences.
It is important to understand that there are multiple ways to implement second-order punishment. In this study, an agent l bases its decision to inflict second-order punishment solely on an assessment of the traits of the observed first-order punisher k. Namely, l compares its own threshold for defining a cheater to the threshold of k. One alternative method of implementing second-order punishment is for l to observe the behavior of k in response to a third party i, where i is a participant in a CPD game. Once k determines whether or not to punish i, l then determines whether it would have taken the same action. If k reacted differently, then l inflicts second-order punishment on k. In other words, if l determines that i was a cheater and that k did not punish i, then l punishes k. Likewise, if l determines that i was a cooperator but was still punished by k, then l punishes k for being overly punitive. These last two cases may be implemented separately as well, leading to many alternative mechanisms for implementing second-order punishment. Therefore, future research should seek to isolate the effects of different mechanisms of second-order punishment.
Effects of second-order punishment: unstructured societies
A surprising result was the ability of second-order punishment to induce cooperative outcomes in unstructured populations. Though simulations on complete networks evolved to full defection in every other treatment in this study, the addition of second-order punishment both increased cooperation and aggregate payoffs to relatively high levels (Table 4). With increasing M, levels of both public good contributions and payoffs using complete networks eventually surpassed those using regular networks (Figure 2).

Effects of second-order punishment using four different networks.
This result suggests that at some point in a continuum of social structures, altruistic punishment alone becomes insufficient as a mechanism for upholding cooperation and second-order punishment emerges as a solution (see also Sigmund et al., 2010). If one considers this structural continuum as describing not simply the average degree of a society, but its overall size and complexity then a plausible example of the need for higher-order punishment can be viewed in the developmental dynamics of police agencies. As cities increase in population, and their policing agencies grow in size, the agencies typically add second-order punishment organizations (Wilson, 1963). Known variously as internal affairs, internal investigations, or similar designations, these organizations are responsible for policing the police. Evidence for this trend toward a need for higher order punishment may be further seen among the largest cities where citizen panels are frequently instituted to monitor the activities of internal affairs divisions. This emergence of third-order punishment may indicate that as societies continue along a continuum of societal size and complexity, increasingly higher order punishment levels are required to maintain cooperation.
Effects of retaliation
In the third set of simulations, a punished agent was allowed to immediately retaliate against its punisher using one of three different rules (described above) to determine the amount s that the retaliating agent spent to impose a cost of sM on its punisher. Using retaliation rule 1 (si = ci), cooperation did not evolve on any network. The ability to retaliate led to the collapse of cooperation that evolved when there was no retaliation. Likewise, under rule 2 (si is independent of ci) full defection evolved on all social structures. In simulations using rule 3 (si = xi) results were more complex. As with simple punishment, simulations with structured populations underwent a rapid transition from almost no contributions to some positive level of contributions with increasing M.
However, contributions did not transition to full cooperation as before but instead plateaued at a value between full cooperation and full defection, a value that varied by network density (Table 5). In addition, payoff efficiency initially rose with increasing M but then fell to negative levels (Figure 3), meaning populations with the ability to retaliate fared worse than populations composed entirely of defectors and no punishment. Payoff efficiency decreased in the presence of retaliation even though some level of public good contribution was achieved. This result demonstrates the provisioning of public goods through what may be better described as coercion than cooperation.

Effects of retaliation using four different networks.
Because humans often do retaliate after being punished (Molm, 1994; Nikiforakis, 2008), these results challenge the idea that cooperation may be the product of altruistic punishment in real world situations. If altruistic punishment actually has been an important mechanism in the evolution of cooperation, then it is likely that other mechanisms also existed to suppress or avoid retaliatory behavior. This may explain the frequency of institutional policies like those of the United States Department of Labor, which penalize or otherwise discourage retaliation against whistleblowers (USDL, 2009).
Further considerations of social structure
In this study I have restricted structured populations to homogeneous regular networks to exclude confounding effects of variation among agents in degree, connectivity, edge effects, etc. However, regular networks bear little resemblance to the patterns of interactions among living things, particularly in humans, though they are arguably more representative of living systems than complete networks in which agents interact equally with all other members of a society. To briefly assess the effect of subsequent punishment behavior under more realistic social structures, supplemental simulations were run using small-world networks (Watts and Strogatz, 1998) and scale-free networks (Barabási, 2009; Tomassini, Pestelacci, and Luthi, 2007), both of which are common in complex physical and social systems (Barabási and Albert, 1999; Dorogtsev and Mendes, 2003).
Under small-world networks, results in all cases were qualitatively equivalent to results with regular networks presented in Table 6. However, results using scale-free networks present a challenge as neither second-order punishment nor retaliation appeared to have any effect on simulation outcomes. Both cases present ample opportunities for future research as they not only generate interesting results but are more applicable to the social structures under which social behavior likely evolved.
Summary of effects of different punishment treatments on cooperative outcomes (public good contributions).
Conclusion
This study has built upon empirical studies that suggest altruistic punishment is a mechanism that leads to cooperation. Specifically, it examines two types of behavior that often occur in the presence of punishing behavior, retaliation and second-order punishment. Using computational social simulations, results show that retaliatory behavior almost always hinders the ability of punishment to induce cooperative behavior and that second-order punishment is most effective when populations are highly connected and/or well-mixed. These results concur qualitatively with observations from human social systems – that retaliation is often suppressed and that second-order punishment frequently emerges when social systems grow beyond a certain threshold of size and complexity.
Footnotes
Acknowledgements
Nikos Nikiforakis and Ann Kinzig provided critical feedback on this study. This material is based upon work supported by a National Science Foundation Graduate Research Fellowship. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.
