Reality and risk: A refutation of S. Rendón’s analysis of the Peruvian Truth and Reconciliation Commission’s conflict mortality study

Abstract

We refute S. Rendón’s recent criticism of the 2003 Peruvian Truth and Reconciliation Commission (TRC) conflict mortality study. We first show that his most important result, an alternative estimate of the mortality due to the Maoist guerrillas of Shining Path (Sendero Luminoso), is lower than existing observed data and is therefore impossible. We then analyze his statistical approach and find that it is affected by a subtle form of selection bias. We contrast his approach to the TRC’s using tools from statistical decision theory, and determine that his method is inadequate for this problem—and that the TRC’s approach is, at minimum, better. Without advocating for the TRC’s original results, we conclude that Rendón’s approach and methods are inferior to the TRC’s original work.

Keywords

Casualty count estimation capture–recapture estimation risk Peruvian Truth and Reconciliation Commission

Introduction

Peru experienced a terrible internal armed conflict during the period 1980–2000 between the Maoist guerrillas of Sendero Luminoso (SLU) and agents of the Peruvian state (EST). In 2003, a team at Peru’s Truth and Reconciliation Commission (TRC) estimated conflict mortality due to violence using capture–recapture (CR) methods by combining the TRC’s information with five other databases. Estimates were stratified by location and perpetrator. Killings committed by SLU were infrequently documented by the non-TRC databases. Therefore, instead of obtaining direct CR estimates for SLU, the TRC first estimated a total including both EST and SLU, then estimated EST alone, and estimated SLU by subtraction. Rendón (2019) disqualifies this procedure as “unusual,” and proposes instead estimating the few estimable strata and to extrapolate to the rest.

We agree that there are aspects of the TRC’s approach that should be improved—we have been working on this for several years. However, our response here focuses specifically on Rendón’s proposal: we show that his methods are substantially weaker than the TRC’s, and that his results are unsound.

There are three bases for our rejection of Rendón’s methods and findings. First, his estimates are inconsistent with observed data. By combining the data used by the TRC with data published by the Peruvian government between 2004 and 2006 (Ministerio de la Mujer y Desarrollo Social 2006a, 2006b), we see that Rendón’s estimates for SLU are, in most strata and in the aggregate, lower than the number of observed SLU victims—without considering victims who continue to be undocumented. This fact alone is enough to dismiss Rendón’s conclusions. The TRC’s estimates do not suffer from this problem.

Second, Rendón’s methods are flawed. His approach consists of estimating only those strata where overlaps among datasets are frequent enough to allow direct log-linear CR estimation of SLU. This sounds innocuous until we note that this selection criterion is directly related to the potential outcomes. Indeed, CR methods rely on overlap information—and CR methods tend to produce smaller estimates when overlaps are relatively larger. Thus, by selecting these exceptional (9 of 59) strata, Rendón is affecting the outcomes, likely underestimating.

Third, Rendón’s claim that his approach is an improvement over the TRC’s is unsupported. He reasons as follows: the TRC did not employ the most obvious, direct approach, and merely for that, their estimates are dubious; an application of the obvious approach produces different results; therefore the new results discredit both the TRC’s approach and conclusions. This is fallacious argumentation. Furthermore, showing mere differences is not the appropriate way to compare statistical methods.

In this paper, we refute Rendón’s approach and conclusions. We first show that his estimates fail basic external validity tests. We then investigate why his methods fail. We compare his approach to the TRC’s using appropriate tools from statistical decision theory, and show that his methods are unsuitable for this application.

External validity: estimates should not contradict reality

The TRC’s original 2003 data is not the only list of fully-identified victims of the Peruvian conflict. Between 2001 and 2006, the Peruvian Ministry of Women and Social Development (Ministerio de la Mujer y Desarrollo Social (MIMDES)) conducted a similar, but independent, survey of victims. The results of this “Census for Peace” were published by the Peruvian Government in 2006 as a series of physical volumes which included lists of fatal victims (Ministerio de la Mujer y Desarrollo Social, 2006a, 2006b, 2006c, 2006d). These data have been used in academic investigations since at least Fermi Blanco (2012). Electronic copies of the same documents have also been circulated in pdf format, although at the time of writing they can be difficult to locate. As part of an as-yet-unfinished project, we have re-digitized the MIMDES lists and matched them to the TRC’s. We note that in order to perform the record linkage we had to obtain confidential TRC data through direct agreement with their current steward, the Peruvian Ombudsman Office. For details, including more information about the MIMDES data, see the accompanying technical Online Supplement.

The MIMDES dataset contains a total of 20,468 unique records, from which 13,011 were not already present in the TRC’s dataset. Importantly, it adds a total of 8351 new records attributed to SLU. This raises the number of known victims of SLU to 17,687. Table 1 summarizes this information together with Rendón’s estimates, ${\hat{N}}_{direct}$ . We also include details of one of Rendón’s selected strata (stratum 11) as an illustration.

Table 1.

Known cases of deaths attributed to Sendero Luminoso by the time of Truth and Reconciliation Commission (TRC) report (2003), and after adding Ministry of Women and Social Development (MIMDES) data. We report global totals and an illustrative stratum (11). In both cases Rendón’s estimates of the total, ${\hat{N}}_{direct}$ , are smaller than the number of known cases.

In TRC (2003) data	In MIMDES data	Observed (stratum 11)	Observed (global)
No	Yes	590	8351
Yes	No	332	5245
Yes	Yes	261	4091
	Observed (2018)	1183	17,687
	${\hat{N}}_{direct}$	692	15,089

Inspection of Table 1 makes clear that Rendón’s estimates are illogical: in both the aggregate and the example stratum, his projections are smaller than the known absolute lower bound.

We believe that, even without combining the datasets, reflection on the scale and distribution by perpetrator reported by MIMDES—both similar to the TRC—should have raised questions about whether Rendón’s estimates were plausible. Moreover, analysis of the TRC’s data alone suffices to detect this problem. Stratum 25 (Chungui district in Ayacucho) is the only region for which there is an extra data source (Centro de Desarrollo Agropecuario) with plenty of SLU cases. Thus we can test Rendón’s method by excluding that database and checking if the result is consistent with the observed lower bound. The resulting estimate is ${\hat{N}}_{direct} = 206$ ( ${CI}_{95 %}$ = [204, 208]). This is smaller than the observed (as of 2003) total of 422.

In what follows we analyze Rendón’s technical approach on its own and compared to the TRC’s. MIMDES data will only be used as a factual reference against which to judge the plausibility of results.

Technical validity: to select is to sample

With the exception of stratum 25, most of the deaths attributed to SLU in the 2003 dataset originated from the TRC’s own data collection. Fifty out of 59 geographic strata lacked sufficient overlaps between sources to allow direct SLU log-linear CR estimates. The remaining nine strata in principle allow the calculation of at least one estimate. Because the strata that would allow a direct estimate were so few, the TRC chose not to attempt direct calculations, and instead used an indirect approach. Rendón argues that estimating these nine strata and extrapolating from them to the whole country is a better procedure.

The problem with Rendón’s proposal is that it involves selecting strata: a stratum will be selected whenever it is amenable to estimation, that is, when it contains enough overlaps. This is a random event which depends on the particular sample that resulted from the (random) data collection process. We can identify two scenarios. If the unobserved population were actually small, as estimated by Rendón, then samples would tend to overlap, thus making the existence of estimable strata a relatively common event. In the converse case, if the dark figure were larger, overlaps and therefore estimable strata would be rarer. In any case, the probability of selecting a stratum is related inversely to its underlying size. This is what Rubin (1987) calls a non-ignorable observation mechanism. Rendón’s procedure ignores it.

There are two elements that make us believe that we are in the second of the two outlined scenarios, that is, the larger dark figure. First, we have already shown that Rendón has underestimated so severely that his estimates contradict observed data. Second, the fact that only nine out of 59 strata are estimable suggests that being estimable is indeed a low-probability event. If this were the case, Rendón’s method would result in underestimation.

Comparing statistical methods: estimation risk

Intuitively, an estimator is better than another if it produces estimates that are closer to the truth. Unfortunately, these direct comparisons are not usually possible or even meaningful: true values are usually unknown, and different samples lead to different estimates. Statistical decision theory avoids these pitfalls through the study of estimation risk: the average deviation of estimates, over all possible samples, with respect to the true value of the parameter (see e.g., Schervish, 1996; Shao, 2003). Since true values are usually unknown, statisticians calculate risks for relevant sets of values. A method will be considered better for estimating one given true value whenever its risk at that value is smaller than the risk of the competing method at the same value. In many cases, one of two competing estimators will be better for estimating only a subset of the parameter space. Thus, determining which of the two is “the best” for an application needs to take into consideration which values of the parameter are relevant to that application.

We have calculated the risk of both Rendón’s “direct” approach and the TRC’s method. Our selected risk measure is the mean squared error, $MSE (\hat{N}, N) = Ε_{N} [(\hat{N} - N)^{2}]$ , a common choice for these comparisons (Shao, 2003). In order for the comparison to be meaningful, we have made sure that the distribution of data—which determines the distributions of both ${\hat{N}}_{direct}$ and ${\hat{N}}_{TRC}$ —is such that: (a) the induced distribution on the observable cells is similar to that of 2003 data; and (b) the probability of the unobservable cell is related to the others through the condition assumed by both methods (no-second-order interaction; see Bishop et al., 1975: Chapter 6). We have constructed such distributions by, starting from an assumed $N$ , estimating multinomial probabilities on the complete $2 \times 2 \times 2$ contingency table using real stratum data completed to $N$ , and a no-second-order-interaction model (see online Supplement for details). This approach is similar to the non-parametric bootstrap, but gives the direct method a better chance of success by avoiding the truncation of the support of the distribution due to the presence of random zeros (Bishop et al., 1975). For both estimators, we included the determination of their existence and the model selection as part of their definition.

Figure 1 shows the risk of both methods for estimating SLU casualties, based on data from stratum 14. We have evaluated the risks for $N$ ranging from the observed count as of 2003, nobs(2003), to the observed count as of 2018 plus $80 %$ , nobs(2018) + 80%. Even though any $N$ smaller than nobs(2018) is impossible, we have found that inspecting these values can help understanding why and how Rendón’s method went astray. Aside from the fact that Rendón’s method has resulted in an impossible estimate (Nh direct), a salient feature in Figure 1 is that both methods can behave both adequately and poorly depending on the true $N$ . However Rendón’s method only behaves better than TRC’s for small values of $N$ , most of which are either implausible or, as already shown, impossible. This behavior is consistent with our analysis in the previous section. The TRC’s method behaves poorly for low (but implausible) values of $N$ , but it overtakes Rendón’s at approximately nobs(2018) + 15% and improves from there. This shows that, although far from perfect, the TRC’s method is likely to produce better estimates for reasonable values of $N$ . Analysis of most of the other eight selected strata (in online Supplement) shows a similar picture. An interesting exception is stratum 47, where both methods have similar risk profiles—and produce similar estimates.

Figure 1.

Estimation risk (mean-squared-error) vs. true population size for the Truth and Reconciliation Commission and direct methods. Smaller is better. The shaded region corresponds to values of $N$ smaller than the known minimum as of 2018. Percentages in the x-axis are with respect to the known minimum.

A final note about “impossible regions.” We note that their accuracy ultimately rests in the quality of our matching of the MIMDES and 2003 datasets. Undermatching errors have the potential to alter our conclusions. However, we believe that this is unlikely. First, our matching within and among the datasets was aggressive, so much so that we identified 619 duplicates that the MIMDES team themselves did not detect. This makes our counts conservative. Second, Rendón’s SLU estimate is so low ( $\hat{N} = 15, 089$ ), that for it to be physically possible it would be necessary that at least 6689 SLU records were common to the 2003 data and MIMDES. This is 72% of the 2003 records, and 63% more matches than the ones we detected (4091). We consider this unlikely.

Discussion

The TRC’s study has several limitations. In particular, estimates stratified by perpetrator require accounting for records with missing perpetrator labels, as Rendón does. In addition, we note that both the TRC’s estimates, and Rendón’s, rely on log-linear models which have important limitations (see e.g., Manrique-Vallier, 2016).

This said, existence of limitations in the TRC’s work does not by itself justify any alternative. Alternatives should at minimum stand on their own: they have to be statistically sound, and should produce plausible results. In addition, if they are to contribute anything to the discussion, they should also have some advantage over the original other than merely appearing more obvious. Rendón’s work fails all three of these requirements.

We agree with Rendón’s observation that the magnitude and distribution by perpetrator of killings during the Peruvian conflict are of great importance for Peruvian historical memory. We agree further that these questions merit considerable additional scientific attention. However, the flaws in Rendón’s (2019) article make it unsuitable to this discussion.

Supplemental Material

supplement_parsed_annon – Supplemental material for Reality and risk: A refutation of S. Rendón’s analysis of the Peruvian Truth and Reconciliation Commission’s conflict mortality study

Supplemental material, supplement_parsed_annon for Reality and risk: A refutation of S. Rendón’s analysis of the Peruvian Truth and Reconciliation Commission’s conflict mortality study by Daniel Manrique-Vallier and Patrick Ball in Research & Politics

Footnotes

Acknowledgements

The authors thank Professor Andrew Womack for his thoughtful suggestions.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Supplemental materials

The supplemental files are available at

The replication files are available at

Carnegie Corporation of New York Grant

This publication was made possible (in part) by a grant from the Carnegie Corporation of New York. The statements made and views expressed are solely the responsibility of the author.

References

Bishop

Fienberg

Holland

(1975) Discrete Multivariate Analysis: Theory and Practice. Cambridge, MA: MIT Press.

Fermi Blanco

(2012) Las relaciones entre las ciencias sociales y la política: Un ejercicio metodológico sobre los hallazgos de la Comisión de la Verdad y la reconciliación [Relations between the social sciences and politics: A methodological exercise on the findings of the Truth and Reconciliation Commission]. Sociology professional licensing thesis, Pontificia Universidad Católica del Peru. [In Spanish.]

Manrique-Vallier

(2016) Bayesian population size estimation using Dirichlet process mixtures. Biometrics 72(4): 1246–1254.

Ministerio de la Mujer y Desarrollo Social (2006a) Censo por la Paz 2001–2003: Relación Preliminar de Personas Desaparecidas por el conflicto armado interno de acuerdo con el Censo por la Paz 1980–2003. Technical report, Ministerio de la Mujer y Desarrollo Social, Lima. Available at: https://www.verdadyreconciliacionperu.com/admin/files/articulos/800_digitalizacion.pdf.

Ministerio de la Mujer y Desarrollo Social (2006b) Censo por la Paz 2001–2003: Relación preliminar de personas muertas por el conflicto armado interno de acuerdo con el Censo por la Paz 1980–2000. Technical Report 2005–9444, Ministerio de la Mujer y Desarrollo Social, Lima. Available at: http://www.verdadyreconciliacionperu.com/admin/files/articulos/800_digitalizacion.pdf.

Ministerio de la Mujer y Desarrollo Social (2006c) Censo por la Paz 2006: Relación Preliminar de Personas Desaparecidas por la Violencia Ocurrida en el Periodo 1980–2000. Technical report, Ministerio de la Mujer y Desarrollo Social, Lima. Available at: https://www.mimp.gob.pe/webs/mimp/sispod/pdf/80.pdf.

Ministerio de la Mujer y Desarrollo Social (2006d) Censo por la Paz 2006: Relación Preliminar de Personas Muertas por la Violencia Ocurrida en el Periodo 1980–2000. Technical report, Ministerio de la Mujer y Desarrollo Social, Lima. Available at: http://www.verdadyreconciliacionperu.com/admin/files/articulos/801_digitalizacion.pdf.

Rendón

(2019) Capturing correctly: A reanalysis of the indirect capture–recapture methods in the Peruvian Truth and Reconciliation Commission. Research & Politics 6(1). Epub ahead of print, 25 January 2019. DOI: 10.1177/2053168018820375.

Rubin

(1987) Multiple Imputation for Nonresponse in Surveys. New York: John Wiley & Sons.

10.

Schervish

(1996) Theory of Statistics. New York: Springer..

11.

Shao

(2003) Mathematical Statistics. 2nd edition. New York: Springer.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.26 MB