Abstract
Rendon (2019) showed that the indirect estimation of total killings in the Peruvian Truth and Reconciliation Commission introduced a distortion. Two of the original analysts, Manrique-Vallier and Ball (2019), provide an indirect defense of their work using new data, and argue that this supports their unprecedented indirect method over the direct estimator. In this rejoinder I show that their new figure of 17,687 killings by the Shining Path is closer to the direct estimate of 18,341 than to their indirect estimate of 31,331 killings. I also show that the indirect method systematically produces impossible negative predicted killings and overfits massively. I reiterate my conclusion that their indirect estimates are unreliable and should be retracted.
Introduction
Truth Commission estimates have not necessarily passed a rigorous evaluation or peer-review by the same standard as scientific journals. They deserve to be subject to public discussion and scrutiny. In Rendon (2019) (hereafter R) I examined some shortcomings in the analysis of the Peruvian Truth and Reconciliation Commission (TRC) and attempted to propose a better alternative.
In their response, Manrique-Vallier and Ball (2019) (MVB hereafter) do not try to explicitly defend their original estimates in Ball et al. (2003) (BASM hereafter), but simply dismiss my estimates as “unsuitable for a discussion.” In the present rejoinder, I show the following.
Their new total figures, which they claim as ground truth, remain provisional data collections and do not undermine my estimates.
Their method of external validity is an arbitrary comparison of point estimates with data.
There is no substance in claiming selection bias for estimating strata where a direct estimation is possible. On the contrary, BASM selected strata exclusively based on the existence of direct estimates for just one party.
Their indirect method massively produces nonsensical estimates, such as negative predictions of killings by source overlap, and overfitted models.
Their discussion on estimation risk confuses assessments of estimators, estimates, and an out-of-sample validation and is circular (“if a true value is large as they believe it is, the larger estimate is better”). 1
In short, the ad hoc indirect method used by MVB and coauthors in the TRC analysis does not provide sound estimates of killings in Peru.
Provisional merged data
In their reply, MVB use new data from a victim survey conducted by the Peruvian Ministry of Women and Social Development (MIMDES), which they have linked with TRC data in an “as-yet-unfinished project” (p.1). They report aggregate figures for the nine strata that admit direct estimations for the Shining Path. Table 1 indicates that the involved parties figures do not coincide with the observations reported by BASM for the TRC.
TRC Counts in BASM and MVB in nine strata.
For the State there are 10% fewer reported observations in MVB than in BASM, whereas for the Shining Path there are 3% more and for Other 14% fewer. These are nontrivial discrepancies, and we need careful work and a transparent explanation of how these data are constructed before the alleged total number of observations can be taken as trustworthy, or “true.”
External validation by point estimates
MVB argue that estimates should be dismissed because they are smaller than observations collected ex post. But in statistics, a prediction must be assessed as a probability distribution. Moreover, due to misreporting and misclassification, reality is usually measured with error, and a naïve comparison of plain numbers is misleading.
The observed killings by the Shining Path were 9243 in 2003 (BASM, Table 3), and MVB (Table 1) report an estimate of 17,687 in 2008. This last figure is very far from the 31,331 estimate in BASM (Table 1) and much closer to my main estimate of 18,341 killings (R, Table 11).
Table 2 reports confidence intervals for the indirect BASM and direct R estimations. The number of 17,687 lies barely outside the 95% confidence interval when missing data are included in Other, and well inside the 95% confidence interval in the multiple imputation estimation, which MVB ignore. 2 MVB’s claim that the estimates in R are impossible or illogical are thus unfounded.
Confidence Interval for the Shining Path by method.
Direct and indirect methods’ statistics. Shining Path.
:
Strata choice based on one party
The design choices made by MVB and coauthors make direct estimations impossible for most Shining Path strata. They stratified the sample exclusively based on the existence of direct estimates for the State, without considering the other perpetrators. For comparability, I followed that stratification, but also proposed a stratification based on the existence of direct estimates for all perpetrators, which validates the direct estimates in R.
However, MVB’s criticism about selection bias is misguided, even with their strata selection. The nine Shining Path strata for which there are direct estimates are representative: They constitute 44% of the total observed counts and cover five of the seven regions in the country. Moreover, in R, I performed validation tests that show that estimates in the interpolated strata are larger than in those nine strata. Consequently, there is no support for their allegation that the direct estimation leads to underestimates. 3
Are indirect estimates better?
An estimation method should be properly established before it is applied. MVB and coauthors put the cart before the horse, and developed an ad hoc indirect method that nobody has applied before or since, and in a fuite en avant justify it ex post as better than the direct standard method. 4
MVB rightly claim that “an estimator is better than another if it produces estimates that are closer to the truth” (p. 2). However, before assessing whether a method produces estimates that are closer to hypothetical unobserved true values (or to out-of-sample values), we should first assess whether it produces estimates that are closer to the in-sample observed true values. The indirect estimator may be consistent in theory and fit acceptably the sum of the State and the Shining Path {E+S} and the State {E} separately, but it is a fallacy of division to infer that these two features imply that the indirect estimator fits acceptably their difference, that is, the Shining Path: {S}={E+S}-{E}. In fact, the indirect method very often: (a) produces negative overlap estimates, and thus contradicts reality; (b) delivers perfect predictions of the observed counts, that is, overfits; and (c) produces estimates whose p-values do not exist or are outside BASM’s admissible range of [0.01,0.5].
Yes, estimates should not contradict reality
Table 3 compares source overlap observations and predictions for
Table 4 compares the two methods for the nine strata where a direct estimation is possible. The indirect method delivers negative predicted killings for four strata, p-values outside the admissible range for all strata, and a fit
Comparison of direct and indirect methods. Shining Path. Nine strata.
:
How severe are these problems for the Shining Path sample? Table 5 reveals that they are massive. There are 19 strata for which all possible log-linear models for {E+S} predict negative killings for at least one count. In the selected models, the number of negative predictions increases to 26 strata. There are also 14 strata where the selected indirect model implies a perfect fit. These two problems imply that the indirect estimates are not sound for 40 strata. Additionally, 49 strata have a p-value outside the admissible interval. The indirect estimator complies with the required standards by BASM for only one out of the 58 strata.
Number of Strata with unsound indirect estimates. Shining Path.
The indirect estimator is not a panacea for capture-recapture estimations with sparse data; it fails to pass the requirements of fit and soundness for the direct method. Hence, we should be skeptical of results from this method.
Estimation risk
The Mean Squared Error is the variance plus bias squared:
MVB’s story about estimation risk only holds due to the assumption about “‘impossibility regions,” which essentially reiterates the flawed out-of-sample validation with point estimates and their own new data. This is a misguided way to choosing among competing estimators. 5
Conclusions
In this rejoinder, I demonstrate the errors by MVB in the external validation by point estimates and the circular thinking in evaluating estimator risk. The new observations they bring up are well inside the confidence intervals for the direct method. I extended the findings of Rendon (2019), which focused on a sensible direct estimation alternative to the indirect method by BASM to address the challenge of an estimation with very sparse data. I have demonstrated that their indirect method produces unreliable results for estimation of sparse data, often generating negative predictions and overfitting. This corroborates and reinforces my previous finding that BASM introduced a big distortion in the estimation of killings in Peru. The Truth Commission has a great responsibility to tell the truth. BASM should acknowledge that their estimates are incorrect and formally retract them.
Supplemental Material
Online_Appendix – Supplemental material for A truth commission did not tell the truth: A rejoinder to Manrique-Vallier and Ball
Supplemental material, Online_Appendix for A truth commission did not tell the truth: A rejoinder to Manrique-Vallier and Ball by Silvio Rendon in Research & Politics
Footnotes
Acknowledgements
I thank Kristian Skrede Gleditsch for great editing and thoughtful comments and Ana Tarazona for help with some references. The views expressed in this paper are those of the author and do not represent the views of any organization that the author may currently be affiliated with.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental materials
Notes
Carnegie Corporation of New York Grant
This publication was made possible (in part) by a grant from the Carnegie Corporation of New York. The statements made and views expressed are solely the responsibility of the author.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
