Sage Journals: Discover world-class research

Abstract

The Truth and Reconciliation Commission (TRC) in Peru is usually cited as an example of how capture–recapture methods can help improve our understanding of mass violence from incomplete observed data. Using 25,000 documented death records, the TRC estimated a total of 69,000 killings, and that the Shining Path was the main perpetrator, in contrast with the raw data where the Peruvian State appears to be responsible for the most killings. One feature not often noticed is that the TRC applied an unusual indirect procedure, combining data on different perpetrators and lumping together missing perpetrator data in one group. I show that direct estimations with strict stratification by perpetrator and accounting for missing data do not support the results of the TRC’s indirect approach. I estimate a total of 48,000 killings, substantially lower than the TRC estimate, and the Peruvian State accounts for a significantly larger share than the Shining Path. Rather than an example of correcting biases in the observed data through capture–recapture methods, the TRC actually introduced further distortion.

Keywords

Armed conflict capture-recapture human rights data kriging multiple system estimation multiple imputation

Introduction

The quantitative analysis conducted by the Truth and Reconciliation Commission (TRC) of Peru is arguably the most prominent application of capture–recapture methods (CRMs) for estimating deaths in armed conflicts. It is often cited as an example of its ability to correct biases in the observed data, as the TRC returns estimated victim number and perpetrator shares that diverge notably from the raw data. The TRC used three different sources on 25,000 documented killings, where the Peruvian State (hereafter, the State) appeared to be main perpetrator. Using CRMs, the TRC estimated 69,000 total killings, and surmised that the insurgent group the Shining Path (Sendero Luminoso in Spanish) accounted for the largest share. Their results have often been cited as evidence that CRMs provide a better understanding of the central patterns in violence than raw observed data. My reanalysis does not support this interpretation, and suggests that the TRC actually introduced further distortion rather than eliminating biases.

The devil is in the details: few have noticed that the TRC applied an unusual indirect procedure. The available data for Peru on the source, group, and geographic strata are too sparse for CRMs to be applied directly in all geographic strata. To address this challenge, the TRC used a nonstandard approach, combining data of different perpetrators and assigning missing perpetrator data to one perpetrator group. I show that a standard CRM application is possible for a subset of the data, and that this allows us to estimate the full data with more accuracy than the indirect application of CRMs. I estimate a total of 48,000 killings, substantially lower than the TRC estimate. Moreover, the State appears to account for a significantly larger share than the Shining Path. Rather than eliminating biases in the original data, the indirect CRM estimation by the TRC seems to have created a distortion in the data of documented deaths. In these direct estimates, the ratio of State-perpetrated to Shining-Path perpetrated deaths is roughly the same as the documented deaths.

Given the salience of the TRC analysis for Peru, this reanalysis has important lessons for both researchers and policymakers. Application of CRM is increasingly used to provide quantitative accounts of armed conflicts and human rights data. These analyses are often used to support proposed policies for reparation, institutional reforms, criminal prosecutions, as well as historical narratives. It is thus crucial to assess critically the possible limitations of this application and propose alternatives that can help overcome its problems.

Peru suffered a war of insurgency and counterinsurgency from 1980 until around 2000, involving the police and army, paramilitary forces and self-defense groups, on the one hand, and insurgent groups, mainly the Shining Path, on the other (Comisión de la Verdad y Reconciliación, (CVR), 2003). By 2000 most of the insurgent leaders were either captured or killed, and insurgency activity declined dramatically. In 2001 the government established a TRC to cast light on the facts of the war, and in particular the total number of killings and the responsible actors. Before the TRC, around 9000 killings were documented by human rights non-governmental organizations (NGOs) and the Ombudsman’s Office, and the State was responsible for more than 90% of these recorded deaths. The TRC changed this picture dramatically: it added around 15,000 additional documented killings, with the Shining Path as the main perpetrator of about 57% of these. The final TRC dataset contained about 25,000 killings (documented deaths and disappearances), classified according to three distinct sources of information (the TRC itself, the Ombudsman’s Office, and human rights NGOs), and three perpetrators: the State, the Shining Path, and “Other” perpetrators. The quantitative work for the TRC was undertaken by Ball et al. (2003), extending previous CRM applications to human rights in Guatemala (Ball, 2000) and Kosovo (Ball et al., 2002).

From the 25,000 documented killings, with 47% attributed to the State and 37% to the Shining Path, the TRC estimated approximately 69,000 total killings, with 30% carried out by the State and 46% by the Shining Path, that is, the opposite pattern of responsibilities to the documented raw data.

The TRC results were surprising and motivated considerable debate in Peru, because of the large total estimated number of killings, “greater than the number of human losses suffered by Peru in all of the foreign and civil wars that have occurred in its 182 years of independence” (CVR, 2003). The claim that the Shining Path was the main perpetrator of killings was also a surprising finding, and did not conform to common perceptions in Peru about the armed conflict.

These results also influenced views about how CRMs could be applied to human rights violations and violence. The reversal in the assigned proportions of responsibilities compared with the observed data was seen to bolster the claim that CRMs provided a more unbiased account of the true events. It was argued that analyses of the observed data were inaccurate and naïve accounts of an armed conflict (Price and Ball, 2015a), “misleading and biased”,¹ while CRMs allegedly corrected for census undercounts and could uncover true perpetrator and time patterns (Ball, 2012; Ball et al., 2003; Landman and Gohdes, 2013; Manrique-Vallier et al., 2013; Price and Ball, 2015a, 2015b).²

While it is clearly warranted to caution against naïve views about observed non-random samples, it is also naïve to think that CRMs applied to biased samples will deliver unbiased estimates: “the [CRM] estimator can be biased, since we are assuming that the model which describes the observed data also describes the count of the unobserved individuals. We have no way to check this assumption” (Bishop et al., 1975: 254).

In practice, CRMs face several practical challenges, as exemplified by the indirect nonstandard application used by the TRC in Peru. Left with three samples that implied different pictures of the war and extremely sparse data for the Shining Path, Ball et al. (2003) asserted that a standard application of CRMs to this party was not possible. They sought to get around this by adding the sparse Shining Path data to dense data for the State to produce an estimate, and then compute an estimate for the Shining Path as the combination of the State added to the Shining Path minus the State. Essentially, the TRC authors realized that they could not obtain direct estimates everywhere for the Shining Path, and thus opted to avoid them everywhere. I will show, however, that in many geographical areas or strata, the data are sufficiently dense to apply standard CRMs. I computed direct estimates wherever possible and compared these with the TRC’s indirect estimates. A key finding is that the indirect method has a strong upward bias when estimating the Shining Path deaths. The direct estimates do not display the perpetrator reversal in the TRC estimates, and accentuate the primary role of the State. In other words, by applying CRMs in the normal manner to the strata for which it is possible to do so, we find that the striking TRC result appears to be an artifact of the unconventional approach devised to deal with the sparse data problem. This is an important result, given the extensive campaign to promote the TRC’s striking but incorrect findings, and how other data on documented deaths or events have been downplayed as a result.

The aim of this manuscript is not to challenge the validity of the CRM as a methodology for quantitative conflict analysis, but to provide new and improved CRM-based estimates for the case of Peru. However, these estimates also undermine claims made about the utility of CRMs, namely that they provide an inherently less biased account of events compared with observed raw data. I show that when the TRC analyses is looked at more closely, this claim is not supported, thus undermining the arguments often offered by capture–recapture advocates for the importance of the technique to quantitative conflict analyses.

This has more general implications beyond this particular study as CRMs are increasingly being used in several areas of the social sciences. ³ These methods work relatively well when data are sufficiently dense, but the proposed indirect application is not an adequate solution when data are sparse and can introduce distortions that are more severe than the alleged biases in the observed data. Researchers should be very cautious when the estimates suggest the opposite pattern to the observed data.

The raw TRC killings data

The TRC collected a sample on killings committed in Peru during the war from 1980 until 2000. Mobile teams were sent to pre-assigned areas with target numbers from each area. The data include retrospective information provided by victims’ relatives or friends, which were further reviewed, verified and qualified by TRC officials. What became evident in many interviews was that respondents did not know who the perpetrator was and thus did not report it. In some cases, TRC officials determined a perpetrator based on the “regional historical context” or known patterns of human rights violations. In addition to the data collected by the TRC itself, the TRC used data from two other sources: the Ombudsman’s Office, and human rights NGOs.

The TRC considered three perpetrators: the State, the Shining Path, and Other. The State includes the police and army units, and self-defense groups, which are State-sponsored citizens’ armed counter-insurgent groups, and paramilitary groups, that is, State-sponsored or protected armed units that do not legally belong to any official armed branch. Besides the Shining Path, other less powerful insurgent groups such as the Túpac Amaru Revolutionary Movement were treated as a third group called Other perpetrators. The TRC included in this group all cases where the perpetrator was unknown or undetermined.

Table 1 summarizes the available information by perpetrator and intersected sources. There is a notable difference between the TRC data and the other two sources. In the latter, the State is overwhelmingly the main perpetrator, while in the TRC data the proportions are more comparable and the Shining Path is the main perpetrator. In terms of source overlap, only observations for the State are fairly evenly distributed across all three sources. The TRC has only 33.6% exclusive observations for the State (3888 out of 11,564), but 94.9% exclusive observations for the Shining Path, 8768 out of 9243, leaving limited intersections with the other sources. For Other perpetrators, 3138 out of 3885 (80.8%) observations are from the TRC.⁴ These samples suggest two very different pictures of the war in Peru, and calls for an examination of the statistical procedures.

Table 1.

Killings by source overlap and responsible party.

Source	State	Shining Path	Other
All
C	6253	8955	3189
D	4517	119	142
N	5483	381	688
Overlaps
C	3888	8768	3138
D	1639	35	49
N	2532	251	571
CD	554	59	10
CN	627	105	34
DN	1140	2	76
CDN	1184	23	7
Total	11,564	9243	3885

Note: C: TRC; D: Ombudsman’s Office; N: human rights NGOs.

The Truth Commission’s estimation

The TRC applied CRMs, creating “models of interdependence among the three lists by using constrained, hierarchical log-linear models” (Ball et al., 2003: 22) (Agresti, 2002; Bishop et al., 1975; Fienberg, 1972; Sanathanan, 1972; Zwane and van der Heijden, 2005):

\begin{matrix} \log (m_{i j k}) = u + u_{1 (i)} + u_{2 (j)} + u_{3 (k)} \\ + u_{12 (i j)} + u_{13 (i k)} + u_{23 (j k)}, \end{matrix}

where $m_{i j k}$ is the number of observations recorded by source $i$ , $j$ and $k$ , $i, k, j \in {0, 1}$ and 1 denotes that the source contains those observations, and 0 otherwise. From the seven possible models of interdependence between sources, one of independence, three with one interaction, and three with two interactions, the TRC kept only those for which the $χ^{2}$ statistic had a p-value between 0.01 and 0.50. Of those models the TRC “chose the model that minimized the $χ^{2}$ statistic divided by its degrees of freedom” (Ball et al., 2003: 26).

The TRC argued that “two levels of stratification,” to geographic strata and the three perpetrators, was “not possible because of the sparseness of the data for reported deaths attributed to the PCP-Shining Path and other perpetrators” (Ball et al., 2003: 24). The TRC used a nonstandard approach, adding the number killings by the Shining Path to the killings of the State “[i]n order to reduce the sparseness and to take advantage of the dense information about deaths attributed to the State” (Ball et al., 2003: 24–25). The TRC exploited the fact that the existence of CRM estimates for the dense data for the State implies the existence of estimates for sums of the State with any other perpetrator group, as these sums also must be dense. Under this approach, the TRC created the following four combinations: the State, the State plus the Shining Path, the State plus Other, and All documented killings. The TRC then calculated the killings by the Shining Path by subtracting the State plus the Shining Path minus the State. Similarly, the estimate for Other was derived by subtracting the State plus Other minus Other.

The TRC partitioned the data into 58 geographic strata based on the political subdivisions of Peru, grouping larger units where the armed conflict was less intense and smaller units where it was more intense (Ball et al., 2003: 47). As this indirect method may produce negative estimates for the Shining Path and Other, the TRC excluded models with such results, and used the next lowest $χ^{2}$ over degrees of freedom of the model that generated a positive number.⁵ In disaggregating the data into smaller units, the TRC discarded data with missing province or district identifiers. This reduced the final sample to 21,950 killings. See Online Appendix I1 for the definitions and the number of observations in these strata.

Table 2 presents the selection that the TRC performed to build the estimation sample. The TRC excluded 2742 killings because of missing information on where the killing occurred. At the same time, the TRC included 2944 killings for which the perpetrator is unidentified, also missing data, in the Other group. Unfortunately, the TRC did not provide any discussion on sample selection, patterns of missingness in the data, and their possible impact on the estimates of killings by perpetrator. This selection choice essentially mostly excluded killings by the State (1998 killings), thereby changing the perpetrator’s observed responsibility. In the original sample, 46.8% of the killings were caused by the State and 37.4% by the Shining Path, but these percentages changed through selection to 43.6% for the State and 41.3% for the Shining Path.

Table 2.

Sample selection by the TRC.

Responsible party	TRC	Excluded	Total
1. State	9566	1998	11,564
2. Shining Path	9075	168	9243
3. Other	3309	576	3885
Identified	365	121	486
Unidentified	2944	455	3399
Total	21,950	2742	24,692

After estimating total killings for the final 58 strata and three perpetrators, as well as their differences, the TRC aggregated the results into seven regions: 1. Ayacucho, 2. Centro, 3. Nor-Oriente, 4. Sur Andino, 5. Huancavelica, 6. Lima-Callao, and 7. Otras; and into total values for the whole country.

Table 3 shows the TRC results by perpetrator and the total number of killings, 69,280.⁶ The Shining Path was responsible for 31,331 killings, while the State was responsible for 20,458 killings. The fifth column of this table shows the difference between the Shining Path and the State. This last statistic led the TRC to assert that the Shining Path “is responsible for a significantly greater number of deaths than agents of the State,” and to “reject the hypothesis that the number of deaths caused by one group is the same as the number caused by either one of the other two groups” (Ball et al., 2003: 7).

Table 3.

TRC’s estimates of total number of killings by responsible party and differences between killings attributed to the Shining Path and the State. Standard errors are in parentheses.

Total killings				Difference
State	Shining Path	Other	All	Shining Path–State
20,458	31,331	15,967	69,280	10,872
(1718)	(3255)	(2055)	(4136)	(2877)

Direct estimation for a subset of strata

The TRC authors chose to use indirect estimation as they could do not standard estimations for the Shining Path in all strata. However, there are nine Shining Path strata for which we can use standard estimation for at least one log-linear model of the three sources. It is instructive to compare these standard estimates to the TRC results. These nine strata contain 3996 observations, which is 44% of the total for this party. Table 4 compares the standard estimations in this subset of strata to the TRC’s estimations.

Table 4.

Estimates for the Shining Path killings by method for nine strata that admit standard estimations.

Region Stratum		Obs.	Direct est.		TRC
			Est.	S. E.	Est.	S. E.
1. AYA	32	163	751	468	477	95
	36	426	637	183	727	244
	25	422	958	93	896	62
	35	246	294	53	623	138
2. CEN	51	321	427	123	1106	199
3. NOR	14	1270	1628	204	4214	341
4. SUR	11	570	692	96	2380	350
	47	258	475	86	547	65
5. HUA	48	320	426	123	703	136
Total		3996	6287	594	11,673	626

AYA: Ayacucho, CEN: Centro, NOR: Nor-Oriente, SUR: Sur Andino, HUA: Huancavelica, Obs.: Observations, Direct est.: Direct estimation, Est.: Estimate, S.E.: Standard Error.

A standard estimation results in 6288 total killings for these strata, while the TRC’s estimate is much larger: 11,673 killings.⁷

Table 5 reports the difference between the killings of the Shining Path and the State in these nine strata by method. The direct estimations stand in stark contrast to the result from the TRC’s indirect application of CRMs. Whereas the TRC’s result led to a reversal in the largest perpetrator compared with the observed data, the standard estimation results indicate that the State is responsible for a significantly larger number of killings than the Shining Path. A simulation study – detailed in Online Appendix C – comparing direct and indirect estimations indicates that the indirect estimates have larger and often substantially upward biases and larger standard errors than the direct estimation. Thus, the TRC approach is very likely to contain a large upward bias in these nine strata, and indeed all strata of the sample.

Table 5.

Differences between killings of the Shining Path and the State by method for nine strata that admit direct estimations.

Method	Est.	SE	z	p	95% C.I.
Direct est.	−2666	706	−3.78	0.00	−4078	−1254
TRC est.	2722	661	4.12	0.00	1401	4043

Est.: Estimate, S.E.: Standard Error, Z: Z-statistic, p: p-value, C.I.: Confidence Interval.

In sum, analyses of both the actual data used by the TRC and simulations indicate that the indirect estimation does not eliminate but rather introduces biases for the sample subset. In the next sections I turn to estimates for the entire country beyond the strata that allow a conventional direct estimation as well as addressing how to deal with missing data problems in the unknown perpetrators category.

Direct estimation by geographic interpolation for the whole country

Having established that there exist strata that admit a direct estimation, I used them to estimate total killings in all strata by a geographic interpolation method known as kriging. I show below that the kriging results can be validated by alternative estimation methods that maintain a strict stratification by perpetrator or do not require mixing data of different perpetrators in any way, namely (a) a fixed effects estimation consisting of constructing groups of strata where direct estimation is possible, and (b) an alternative stratification in fewer dense strata where a direct estimation can be done.

Similar to the case of the Shining Path, there are 23 strata for Other perpetrators for which we can conduct a direct estimation, and these comprise 61% of the total number of observations. Geographic interpolation is obtained by first estimating the ratio of the estimated total population over the observed population for a subset of the strata that admits a direct estimation. Then, this ratio is calculated for each stratum as a weighted combination of the ratios in the strata of the first subset. Finally, this interpolated ratio is multiplied by the number of counts in each stratum. Online Appendix D details the kriging procedure.

Table 6 shows the estimated number of killings by perpetrator aggregated by region. The State is responsible for an estimated 20,516 killings, the Shining Path for 15,089 killings and Other for 12,444 killings, in percentages of 42.7%, 31.4%, and 25.9%, respectively. The Total column presents the sum by row, 48,049 killings; a number substantially lower than estimated by the TRC. Note also that the ratio of killings by the State and by the Shining Path is 20,516/15,089 = 1.36, which is very close to the corresponding ratio in the observed unselected data, 11,564/9243 = 1.25.

Table 6.

Direct estimation. Estimates for killings by all parties by region. Standard errors are in parentheses.

Region	State	Shining Path	Other	Total
1. AYA	10,005	8459	3172	21,636
	(604)	(645)	(430)	(983)
2. CEN	1555	2008	2079	5642
	(297)	(603)	(350)	(758)
3. NOR	2954	1656	2935	7545
	(215)	(205)	(1358)	(1390)
4. SUR	1682	1167	1340	4189
	(184)	(129)	(471)	(521)
5. HUA	905	1118	832	2855
	(86)	(325)	(363)	(495)
6. LIM	1756	66	224	2046
	(1544)	(44)	(100)	(1548)
7. OTR	1659	615	1862	4136
	(195)	(202)	(576)	(641)
Total	20,516	15,089	12,444	48,049
	(1721)	(994)	(1687)	(2607)

AYA: Ayacucho, CEN: Centro, NOR: Nor-Oriente, SUR: Sur Andino, HUA: Huancavelica, LIM: Lima-Callao, and OTR: Otras.

Table 7 reports the differences between the estimated number of killings by the Shining Path and the State. We can reject the hypothesis that the total difference is greater than or equal to zero at a high level of confidence. Thereby this test rejects or, more precisely, determines as very unlikely, that the number of killings by the Shining Path is equal to or larger than the number of killings by the State.

Table 7.

Direct estimation. Differences between the killings of the Shining Path and the State by region.

Region	Est.	S.E.	Z	p	95% C.I.
1. AYA	−1546	884	−1.75	0.04	−3313	222
2. CEN	453	672	0.67	0.25	−891	1798
3. NOR	−1298	297	−4.38	0.00	−1890	−704
4. SUR	−515	224	−2.30	0.01	−963	−65
5. HUA	213	336	0.63	0.26	−458	885
6. LIM	−1690	1545	−1.09	0.14	−4779	1400
7. OTR	−1044	281	−3.72	0.00	−1605	−481
Total	−5427	1987	−2.73	0.00	−9401	−1451

AYA: Ayacucho, CEN: Centro, NOR: Nor-Oriente, SUR: Sur Andino, HUA: Huancavelica, LIM : Lima-Callao, and OTR: Otras, Est.: Estimate, S.E.: Standard Error, z: z-statistic, p: p-value, C.I.: Confidence Interval.

I assessed the accuracy of the kriging geographic interpolation procedure in three ways. First, I performed a leave-one-out cross validation (see Online Appendix E), which suggested that kriging estimates are generally larger than direct estimates, yet smaller than the indirect TRC estimates. This reinforces the finding that the Shining Path carried out fewer killings than the State. Second, I compared kriging estimates to fixed effects estimates, based on groups of strata by geographical vicinity. Online Appendix F explains the fixed effects estimation and its results: 46,000 total killings and that the State is the main party responsible. Finally, I stratified the data into fewer strata so that a direct estimation was possible for all three perpetrators, see Online Appendix G. Once again, the estimation results of this stratification was a total number of killings of 46,000 with the State as the main perpetrator.

In sum, estimates for the whole country can be produced from the strata that allow a conventional direct estimation. A direct estimation that strictly stratifies by perpetrator results in 48,000 killings of which the State is the main party responsible.

Multiple imputation for unknown perpetrators

In this section I address the problem of missing perpetrator data by a multiple imputation defined in Rubin (1987). This consists of constructing simulated datasets and then computing averages over their corresponding estimates. I first estimated a multinomial logit for the probability of reporting a specific perpetrator depending on observed variables such as the victim’s gender and age, whether the victim is disappeared or dead, year of killing, and the source to which the event was reported. With these parameters’ estimates I simulated 20 samples obtained from imputing missing perpetrator data from Other to the three perpetrator groups. Online Appendix H describes in detail the construction of the samples by multiple imputation.

Table 8 shows the perpetrator composition in the sample used for the previous estimation and the averages of the 20 imputed samples. This is essentially a reassignment of perpetrator data: 2863 observations of Other are attributed to the State and the Shining Path, 1573 and 1290 observations, respectively. In these imputed samples, on average the State is responsible for 51% of the killings and the Shining Path for 47%, while identified Other are responsible for only 2% of the killings.

Table 8.

Total counts and percentage by responsible party for the original sample and samples of imputed missing data.

	State	Shining Path	Other	Total
Missing data in Other
Counts	9566	9075	3309	21,950
Percentage	43.6	41.3	15.1	100
Multiple imputation
Counts (average)	11,139	10,365	446	21,950
Percentage	50.8	47.2	2.0	100

For each of the imputed samples, I carried out a kriging interpolation. For the Shining Path there are now 11 strata that allow for direct estimation, which contain 49% of the total number of observations, while for ‘Other’ (identified), there are on average 9 strata where direct estimation is possible, which contain 37% of the total number of observations.

Table 9 presents the results of the multiple imputation estimations by perpetrator aggregated by region. The estimated total number of killings, 47,849, changes only slightly from the total number estimated in the previous section, 48,049. The number of killings by the State increases from around 20,516 to 27,872, while the number of killings by the Shining Path increases from 15,089 to 18,341. Most killings that were previously contained in the group of Other, actually missing data, are now attributed to these two perpetrators. The composition of estimated killings results in 58.3% for the State, 38.3% for the Shining Path, and 3.4% for identified Other.

Table 9.

Multiple imputation estimation for killings of all parties by region. Standard errors are in parentheses.

Region	State	Shining Path	Other	Total
1. AYA	12,647	8944	120	21,711
	(1663)	(748)	(26)	(1823)
2. CEN	3424	3084	448	6956
	(821)	(746)	(247)	(1137)
3. NOR	4313	2551	728	7592
	(490)	(725)	(641)	(1085)
4. SUR	2334	1377	31	3742
	(262)	(151)	(46)	(306)
5. HUA	1184	1279	20	2483
	(124)	(291)	(16)	(317)
6. LIM	1315	156	45	1516
	(938)	(106)	(24)	(944)
7. OTR	2655	950	244	3849
	(359)	(9261)	(134)	(464)
Total	27,872	18,341	1636	47,849
	(2185)	(1352)	(703)	(2664)

AYA: Ayacucho, CEN: Centro, NOR: Nor-Oriente, SUR: Sur Andino, HUA: Huancavelica, LIM : Lima-Callao, and OTR: Otras.

Table 10 shows the difference between the killings of the Shining Path and the State for the multiple imputation estimations by region and total. The total negative difference widens from -5427 in Table 8 to -9531, and we can also reject the hypothesis that the difference between the total number of killings of the Shining Path and the State is greater or equal to zero is – that is, the State seems to be responsible for a greater number of killings than the Shining Path.

Table 10.

Multiple imputation estimation. Differences between killings of the Shining Path and the State by region.

Region	Est.	S.E.	z	p	95% C.I.
1. AYA	−3703	1823	−2.03	0.02	−7348	−56
2. CEN	−340	1110	−0.31	0.38	−2559	1880
3. NOR	−1762	875	−2.01	0.02	−3511	−11
4. SUR	−957	302	−3.17	0.00	−1560	−352
5. HUA	95	316	0.30	0.38	−536	727
6. LIM	−1159	944	−1.23	0.11	−3046	729
7. OTR	−1705	444	−3.84	0.00	−2592	−816
Total	−9531	2569	−3.71	0.00	−14,668	−4392

In conclusion, accounting for missing perpetrator data does not change the estimated number of total killings, but it does change the composition of perpetrators. The killings with missing perpetrator that were attributed to Other are attributed in a larger proportion to the State than to the Shining Path, adding to the previous findings where the former is responsible for more killings than the latter.

Conclusions

In this analysis I applied the same CRMs as the TRC of Peru. A key challenge for estimation was that the raw data here were very sparse for perpetrators other than the State. The TRC applied a nonstandard method that adds and then subtracts data of different perpetrators and does not account for missing perpetrator data. I have shown that the TRC’s indirect method had a sizable upward bias compared with a direct estimator. For the Peruvian data, the application of the indirect method reversed the pattern of responsibilities in the observed samples. However, this is an artifact of the indirect method rather than a result of removing bias.

As an alternative, I used a subset of strata where the standard estimation approach was possible, and used this to estimate the whole dataset by means of geographic interpolation. The figure that emerges for the armed conflict in Peru is illustrated in Table 11. A direct estimation, strictly stratified both by geography and perpetrator, extended by a multiple imputation of missing perpetrator data, sets the estimated total number of killings at around 48,000, of which 28,000 were committed by the State, 18,700 by the Shining Path, and 1300 by Other perpetrators. The proportion of responsibilities is thus 58.3% for the State, 38.3% for the Shining Path, and 3.4% for Other perpetrators. I find that it is very unlikely from a statistical point of view that the Shining Path has an equal or larger responsibility for killing than the State; the State seems responsible for a greater number of killings than the Shining Path.

Table 11.

Peru 1980–2000: Estimates and confidence interval limits of the total number of killings by responsible party (confidence level: 95%).

Estimates	Responsible party			Total
	State	S. Path	Other
Lower limit	23,503	15,637	231	42,522
Estimate	27,872	18,341	1636	47,849
Upper limit	32,241	21,045	3041	53,176

These findings imply a word of caution on the use of CRMs for quantitative analysis of violence, especially when the estimates seem to differ substantially from the patterns of the observed data. Rather than correcting for alleged biases in the observed data, the TRCs’ indirect application of CRMs introduced a distortion and a misunderstanding about the proportion of responsibilities in Peru. Direct application of CRMs produces estimates that are closer to the patterns shown in the observed data.

Supplemental Material

supplemental_material – Supplemental material for Capturing correctly: A reanalysis of the indirect capture–recapture methods in the Peruvian Truth and Reconciliation Commission

Supplemental material, supplemental_material for Capturing correctly: A reanalysis of the indirect capture–recapture methods in the Peruvian Truth and Reconciliation Commission by Silvio Rendon in Research & Politics

Footnotes

Acknowledgements

I would like to thank Kristian Skrede Gleditsch for great editing and thoughtful comments throughout the refereeing process for both this and the original paper. I thank two anonymous referees, and R. Alvarado, D. Cedano, D. Manrique-Vallier, K. Pollock, N. Quella, M. Spagat, and former TRC officials E. González-Cueva and J. Ciurlizza for providing the data for this research. I also thank participants of the seminars at the Central Bank of Peru and at Instituto de Matemática y Ciencias Afines in Lima, Peru. I started this research while I was an assistant professor at Stony Brook University, and continued the research as a personal project after I left that role. The views expressed in this paper are those of the author and do not represent the views of any organization that the author may currently be affiliated with. All remaining errors and omissions are solely mine.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Silvio Rendon

Supplemental materials

The supplemental files are available at http://journals.sagepub.com/doi/suppl/10.1177/2053168018820375. The replication files are available at

Notes

Carnegie Corporation of New York Grant

This publication was made possible (in part) by a grant from the Carnegie Corporation of New York. The statements made and views expressed are solely the responsibility of the author.

References

Agresti

(2002) Categorical Data Analysis. New York: John Wiley and Sons.

Ball

(2000) The Guatemalan Commission for Historical Clarification: Intersample analysis. In: Ball

Spirer

(eds) Making the Case: Investigating Large Scale Human Rights Violations using Information Systems and Data Analysis. Washington, DC: American Association for the Advancement of Science, pp. 259–286.

Ball

(2012) Calculating body counts. Brooke Gladstone interviews Patrick Ball. On the Media, 16 March. Available at: https://www.wnyc.org/story/192734-calculating-bodycounts/ (accessed 13 December 2018).

Ball

Asher

Sulmont

(2003) How many Peruvians have died? An estimate of the total number of victims killed or disappeared in the armed internal conflict between 1980 and 2000. Washington, DC: American Association for the Advancement of Science. Available at: https://hrdag.org/wp-content/uploads/2013/02/aaas_peru_5.pdf (accessed 13 December 2018).

Ball

Betts

Scheuren

(2002) Killings and Refugee Flow in Kosovo, March–June 1999: A Report to the International Criminal Tribunal for the Former Yugoslavia. Washington, DC: AAAS/ABA-CEELI.

Bishop

Fienberg

Holland

(1975) Discrete Multivariate Analysis. Cambridge, MA: MIT Press.

Cook

Blas

Carroll

Sinha

(2017) Two wrongs make a right: Addressing underreporting in binary data from multiple sources. Political Analysis 25(2): 223–240.

Cruyff

Dijk

Heijden

(2017) The challenge of counting victims of human trafficking: Not on the record: A multiple systems estimation of the numbers of human trafficking victims in the Netherlands in 2010–2015 by year, age, gender, and type of exploitation. CHANCE 30: 41–49.

Comisión de la Verdad

Reconciliación

CVR

(2003) Informe Final. Lima, Peru. Available at: http://www.cverdad.org.pe/ifinal/ (accessed 13 December 2018).

10.

Fienberg

(1972) The multiple recapture census for closed populations and incomplete 2k contingency tables. Biometrika 59(3): 591–603.

11.

Hendrix

Salehyan

(2015) No news is good news: Mark and recapture for event data when reporting probabilities are less than one. International Interactions 41(2): 392–406.

12.

Jewell

Spagat

Jewell

(2013) MSE and casualty counts: Assumptions, interpretation, and challenges. In: Seybolt

Aronson

Fischhoff

(eds) Counting Civilian Casualties: An Introduction to Recording and Estimating Nonmilitary Deaths in Conflict. New York: Oxford University Press, pp. 185–211.

13.

Landman

Gohdes

(2013) A matter of convenience: Challenges of non-random data in analyzing human rights violations in Peru and Sierra Leone. In: Seybolt

Aronson

Fischhoff

(eds) Counting Civilian Casualties: An Introduction to Recording and Estimating Nonmilitary Deaths in Conflict. New York: Oxford University Press, pp. 77–93.

14.

Manrique-Vallier

Price

Gohdes

(2013) Multiple-systems estimation techniques for estimating casualties in armed conflicts. In: Seybolt

Aronson

Fischhoff

(eds) Counting Civilian Casualties: An Introduction to Recording and Estimating Nonmilitary Deaths in Conflict. New York: Oxford University Press, pp. 165–182.

15.

Price

Ball

(2015a) Selection bias and the statistical patterns of mortality in conflict. Statistical Journal of the IAOS 31: 263–272.

16.

Price

Ball

(2015b) The limits of observation for understanding mass violence. Canadian Journal of Law and Society 30(2): 237–257.

17.

Rubin

(1987) Multiple Imputation for Nonresponse in Surveys. New York: John Wiley and Sons.

18.

Sanathanan

(1972) Estimating the size of a multinomial population. Annals of Mathematical Statistics 43(1): 142–152.

19.

Zwane

van der Heijden

(2005) Population estimation using the multiple system estimator in the presence of continuous covariates. Statistical Modelling 5: 39–52.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

11.17 MB