Abstract
This paper accomplishes two related goals. Firstly, it presents version 1.1 of the Interstate War Data (IWD) data set. IWD is a data set of all interstate wars, using Correlates of War (COW) rules but improving on COW by repairing the high number of coding errors in COW and more accurately and precisely treating multilateral wars. IWD 1.1 makes important improvements over IWD 1.0, such as describing interstate wars and war participants previously neglected by COW and IWD 1.0 and extending the temporal range forward to 2013. It also is the first data set to provide a systematic list of all interstate wars back to 1816 with at least 500 battle deaths (COW and IWD 1.0 only listed those conflicts with at least 1000 battle deaths). Secondly, the paper discusses important issues in collecting data on interstate wars, especially collecting reliable data on battle dead. Battle dead data are critical both for constructing data sets and for testing hypotheses, but frequently battle dead estimates vary widely, and come from sources of varying quality. The paper discusses the importance of using reliable sources, and in particular of minimizing reliance on military encyclopedias.
Keywords
Within international relations, explaining the causes, prosecution, outcomes, and effects of interstate wars remains critically important. Doing so requires accurate data. Reiter et al. (2016) described new data on interstate wars, the Interstate War Data (IWD) data set, version 1.0. Building on the Correlates of War (COW) data set, the most widely used data set on interstate war, IWD reduced coding errors in COW and provided more precise information on who fought whom within multilateral wars, while employing COW coding rules. This note introduces IWD version 1.1, describing its improvements over past data sets. The improvements not only demonstrate the importance of collecting accurate interstate war data, but also highlight larger issues in interstate war data collection.
IWD 1.0 identified significant coding errors in more than one third of the interstate wars in COW 4.0. Such a large number of errors could have a substantial effect on empirical findings reported in scholarship, and IWD is already being used to improve international relations scholarship in several areas. Scholars have used IWD 1.0 to show that democracies win the wars they initiate (Ausderan, forthcoming), democracies win their conflicts because they fight in larger coalitions (Graham et al., forthcoming), the frequency of interstate war may not be in decline (Braumoeller, forthcoming), and the rate of compliance with international alliances is somewhat lower than what had been previously reported, about 63% instead of about 75% (Berkemeier and Fuhrmann, 2016; Fjelstul and Reiter, 2016).
IWD 1.1 improves on v1.0 in four ways (see the online appendix for specific description of changes in IWD 1.1). Firstly, IWD 1.1 advances the temporal range for data on interstate wars from 2007 to 2013. Secondly, it corrects some remaining errors in COW 4.0. Specifically, IWD 1.1 includes five new wars with more than 1000 battle deaths that COW 4.0 omitted: the 1890 Central American War between Guatemala and El Salvador; the 1944 Lapland War between Finland and Germany; the 1945 Indochina Japan–France war; the 1950 China–Taiwan War over the islands of Hainan and Wanshan; and the 1984 China–Vietnam War. IWD 1.1 also lists five new participants in previously identified wars: the USA, South Africa, and Canada joining the UK–France war in 1942; South Africa joining the UK–Italy War in 1942; and Soviet participation in the Japan–China War from 1937 to 1941. 1 Thirdly, it corrects some trivial errors, such as a typographical error in COW misstating Portugal’s date of entrance into World War I.
Fourthly, IWD 1.1 is, to our knowledge, the first published data set that lists all interstate wars since 1816 with battle dead of at least 500 (COW and IWD 1.0 listed conflicts with battle dead of at least 1000; the Uppsala Conflict Data Program (UCDP) conflict data begin in 1946). It identifies five such wars: the 1939 Italy–Albania War; the 1961 Tunisia–France War; the 1963–1966 Malaysia War; the 1982 Falklands War; and the 1999 Kargil War. Lowering the casualty threshold for defining interstate wars to 500 battle deaths provides benefits. It enriches our body of conflict data, following the lead of data projects such as UCDP that collect data on conflicts with casualty thresholds below 1000 battle dead. Lowering the casualty threshold also permits a more theoretically consistent treatment of interstate conflicts with between 900 and 1000 battle dead, such as the Falklands and Kargil Wars. Previously, COW either conceded that the Falklands War did not quite reach the 1000 threshold but included it as an interstate war anyway (COW 3.0), or engaged in eccentric coding efforts to push the fatality count up to 1000 (COW 4.0). By creating a category of conflicts with 500–1000 battle dead, IWD 1.1 permits scholars to classify those conflicts more accurately. Moreover, IWD 1.1 also provides some validation for COW’s 1000 battle dead minimum for interstate wars. The relatively small number of wars with between 500 and 1000 battle deaths suggests that the original COW coding rule requiring at least 1000 battle dead in hindsight looks like a reasonable approach.
IWD 1.1 provides support for some existing empirical findings. None of the new conflicts involve wars between mature democracies, bolstering the democratic peace proposition. The data are also consistent with the hypothesis that democratic initiators and joiners are more likely to win their wars. Among wars of more than 1000 battle dead, in two an autocratic initiator lost (1890 Central American; 1944 Lapland War); one autocratic joiner experienced a draw (Soviet entry into the 1937 China–Japan War); and two democratic joiners experienced victory (the USA and Canada joining the 1942 UK–France War).
Notably, many of the conflicts that COW misses and IWD (1.0 and 1.1) includes were fought by autocracies. This is not surprising given the difficulties of gathering information on conflicts fought by autocracies, actors that have the ability and willingness to conceal information about these conflicts. They include Soviet participation in the 1937 China–Japan War, Soviet participation in the Korean War, the 1950 China–Taiwan clashes, the 1984 China–Vietnam War, and others. This suggests there may be bias in missing war data if wars involving autocracies are systematically less likely to be included. Missing data bias is a dynamic that scholars have also described regarding data on militant groups; autocracies are better able than democracies to conceal terrorist attacks occurring in their territory (Drakos and Gofas, 2006).
The development of IWD 1.1 also suggests some broader lessons about collecting data on interstate wars, and battle deaths data in particular (collecting accurate battlefield data is often essential in assessing whether or not a conflict meets the 500 or 1000 battle deaths minimum). Broadly, collecting quality data on battle deaths in interstate wars (to say nothing of other types of conflicts) is extremely difficult. Battle death data are among the least reliable in political science, as compared with data on voting, gross domestic product, international institutions, legislative seats, trade flows, or other phenomena. Most battlefield data come from official sources, and both national governments and military organizations have incentives to exaggerate enemy casualties and undercount friendly casualties. In the best circumstances, professional military historians eventually examine a variety of sources, including after action reports, military cemeteries, interviews with participants, and demographic data, to produce more accurate assessments. Even careful historical work can have limits. The American Civil War is perhaps one of the most carefully documented conflicts since 1815, and in a respected piece of historical scholarship Hacker (2011) argued that the longtime estimate of 618,000 battle dead was 20% too low.
IWD 1.1 also revealed problems in relying exclusively on military encyclopedias as sources for building interstate war data sets, especially encyclopedias that aim at broad coverage (some encyclopedias covering a single conflict, such as Dear and Foot (1995), face fewer of the problems we describe below). These encyclopedias generally list and describe many conflicts, often providing casualty estimates. Encyclopedias are attractive sources for scholars building conflict data sets, because a tremendous amount of information is presented in a compact fashion; data on key variables such as casualties and participants are often provided, and encyclopedias are sometimes the only easily available source. Some data sets, including COW 4.0, rely on military encyclopedias, especially for more obscure conflicts.
However, military encyclopedias often lack the accuracy of careful historical scholarship or primary sources. Encyclopedias are not intended to be rigorous data sources, in the social scientific sense. They often do not apply systematic rules for including or excluding the conflicts entered into the encyclopedia. Neither are they systematic in assessing key questions, such as who started or won a war. Encyclopedias often do not include extensive documentation, such as sources for battle death estimates, and are sometimes inconsistent in their use of terms such as “casualties,” which could refer to battle dead, wounded, and/or missing. Casualty figures in encyclopedias sometimes rely on official estimates or are simply guesses, with no discussion of the methodology or information used to arrive at the estimate. For example, without providing citation Clodfelter (2008: 396) says of the 1941 Franco–Thai conflict that, “Total French casualties [sic] were about 200” (in comparison, using French and Thai language sources, IWD 1.0 estimated 481 French and Thai battle dead).
Consider as a brief case study the March 1969 Sino–Soviet border clashes. Clodfelter (2008: 676) reports 906 Chinese and Soviet dead (Ciment, 1999, vol. 2: 444, estimates “casualties” in the several hundreds), a fatality level comparable to that experienced in the Falklands and Kargil Wars, perhaps justifying inclusion in COW (note that COW 3.0 coded the 1982 Falklands War as an interstate war with 910 overall battle dead). The existence of such a conflict would have important implications for several scholarly debates, as it would be evidence against the nuclear peace and the proposition that bipolarity breeds peace. However, as we document in the IWD 1.1 codebook, the estimate of more than 900 dead for this conflict likely relies on official Soviet and Chinese casualty estimates issued at the time of the conflict. IWD 1.1 summarizes contemporary work demonstrating that the 900 dead estimate is too high. IWD uses a variety of Soviet and Chinese language archival and other sources, including military cemeteries, defector reports, participant interviews, and others, suggesting a battle dead estimate perhaps one 10th that initial figure of 900, less than 100.
Any scholarship attempting to produce quality battle death data should try to move beyond military encyclopedias, using primary and secondary sources, including sources in languages other than English. IWD employed sources in Spanish, Chinese, French, Russian, Polish, and Thai. In the IWD 1.0 codebook, there was a handful of instances in which our only reference for a factual coding was Clodfelter (2008), a commonly used military encyclopedia. In the IWD 1.1 codebook, we add material to support those factual codings.
IWD 1.1 is not the last word on interstate war data. New information will continue to appear that may encourage alternative codings. Other political scientists are also trying to improve the quality of war and casualty data, such as Jason Lyall as part of the monumental data collection effort in his Paths to Ruin project. We welcome ongoing dialogue, especially challenges to our particular coding decisions. Our goal over time is for the field to produce the highest quality interstate war data possible.
Coding interstate war data is both important and challenging. IWD 1.1 improves the quality of previous interstate war data, including our own, and we hope it will contribute to higher quality scholarship. We look forward to continuing to engage on this topic.
Footnotes
Acknowledgements
For research assistance, thanks to Kathryn Dura, Laura Huber, Drew Wagstaff, and Karen Whisler.
Declaration of conflicting interest
The authors declare that there is no conflict of interest.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Supplemental material
The online appendix is available at: http://journals.sagepub.com/doi/suppl/10.1177/2053168016683840. The replication files are available at:
.
Notes
Carnegie Corporation of New York Grant
This publication was made possible (in part) by a grant from Carnegie Corporation of New York. The statements made and views expressed are solely the responsibility of the author.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
