When open data is a Trojan Horse: The weaponization of transparency in science and governance

Abstract

Openness and transparency are becoming hallmarks of responsible data practice in science and governance. Concerns about data falsification, erroneous analysis, and misleading presentation of research results have recently strengthened the call for new procedures that ensure public accountability for data-driven decisions. Though we generally count ourselves in favor of increased transparency in data practice, this Commentary highlights a caveat. We suggest that legislative efforts that invoke the language of data transparency can sometimes function as “Trojan Horses” through which other political goals are pursued. Framing these maneuvers in the language of transparency can be strategic, because approaches that emphasize open access to data carry tremendous appeal, particularly in current political and technological contexts. We illustrate our argument through two examples of pro-transparency policy efforts, one historical and one current: industry-backed “sound science” initiatives in the 1990s, and contemporary legislative efforts to open environmental data to public inspection. Rules that exist mainly to impede science-based policy processes weaponize the concept of data transparency. The discussion illustrates that, much as Big Data itself requires critical assessment, the processes and principles that attend it—like transparency—also carry political valence, and, as such, warrant careful analysis.

Keywords

Transparency openness data policy governance science

Openness, transparency, and reproducibility have become the new watchwords of responsible data practice in science. A series of recent high-profile scandals has brought to light the problems of erroneous analysis, falsified data, problematic methodology, and cherry-picked presentation of research results (Carey and Belluck, 2015; Coy, 2013; Economist, 2013; Kolata, 2011). Some evidence suggests these problems, whether intentional or accidental, are more widespread than has been previously recognized, and that existing norms and processes of academic publishing and peer-review are poorly configured to detect or deter them (Gelman, 2015; Horton, 2015). For instance, in one recent large-scale reanalysis of 100 published psychology studies, only about one-third of the results could be replicated (Open Science Collaboration, 2015); in another recent analysis of 60 published economics papers, less than half of the main results could be replicated (Chang and Li, 2015). In response, a number of standard-bearers in the scientific process have recently called for or instituted pro-transparency policies—from requiring researchers to make their data publicly available, to insisting that study results be independently replicated prior to publication (Ablin, 2014; Alberts et al., 2015; Institute of Medicine, 2015; Jacoby, 2015).

Similar pro-transparency principles have also gained traction in government. The ethos of open government, advocated in recent years by the Obama administration (Ellman and Suh, 2013), posits that accessible datasets and transparent decision-making processes are necessary precursors to government accountability, responsible governance, public trust, and, ultimately, improved policy outcomes. Providing open access to information produced in federally funded research is said to be a core function of democracy, an effective means of accelerating job growth and innovation, and an essential strategy for promoting an engaged and informed public (Holdren, 2013; Seife and Thacker, 2015).

As social scientists, we count ourselves in support of the (often overlapping) agendas of the open science and open governance movements. In numerous cases, accessibility and replication have strengthened the integrity of data-driven decisions and increase the accountability of decision-makers. Indeed, it is difficult to imagine many principled arguments against transparency (except to the extent necessary to protect personal privacy or otherwise sensitive information)—especially when data are analyzed as part of public governance processes, or when public money has been used to produce them.

However, in this Commentary, we highlight a critical challenge to the growing movement toward increased data transparency in science and public policy. We note that legislative efforts that invoke the language of data transparency can sometimes function as “Trojan Horses” designed to advance goals that have little to do with good science or good governance. Framing these maneuvers in the language of transparency can be politically strategic, because approaches that emphasize open access carry tremendous appeal, particularly in current political, technological, and institutional contexts. We illustrate our argument through two examples of pro-transparency policy efforts, one historical and one current: industry-backed “sound science” lobbying in the 1990s, and contemporary legislative efforts to open environmental data to public inspection.

“Sound science,” data quality, and the institutionalization of uncertainty

In the 1990s, the tobacco company Philip Morris launched a “sound science” initiative aimed at casting doubt on the link between secondhand tobacco smoke and lung cancer by challenging prevailing interpretations of key studies (Baba et al., 2005). Among the campaign’s first objectives was to “legislate public access to epidemiological data used in support of federal laws and regulations” (SRIC Innovation, 1997). Recognizing that frustration in the oil and coal industries over proposed clean air regulations represented a “hook” that could provide an opportunity for political coalition-building, the sound science team presented data access legislation to Capitol Hill allies (Philip Morris, 1997). In 1998, Sen. Richard Shelby (R-Ala.) added a rider called the Data Access Act (DAA) to an appropriations bill that mandated public access to all data produced by federally funded scientists employed by nonprofit institutions (Michaels, 2008). The law, commonly known as the Shelby Amendment and passed in September 1998, did not apply to privately funded studies.

In 2000, the sound science team achieved an even bigger victory when a tobacco industry lobbyist convinced Rep. Jo Ann Emerson (R-Mo.) to slip a two-sentence rider into a 712-page appropriations bill requiring all federal agencies to issue guidelines “ensuring and maximizing the quality, objectivity, utility, and integrity of information (including statistical information) disseminated by the agency.” The Data Quality Act (DQA)—also known as the Information Quality Act—required agencies to establish a mechanism through which “affected persons” could “seek and obtain correction of information” promulgated by the government. No hearings were held on the DQA, and it is not clear that most members were aware it was being passed (Wagner, 2003; Weiss, 2004).

Many scientists have denounced the DAA and DQA as attempts to magnify and institutionalize the uncertainty inherent in the science policy enterprise (Michaels and Monforton, 2005; Rosenstock, 2006; Schick et al., 2007). David Michaels, current chief of the Occupational Safety and Health Administration, has referred to the DAA as an invitation to “dredge and manipulate” government data in an effort to muddy the scientific waters (Michaels, 2008: 177). Michaels has written that the DQA has “successfully slowed agency activities” by consuming scarce resources and staff (Michaels, 2008: 190). DQA corrections have become a favored tactic for delaying agency actions that run counter to industry interests. An analysis by the Washington Post found that DQA petitions had been filed predominantly by regulated industries, lobbyists, and trade organizations (32 of 39 petitions analyzed) (Weiss, 2004). Public health scholars have warned that the DQA may prompt agencies to “self-censor” important information that is likely to come under challenge (Rosenstock, 2006). Because these purportedly pro-transparency laws do not apply to industry-funded science and are invoked only in the face of agency action, they are like “a knife that cuts only one way”—against federal intervention (Houck, 2003).

“Secret science” in environmental regulation

Similar legislative efforts are unfolding today. The Secret Science Reform Act (SSRA) is currently pending in Congress. (It was passed in the House of Representatives in March 2015, and is currently awaiting Senate action.) The bill would prohibit the Environmental Protection Agency (EPA) from proposing or implementing regulations “based on science that is not transparent or reproducible,” and requires the public release of all data used in the EPA’s assessments in a manner that allows for independent analysis and reproduction of results. Advocacy for the SSRA has been couched firmly in the language of data transparency: as its sponsor Lamar Smith (R-Tex.) put it, “[c]ostly regulations should not be created behind closed doors and out of public view” (US Congress press release 2015). Notably, the notion of using the catchphrase “secret science” to advocate for data disclosure was discussed in private meetings of consultants to the tobacco industry as early as 1998 (Gianelli, 1998).

The SSRA hits squarely at the intersection of open science and open government. For the science community, the SSRA appears to respond to calls for increased replicability, open access, and data sharing. On the governance side, many informed advocates on the left have pushed to open up government datasets to fight corruption and cast “sunlight” on policy processes.

But upon closer inspection, the SSRA appears to use data transparency as a Trojan Horse through which to advance a different goal: namely, to hamstring the processes of the EPA. The EPA relies upon approximately 50,000 scientific studies annually to make environmental policy. The Congressional Budget Office forecasts that the SSRA would severely restrict the EPA’s ability to enact new regulations, due to both the costs of making so much data publicly available, and the fact that certain classes of data (for example, industry-held or medical data) would be impossible to release (and thus to rely upon) according to the terms of the proposed law (Congressional Budget Office, 2014)—creating a “catch-22” for the agency (Jaffe, 2015; Rosenberg, 2014). These effects do not appear to have been unforeseen: Congressional votes on the SSRA have divided along party lines, and some of its chief advocates are active in the climate-change denial movement. On the other side, more than 50 scientific societies and universities have signed statements in opposition to the bill (American Association for the Advancement of Science, 2014).

Data dredging and the risks of “scientific cacophony”

Beyond these legislative actions, the open data movement has provoked a complex debate among scientists who wish to ensure that vested interests do not take advantage of open-access policies to advance their goals to the detriment of public health. For example, some clinical and public health researchers, informed by prior scientific battles with the tobacco industry and other powerful interests, have expressed concern that biased actors may “dredge” existing data sets to generate new analyses that contradict established scientific and public health positions (Christakis and Zimmerman, 2013; Kaiser, 2003; Sacks et al., 2003). Open access to research data is said to be a “double-edged sword” (Spertus, 2012). To prevent the “scientific cacophony” that might ensue from truly open access, some have proposed that data sharing may not be useful when those requesting data have strong vested interests (Christakis and Zimmerman, 2013).

Scientific proponents of data sharing, in contrast, assert that de-identified raw data “should eventually be put into the public domain for unconditional, universal access” (Strom et al., 2014). For such advocates of unrestricted access, status quo arrangements in which data are tightly held and the original investigators’ interpretation prevails “may be just as or more harmful” than a situation in which diverse private actors are empowered to challenge the accepted wisdom with their own assessments of the evidence (Krumholz and Peterson, 2014).

These competing perspectives invoke different assumptions about the institutional processes of science-based governance. Regulatory open-data advocates propose essentially a democratic, free-market approach to the evaluation of scientific findings: release the data, they suggest, and the “best” findings will rise to the top, while promoting accountability for decision-makers. Other researchers fear that open-data policies might, paradoxically, increase industry “capture” of regulatory processes, as resource-rich special interests exploit scientific uncertainty to impose undue administrative delay.

The limits of transparency in science and governance

In an era of increased skepticism toward science, anything less than unqualified openness on the part of regulatory agencies may be taken as an indication that something is being hidden. In this Commentary, we emphasize that the principle of data transparency is subject to limits (Jasanoff, 2006). Data processes in science and governance face at least three sources of constraint, which calls for increased data transparency may strategically play against. First, institutional resource limitations are unavoidable—open processes require substantial outlays of time and money, which can hamstring the workings of agencies to the extent that they cannot accomplish their aims. Second, countervailing interests, such as privacy protection, may make it impossible to fully release data; while techniques like anonymization can provide a middle ground, initiatives like the SSRA do not make allowances for them.

In time, technological and methodological advancements may ameliorate these two concerns somewhat, as infrastructures for data sharing become standardized and institutional norms shift. But a final constraint remains: the fact that epistemological limitations constrain data-driven political decision-making. Agencies charged with protecting public health and the environment must make decisions in the face of scientific uncertainty, because science by its nature is incomplete and only rarely provides precise answers to the complex questions policymakers pose. Sarewitz (2000) compares the goals of science and politics:

The goal of politics is the achievement … of an operational consensus that enables action. This is a very different goal from that of science, which seeks to expand insight and knowledge about nature through an ongoing process of questioning, hypothesizing, validation, and refutation … . When a scientific problem is contentious and the object of a vibrant research effort, consensus is extremely difficult to achieve—the process of scientific investigation intrinsically militates against, is designed to inhibit, premature consensus.

Data transparency, even with its many virtues, cannot alter this fundamental aspect of scientific inquiry. Cacophony and contention are core elements of the scientific enterprise. By invoking a “narrow, idealized portrayal of science” in which research reliably produces clear and reproducible facts, the special interests behind the DAA, DQA, and SSRA mischaracterize science’s “inevitably incomplete, uncertain, contested, and … often unreliable” nature in their efforts to stymie regulatory activities with which they disagree (Sarewitz, 2015). Technical experts at regulatory agencies frequently commit this same mistake by failing to delineate the extent to which their science policy decisions inevitably are informed by value judgments that go well beyond the available science (Wagner, 2003). Regulators and their opponents thus co-produce a false impression of the contribution of science to public policy development.

Rules that invoke the specter of “secret science” or that exist mainly to impede policy processes weaponize the concept of data transparency. They also, ironically, may themselves violate principles of transparency: the DAA and DQA were covertly inserted into large appropriations bills—a strikingly opaque approach. The SSRA proposes that evidence that has not been released publicly should be excluded from EPA analyses even if it might improve agency decision-making. Transparency requires that relevant research not be dismissed from policy processes even if it is incomplete or less than definitive (Wagner and Steinzor, 2006). There are other recent examples of legislative efforts that invoke the language of transparency and scientific quality control in the name of democratic values but that appear to be camouflaged efforts to improve the lot of special interests (Marcos, 2015; Urology Times, 2013).

The political valence of data transparency is a critical reminder of the inherently sociopolitical nature of all technologies, including institutional data practices. Though transparency is often framed as an unalloyed good (provided that privacy interests can be adequately protected), in practice it provides a means through which diverse stakeholders attempt to achieve diverse political goals. Politically motivated proponents of transparency, in some cases, may exploit the epistemological and institutional realities that accompany the production of science and science-based policy. Policies that allow for open sharing of data may improve perceptions that science-based decisions are credible, but data access approaches must be carefully designed to ensure they make science and governance better, not worse. Just as Big Data itself creates phenomenological and epistemological challenges that must be critically assessed, its attendant processes also warrant careful analysis.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

This article is a part of Special theme on Critical Data Studies. To see a full list of all articles in this special theme, please click here: http://bds.sagepub.com/content/critical-data-studies.

References

Ablin

(2014) The problem with prostate screening. The New York Times. 25 November. Available at: http://www.nytimes.com/2014/11/26/opinion/the-problem-with-prostate-screening.html (accessed 9 October 2015).

Alberts

Cicerone

Fienberg

(2015) Self-correction in science at work. Science 348(6242): 1420–1422.

American Association for the Advancement of Science (2014) Letter to house majority whip Kevin McCarthy, 31 July. Available at: http://democrats.science.house.gov/sites/democrats.science.house.gov/files/documents/Coalition-secret_science_house.pdf (accessed 9 October 2015).

Baba

Cook

McGarity

(2005) Legislating ‘sound science': The role of the tobacco industry. American Journal of Public Health 95(S1): S20–S27.

Carey

Belluck

(2015) Doubts about study of gay canvassers rattle the field. The New York Times. 25 May. Available at: http://www.nytimes.com/2015/05/26/science/maligned-study-on-gay-marriage-is-shaking-trust.html (accessed 9 October 2015).

Chang

(2015) Is economics research replicable? Sixty published papers from thirteen journals say “usually not.” Finance and Economics Discussion Series 2015–083, Washington, DC: Board of Governors of the Federal Reserve SystemAvailable at: http://www.federalreserve.gov/econresdata/feds/2015/files/2015083pap.pdf (accessed 9 October 2015).

Christakis

Zimmerman

(2013) Rethinking reanalysis. Journal of the American Medical Association 310(23): 2499–2500.

Congressional Budget Office (2014) Cost estimate, H.R. 4012, Secret Science Reform Act of 2014, 3 October. Available at: https://www.cbo.gov/publication/49443 (accessed 9 October 2015).

Coy

(2013) Reinhart, Rogoff, and the Excel error that changed history. Bloomberg Business. 18 April. Available at: http://www.bloomberg.com/bw/articles/2013-04-18/faq-reinhart-rogoff-and-the-excel-error-that-changed-history (accessed 9 October 2015).

10.

Economist (2013) Trouble at the lab: Scientists like to think of science as self-correcting. To an alarming degree, it is not, 19 October. Available at: http://www.economist.com/news/briefing/21588057-scientists-think-science-self-correcting-alarming-degree-it-not-trouble (accessed 9 October 2015).

11.

Ellman L and Suh R (2013) Sunshine week: In celebration of transparency. In: White House Open Government Initiative blog. Available at: https://www.whitehouse.gov/blog/2013/03/14/sunshine-week-celebration-transparency (accessed 9 October 2015).

12.

Gelman

(2013) Ethics and statistics: It's too hard to publish criticisms and obtain data for replication. Chance 26(3): 49–52.

13.

Gianelli L (1998) Memorandum to “Secret Science” Work Group, Philip Morris, 10 April. Available at: http://industrydocuments.library.ucsf.edu/tobacco/docs/klyc0069 (accessed 9 October 2015).

14.

Holdren

(2013) Memorandum for the Heads of Executive Departments and Agencies: Increasing Access to the Results of Federally Funded Scientific Research. Available at: https://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf (accessed 9 October 2015).

15.

Horton

(2015) Offline: What is medicine's 5 sigma? The Lancet 385(9976): 1380.

16.

Houck

(2003) Tales from a troubled marriage: Science and law in environmental policy. Science 302(5652): 1926–1929.

17.

Institute of Medicine (2015) (2015) Sharing Clinical Trial Data: Maximizing Benefits, Minimizing Risk, Washington, DC: The National Academies Press.

18.

Jacoby W (2015) The AJPS replication policy: Innovations and revisions. In: American Journal of Political Science Editor Blog. Available at: http://ajps.org/2015/03/26/the-ajps-replication-policy-innovations-and-revisions/ (accessed 9 October 2015).

19.

Jaffe

(2015) Republicans’ bills target science at US environment agency: Proposed legislation would change how the US Environmental Protection Agency uses science to determine pollution limits. The Lancet 385(9974): 1167–1168.

20.

Jasanoff

(2006) Transparency in public science: Purposes, reasons, limits. Law and Contemporary Problems 69(3): 21–45.

21.

Kaiser

(2003) Industry groups petition for data on salt and hypertension. Science 300(5624): 1350–1350.

22.

Kolata

(2011) How bright promise in cancer testing fell apart. The New York Times,. 7 July. Available at: http://www.nytimes.com/2011/07/08/health/research/08genes.html (accessed 9 October 2015).

23.

Krumholz

Peterson

(2014) Open access to clinical trials data. Journal of the American Medical Association 312(10): 1002–1003.

24.

Marcos

(2015) House passes bill to overhaul EPA Scientific Advisory Board. The Hill,. 17 March. Available at: http://thehill.com/blogs/floor-action/house/235988-house-passes-bill-to-overhaul-epa-advisory-board (accessed 9 October 2015).

25.

Michaels

(2008) Doubt is Their Product, New York, NY: Oxford University Press, pp. 176–178.

26.

Michaels

Monforton

(2005) Manufacturing uncertainty: Contested science and the protection of the public's health and environment. American Journal of Public Health 95(S1): S39–S48.

27.

Open Science Collaboration (2015) (2015) Estimating the reproducibility of psychological science. Science 349(6251): aac4716.

28.

Philip Morris (1997) Sound Science Project Plan, December. Available at: http://industrydocuments.library.ucsf.edu/tobacco/docs/zqng0155 (accessed 9 October 2015).

29.

Rosenberg

(2014) Congress must block these attacks on independent science. Roll Call,. 17 November. Available at: http://www.rollcall.com/news/congress_must_block_these_attacks_on_independent_science_commentary-237993-1.html (accessed 9 October 2015).

30.

Rosenstock

(2006) Protecting special interests in the name of ‘good science’. Journal of the American Medical Association 295(20): 2407–2410.

31.

Sacks

Appel

Bray

(2003) Sodium and blood pressure: No data dredging, please!. American Journal of Hypertension 16(7): 614–616.

32.

Sarewitz

(2000) Science and environmental policy: An excess of objectivity. In: Frodeman

(ed.) Earth Matters: The Earth Sciences, Philosophy, and the Claims of Community, Upper Saddle River, NJ: Prentice Hall, pp. 79–98.

33.

Sarewitz

(2015) Reproducibility will not cure what ails science. Nature 525(7568): 159.

34.

Schick

Bero

Cook

(2007) The tobacco industry and the Data Quality Act. Science 317(5840): 898.

35.

Seife C and Thacker P (2015) Why it’s OK for taxpayers to ‘snoop’ on scientists. Los Angeles Times, 21 August. Available at: http://www.latimes.com/opinion/op-ed/la-oe-0821-seife-thacker-science-transparency-20150821-story.html (accessed 9 October 2015).

36.

Spertus

(2012) The double-edged sword of open access to research data. Circulation: Cardiovascular Quality and Outcomes 5(2): 143–144.

37.

SRIC Innovation (1997) Sound Science Project. Philip Morris, May. Available at: http://industrydocuments.library.ucsf.edu/tobacco/docs/snyc0069 (accessed 9 October 2015).

38.

Strom

Buyse

Hughes

(2014) Data sharing, year 1—Access to data from industry-sponsored clinical trials. New England Journal of Medicine 371(22): 2052–2054.

39.

Urology Times (2013) AUA throws its support behind USPSTF reform bill, 3 June. Available at: http://urologytimes.modernmedicine.com/urology-times/content/tags/aua/aua-throws-its-support-behind-uspstf-reform-bill?page=full (accessed 9 October 2015).

40.

US Congress (2015) House Committee on Science, Space, and Technology press release. House, Senate Introduce Bill to Ensure Open EPA Science, 24 February. Available at: http://science.house.gov/press-release/house-senate-introduce-bill-ensure-open-epa-science (accessed 9 October 2015).

41.

Wagner

(2003) The ‘bad science’ fiction: Reclaiming the debate over the role of science in public health and environmental regulation. Law and Contemporary Problems 66(4): 63–133.

42.

Wagner

Steinzor

(2006) Transparency and honesty. In: Wagner

Steinzor

(eds) Rescuing Science from Politics: Regulation and the Distortion of Scientific Research, New York, NY: Cambridge University Press, pp. 99–102.

43.

Weiss

(2004) A policy puts science on trial: “Data quality” law is nemesis of regulation. Washington Post,. 16 August. Available at: http://www.washingtonpost.com/wp-dyn/articles/A3733-2004Aug15.html (accessed 9 October 2015).