Sage Journals: Discover world-class research

Abstract

Metagenomic Early Detection of Future Engineered Outbreaks

Advances in DNA synthesis will soon expedite the production of longer fragments of DNA and enable benchtop DNA printing.¹ These advances will become cheaper in coming years, with many welcome results. One result, however, is very unwelcome. With our increasing understanding of the factors governing viral transmission and immune evasion—capabilities that may be dramatically enhanced with advancements in artificial intelligence—tens of thousands of scientists may soon have the capacity to obtain infectious samples of pandemic pathogens.¹ Publications describing how viruses cause harm may inadvertently but credibly describe ways to engineer potential pandemic agents to be as dangerous and disruptive as possible.² Malevolent actors may then create such pathogens and deliberately or accidentally release them, potentially causing a pandemic that dwarfs the COVID-19 pandemic.

Engineered pandemics, the topic of this commentary, are in some ways a more critical global health concern than zoonotic disease outbreaks. For the latter, being as dangerous and as disruptive as possible would be a mere coincidence, not a design feature.

Against a fast-spreading engineered pathogen with asymptomatic transmission in most individuals (like polio), or especially one with delayed symptoms (like HIV or tuberculosis), early warning could provide crucial time to respond—limiting transmission and buying more time for countermeasure development. Therein lies the need to develop disease monitoring systems for surveillance and early detection that incorporate metagenomic sequencing.^1,3 By that, we mean methods that extract genetic sequence data from many or all organisms in a sample, collected primarily from environmental sources like sewage, air, soil, and farm dust, and secondarily from individuals who attend clinics or otherwise contribute samples, such as flight crews. Experts point out the many advantages and uses of including metagenomic sequencing in our pandemic detection arsenal, especially against engineered outbreaks of catastrophic potential:

Metagenomic sequencing does not require knowing what virus or variant is spreading—it is pathogen agnostic.⁴ For example, it may simply look for any sequence that displays exponential growth, a proxy of an infectious disease outbreak.⁵ That is a crucial advantage given that novel pathogens—whether engineered or arising naturally—pose the greatest risk of a pandemic. Engineered pathogens may also be deliberately designed to evade easy detection.

Being pathogen agnostic, metagenomic sequencing does not require a computer program listing viruses of concern, which is an information hazard.⁶

Unlike exclusive reliance on syndromic monitoring or clinical testing, environmental metagenomic sequencing can start before patients become symptomatic. That helps with detecting viruses that may spread presymptomatically or asymptomatically.

Unlike exclusive reliance on clinical testing, metagenomic sequencing does not always encounter the many legal constraints on use of clinical data (more on this below).

Metagenomic sequencing that employs a high-throughput method, such as next-generation sequencing that is adapted to simultaneously detect any pathogens present, could be faster than the use of a large number of individual tests based on current methods, even in detecting a known virus.

The same tools enabling metagenomic sequencing that tell us whether a virus is exponentially growing can also be used to identify its genome and characterize its likely characteristics, such as its likely virulence, antigenicity, and susceptibility to antivirals.⁴ Later in the epidemic, metagenomic sequencing tools created for detection could continue to help monitor ongoing spread⁷ and the original genomic map could help track mutations.

The same tools that help reveal outbreaks of especially dangerous engineered viruses can also serve us in preventing natural pandemics. For example, they might detect the “next HIV” before it spreads widely. That obviously bolsters their justification, but the argument below relies mainly on their help in the especially acute and undertheorized event of a pandemic from an engineered pathogen.

In this commentary, we imagine how a global system of metagenomic sequencing could be governed in order to reap the benefits and manage the risks. Global institutional engineers and their academic partners would need to develop a new governance model to ensure that all partners involved in a global metagenomic surveillance system cooperate as needed. If you will, a smart equivalent of the International Atomic Energy Agency is needed, but one that has far sharper legal “teeth.” International expert monitors could routinely circulate between states with metagenomic surveillance stations and have collective powers to mete out formidable and decisive punishments for transgression. The sanctions used should not include the “own goal” that when 1 country defects, others defect too, for example, by leaving the metagenomic sequencing network. Instead, sanctions like losing access to a large sum of money deposited for safekeeping or still other sanctions should be used.

Such a global system for metagenomic surveillance may raise worries about privacy, on which this commentary expands. It is true that many privacy-preserving options—such as discarding sequencing reads matching the human genome, environmental sampling, basic deidentification, and thoughtful privacy impact assessment processes—will be compatible with an effective metagenomic sequencing surveillance system. However, the greater scale of potential harm from engineered pandemics may justify giving greater priority to detection over privacy than would be appropriate for less dangerous diseases. Privacy and public health experts alike should prepare the public now for the possibility that early detection of potentially catastrophic pathogens may require a sober and well-informed debate about privacy and greater collective tolerance for different levels or types of privacy protection.

Initial Privacy Concerns in Metagenomic Monitoring

The benefits of metagenomic monitoring will not be realized if privacy regulations or public distrust prevent such systems from being established. In related health settings, privacy concerns are regularly managed. Samples from clinical care settings, for example, can contain identifying metadata about patients. They may also reveal symptoms and could trigger efforts to detect individual patients and their contacts. Samples can include patient cells that contain genetic and other biological data. They can also reveal who has a novel disease (potentially leading to ostracization of the patient and their contacts) or other diseases (including stigmatized infections or ones that can prompt denial of insurance or of employment).^8,9 More generally in public health surveillance, “individuals and groups can be chilled in their personal lives, stigmatized or threatened, and used for the benefit of others when health information is wrongfully collected or used.”¹⁰

Environmental samples carry fewer privacy risks than clinical samples because they include no direct identifiers. Any human DNA can typically be removed from collected samples during preparation for sequencing, with any sequence reads matching human sequences discarded.¹¹ And even if one tried, it is exceedingly difficult to assemble full human genomes from environmental samples even with long-read sequencing due to the similarity and repetitiveness of human genomes. Nevertheless, a lot can be revealed and inferred about individuals from environmental samples.

First, environmental samples can reveal genetic information. While difficult (and impossible if human DNA is reliably separated in advance from viruses and the rest of the sample), medically relevant fractions of human genomes can plausibly be constructed even from environmental samples if sufficiently few individuals from different ancestral backgrounds have contributed to the sample. While fractions of human genomes may not provide information on who the DNA belongs to, triangulating it with more information—for example, from an ancestry or public research DNA database—may reveal identities or at least narrow the possibilities down to particular families.^12,13 That may already reveal private information, for example, that a person was in a different location than they told their family. Additional inferences could also be made about the person, such as likely physical attributes, likely personality traits, and likely medical conditions.^14,15

Likewise, the microbiome found in environmental samples might, with further work, create a unique signature that under certain conditions could be linked to individual identities.^16,17 As tools for its use (eg, in forensic medicine) continue to develop, the microbiome may become a powerful clue for identifying individuals by looking for unique signatures and matching them to an existing library, though perhaps not all types of environmental samples will be amenable to this type of process. Crucially, unlike human genome parts, which might be removed from the sample, the microbiome must not be excluded from the tested sample—as the goal is to check all microbiota for signatures of engineering, weaponization, and/or exponential growth. Once a person is identified from a sample, the microbiome information in that sample could someday reveal further statistical inferences about their health state^18,19 and personality.²⁰

It is possible that technical advances will allow metagenomic sequencing for early detection without these privacy concerns. But if a silently spreading outbreak is detected, especially one that appears to be adversarially engineered, there is real public health value in identifying individual carriers for treatment, contact tracing, or isolation—sometimes, individual identification may even be an essential public health need. In the case of a pathogen with a long incubation period and asymptomatic spread, determining whether or not delayed harm will ensue may require medical examination of infected individuals by expert responders, which in turn requires individual identification.² Public health authorities may therefore be tempted to triangulate human genome and/or human microbiome with other information, such as passenger manifests, to identify carriers. If the pathogen is dangerous enough, a government may even feel morally or practically compelled to identify them. Perhaps privacy laws and norms should anticipate when a public authority should or must attempt to identify a person or disease vector when doing so could avert catastrophe.

Directly put, when metagenomic monitoring is intended to remain statistical and anonymous, the following methods are commonly used to manage privacy rights and risks. There is, however, a limit to how much they can protect privacy and especially to how much they can do so while maximizing public health:

Deidentifying data – For most systems, data can be collected without directly identifying information like personal names and without indirect identifiers that are known to facilitate fairly easy linkage attacks like age, sex, and zip code.²¹ To preserve the possibility of identifying individuals for contact tracing or isolation in the event of a likely pandemic outbreak, systems that originally collect clinical data or other identified records could keep such information secure by separating it from the public health research dataset until an emergency is declared or exigent circumstances justify the identification of a human carrier. But the risk of reidentification will always be nonzero because the collected data could be triangulated with data from external sources or hacked if identifying information is stored separately.

Filtering out human genomic information – Alternatively, a system can filter out and discard human genomic information from metagenomic monitoring systems, leaving only nonhuman genomic material available for analysis. That would further reduce the risk of reidentification, but it would come at some cost to both research utility and public health actors’ capacity to identify a human carrier for isolation or for contact tracing.

Regularly reviewing privacy standards and privacy impact assessments – A metagenomic pandemic monitoring system can include a regular privacy impact assessment, possibly in consultation with stakeholders (including representatives of the surveilled communities), in order to engage in regular self-reflection and to gain public understanding and a social contract of sorts. These mechanisms can enhance public acceptability, but they risk creating an expectation of absolutely zero risk to privacy intrusions—as though the reason we engage stakeholders is to ensure that their privacy is secure. That risk increases if the bodies making privacy assessments are charged only with overseeing privacy interests, without being held accountable for setbacks to public health.

Requiring opt-in consent for monitoring systems – Opt-in systems of consent are the gold standard for privacy regimes that attempt to place full autonomy and control with a data subject. But individual consent could feasibly be considered only for clinical monitoring or environmental monitoring that samples from a small, identifiable population. Opt-in requirements would burden monitoring systems with delay, cost, selection bias, and lower sensitivity.

Permitting opt-out of monitoring – Systems that allow individuals to opt out of metagenomic pandemic monitoring would be difficult to carry out in the case of environmental collection (ironically, to execute the opt-out, they require the identification of any individuals whose information was collected). By reducing the sensitivity of testing and introducing some bias, they would compromise the quality of monitoring. Tellingly, while many privacy laws around the world attempt to maximize the data subject’s control over information, public health exceptions abound. This is because either opt-in or opt-out privacy rules run the risk of undermining collective and government goals of the highest order.

For these reasons, but especially the likely importance of locating infected individuals when a silently spreading pathogen is detected, Liang et al are right to warn that privacy “concerns will become increasingly salient as metagenomic monitoring becomes ubiquitous.”⁷ A genuine tension exists here between optimizing privacy and optimizing public health. Privacy-preserving technologies will often diminish the public health value of data, forcing difficult tradeoffs.

At other times, Liang et al seem to endorse a middle-of-the-road solution: “Data and metadata collected by the distributed sequencing network are at least partly public, but systems also account for societal preferences regarding privacy. Maintaining privacy is necessary to earn social licensing and trust in storing potentially identifying information.”⁷ That sounds right, but how extensive are the privacy considerations proposed? Inasmuch as privacy can be protected with little or no cost to monitoring quality, of course it should be. However, real tradeoffs may be unavoidable, especially when a pathogen is detected. As COVID-19 has shown, we will have very little time to act once a highly infectious virus emerges, and little wiggle room once a technology and a legal system are in place. We should therefore build a system for protecting information and privacy now, and we should clarify what type of system to put in place, how to follow up after pathogen detection, and what regulations and promises on privacy should be made to the public.

When tradeoffs are real, challenges that are ethical, legal, or concerned with public acceptability are also real. In this commentary, we review the strength of these challenges and what could be done to mitigate or otherwise address them. We propose an agenda of research in ethics, law, and public health risk communication to address these privacy concerns. This discussion is primarily concerned with a public health surveillance system rather than any research the system may enable—the latter raises other ethical considerations.

Ethical Challenges

An invasion of someone’s privacy can come in different degrees. Some privacy invasions are dramatic—they cost the person terrible harms or setbacks like utter humiliation or deep shame, discrimination in important matters such as denial of employment or health insurance, severe stigma, and related sanctions or violence. Other transgressions of privacy are relatively small. There are 4 reasons to think that when metagenomic monitoring is done right, including the institutionalization of such protections, any related privacy transgressions from it would be limited almost entirely to small ones or would otherwise be easily answerable in this setting.

First, firm protections from certain abuses of data, such as reidentification attacks, would make them very costly to the attackers—and therefore much rarer. Deidentifying open data would likewise make such attacks harder and more costly to the attackers. That would make metagenomic sequencing (when it carries little if any damage to its public health utility) clearly worthwhile. The privacy transgressions that remain once protections against serious abuse and deidentification are in place tend to be fairly innocuous.

Admittedly, deidentification notwithstanding, a computer or a Microsoft Excel sheet may still “discover” that someone with certain identifiers was somewhere sometime. But suppose that discrimination cannot follow because of protections that make abuses of data too costly to attackers. That “discovery” is not given meaning, and the occasional “discovery” is not shaming that person.

This position may seem cavalier. Does the sheer fact that someone’s computer or Excel sheet obtains information linked to a person’s unique genetic code or social security number not constitute a gross privacy violation—even if use of that information to expose their health condition to people or to deny them health insurance, for example, is blocked? But we are not ashamed by our showerhead “seeing” us naked. So why should a computer “figuring” something about “us” be any different? As Jean-Paul Sartre wrote, all shame is “in its primary structure shame before somebody […] shame is shame of oneself before the Other”²²—not before a showerhead or a computer. Thus, when a computer or dataset merely matches up our numbers to other numbers corresponding to a certain health condition, the process has little bearing on privacy harm so long as the data controllers are barred from sharing, using, or having a human look at the data. When an individual who was in a certain catchment area but whose interpretable “identity” is likely to remain unknown is discovered by a computer to have a certain genetic code or fragment or health condition, that in itself means nothing.

Second, from a utilitarian point of view, limited transgressions of privacy are warranted if they enable effective response to a megapandemic. The latter utility loss is far, far greater. Of course, not everyone is a utilitarian. Some may object on the ground that acute individual interests, including privacy interests, should not be sacrificed on the altar of collective utility. In contemporary antiutilitarian philosophy, “antiaggregationism” forbids setting back the interests of even a few individuals dramatically for the sake of minute individual gains, even when, by accruing to billions of other people, the aggregated gains far exceed the cumulative losses.²³

Yet in our case, antiaggregationists could agree that privacy interests should give way. Note that antiaggregationism can focus on gains and losses in personal realizations (eg, premature death) or, alternatively, on personal prospects (eg, life expectancy). For antiaggregationists who focus on personal realizations, the billions are not suffering something smaller than compromises to privacy (a missed TV show or a sore throat is a typical illustration). They suffer something very bad and far worse than most compromises of privacy—severe disease or death. It is appropriate to count those severe outcomes. Move on to those antiaggregationists who focus on personal prospects. They can also agree that preventing a pandemic matters more. Each individual benefits in prospect when, thanks to the minor privacy transgressions that occur in early detection, their personal risk of death or illness sharply goes down thanks to response measures that require early detection.

To this, some might answer that, chronologically, metagenomic monitoring starts operating when both the cumulative risks and the risks per individual are still modest. At that early point, the prospects of pandemic harm to each individual—his or her medical risk—are still very small. However, even at that early point, the medical risk is not so small as to be negligible—any small probability of a truly catastrophic risk warrants reasonable prophylactic precautions. Besides, the Nucleic Acid Observatory, the first planned metagenomic monitoring project in the United States,⁷ is reportedly planning to perform analyses of environmental metagenomic sequencing data with very little potential for privacy encumbrance; it would only aim to identify the infected—through every means available—if and when a pathogen that is suspected to be particularly dangerous is detected. Therefore, we can safely assume that a breach of privacy would be for good reason, in terms of the risk to life and limb. As long as computer security and cloud technology can safely contain undeciphered data up to that point, no antiaggregationist case founded on our privacy interests can take off.

Third, in the metagenomic detection and surveillance setting, those whose privacy is slightly transgressed also benefit from the existence of that protection system—because it protects their lives, loved ones, and functional societies. Indeed, they benefit a lot from the existence of the system—enough that even extreme privacy violations may be worth it to them ex post compared to alternatives like dying or having their civilization collapse. It is true that they would benefit much more from freeriding—from the potentially catastrophic outbreak getting detected early and thwarted by curbing everyone else’s privacy but not their own. But this presents a classic tragedy of the commons, a collective action problem. Freeriding is no one’s entitlement.^24,25

That said, enforceable international agreements that require firm protections against misuse of data by all governments involved and by commercial or other third parties remain necessary. And the fruits of metagenomic genomic monitoring should benefit all or nearly all, for example, by enabling all to obtain pandemic countermeasures in a timely manner because “just distribution of the benefits of monitoring […] are core elements in the justification of monitoring practices.”¹⁰ It would also be important to ensure that disease surveillance practices are minimally burdensome. Thankfully, environmental disease surveillance may reduce the requirements for human testing, thereby alleviating the burden of existing programs, which rely on clinical samples. However, if a metagenomic sequencing surveillance program were to disproportionately burden some individuals or groups, then processes should be established to ensure appropriate compensation.²⁵

Fourth, the potential loss of privacy from reidentification is justified because when an individual has been infected with or exposed to a dangerous virus, identifying that individual for the purpose of notice, quarantine, and health services may be beneficial not only to society at large but also to that individual. It could save their health. Even in the extreme case where nothing could be done for the individual medically (eg, they may die before the development of therapeutics), reidentification could save the people they love. And one could argue that having dramatically assisted humanity to fight the scourge that robbed them of their life would lend meaning to their death—analogous to contributing to medical research that promotes human life and health.²⁶ Even if, on balance, the prospective privacy loss were worse for the individual than any potential personal gains, these prospective gains diminish somewhat the net risk to that individual from that exposure, and lower the already not-very-high threshold for permitting privacy transgressions for societal purposes.

What about the occasional large privacy transgressions—when there are occasional glitches in the system, when a regime reneges on its agreements and abuses data, or when it sells data to abusive parties who engage in very disturbing privacy transgressions? Those concerns are real and need to be addressed seriously and in advance. Once a governance system that meets the requirements noted in the opening section is in place to preempt abuses, we believe that any remaining risks to privacy would be justified, on balance, for 2 reasons.

The first reason is that privacy is a paradigm case of a set of duties that can be transgressed when critical interests are at stake. Consider the argument that person A is permitted to peek into innocent B’s private diary in order to glean where C is planning to meet D and assassinate the latter, and thereby preempt the assassination. Ethicist Jasmine Gunkel responded to this reasoning as follows:

I can agree that A is permitted to peek in B’s diary if A knows or reasonably suspects that they can prevent an assassination, or will in fact prevent one. But that would be more parallel to examining the wastewater that came from a single apartment complex or neighborhood if we knew or reasonably suspected that we could glean information that would prevent the spread of a pandemic. And that is not what is being suggested here, nor would such an approach, of monitoring only when we already know or suspect people have contracted a pathogen, be particularly effective. Rather, the appeal of metagenomic monitoring is for early detection, which means we will lack the justification and narrowness of invasion that makes the initial diary case so compelling.

Metagenomic sequencing is more like reading everyone’s diaries all the time, because we know that doing so will prevent some assassinations. And it is much less obvious that this sort of invasion would be permissible. […] it is certainly not widely agreed that such invasions are permissible. Such suggestions are incredibly unpopular, and I think with very good reason.²⁷

Metagenomic sequencing is unlikely to be anything remotely like reading everyone’s diaries all the time: the real privacy tradeoff from metagenomic sequencing is that some people could get detected, with only limited exposure of their secrets. We do not yet know the extent to which everyone’s privacy must be slightly degraded to more quickly identify those who have been infected, whose privacy will be seriously compromised. Still, Gunkel makes a good point. Non-narrow privacy invasions are so unpopular that people are happy to forego even some lifesaving just to keep their privacy. In Thomas More’s Utopia (1551), for example, a fictitious society is described in which doors have no locks, so everyone can walk in on everyone else at any moment. The felicitous result is that no one can afford to plan any crime.²⁸ Modern commentators, however, typically describe that feature of More’s society as disturbing. It is very possible that most of us would reject the balance that wholly sacrifices privacy just to prevent crime. This may initially suggest that privacy is more important and that the privacy costs of metagenomic sequencing are as serious as the health risks from not sequencing. However, the tradeoff in metagenomic monitoring is nothing like the near obliteration of privacy in More’s Utopia. The limited compromise of optimal privacy in our own tradeoff is unlike a situation where someone could suddenly open the door and see whatever we might be doing. And what is prevented in our situation is not sporadic violent crime; what metagenomic early detection may prevent, if conjoined with very robust response measures, is a pandemic far worse than has previously been experienced.

There is a second reason why an effective governance mechanism as described in the opening section would make remaining privacy transgressions justified on balance. Any system offering a significant reduction in nonnegligible catastrophic risks will typically warrant an exception to many otherwise strict moral rules. Preventing public health emergencies is the paradigm case of a situation that even privacy absolutists recognize would warrant transgressions of what generally are binding rules.²⁹ And engineered pandemics may be public health emergencies of the highest order. If credible instructions for engineering pandemic pathogens were to become publicly available, these risks have a meaningful chance of occurring as well, especially in the absence of preventive measures.³⁰ All in all, outside the most absolutist approaches to privacy, most philosophical approaches to privacy treat privacy as a threshold, not an absolute constraint.

This is not to deny that, pragmatically, even privacy infringements that are on balance morally permitted might be rejected by legislators and the public. But these are concerns of a pragmatic, not an ethical sort. We now turn to these pragmatic concerns.

Legal Challenges

Any privacy laws that prevent a metagenomic monitoring system from being effective (or from being implemented at all) will reduce the incentive to develop such a system in the first place. In some contexts, metagenomic pandemic monitoring could proceed without any legal restrictions. Alan Rozenshtein³¹ has explained why US law does not currently prohibit many technology-assisted methods of disease surveillance. Even where a privacy law applies, metagenomic pandemic monitoring could fall under exceptions. This is true even with respect to relatively demanding rules such as the US Health Insurance Portability and Accountability Act of 1996 (HIPAA)³² and, we might add, the European Union’s General Data Protection Regulation (GDPR),³³ for 3 reasons.

First, when the authorities’ purpose is responding to a public health emergency, even sequencing genomic samples usually falls under long-established exceptions to privacy rules. Restrictive laws like HIPAA and GDPR, which generally require individual consent in order for data to be collected or repurposed, would not constrain a metagenomic sequencing program of the sort described here as long as the collection and processing are limited to pandemic-related purposes. See, for example, 45 CFR §164.512(b)(1)(i)³⁴ and GDPR Article 6, Recital 46.³³ These same provisions may also cover data processing for the purpose of detecting public health emergencies.

Second, in environmental metagenomic sequencing, the source of information is often wastewater or air collected in public spaces, and not, for example, samples from clinical care. In many jurisdictions, accessing the former is unregulated. For example, the US Environmental Protection Agency frequently conducts analyses of wastewater in order to detect contaminations or other violations of environmental protection laws, and these analyses do not conflict with the Fourth Amendment to the US Constitution³⁵ or any other applicable US law (eg, Riverdale Mills Corp v Pimpare).³⁶ One might have responded that since clinical data collection will remain necessary in addition to metagenomic testing, pandemic monitoring would require changes to privacy laws. However, Liang et al⁷ clarify that metagenomic testing “should supplant PCR [polymerase chain reaction], lateral flow, or other testing, not just supplement them.”

Third, data collection for metagenomic sequencing programs can be performed with some degree of deidentification that meets today’s standards for deidentified (and therefore nonpersonal) data. Deidentified data typically fall outside the scope of privacy laws.

While the law need not pose an obstacle to effective metagenomic monitoring, it might do so, directly and indirectly. Directly, some jurisdictions may promulgate privacy laws that hold data controllers to a higher standard, increasing uncertainty and the time to detection and response. That would protect their own citizens’ privacy at the expense of the acute health interests of the entire world. Notwithstanding the lack of legal necessity or moral justification, regulators may do so. Indeed, at the height of the COVID-19 crisis, before vaccines were deployed, privacy regulators in the European Union promulgated guidance discouraging the use of cell phone data for exposure risk notifications without individual user consent, despite the sufficiency of public health management as legal justification for the processing. And recently proposed privacy legislation in the United States lacks a public health exception, so far at least, although it does exempt government agencies and individuals acting on their behalf from the scope (ie, the American Privacy Rights Act of 2024, §101(13)(C)(i), exempts federal agencies from the definition of a “covered entity”).³⁷ Thus, particularly given the trend in privacy laws over the past 10 years toward greater privacy protection, and greater emphasis on data subject control, legal interventions in some jurisdictions remain a real possibility.

Unfortunately, the entire international system of metagenomic monitoring would be pointless if refusal to put in place such legal interventions in even 1 country enabled substantially more effective bioattacks through the release of engineered pathogens in that country. Nonetheless, dense monitoring systems everywhere else would catch the fast-spreading pathogen soon thereafter, permitting fairly early monitoring.

Laws can also do indirect damage by encouraging the misguided public perception that morally, we all have moral authority to veto at will anyone deriving information from material we shed. Such perceptions are discussed in the next section.

Implementers of pandemic early detection systems must study legal restrictions to understand and predict how they may vary across jurisdictions. Such research would identify policy mechanisms that allow these early detection systems to develop within the local limitations on use or on disclosure. Ideally, law and policy researchers would propose basic principles that could help guide changes to the privacy rules or future uses of these systems. Scholarly debate over the development and implementation of pandemic response mechanisms must be specific and realistic about the tradeoffs between personal data control and the efficacy of public health programs. Only this can lead to privacy laws being designed, interpreted, and adjusted to prevent negative impact on critical global public health monitoring.

The orientation of this legal research and analysis must diverge from current practices. Too many privacy and public health commentators describe the right to privacy as if it must be achieved first, with the implication that only those public health initiatives that operate within those bounds may proceed. We suggest that in this area of policy, a different approach is needed. When catastrophic risk is at issue, as it is with engineered pandemics, we should first consider how to create effective monitoring, and then consider how privacy can be protected to the greatest extent possible while still maintaining effectiveness. While there may be exceptions—for example, if a tiny improvement in the efficacy of testing is bound to carry an extreme toll on privacy—there is generally a good case for prioritizing effective pandemic monitoring.

Public Acceptability Challenges

As Liang et al commented, “it is unclear if there are legal restrictions on analysis of sewage and similar sources, but the data are potentially predictive of otherwise personal information, so public discussion of privacy tradeoffs and preventing misuse is important.”⁷ Even when there is no sound ethical or legal concern about privacy in metagenomic pandemic monitoring, public perception challenges could defeat the entire enterprise. Even an imagined privacy concern could spread distrust, exacerbating justified and unjustified suspicions of public health actors, governments, and international organizations. Moreover, “new technologies are often the subject of suspicion and misinformation.”⁷ The opposition in various countries to establishing and using datasets that would have facilitated observational medical research illustrates that the public can prioritize its own optimal medical privacy over protecting global health through better public health knowledge.

While we believe that, in principle, this is an area where the public lacks rights against coercive intervention, trying to implement a program against a nation’s opposition is bound to fail in democratic elections. An effective global detection system can only afford to have very few exceptions. The question is how to encourage societies everywhere to support metagenomic monitoring that would help protect us all. Research is needed on what is likely to make people tolerate monitoring in their own “backyards,” so to speak—research that would ask: what education, messaging, incentives, and other interventions are likely to work?

One form of research could use surveys and focus groups to identify empirically what messaging is likely to work best in different societies. The fruits of such research could help public health actors preemptively explain the ethical case for metagenomic monitoring, privacy issues notwithstanding. There might be no escaping the need to openly address with democratic voters the moral questions around privacy and educating societies on why we lack absolute privacy rights, at least in this critical setting. Health communication research may also advise us on where we must prepare for limited compromises on monitoring efficacy because people are expected to insist on strong privacy protections, despite our communication efforts.

Such research on how to encourage people everywhere to cooperate fully with these early detection efforts should be broader than identifying the best messages for convincing the public. Additional candidate interventions to assuage privacy concerns about metagenomic monitoring are needed. For example, it would be sensible to arrange for automated individual monitoring to become a routine operation that does not require repeated opt-ins and reconvincing the public. Likewise, “initial deployments in countries with higher institutional trust may be preferred.”⁷

It is also advisable to engage people in terms of hearing from stakeholders in advance and understanding the basis of any concerns, and learning which counter-messages work best. Ongoing engagement may also help to identify any unanticipated problems with public perception of the monitoring system. What might be counterproductive is public “engagement” in another widely used sense, of giving communities and other stakeholders veto power over practices with no evidence of excessive loss of privacy whatsoever—a folly in our view, given the high stakes and the collective action problem. Nor should any public deliberation be allowed to delay urgent action once an outbreak starts. Because the fundamental ethics are so robust, although some form of engagement is advisable, engagement in these extreme senses is unnecessary and, in light of its costs and risks, detrimental overall.

Ensuring that substantial personal and public benefits exist where the stations for collecting and analyzing samples are located could also go a long way toward containing privacy concerns and enabling smooth cooperation. To ensure that systems bring benefits to communities hosting monitoring stations, messages might address that such monitoring would protect local health, among other things, and increase efficacy of the public health response to somewhat better understood seasonal and zoonotic outbreaks. Messages should also emphasize the strictly individual benefits to participants, should they get infected, including further diagnosis and connection to personal care and improved protection of their personal contacts. It would help to make the analogy to popular apps like Waze and Google Maps, which collect (admittedly with personal consent) invasive information from all participating drivers about their exact locations, but all are happy to provide it (hence the high consent rates) given the serious privacy protection and the immediate benefits to them. There are also benefits to nonusers in terms of fewer traffic jams, but users get the most benefits.

Finally, we should point out the arguable “altruistic benefits” to communities that host metagenomic monitoring stations. By that we mean the potential pride, reputation, collective self-esteem, and happiness imparted by acting responsibly and helping others. While for individuals, this psychological benefit would be more pronounced for opt-in clinical monitoring than for automated environmental monitoring, it could facilitate communities’ decisions to approve local environmental monitoring.

The most relevant end point of the many social science studies needed is what would help sway the public to support what is right for global public health. Genuine ethical qualms about privacy transgression have already been addressed. The guiding thought should be that within limits (eg, not lying or deceit), almost anything that protects the public by motivating widespread adherence is permissible. Thankfully, a major policy to that effect is likely to be sharing the truth—namely, that the currently theoretical and, in the worst case, quite minor privacy transgressions per person from metagenomic monitoring are ethical given the important protections that they may buy us all. Mechanisms that help policymakers and the general public understand how privacy is being protected and why health and safety require some sacrifices of data control (as they always have) could incentivize use of privacy regulation without adverse effects on critical metagenomic monitoring.

Zoonotic Outbreaks

How many of the privacy challenges described earlier apply also to typically lesser zoonotic pandemics and seasonal outbreaks? Do such outbreaks pass the threshold of public health urgency to warrant the compromise of optimal privacy that remains inevitable despite our best efforts? Initially, this may seem to depend on exactly how much privacy can be preserved in a metagenomic early detection system.

Luckily, this question is moot. Once the infrastructure is in place for the purposes of early detection of engineered pandemics, it will likely detect any pathogen spreading exponentially, including ones lacking catastrophic potential. The proposed infrastructure will detect (if successful) fast-spreading pathogens before determining whether they are natural or engineered, highly lethal or benign. Therefore, it will inevitably help us even against these lesser diseases as well, at no additional cost to privacy. Surely that is a good thing.

For some later decision nodes (eg, on whether, once a spreading pathogen is characterized, to take extraordinary measures to identify the individuals infected), there will be a legitimate divergence of opinion depending on the pathogen’s destructive potential. Does a reasonable weight on privacy tolerate personal reidentification when a pathogen is merely a seasonal flu or “only” as bad as seasonal flu? We do not, for the purposes of this commentary, have a firm opinion on these questions. Our claims are more modest. When the pathogen is either known to be of the worst kind or turns out to be the worst kind, there is no real ethical or fundamental legal question of whether we may compromise privacy to the limited degree needed in the interests of rescuing global public health.

Conclusion

While many of the concerns about loss of privacy in metagenomic monitoring for early detection of engineered pandemics are resolvable on a philosophical and, in principle, on a legal level, merely perceived or morally exaggerated privacy concerns might pose unnecessary barriers. Combined with the collective action problem, those may tempt the public to maintain optimal privacy protections and freeride on other publics. If so, there will be a need to engage with the global public and deploy nudges, incentives, and other interventions to assuage potential worries about this important tool for global health security. Fortunately, a means to increase acceptance of especially effective monitoring for early pandemic detection is relaying a truth: there is a strong technological and ethical case for the early detection of potentially catastrophic pandemics to key stakeholders and to the public at large. Our hope is that, done right, that would be enough.

Footnotes

Acknowledgments

The authors would like to thank Open Philanthropy (Bridget Williams, Kevin Esvelt), the Aphorism Foundation and the MIT Media Lab (Kevin Esvelt), and Longview Philanthropy (Nir Eyal) for funding their work.

References

Esvelt

. Delay, Detect, Defend: Preparing for a Future in Which Thousands Can Release New Pandemics. Geneva: Geneva Centre for Security Policy; 2022. Accessed August 13, 2024. https://www.gcsp.ch/publications/delay-detect-defend-preparing-future-which-thousands-can-release-new-pandemics

Youssef

, Gurev

, Ghantous

, et al. Protein design for evaluating vaccines against future viral variation. Preprint. bioRxiv. Posted March 4, 2024. Accessed November 4, 2024. https://doi.org/10.1101/2023.10.08.561389.

Gopal

, Bradshaw

, Sunil

, Esvelt

. Securing Civilisation Against Catastrophic Pandemics. Geneva: Geneva Centre for Security Policy; 2023. Accessed August 20, 2024. https://dam.gcsp.ch/files/doc/securing-civilisation-against-catastrophic-pandemics-gp-31

Ibañez-Lligoña

, Colomer-Castell

, González-Sánchez

, et al. Bioinformatic tools for NGS-based metagenomics to improve the clinical diagnosis of emerging, re-emerging and new viruses. Viruses., 2023; 15(2):587.

SecureBio, Massachusetts Institute of Technology Sculpting Evolution Group. Nucleic Acid Observatory. Accessed May 19, 2024. https://www.naobservatory.org/

Lewis

, Millett

, Sandberg

, Snyder-Beattie

, Gronvall

. Information hazards in biotechnology. Risk Anal., 2019; 39(5):975-981.

Liang

, Wagstaff

, Aharony

, Schmit

, Manheim

. Managing the transition to widespread metagenomic monitoring: policy considerations for future biosurveillance. Health Secur. 2023; 21(1):34-45.

Genetic Information Nondiscrimination Act of 2008. 42 USC §2000ff (2018).

National Congress of State Legislatures. 2023 consumer data privacy legislation. Updated September 28, 2023. Accessed November 4, 2024. https://www.ncsl.org/technology-and-communication/2023-consumer-data-privacy-legislation#:∼:text=State%20legislatures%20have%20long%20been,in%20at%2025%20least%20states

10.

Francis

, Francis

. Sustaining Surveillance: The Importance of Information for Public Health. Cham, Switzerland: Springer Nature Switzerland AG; 2021.

11.

Bush

, Connor

, Peto

TEA

, Crook

, Walker

. Evaluation of methods for detecting human reads in microbial sequencing datasets. Microb Genom., 2020; 6(7):mgen000393.

12.

Homer

, Szelinger

, Redman

, et al. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet., 2008; 4(8):e1000167.

13.

Fuller

. How a genealogy site led to the front door of the Golden State Killer suspect. New York Times. April 26, 2018. Accessed August 20, 2024. https://www.nytimes.com/2018/04/26/us/golden-state-killer.html

14.

Wee

S-L

. China uses DNA to track its people, with the help of American expertise. New York Times. February 21, 2019. Accessed August 20, 2024. https://www.nytimes.com/2019/02/21/business/china-xinjiang-uighur-dna-thermo-fisher.html

15.

Jacobs

, Yeager

, Wacholder

, et al. A new statistic and its power to infer membership in a genome-wide association study using genotype frequencies. Nat Genet., 2009; 41(11):1253-1257.

16.

Hampton-Marcell

, Lopez

, Gilbert

. The human microbiome: an emerging tool in forensics. Microb Biotechnol., 2017; 10(2):228-230.

17.

Swayambhu

, Kümmerli

, Arora

. Microbiome-based stain analyses in crime scenes. Appl Environ Microbiol., 2023; 89(1):e0132522.

18.

Manor

, Dai

, Kornilov

, et al. Health and disease markers correlate with gut microbiome composition across thousands of people. Nat Commun., 2020; 11(1):5206.

19.

Dominguez-Bello

, Godoy-Vitorino

, Knight

, Blaser

. Role of the microbiome in human development. Gut., 2019; 68(6):1108-1114.

20.

Kim

H-N

, Yun

, Ryu

, et al. Correlation between gut microbiota and personality in adults: a cross-sectional study. Brain Behav Immun., 2018; 69:374-385.

21.

Sweeney

. k-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst., 2002; 10(5):557-570.

22.

Sartre

J-P

. Being and Nothingness. Richmond

, trans. New York: Washington Square Press; 2021.

23.

Scanlon

. What We Owe to Each Other. Cambridge, MA: Harvard University Press; 1998.

24.

Yakowitz

. Tragedy of the data commons. Harv J Law Technol. 2011; 25(1):2-67.

25.

Parker

. Ethical hotspots in infectious disease surveillance for global health security: social justice and pandemic preparedness. In: Savulescu

, Wilkinson

, eds. Pandemic Ethics: From COVID-19 to Disease X. Oxford: Oxford University Press; 2023:277-294.

26.

Wendler

. The Ethics of Pediatric Research. New York: Oxford University Press; 2010.

27.

Gunkel

. Comments on “Metagenomic sequencing for early detection of future pandemics: Foreshadowing the privacy challenge.” Presented at: Center for Population-Level Bioethics Annual Retreat (on Fighting Engineered-Virus Pandemics); Rutgers University; June 2023.

28.

. Utopia. Adams

, trans. 3rd ed. New York: WW Norton & Company; 2010.

29.

Nozick

. Anarchy, State, and Utopia. Oxford: Blackwell Publishers Ltd; 1974.

30.

Ord

. The Precipice: Existential Risk and the Future of Humanity. New York: Hatchette Books; 2020.

31.

Rozenshtein

. Digital disease surveillance. Am Univ Law Rev. 2021; 70(5):1511-1576.

32.

Health Insurance Portability and Accountability Act (HIPAA) of 1996, Pub L No. 104-191, 110 Stat 1936 (1996). Accessed August 23, 2024. https://www.govinfo.gov/app/details/PLAW-104publ191

33.

General Data Protection Regulation, Article 6, Lawfulness of processing. Accessed August 21, 2024. https://gdpr-info.eu/art-6-gdpr/

34.

Standard: Uses and Disclosures for Public Health Activities. 45 CFR 164.512(b)(1)(i) (2023). Accessed August 23, 2024. https://www.ecfr.gov/current/title-45/subtitle-A/subchapter-C/part-164

35.

Constitution of the United States, Fourth Amendment. Accessed August 23, 2024. https://constitution.congress.gov/constitution/amendment-4/

36.

Riverdale Mills Corp v Pimpare, 392 F3d 55 (1st Cir 2004). Accessed August 23, 2024. https://casetext.com/case/riverdale-mills-corp-v-pimpare

37.

American Privacy Rights Act, HR 8818, 118th Cong (2024). Accessed August 23, 2024. https://www.govinfo.gov/app/details/BILLS-118hr8818ih