Abstract
The use of artificial intelligence (AI) to generate automated early warnings in epidemic surveillance by harnessing vast open-source data with minimal human intervention has the potential to be both revolutionary and highly sustainable. AI can overcome the challenges faced by weak health systems by detecting epidemic signals much earlier than traditional surveillance. AI-based digital surveillance is an adjunct to—not a replacement of—traditional surveillance and can trigger early investigation, diagnostics and responses at the regional level. This narrative review focuses on the role of AI in epidemic surveillance and summarises several current epidemic intelligence systems including ProMED-mail, HealthMap, Epidemic Intelligence from Open Sources, BlueDot, Metabiota, the Global Biosurveillance Portal, Epitweetr and EPIWATCH. Not all of these systems are AI-based, and some are only accessible to paid users. Most systems have large volumes of unfiltered data; only a few can sort and filter data to provide users with curated intelligence. However, uptake of these systems by public health authorities, who have been slower to embrace AI than their clinical counterparts, is low. The widespread adoption of digital open-source surveillance and AI technology is needed for the prevention of serious epidemics.
Keywords
Introduction
Artificial intelligence (AI) has been adopted in a wide spectrum of clinical medicine applications; 1 however, the uptake of AI technologies in public health remains slow. 2 The coronavirus disease 2019 (COVID-19) pandemic has led to substantial investment in AI tools for epidemic surveillance. With rapid advancements in AI and machine learning algorithms, state-of-the-art epidemic surveillance systems have been developed to detect early signs of epidemics by processing open-source data including news reports and social media data. The implementation of AI in outbreak detection requires 1) natural language processing (NLP) of a large quantity of multi-dimensional open-source data to detect early warning signals, 2) identifying local and regional patterns in the detected signals, 3) modelling and simulating outbreak behaviours and 4) rapidly identifying misinformation and disinformation that can cripple pandemic responses.3,4 In this narrative review, we focus on the role of AI in the early detection of and response to outbreaks and summarise the current Internet-based epidemic intelligence systems available. This review aims to update the knowledge of AI in epidemic surveillance and assess the need for widespread adoption of AI-driven open-source surveillance in public health.
The use of AI to generate automated early warnings for epidemics by harnessing vast open-source data with minimal human intervention can be revolutionary and highly sustainable. In low-income countries, AI has the potential to overcome the shortfall in human resources for traditional labour-intensive disease surveillance, which relies on doctors or laboratories to report infections and is a passive and untimely system that requires multi-level reporting structures. 3 AI can further address politically sensitive issues such as data censorship. Epidemics grow exponentially and often spread by the time health authorities become aware of them. Although notifications based on confirmations from laboratories and healthcare systems are valid, early detection can be expedited and enhanced by using as early epidemic signals open-source data such as news reports, social media and geospatial, temporal, environmental and meteorological satellite data. 3 Time is critical in an epidemic. For example, the severe acute respiratory syndrome coronavirus (SARS-COV-2) epidemic in Wuhan, China may have started with a single case that rapidly increased to a handful of cases in a short timeframe. 5 Before the spread of the virus outside of China, transmission could have been contained through case isolation, contact tracing and quarantine, and the global pandemic could have been prevented.
Infectious disease outbreaks are further characterised by non-linear complex dynamics that are not well captured by conventional statistical approaches. AI technology applied to open-source data and followed up with formal outbreak investigation enables rapid epidemic signals that can detect and prevent serious epidemics. Open-source systems generate large quantities of unfiltered data with uncertain meaning and may overwhelm users or lead to misleading conclusions. AI can be used to curate, filter and decipher such data to provide more valid early warning signals. AI technology can also predict spread at a granular scale, guiding data-driven early local responses that can be critical in the initial stages of a pandemic. 6 Given the non-linear complex spread dynamics and the uncertainty inherent in early epidemic evolution, complex dynamic AI-based modelling frameworks such as multi-agent models can help temporally and geospatially simulate the evolution of epidemics, allowing targeted and effective public health responses. 6 In addition, these frameworks can be used to identify the most effective interventions and their importance in limiting spread. Finally, AI technology can overcome the challenges faced by weak health systems and issues such as poor surveillance and the censorship of outbreak reporting. 3
Early detection through AI-based surveillance and monitoring
A vast array of uncurated, open-source data including social media and news reports capture the concerns and discussions of the community. If mined regularly, these data can provide early signals of epidemics before official detection by health authorities. 3 Public health has been slow to adopt AI technology and utilise AI capability to enable rapid and early detection of epidemic signals. AI can further address the issue of data censorship and the challenges faced by weak healthcare systems in low-income countries with limited human resources for public health surveillance. Figure 1 illustrates the genesis and exponential growth of epidemics and the importance of early detection to prevent global spread. The early detection of epidemics affords the best prospect of preventing global spread. Currently, epidemics are detected using traditional surveillance methods that rely on disease surveillance data formally reported by healthcare workers and laboratories. Given the time lag between the onset of a symptom and the formal laboratory confirmation report, traditional surveillance methods are not rapid enough to allow for the early detection of serious epidemics. 3 The COVID-19 pandemic is an example: the SARS-CoV-2 virus had already spread worldwide before the virus was first reported. The first COVID-19 cases presenting severe pneumonia of unknown origin were officially reported in China on approximately 8 December 2019. However, a retrospective study using open-source intelligence data identified another COVID-19 case in China in mid-November 2019. 7 In Spain, the first COVID-19 case was officially reported on 25 February 2020; however, the virus was detected in sewage water in Barcelona 41 days before that date. 5 The COVID-19 pandemic demonstrates the critical need for the early detection of epidemics using AI technology in public health. 5 Currently, investment in epidemic preparedness predominantly supports the development of drugs and vaccines. Although essential, drugs and vaccines tend to become available a considerable time after a serious new infection has emerged and spread widely. 8 Figure 1 highlights currently missing system capability—rapid epidemic intelligence, recognition and risk analysis—and the potential gains of AI-based epidemic intelligence.

Enhancement of epidemic detection by rapid epidemic intelligence and risk analysis.
As demonstrated during the 2020 COVID-19 pandemic and the 2014 Ebola epidemic, non-pharmaceutical interventions are critical during epidemics, especially when drugs or vaccines are unavailable. 9 Epidemic growth was exponential in both cases, making rapid intervention critical to mitigating spread. 9 The Ebola epidemic could have been detected in late 2013, 3 months before the World Health Organisation (WHO) was informed, using rapid social media-based intelligence—even in Guinea, which has relatively low smartphone penetration. 10 Similarly, a retrospective study using open-source data on Weibo detected a signal of unknown severe pneumonia in Hubei Province in mid-November 2019, a month before the COVID-19 outbreak was officially reported. 7
The most widely used outbreak alert system is the Program for Monitoring Emerging Diseases (ProMED-mail), a qualitative reporting system to which clinicians report unusual outbreaks. 11 This system relies on health professionals notifying moderators of unusual outbreaks. While the system has improved the speed of traditional health system surveillance and has been the first to detect many important epidemics, it is still largely dependent on human reporting and does not harness the full capabilities of open-source data and AI. The ideal system must harness and process multiple sources of unstructured data and display data in a curated, filtered and structured format that can inform rapid public health action.3,12
Another system used for outbreak alerts is Google Flu Trends, which was in use from 2008 to 2015 but was terminated because of errors in estimation. 13 Another tool, DEFENDER, 14 was an outbreak detection, surveillance, forecasting and nowcasting system developed as a research tool by the United Kingdom. The system integrated geocoded symptom data from Twitter with news reports and used this information for outbreak detection, situational awareness and nowcasting. 14 DEFENDER was not a public system and is not currently in use. This review focuses on systems in current use.
Current Internet-based epidemic intelligence systems
Several digital systems for the early identification of important public health events are currently available. 15 The systems vary in scope based on the type of event covered; event types range from solely human diseases to a combination of human, animal and plant diseases and infectious diseases to all health events including non-communicable diseases, natural disasters and humanitarian emergencies. 16 A One Health approach is ideal for infectious diseases. 17 This approach describes that an end-user, whether a public health official from a national agency or a lay person, should be able to access a range of services such as news reports, geographical heat maps, risk analysis tools on the web or via mobile applications. Some services are commercially available and others are internal systems accessible only to public health agencies.
Internet-based surveillance systems are breaking new ground in the surveillance of public health events that otherwise rely exclusively on laboratory diagnostic capability and timely notification by health professionals. Although vast open-source data are freely available online, they may be irrelevant or lead to false positive reports. Research on the level of human moderation required to operate a valid epidemic intelligence system is currently limited.3,18 However, the incorporation of AI technology such as machine learning and text mining using NLP is undeniably essential for processing and filtering large amounts of unstructured data and generating valid early warning signals of serious epidemics. 19 We describe eight known early warning systems below.
ProMED-mail
ProMED-mail is a system established by the International Society for Infectious Diseases in 1994 to identify unusual health events affecting humans, animals and plants.20,21 This free service is moderated by expert staff who review reports from health professionals, the Internet (e.g., official government websites) and traditional media. 22 This system largely relies on a network of health personnel to provide qualitative reports of events of interest and an expert assessment of risk. In the absence of an automated data collection process, the importance of articles is judged on a case-by-case basis, introducing the potential for human error and personal bias that are inherent in human moderation.21,23 With a network of staff from at least 30 countries working in different time zones and nearly 80,000 subscribers from approximately 200 countries, ProMED-mail operates 24/7 and has identified several important epidemics. 24 ProMED-mail has ongoing collaborations with other services such as HealthMap, the United States Agency for International Development and Public Health England.24,25 Human moderation continues while research to automate the data collection and curation processes is currently in progress. 23 On average, six posts are displayed daily by ProMED-mail. 26
HealthMap
HealthMap is a fully automated system that does not rely on human moderation. It reports all health events including non-communicable diseases and is therefore not specific to epidemics. 27 Developed in 2006, HealthMap uses Fisher–Robinson Bayesian filtering in a Linux/Apache/MySQL/PHP application with other products such as Google Maps, GoogleMapAPI for PHP, Google Translate API and a single AJAX library in PHP. 28 HealthMap uses a text processing algorithm to automate identifying, classifying and overlaying relevant information on a map. 29 This system contains multiple modules including a data acquisition engine, a classification engine, a back-end web application, a database and a front-end web application that enable the system’s smooth operation. 30 HealthMap processes approximately 80 infectious disease alerts daily. 31
Epidemic Intelligence from Open Sources (EIOS)
The EIOS was developed as a collaboration between the WHO and the Joint Research Committee of the European Commission.17,32 Developed in 2017, the EIOS is an automated system with roots in the Global Public Health Intelligence Network and the Global Health Security Initiative and endorses the Early Alerting and Reporting system and the Hazard Detection and Risk Assessment System. 33 The EIOS is estimated to have a system capacity of at least 40 million news items from 12,000 web sources including social media in multiple languages. 34 The system includes NLP recognition technology, article classification and priority algorithms to identify, tag and categorise reports. 33 Additionally, the EIOS includes human review before reports are made available to users 33 and reports a broad scope of events that range from human health, natural disasters, conflicts and mass gatherings. During the COVID-19 pandemic, the EIOS provided a public COVID-19 news map 35 and access to COVID-19 data with accompanying graphs through a public dashboard 36 that displayed data from the WHO, Johns Hopkins University, the European Centre for Disease Prevention and Control (ECDC) and Worldometer. 32 The EIOS was operational in 2019 but did not contribute to the early detection of the COVID-19 pandemic. System access is granted exclusively to the WHO and specific agencies or countries. The EIOS system is integrated with the INFORM suite, which includes risk analysis tools. 37
BlueDot
BlueDot is a commercial system that began as a transport network modelling tool and later added open-source intelligence and clustering tools to allow the identification of potential hot spots for infectious disease outbreaks. 38 BlueDot uses both AI and human moderation and includes a search capacity in multiple languages. However, this system is not available for public use and is only available to paying clients. 38 In addition, the system has access to closed-source information such as government data, which is usually provided by clients.
Metabiota
Accessible to the public, the Metabiota Epidemic Tracker provides a heat map to show the geographic distribution of event-based epidemics for 208 pathogens. 39 Metabiota couples disease impact with economic impact by calculating a Pathogen Sentiment Index, a unique feature of potential interest to insurance companies and the travel and tourism industry. 40 Metabiota provides a large set of simulated events—with up to 18 million simulations for a single pathogen—using big data and cloud computing platforms. 41 Metabiota further has a validated library with various disease models that facilitate risk analysis and inform planning and response activities using a peer-review process.
Global Biosurveillance Portal (GBSP)
The web-based information-sharing system GBSP 42 facilitates timely responses and decision-making to support the detection and management of natural and unnatural biological hazards. Based on the Ozone Widget Framework architecture, an open-source data integration framework, the GBSP provides end-users with a single web front-end to access reports in HTML frames from multiple web applications. 43 The GBSP further integrates systems in a whole-of-government approach by including data from the United States Department of Defense and other government agencies. To support the COVID-19 response, the GBSP provides near real-time data sharing, mapping and AI-based predictive analysis models for users and has partnerships with various health organisations and industry partners across countries. 44 However, the GBSP is not publicly available, and its early warning sensitivity is currently unknown.
Epitweetr
Epitweetr, an R-based tool developed in 2018 by the ECDC, is an open-source system that monitors tweets on infectious diseases. 45 To identify potential public health threats, individual detection signals can be sorted by geolocation, time and language. 46 The rationale for creating this system was based on the proven value to public health responses of monitoring tweets and social media.3,10,47 This system is publicly available, provides open-source code and can be customised by users.
EPIWATCH
EPIWATCH is an AI-based system that harnesses open-source data to generate automated early warnings of epidemics worldwide. A public dashboard provides analytics with a searchable and sortable table of outbreak reports, analytics capability and geographic information systems mapping functionality free of charge. EPIWATCH provides AI-based event filtering, prioritisation, curation and human review of reports. These features ensure that the user is not overwhelmed with an impractical volume of data and provides a more reliable and trustworthy prediction of disease outbreaks.
EPIWATCH captures specific infectious diseases and clinical syndromes that may signal new and emerging infections. The system uses AI techniques that incorporate contemporary NLP and named entity recognition algorithms to automatically detect data points within scanned articles. A second AI sub-system of classification and prioritisation is empowered by bidirectional encoder representations derived from transformers (BERT)48,49 and can assess with 88.2% accuracy whether articles contain relevant outbreak information. BERT allows the articles to maintain contextualised representations 49 and achieves state-of-the-art results in many downstream tasks including text classification, named entity recognition and text summarisation in NLP fields. 50 Use of pre-trained BERT on datasets such as Google News and fine-tuning on a smaller dataset using transfer learning techniques have proven effective in increasing the robustness of the model. 51 This AI system is trained and validated on article datasets.
In addition to EPIWATCH’s public dashboard, extra functionality is provided with an internal dashboard. EPIWATCH also has a suite of risk analysis tools such as FLUCAST, 52 EPIRISK 53 and ORIGINS 54 that are designed to forecast the severity of an emerging influenza season, prioritise serious developing epidemics and provide insights into the origins of epidemics, respectively.
Using the Centers for Disease Control and Prevention’s guideline for evaluating a public health surveillance system 55 , we compared available systems using 15 specific parameters (Table 1).
Current Internet-based epidemic intelligence systems.
WHO, World Health Organization; CDC, United States Centers for Disease Control; GPHIN, Global Public Health Intelligence Network; app, application.
Geospatiotemporal forecasting
Understanding and predicting the geospatial risk of outbreaks and the evolution and spread of epidemics can further inform public health responses. In this context, machine learning methods can identify and predict the risk of outbreaks at a granular level—both geospatially and temporally. Processing such multi-dimensional (i.e., geospatiotemporal) data for analysis and forecasting requires machine learning approaches that can utilise these features to develop prediction models without losing salient information that carries important signals. Convolutional neural networks, 109 transfer learning, support vector machines, random forest, 110 deep learning and gradient boosting machine learning have been applied with high accuracy to these challenges in different contexts. Used in research, these models typically utilise regional data on past outbreaks, environmental factors, travel data, social factors, vector distribution and satellite meteorological data (e.g., temperature and rainfall). These data can be highly predictive of the occurrence and timing of regional outbreaks, providing a framework for early preparedness and response. None of the available epidemic intelligence systems has automated capability for geospatial risk prediction.
Modelling of interventions and response
Early warning systems can be enhanced by modelling pandemic growth and the effectiveness of interventions. This requires modelling complex dynamic systems with non-linearities that can be applied to time-series data with lags between interventions and responses. A variety of AI-based approaches have been applied to develop frameworks for these data and can be automated, customised and added to early warning systems. These approaches include long-short term memory networks that are ideal for modelling temporal data trends and can be trained to retain memory for features that are important for prediction at a given point in time whilst ‘forgetting’ features that are unimportant. 13 These networks retain ‘memory’, as needed, over time and can account for lags between intervention and response. Support vector machine models and transformers have also been used to flexibly model the impact of interventions on pandemic growth globally; these systems have identified the most effective interventions employed during the SARS-CoV-2 pandemic.
Agent-based simulation models can further provide a flexible alternative to conventionally used susceptible-infectious-removed models to model geospatial dynamics and spread. These models can create synthetic populations and use available granular data on geospatial context, contact rates, behaviour, mobility and infrastructure to model the spread and impact of interventions at a fine scale.
The potential of AI in public health
The potential of AI in public health is illustrated through the development and use of the epidemic intelligence systems described in this review. Nevertheless, AI is not widely implemented at an operational level in the everyday practice of public health compared with the use of AI in clinical medicine. By generating early epidemic warnings even in low-resource settings or in areas in which data are censored by governments, AI can be revolutionary. AI enables early identification and intervention, allowing the early management of newly emergent epidemics to feasibly result in eradication. When added to late-stage interventions such as diagnostics, drugs and vaccines, AI can considerably improve health security and the prospect of preventing pandemics. AI can be used to identify not only specific diseases but also clinical syndromes that may predict new and emerging infections. Innovations in novel digital syndromic surveillance systems using open-source data can support the early detection of serious emerging infectious epidemics. The key features of an optimised AI system are:
Rapid intelligence drawn from open-source data to generate higher-level and earlier epidemic alerts compared with traditional surveillance without the need for human reporting. These alerts can be followed up with formal investigation and traditional surveillance methods such as laboratory confirmation by public health authorities. The capability to rapidly and globally identify key serious syndromes that may result from new emerging infections or biowarfare events. The ability to address the issues of censorship of reporting and reliance on human reporting and the challenges faced by weak health systems. The capability to predict in real time the likelihood of serious outcomes of identified events using a suite of decision support tools (e.g., risk analysis, modelling and simulation), prioritise responses and determine the urgency of intervention. Tailored user interfaces on the Web, mobile applications for real-time decision support and tools that can be adapted for use in health and defence across government and non-government sectors that require early warning and intelligence on serious epidemics.
The COVID-19 pandemic has prompted substantial investment in AI tools for epidemic surveillance. The United States Centers for Disease Control and Prevention established the Center for Forecasting and Outbreak Analytics in 2021, 111 and the United Kingdom announced its Global Pandemic Radar the same year. 112 In Berlin, the WHO further established the Pandemic Hub, 113 which is co-funded by the German government. The ECDC established an open-source tool called Epitweetr in August 2018.45,46 The unprecedented 2022 monkeypox epidemic in non-endemic countries provided a test case for using learnings from epidemic intelligence systems’ responses to the COVID-19 pandemic. All systems reported on monkeypox; however, we are aware of only two special initiatives. HealthMap created a monkeypox dashboard with daily updates of case counts, 114 and EPIWATCH created a weekly summary of syndromic surveillance for rash and fever illnesses that could be monkeypox misdiagnosed as other illnesses. 115
The past decade has seen the emergence of epidemics such as the novel zoonotic influenza,116,117 the Middle East respiratory syndrome coronavirus, Ebola, Zika virus and SARS-CoV-2. 118 These events highlight the increasing risk of emerging infectious diseases and the need for early warning signals. Strategies to adopt open-source early warning systems and provide the source code for such systems would allow for collaborative design on a global scale. Harnessing the creative talents of health and software engineering experts working collaboratively in interdisciplinary teams could support optimising global early warning systems. The creation of tools with free availability and user interfaces in all major international languages can increase access—including for the community and local health authorities—to open-source intelligence.
Conclusion
The widespread adoption of AI technology in public health and clinical medicine can revolutionise disease prevention and control. Currently, the use of available systems is not widespread at the grassroots level of public health practice. AI technology can generate early epidemic warnings without reliance on passive human reporting, enable intervention early in the timeline of an epidemic and allow newly emergent epidemics to be identified and eradicated as quickly as possible. In this review of existing epidemic intelligence systems, EPIWATCH is identified as having substantial value in epidemic intelligence collection, the identification of outbreak alerts and early epidemic signal detection. Widespread adoption of digital surveillance by public health agencies at the global, national and local operational levels offers the best prospect of preventing the next pandemic.
Footnotes
Author contributions
CRM: Conception and design of the study, manuscript drafting and critical revision of the article. XC: manuscript drafting, data collection, revision and manuscript submission. MPK: manuscript drafting, data collection and revision. AQ: manuscript drafting, data interpretation and revision. SL: manuscript drafting and revision. HS, HP, LY, DH, WW, and IS: manuscript drafting. DG: manuscript drafting and study conception.
Declaration of conflicting interests
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: The authors are part of the EPIWATCH system.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Funding was provided by the Medical Research Future Fund Research Grant (ID RFRHPI000280), Stage 1 from the Australian Government, and from The Balvi Filantropic Fund. Raina MacIntyre is supported by an NHMRC Principal Research Fellowship, grant number 1137582.
