Abstract
Human immunodeficiency virus vaccination and pre-exposure prophylaxis represent two different emerging preventive tools. Google Trends was used to assess the public interest toward these tools in terms of digital activities. Worldwide web searches concerning the human immunodeficiency virus vaccine represented 0.34 percent, 0.03 percent, and 46.97 percent of human immunodeficiency virus, acquired immune deficiency syndrome, and human immunodeficiency virus/acquired immune deficiency syndrome treatment–related Google Trends queries, respectively. Concerning temporal trends, digital activities were shown to increase from 0 percent as of 1 January 2004 percent to 46 percent as of 8 October 2017 with two spikes observed in May and July 2012, coinciding with the US Food and Drug Administration approval. Bursts in search number and volume were recorded as human immunodeficiency virus vaccine trials emerged. This search topic has decreased in the past decade in parallel to the increase in Truvada-related topics. Concentrated searches were noticed among African countries with high human immunodeficiency virus/acquired immune deficiency syndrome prevalence. Stakeholders should take advantage of public interest especially in preventive medicine in high disease burden countries.
Keywords
Introduction
Since the emergence of the human immunodeficiency virus (HIV) epidemic in the early 80s, development of an effective vaccine to prevent HIV infection has attracted considerable interest from researchers, becoming an ambitious goal. 1 However, there have been numerous unsuccessful efforts to conceive such vaccine. 2 Many obstacles prevented the development of this vaccine; first, viral vaccines mimic natural immunity against reinfection that is found in recovered infected individuals. However, there are almost no recovered acquired immune deficiency syndrome (AIDS) patients. Second, the main role of immunization is to prevent viral-associated disease rather than viral acquisition, although in HIV prevention of viral acquisition is warranted. Third, the development of an effective HIV vaccine can be challenging regardless of the nature of the vaccine itself, whether killed or live attenuated. On one hand, live attenuated vaccines would be unsafe and might cause an infection, whereas, on the other hand, a killed HIV-based vaccine would lack the capability of generating an adequate immune response.2–5
Antiretroviral therapy (ART) has changed the current paradigm of HIV pandemic management since its introduction in 1996. 6 In general, the treatment of HIV infection consists of a combination of several drugs, usually three or more, which is also known as highly active ART (HAART). ART has been shown to be effective in reducing the replication of viral particles to undetectable levels. However, it has been shown to neither eradicate the virus nor halt its spread. Moreover, developments in HIV infection prophylaxis have gained traction in the past decade. The Phase III “Iniciativa Profilaxis Pre-Exposición” (iPrEx) study, published in 2010, demonstrated the efficacy of daily Truvada (Emtricitabine + Tenofovir) as an HIV pre-exposure prophylaxis (PrEP). 7 In 2012, Truvada was approved by the US Food and Drug Administration (FDA). 8 Subsequent studies, such as the French clinical trial IPERGAY or the UK trial PROUD, demonstrated comparable effectiveness.9,10
Nevertheless, HIV/AIDS is considered to be a major cause of death in developing countries, and new cases are continually reported in affected regions. Thus, vaccination remains the best option to overcome this burden. In this article, we performed an analytical search through Google Trends (GT) to investigate public interest in topics associated with HIV prevention, such as vaccination and PrEP. Notably, we aimed to address (1) whether web-related search patterns in terms of search volume correlate with epidemiological figures, (2) whether a geographical correlation between search queries and epidemiological figures exists, and (3) whether media coverage of news concerning major HIV vaccine PrEP studies or events are drivers for HIV-related digital behavior.
Materials and methods
GT is a freely available, open-source, online tracking system of Internet hit search volumes. It was used to assess HIV vaccine–related Internet activity. For this purpose, GT was mined from inception (last search carried out on 8 October 2017). Searches can be performed using two different strategies, namely, the “search term” and the “search topic” options. While the first approach enables to search exactly the keyword or keywords entered by the user, the second search strategy results in a broader search, in which GT does not limit to the entered keyword(s) but systematically performs a search of all web searches containing related terms.
Web queries are reported not as absolute, raw figures but as normalized figures (relative search volumes (RSVs)). In detail, to make comparisons, every query is divided by the total searches performed in that given country and time range and is, then, scaled on a range from 0 to 100 based on the topic’s proportion with respect to all searches carried out on all searchable topics.
In our analysis, we used the “search topic” option. In particular, we looked for “HIV vaccine (Topic),” “Tenofovir/Emtricitabine (Drug)” (Truvada), “HIV (Virus),” “AIDS (Topic),” and “HIV/AIDS treatment (Topic).” Searches were performed worldwide.
PubMed is an online database of scholarly peer-reviewed articles, based on MEDLINE, a large bibliographic repository covering almost all medical fields. A PubMed search was performed in October (last search carried out on 8 October 2017) for all HIV- and Truvada-related peer-reviewed articles written in English.
Data regarding the incidence and prevalence of HIV worldwide were obtained using figures from “The Joint United Nations Programme on HIV/AIDS” (UNAIDS) official website (http://www.unaids.org), and data concerning Internet access in all African countries were achieved through “Internet World Usage Statistics for all countries and regions of the world” official website (InternetWorldStats 2017) (http://www.internetworldstats.com).
For further details concerning GT, the reader is referred to Nuti et al. 11 All novel data streams used in this study are briefly overviewed in Table 1.
An overview of the novel data streams used in this study.
GT: Google Trends; GN: Google News.
Mann–Kendall test and Sen’s robust estimator analysis were carried out using MAKESENS 1.0 (Mann–Kendall test and Sen’s slope estimates for the trend of annual data, Finnish Metereological Institute, 2002).
Concerning web traffic-related data, it is still debated among scholars which is the best regression model for properly capturing the unique features and complexity of such data. In the current literature, GT-based data or data obtained with similar tools (for example, Yandex) have been sometimes modeled using the Poisson regression model. For example, it has been used to estimate the incidence of influenza-like illness in Argentina or in the Americas using GT-generated data as proxies,12,13 or to model sexually transmitted infection–related web queries. 14 Conversely, Roadknight et al. 15 have challenged the view that web traffic data follow a Poisson distribution. Other scholars have used different regression techniques, including generalized linear models (GLMs), 16 logistic regressions, 17 or negative binomial models. 18 At the basis of such various methodological approaches, there could be a particular type of searched website (entertainment, commercial, institutional-governmental, etc.), the chosen time window, the geographic region, and other features characterizing the digital behavior and seeking attitudes. 19
In this article, GT-generated data were modeled using a robust Poisson log-normal regression. This type of regression analysis has been chosen in light of a particular kind of data provided by GT itself; normalized and scaled figures of website traffic and search engine volumes, amassed from users over a certain time period and spatial location, and aggregated on a given time basis (hourly, daily, weekly, or monthly). According to Tierney and Pan, 20 “website traffic data is a non-negative count variable that does not have an upper bound.” Count data series are characterized by a number of statistical properties, including under- and over-dispersion, and heteroscedasticity. All the operations performed automatically by GT (namely, amassing, aggregating, normalizing, and scaling) may transform the raw data and truncate them, introducing a loss of information and potentially distorting them (for instance, introducing extreme over-dispersion in a series of frequency data, originally characterized by under-dispersion, or modifying low frequencies in high frequencies and vice versa). Normalized and scaled data as returned by GT are provided not as pure frequencies/count data but as discrete, truncated indexes. However, a robust Poisson model can automatically take into account all the needed constraints to model a series of discrete, normalized, and scaled data.
In one of the very few investigations aimed to assess the statistical effect of normalization and scaling procedures, Tierney and Pan 20 have systematically analyzed different regression models (including non-linear regression, negative binomial and Poisson regression models, and the two-step negative binomial quasi-maximum likelihood (QML) equation (QMLE)). Non-linear models were discarded due to their assumption of homoskedasticity, whereas negative binomial models, being a quadratic function of the mean, were considered to not sufficiently take into account under-dispersion, which could characterize web traffic-related data. Similarly, the two-step negative binomial QMLE approach did not produce sufficient robust findings. The authors concluded that the best effective way to analyze GT-generated data was the Poisson QML model, which differently from the classical Poisson count model (PCM), based on the maximum-likelihood estimating function, does not require the conditional mean to be equal to the conditional variance of the data series.
Pervaiz et al. 21 compared three different statistical approaches (namely, based on normal distribution, Poisson distribution, and negative binomial distribution) applied on data derived from Google Flu Trends, which is a fully automated system for early warning of influenza epidemics. The authors found that Poisson- and negative binomial–based algorithms generally tended to perform better than normal distribution–based algorithms, with the Poisson method having a higher predictive power.
These theoretical considerations were further corroborated by extensively performing distribution fitting tests. The distribution that fit best the data for the goodness-of-fit test was chosen on the basis of the p-value. Based on this distribution, different regression models were run with different independent predictors, and the best model was chosen according to fitting parameters such as −2 log(likelihood) and pseudo-R 2 according to McFadden, Cox and Snell, Nagelkerke, Akaike information criterion, Schwarz–Bayesian Criterion, deviance, and Pearson’s chi-square.
All these analyses were performed using the commercial software XLSTAT Premium version 19.7 for Windows (Addinsoft, France). All figures with p-value less than 0.05 were considered statistically significant.
Results
As shown in Figure 1(a), average HIV vaccine–related digital activities, expressed as RSVs, were 16.45 percent ± 10.35 percent (median 13%). Concerning the time trends, they tended to decrease throughout time (from 28% as of 1 January 2004% to 11% as of 8 October 2017). This decrease was statistically significant both on a yearly (Mann–Kendall z = −4.55, p < 0.001) and a monthly basis (all months p < 0.001, except September, Mann–Kendall z = −2.48, p < 0.05, and December, Mann–Kendall z = −2.94, p < 0.01; Figure 2(a)). Web searches concerning “HIV vaccine” represented 0.34 percent, 0.03 percent, and 46.97 percent, of HIV, AIDS, and HIV/AIDS treatment–related GT queries, respectively.

Time trend of HIV vaccine/HIV pre-exposure prophylaxis/Truvada-related queries as captured by Google Trends from 2004 to present and geospatial distribution of HIV vaccine/HIV pre-exposure prophylaxis/Truvada-related queries as captured by Google Trends. (a) HIV vaccine–related interest as shown by Google Trends search volumes and geographic concentration. (b) HIV pre-exposure prophylaxis/Truvada-related interest as shown by Google Trends search volumes and geographic concentration.

(a) Yearly and monthly trend of HIV vaccine–related queries as captured by Google Trends in the study period. (b) Yearly and monthly trend of HIV pre-exposure prophylaxis–related queries as captured by Google Trends in the study period.
From a geospatial standpoint, HIV vaccine–related GT queries were concentrated in Uganda (100%), Botswana (98%), Zimbabwe (89%), Malawi (74%), and Zambia (65%). These queries were comparable with AIDS-correlated searches, based in Zambia (100%), Zimbabwe (97%), Mozambique (87%), Uganda (76%), and Kenya (65%). HIV-related queries were concentrated in Zimbabwe (100%), Malawi (91%), Zambia (89%), Namibia (79%), and Mozambique (77%). Treatment of HIV/AIDS was mainly based in Swaziland (100%), Lesotho (65%), Zambia (48%), Zimbabwe (41%), and Malawi (41%).
After performing distribution fitting test, HIV vaccine–related data were found to follow a Poisson log-normal distribution, with a p-value of 0.960 (Table 2) and, therefore, were modeled using robust Poisson log-normal regression.
Distribution fitting for HIV vaccine– and Truvada-related queries as captured by Google Trends in the study period.
The best model incorporated all the independent predictors (namely, epidemiological figures, scientific interest as captured by PubMed, and media coverage as captured by Google News; Figure 3(a)), as shown in Table 3. Considering both prevalence and incidence, all the regressors resulted statistically significant and positively associated with the regressand (Table 4).

Comparison of different novel data streams (PubMed/MEDLINE, Google Trends, and Google News) related to (a) HIV vaccine and (b) pre-exposure prophylaxis with epidemiological incident and prevalent HIV cases in the study period.
Goodness-of-fit statistics for different models predicting HIV vaccine–related queries as captured by Google Trends in the study period.
Estimates of the different models predicting HIV vaccine–related queries as captured by Google Trends in the study period.
GN: Google News.
Concerning Truvada-related GT queries, as shown in Figure 1(b), the average RSV was 21.10 percent ± 13.30 percent (median 16%). Concerning temporal trends, digital activities tended to increase (from 0% as of 1 January 2004 to 46% as of 8 October 2017), with two major spikes in May and July 2012 (80% and 100%, respectively). This increase was statistically significant both on a yearly (Mann–Kendall z = 3.39, p < 0.001) and monthly basis (all months p < 0.001, except for January, April, July, October, November, and December, p < 0.01, and May, p < 0.05; Figure 2(b)). Web searches concerning Truvada represented 0.56%, 0.03%, and 57.49%, of HIV, AIDS, and HIV/AIDS treatment–related GT queries, respectively.
From a geospatial point of view, queries were concentrated in Botswana (100%), Zambia (58%), Togo (44%), Zimbabwe (40%), and Namibia (38%).
Similar to HIV vaccine–related queries, search volumes concerning PrEP were also distributed according to Poisson log-normal distribution, with a p-value of 0.721 (Table 2). Once again, the model incorporating all the predictors (Figure 3(b)) resulted in the best one in terms of goodness of fit (Table 5). Considering prevalence, cases and scientific interest were statistically significant, but not media coverage as captured by Google News (p = 0.729), as reported in Table 6. Considering incidence, only scientific interest, in terms of peer-reviewed article production as captured by PubMed, yielded statistical significance (Table 6).
Goodness-of-fit statistics for different models predicting HIV pre-exposure prophylaxis/Truvada-related queries as captured by Google Trends in the study period.
Estimates of the different models predicting HIV pre-exposure prophylaxis–related queries as captured by Google Trends in the study period.
GN: Google News.
Discussion
Investigating information-seeking behavior on the web concerning health-related issues, and exploiting tracking/monitoring tools such as GT, is known as infodemiology, 22 a well-established practical and informative method to analyze and predict different patterns of diseases. The Internet is a powerful tool that can be used to inform the public on various healthcare issues and policies. Novel data streams are extremely valuable in the field of infectious diseases and outbreaks. Once evaluated and addressed by public health authorities, better management of outbreaks can be achieved. Addressing the Ebola outbreak in 2014 demonstrated the usefulness of GT as an effective tool in identifying and planning necessary strategies for health-threatening events management. 23 Furthermore, Twitter has been shown to be useful in disseminating accurate information regarding Ebola disease outbreak in Western Africa, thus illustrating the potential role of Internet and web searches during health disasters. 24
Concerning HIV vaccine– and PrEP-related web searches, our study showed a slight decrease in searches in the past decade. This decrease was also significant on a monthly basis (Figure 3(a)). However, certain months did not show a similar pattern of decrease, such as December. Such findings can be attributed to the fact that since 1988, December has become the AIDS awareness month and this may contribute to increase public awareness concerning HIV/AIDS. The model we developed could enable the monitoring in real time of the impact of news coverage by media or scholarly publications on public interest and engagement, providing, for instance, useful information to workers in the field of health communication and to institutional bodies and authorities.
In the literature, some studies have been conducted to assess HIV/AIDS-related web searches. For example, Chiu et al. 25 showed a positive impact of news trends on the online search behavior regarding HIV/AIDS and males who have sex with males (MSM) during a 10-year period of web queries in Hong Kong. In addition, the authors noticed that such search patterns peaked 2–10 weeks after news were published. Moreover, significant correlation between the patterns of chronic diseases, including HIV, and online activity during a 10-year period has been reported. 26 However, no significant correlation between campaigns of HIV and search activity was found. Domnich et al. 14 demonstrated a clearly high correlation between notification rates of HIV and syphilis and search volumes in different regions of Russia. Furthermore, the authors showed a geographical correlation between search volume and disease incidence.
Interestingly, the burst of HIV vaccination web searches from December 2004 through early 2005 reported in our analysis overlaps with the STEP study enrollment commencement. 27 The study was rendered unsuccessful in demonstrating the efficacy of a newly devised HIV vaccine containing adenovirus synthetically modified to contain HIV proteins. This study was terminated and reviewed in September 2007 showing that enrolled men and women at increased risk of acquiring HIV in the vaccine arm had a higher HIV infection rate compared to the placebo group. In contrary to expectations, no burst in searches was recorded when the study stopped or when the results were finally released and published in 2008.
The striking burst in September 2009 was attributed to the publicly released data of the RV 144 vaccine trial. The results showed a 30 percent decrease in HIV rate among volunteering participants who received the vaccine compared to placebo. 28 A less significant, albeit high, burst was recorded in October 2009 when the full data results were published.
Interestingly, another burst in September 2013 was noted corresponding to the FDA Phase I trial using the SAV-001 vaccine. Vaccination with SAV-001, the first known killed and genetically modified HIV vaccine, was shown to significantly increase the titer of antibodies against HIV gp120 and p24 by 6- and 64-fold, respectively, in 33 recruited HIV-positive individuals treated with anti-retroviral drugs. 29
Our findings have several differences in comparison with the aforementioned studies. While Chiu et al. 25 showed a late peak in search volumes after news publication, ranging from 2 to 10 weeks, the peak in HIV vaccine–related web searches reported in our study coincides with the trial date initiations, a finding that can be explained by the attractiveness and high interest toward HIV vaccination. Furthermore, while Ling and Lee 26 showed no correlation between HIV campaign and HIV search activity in Canada, we reported a clearly direct correlation between the trial enrollment campaign and search behavior.
The high incidence and burden rate of HIV/AIDS in Africa shed light on the high HIV/AIDS-related search patterns among African countries, especially in Uganda and Zimbabwe. According to the UNAIDS 2016, there are close to 1.3 and 1.4 million people living with HIV in Zimbabwe and Uganda, respectively, with a prevalence rate of 13.5 percent in Zimbabwe and 6.5 percent in Uganda. 30 While there is no significant difference in HIV prevalence among the different African countries, people in Uganda and Zimbabwe have the highest rate of access to the Internet (45% and 41% of the population, respectively). However, this speculation fails to explain why Zambia comes first in terms of AIDS-related searches while having a very low percentage of access to the Internet (8.1%). 31 Moreover, a geographical correlation was also found in the study of Domnich et al. 14
Another point that deserves special attention is the link between FDA approval of Truvada as PrEP of drugs and web search patterns. FDA approval of Truvada as a PrEP in 2012 was accompanied by two striking bursts during the same year. Other correlations between FDA announcements and peaks in web searches have been previously reported. For instance, a peak in searches for heparin in March 2008 was attributed to the well-known recall of heparin by the FDA after discovering that some stocks contained an over-sulfated derivative of chondroitin sulfate. 32
In our study, the statistically significant effect of HIV cases/new cases on GT search patterns pertinent to HIV vaccine–related topics is an additional valuable finding. Similarly, the impact of disease incidence on GT was studied by others. For example, Seifter et al. 33 showed a seasonal and geographical correlation between GT patterns and Lyme disease incidence. In another study, big data analysis showed a significant correlation between Zika virus outbreaks and increased public interest through all web data sources in terms of searches and interactions. 34 Therefore, an increase in HIV case prevalence is a strong drive for public concern in preventing HIV by searching for a possible vaccine. Moreover, interest in HIV vaccine trial participation, which presents an obstacle in enrollment, has been addressed by several studies. Vlahov et al. 35 interviewed 375 HIV-negative injecting drug users and showed a high level of interest for participation in HIV vaccine trials, however necessitating more education regarding the vaccine and its associated risks. Koblin et al. 36 have assessed the likelihood of enrollees to participate in vaccine trials and the effect on willingness of media events about the decision not to proceed to Phase III trials because of increased rate of infection among participants. In this study, a high percentage of willingness was demonstrated along with a negative impact of the media on such willingness. Nevertheless, others studies were able to show a correlation between media coverage, for instance, Google News, and GT. Segev and Baram-Tsabari 37 discussed the impact of media and education on web search behavior concluding that focused media coverage on current events and concerns may create a teachable moment, which motivates people to independently search for related information. Other studies38–40 showed a direct and significant impact of media, either positive or negative, on vaccination coverage.
In our study, a geographical pattern was found to overlap with high disease burden in African countries, with significant changes in time trends both at a yearly and monthly basis.
In conclusion, despite obstacles and disappointments in the process of developing an effective HIV vaccine, it remains a common web-related search topic with remarkable bursts occurring in coincidence with large trials’ start or published data. HIV vaccine–related searches have decreased significantly during the past decade, probably because of approval of PrEP. Our hypothesis is strengthened by the fact that Truvada-related web searches have in turn increased in number. African countries with higher HIV/AIDS burden are the countries with higher web searches on this topic. International organizations, especially those dealing with disease prevention, should take advantage of public interest in preventive medicine in high disease burden countries such as HIV/AIDS in Africa. This is extremely valuable if the high costs of ART, when used as prevention, are taken into consideration in resource-limited countries.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
