Sage Journals: Discover world-class research

Abstract

Objective: To audit and compare search autocomplete results in Spanish and English during the early COVID-19 pandemic in the New York metropolitan area. The pandemic led to significant online search activity about the disease, its spread, and remedies. As gatekeepers, search engines like Google can influence public opinion. Autocomplete predictions help users complete searches faster but may also shape their views. Understanding these differences is crucial to identify biases and ensure equitable information dissemination. Methods: The study tracked autocomplete results daily for five COVID-19 related search terms in English and Spanish over 100+ days in 2020, yielding a total of 9164 autocomplete predictions. Results: Queries in Spanish yielded fewer autocomplete options and often included more negative content than English autocompletes. The topical coverage differed, with Spanish autocompletes including themes related to religion and spirituality that were absent in the English search autocompletes. Conclusion: The contrast in search autocomplete results could lead to divergent impressions about the pandemic and remedial actions among different sections of society. Continuous auditing of autocompletes by public health stakeholders and search engine organizations is recommended to reduce potential bias and misinformation.

Keywords

algorithmic bias COVID health information health disparity search autocompletes search bias

Introduction

The global COVID-19 pandemic resulted in sharp increases in online search activity about the disease, its spread, and remedial actions.¹ Search engines can significantly influence public perception of the disease and the efforts undertaken by the public.^2,3 Autocompletes results are auto-generated queries that allow users to complete searches faster by populating information in the search engine’s text box as they type the search word. For example, when a user types “coronavirus is,” Google suggests autocompletes such as “coronavirus is contagious,” “coronavirus is man-made.” While convenient, the function may contribute to bias that, if left unchecked, can lead to health inequality experienced by marginalized and racial minority groups by providing different results for similar inquiries.⁴ This becomes extremely important in pandemic scenarios where the public has little information and is likely to react sharply based on the (potentially biased) data they might receive.³

The example in Figure 1 includes the Google autocomplete results for “coronavirus is [in English] …” and “coronavirus es [in Spanish]… ” as observed via the web interface during the early phase of the pandemic (Mar 6, 2020) in the New York area. The first three auto-complete results for “coronavirus is …,” in English include one question and two neutral-sounding statements. However, in the Spanish version, “coronavirus es …”, all the top auto-completes are perceived as negative. The information generated using autocompletes searches may differ, depending on the language preferred by the users for the search, despite facing the same medical and health threats. The autocompletes could shape public perception of the disease, the public’s reaction to the disease, and, ultimately, the outcomes of the disease.^3,5–7

Figure 1.

Differences across languages in search engine auto-completes about COVID-19 as observed in New York region on Mar 6, 2020.

The above example motivated the need to systematically examine the differences in COVID-19 related search autocompletes, raise awareness, and create scientific knowledge to mitigate an understudied form of cross-language health information bias. Hence, we proceeded to systematically collect and analyze Google autocomplete data in NY region with goals to:

(1) Introduce the public health problem of cross-language bias in health-related search auto-completes.

(2) Audit COVID-19 autocomplete search results in English and Spanish using logged 2020 data in New York region from Google search application programming interface.

Background and significance

The internet is a primary source of health information for many US adults, and during the COVID-19 pandemic, Google searches have been shown to correlate with case trends, offering predictive insights.^5,8,9 The accessibility of online information has democratized health knowledge, yet it has also led to the spread of misinformation and bias, challenging the integrity of health information dissemination.^5,8–10

Algorithms like Google’s autocomplete influence daily decisions, including those related to health, and may potentially carry inherent biases.^4,10–12 In Google autocomplete, as soon as a user starts typing a query term in the search box, a set of predictions are shown below the search box. Autocomplete, initially were used for users experiencing hand disorders (e.g., carpel tunnel syndrome) by reducing keystrokes, now guides a broader audience’s search experience, saving time but potentially shaping user queries and perceptions.^13,14,15 According to Google, these suggestions reduce typing significantly (about 25%) and save hundreds of years of typing time each day.¹⁶ Autocomplete algorithms thus mediate an immense volume of search. While these algorithms facilitate efficient information retrieval, they also risk reinforcing stereotypes and spreading misinformation, which can have legal and societal implications.^4,5,15

Research conducted by Noble has shown that auto-complete search results often reinforce stereotypes like presenting Black females in negative contexts more often than White females.⁴ As auto-completes often mediate one’s information seeking, systematic bias in auto-completes affecting certain groups is problematic.^4,17,18 Because health disparities are present in racial and ethnic groups, algorithms might intensify or generate disparate outcomes due to reinforcing racial, gender, and cultural bias, which can be especially problematic during a pandemic such as COVID-19.^18,19

In this project, we focus on a large segment of the Spanish-speaking community population because Spanish is the second most frequently spoken language among Latinos in the US.²⁰ The Latino population represented 19% of the US population in 2021, making them (including native-born and foreign-born from Latin America and the Caribbean) the largest ethnic and racial minority group in the US.²¹ By 2050, the Latino population will total roughly 132.8 million people, or 30% of the total population.²¹ Nationwide, Latinos are more likely to live below poverty than White Americans, less likely to complete post-secondary education, and have poor health outcomes than their White counterparts.^21,22 Latinos in the US have had significantly lesser access to information technology, and many continue to be impacted by the digital divide.²³ This gap remains despite the growth of mobile technology, which, in COVID-19, can exacerbate health outcomes.²⁴

Health equity is an essential concern in the COVID-19 pandemic.²⁵ Multiple studies have reported Latinos are more likely to experience severe COVID-19 illnesses and mortality than other racial and ethnic groups.^26,27 In fact, the New York City Department of Health and Mental Hygiene determined that the spread of misinformation about COVID-19 was having a harmful health impact, particularly on communities of color with low vaccination rates.¹⁰ The department set up a specialized ‘Misinformation Response Unit’ tasked with tracking and addressing the spread of hazardous misinformation across various media outlets. This includes vigilance over social media, non-English language media, international publications, and discussions within local community forums.¹⁰ However, little empirical evidence exists in the literature as to whether autocomplete results during a pandemic differed between English and Spanish speakers access to COVID-19 information using Google search. Hence, this study’s central research question was: Are there any systematic differences between English and Spanish search autocompletes for COVID-related terms?

We examine the research question in three dimensions:

(1) the number of autocompletes : a higher number of autocompletes allows for more comprehensive coverage and a wider range of options;

(2) the average sentiment of the autocompletes : the sentiment of the autocompletes can create a more positive or more negative impression of the pandemic for the users; and

(3) the topics covered by the autocompletes : different languages may provide different levels of discussion on the phenomenon’s health, economic, and religious aspects.

Materials and methods

The primary goal of this study is to audit and compare search autocomplete results in Spanish and English during the early COVID-19 pandemic in the New York metropolitan area. For this purpose, we collected daily autocomplete results for multiple COVID-19-related query terms from March 12, 2020, to July 9, 2020 using Google’s search Application Programming Interface (API), which allows one to specify the language for search.²⁸ Five query terms were considered, and it included terms related to the overall COVID-19 situation (i.e., “Coronavirus is …”, “Flu is …”, and “Pandemic is …”) and those about precautions (i.e., “Hand washing is …”, and “Face Mask is …”). While the first three terms were selected to consider some of the most common terms related to overall phenomena, the other two were intended to provide insights into behaviors that users might undertake to combat being infected by the COVID-19 virus. The style of this query (i.e. “X is …”) was selected to be like that described by Safiya U. Noble in past work for comparing autocompletes for different terms.⁴

Each of the query terms was translated into Spanish, double-checked for accuracy by two bicultural and bilingual (English and Spanish) team members, and passed on to the search’s Spanish language version (API).²⁸ The API supports REST protocol and provides a list of autocomplete suggestions based on the provided query in JSON format. Queries were run using Python code running on a Jupyter notebook running in the “incognito mode” of the browser from a computer to minimize the odds of customization. The computer collecting the data was physically based in the New York metropolitan area. The data are missing for some days due to technical issues and are available for a total of 109 out of 120 days. Between 0 and 20 autocompletes were found for each query term on each of the days, yielding a total of 9164 individual autocomplete predictions. Data cleaning was undertaken in Microsoft Excel (Office 365) and SPSS version 28 was used for statistical analysis.

Since all the autocompletes for a query term on a given day are presented as a group to the user, we decided to consider such a query auto-complete daily group (QACDG) as the basic unit of analysis. Thus, 10 such QACDG (5 query terms * two languages) were considered daily for the 109 days for the analysis in this work. These data were analyzed in terms of the following three dimensions.

The number of autocompletes

The first dimension considered for comparison was the number of autocomplete options available in the QACDG. There were between 0 and 20 autocompletes found for each query term on each of the days. A significant difference in the number of autocompletes for comparable QACDGs, i.e., those that differed only in terms of the language used such as “coronavirus is” versus “coronaviruses”, was considered as an indicator of bias.

The average sentiment of the autocompletes

The second dimension of the study considered was the average sentiment conveyed by the autocompletes (QACDG). To quantify this aspect, we obtained a sentiment score (e.g., positive, negative, or neutral) for each of the autocompletes using Amazon’s sentiment recognition software, which works for English and Spanish.²⁹ The autocompletes that are likely to evoke positive emotion in the reader (e.g., “coronavirus is curable”) are assigned a positive sentiment label by the software, and those likely to evoke negative emotion in the reader (e.g., “coronavirus is deadly”) are assigned a negative sentiment. We denoted the scores numerically as +1 for positive, −1 for negative, and 0 for neutral autocompletes. As mentioned above, there were up to 20 such scores obtained for each QACDG. Next, we calculated four aggregate statistics: the number of positive autocompletes (+1), the number of negatives (−1), the number of neutral (0), and the total number of autocompletes. Once we obtained these statistics, we calculated the average sentiment score as $\frac{{# o f p o s i t i v e s}_{i}^{l} - {# o f n e g a t i v e s}_{i}^{l}}{{T o t a l C o u n t}_{i}^{l}}$ where ‘i' iterates over all the COVID-19 related terms and ‘l’ over the language, i.e. English and Spanish. We acknowledge the challenges associated with the use of an automated sentiment classifier which could in turn be biased, however we performed sanity checks on the results for correctness and consider this method as a complement to the dimensions of number of options, and human topic labeling in the analysis. We will only interpret significant differences in the average sentiment for comparable QACDGs to be indicators of bias.

Topics covered by the autocompletes

The third aspect of this study analyzed the variation in topics covered by the QACDGs. The primary objective was to identify the main themes or subjects that would resonate with users upon exposure to a QACDG. Any differences observed will help to describe the relative levels of discussion on health, economic, and religious aspects between Spanish and English. For the labeling process we followed the IAB Taxonomy, a taxonomy developed by nonprofit research and development consortium that focuses on helping organizations implement global industry technical standards and solutions.³⁰ We used the Tier one categories to label our list of grouped autocompletes. This included medical health, healthy living, science, travel, religion, and spirituality, etc.

Moreover, with regard to analysis, we developed a codebook for the autocomplete queries; it was created through agreement among the four coders (two of whom were native or bilingual Spanish speakers). Content analysis was used to analyze autocompletes to facilitate data management and determine codes.^31,32 The team developed the first-level codes by reviewing the autocompletes in English and Spanish, and the team reconciled discrepancies between coders. Coded data were organized into appropriate content, examined for relevance, coherence, and consistency, and then checked against the original autocompletes to ensure accuracy. A final codebook was generated to analyze the autocomplete data (see Table 1). The constant comparison method was used to reveal distinctions between the coded responses’ categories and generate themes.³² The inter-coder reliability between two sets of coders was 80%, indicating that the two coders agreed on the labels 80% of the time. This is typically considered a high level of agreement between coders.³²

Table 1.

Codebook for labeling daily autocomplete data.

Code	Inclusion	Exclusion	Examples (query term shown only once)
Healthy living	Any mention of handwashing; primary prevention; disinfection process	If the statement focuses only on the coronavirus, airborne	Hand washing is: an example of, the number one, an example of primary prevention, the most effective, the single most effective, considered an engineering control, an example of a disinfection process important because
Medical health	Any mention of coronavirus, airborne; symptoms of skin such as itchy, stinging, burning	Any mention of handwashing, soap	A face mask is: burning, good, good for skin, good for how long, for what, burning my face. Itchy, used for too, big. Stinging
Style & fashion	Any mention of face mask too big; reusable; too loose uncomfortable	No mention of facemask and comfort, but only medical symptoms	A face mask is: too big, hot, hurting my ears, burning. Mandatory. Loose, good, required, reusable
Travel	Any mention of flying, safe to travel	No mention of flying or travel	coronavirus is: airborne, man made, it contagious, it serious, not new, it dangerous, our future, it safe to fly, deadly, it curable, airborne, man-made, it deadly, from, it safe to travel, caused by, curable, biological weapon
Religion & spirituality	Any mention of biblical news, divine punishment	No mention of religion	coronvirus es: mortal, mentira, real, peligroso, una pandemia, verdad, biblico
Other	Any mention of sentiments that cannot be labeled in the first level of the taxonomy	Sentiments that can be labeled in the first level taxonomy	Pandemic is a lie

Statistical analysis

We employed statistical tests to evaluate the difference observed in autocomplete results. We computed the average number of autocompletes and the average sentiment of autocompletes for the five keywords daily. Student’s T-test (2 -sided, pairwise) was used to discern whether there are significant differences between the English and Spanish autocompletes.³³ In addition, Chi-square test was employed to test whether language played a significant role in the relative distribution of autocompletes across different topics.³³

Results

The number of autocompletes

English search queries generated more search autocompletes (10.06 vs 6.75) than Spanish queries. We examined the average counts for the five keywords to understand the phenomena further, reported in Table 2.

Table 2.

Number of search autocompletes in English and Spanish and differences between them.

S. No	Keyword	English: Number of daily autocompletes (Mean ± margin of error)	Spanish: Number of daily autocompletes (Mean ± margin of error)	Results for t-test comparing means. (p-value)	Effect size (Cohen’s d)
1	Coronavirus is/Coronavirus es	10.48 ± 0.37	6.23 ± 0.68	p-value <.0001	1.43
2	Flu is/Flu es	9.91 ± 0.18	8.61 ± 0.83	p-value <.0001	1.52
3	Pandemic is/Pandemia es	9.94 ± 0.06	3.20 ± 0.55	p-value <.0001	3.22
4	Handwashing/Lavarse	9.97 ± 0.03	7.79 ± 0.31	p-value <.0001	1.85
5	Face mask is/Mascarilla es	10.00 ± 0.00	7.94 ± 0.46	p-value <.0001	1.28
	ALL	10.06 ± 0.21	6.75 ± 0.67	p-value <.0001	1.50

As shown in Table 2, the average number of autocompletes was lower in Spanish than in English for all the five keywords considered. We used a Student’s T-test (2 -sided, pairwise) with the average daily count values for English and Spanish to check if the differences were statistically significant. The results indicated that the differences were statistically significant for each of the five terms, with frequencies in English being higher than those in Spanish.

The average sentiment of the autocompletes

We analyzed the average sentiments of autocompletes for different COVID-19 related terms for both English and Spanish. The average sentiment represented for the five keywords daily and its overall average was −0.18 in English and −0.29 in Spanish. While the overall sentiment was negative across the languages (the keywords are associated with a pandemic after all), the average sentiment was more negative in Spanish than in English. We looked at the five keywords’ average values to further understand the phenomena, reported in Table 3.

Table 3.

Average sentiment conveyed by the search autocompletes in English and Spanish and differences between them.

S. No	Keyword	English: Average daily sentiment of autocompletes (Mean ± margin of error)	Spanish: Average daily sentiment of autocompletes (Mean ± margin of error)	Results for T-test comparing means. (p-value)	Effect size. (Cohen’s d)
1	Coronavirus is/Coronavirus es	−0.06 ± 0.02	−0.25 ± 0.04	p-value <.0001	1.35
2	Flu is/Flu es	−0.14 ± 0.02	−0.44 ± 0.08	p-value <.0001	1.08
3	Pandemic is/Pandemia es	−0.40 ± 0.02	−0.54 ± 0.06	p-value = .0009	0.72
4	Handwashing/Lavarse	0.17 ± 0.03	−0.07 ± 0.05	p-value <.0001	1.04
5	Face mask is/Mascarilla es	−0.30 ± 0.05	0.48 ± 0.06	p-value <.0001	2.72
	ALL	−0.14 ± 0.01	−0.16 ± 0.03	p-value = .3899	0.11

As we can see for four of the five Covid-19 related terms (‘Coronavirus is,’ ‘Flu is,’ ‘Pandemic is’, ‘Handwashing is’) in English, the average sentimental score for autocompletes is more favorable as compared to the corresponding terms in Spanish. A noteworthy difference is the keyword “face mask” where the sentiment for autocompletes in Spanish was more positive than English version. One possible reason is that face masks have a cultural association with festivities for the Latino community (e.g., Carnival and Diablada). This could explain the more positive connotation of face masks in Spanish autocompletes during the early phase of the pandemic. No other terms (e.g., coronavirus, flu, pandemic, handwashing) had a similar history of linkage with traditional festivities. The results of the repeated measures T-test indicated that the differences were statistically significant for each comparison between English and Spanish terms.

Topics covered by the autocompletes

We observed the differences in the topics of the autocompletes collected by examining the distributions of these topics using histograms. Histograms help plot the frequency of occurrences for the different topics/themes divided into classes, called “bins” for different keywords in the two languages considered. Figure 2 presents the histogram of the topics described in the two languages.

Figure 2.

Topics covered in the search results.

As can be observed in Figure 2, the distribution appears non-uniform across the different topics in the two languages. While “medical health” is the most common topic in both languages, the 2^nd most frequent topic differs for the two languages. “Healthy Living” is the 2^nd most common topic in Spanish, while it is “Style and Fashion” in English. The “Healthy Living” topic was frequently associated with hand washing in both English and Spanish. However, it had a more negative tone in Spanish, where it often focused on aspects like whether it causes chemical or physical changes to hands. The emergence of “Style and Fashion” as the second most common topic in English indicates that the discussions on aesthetics and comfort were much more common in English autocompletes than in Spanish. Similarly, “Travel” as a topic is only present in English. Interestingly, “Religion and Spirituality” as a topic appears only in Spanish. It included autocompletes like “coronavirus es biblico,” which connected coronavirus with a religious text. These results suggest that autocompletes focused on different aspects across the two languages.

To statistically test whether language played a significant role in the relative distribution of autocompletes across different topics, we used the Chi-Square test, which indicated that the differences were significant with a p-value less than 0.001 (statistic = 273.252, df = 5).

Discussion

Overall, our results suggest significant differences between the autocomplete results observed in English and Spanish in terms of the frequency counts, sentiment, and the topics covered. These observations should be considered alongside various theoretical frameworks and empirical findings presented in the existing literature.

Socio-technical complexity and legal debates surrounding Google Autocompletes

Google Autocompletes (GA) operate within a socio-technical system where user behavior, data collection, and algorithmic processing are deeply intertwined. This system’s complexity is reflected in the diverse outcomes of autocomplete suggestions, ranging from enhancing the search experience to inadvertently spreading misinformation.^13,34 The legal discourse surrounding GA further adds to this complexity, with courts worldwide grappling with the liability of search engines for the content suggested by their algorithms.^15,18 The varying judicial perspectives analyzed by Karapapa and Borghi (2015) demonstrate the challenge of establishing a consistent legal stance on the issue.³⁵

The ethical responsibilities of search engines like Google, as gatekeepers of information, are at the forefront of this debate. The legal ambiguity surrounding autocomplete suggestions—whether they are viewed as protected expressions of thought or as technical processes—raises significant ethical questions about the management of these predictions.³⁵ While some autocomplete suggestions may be legally permissible; they can still have substantial consequences for public discourse. It is crucial for Google to navigate this landscape responsibly, ensuring that the amplification of information through GA is balanced against the need to minimize biases.^15,36,37 As the custodian of a system that shapes public perception, Google must consider both the legal precedents and the broader ethical implications of its autocomplete function, particularly in times of health crises when accurate information is paramount.

Public health and health equity impact

The disparities in autocomplete suggestions between English and Spanish, as revealed by this study, have significant implications for Spanish speakers and the Latino community, which has been disproportionately affected by the COVID-19 pandemic.^38,39 The misinformation presented through autocompletes could influence Latinos’ health behaviors and beliefs, potentially exacerbating existing health inequities.^3,40,41 Addressing these biases is therefore not just a matter of algorithmic fairness but also a public health imperative.

Moreover, the study’s findings underscore the importance of equitable access to accurate health information as a determinant of health outcomes. As search engines increasingly become primary sources of health information, the role of algorithms in shaping access to this information cannot be overlooked.^42,43 The research calls for a collaborative effort between public health professionals, information technologists, and policymakers to ensure that search autocompletes do not perpetuate health disparities. By highlighting the need for algorithmic audits and bias correction, the study contributes to the ongoing efforts to achieve health equity and improve public health outcomes.

A recent related work by Valera et al. (2023) has shed light on the social implications of these biases.³ Their findings from focus group meetings with English and Spanish speakers illustrate that autocompletes can influence the preselection of searches. Further, the limited choice of results for Spanish-speaking searchers is particularly concerning as it may contribute to the hesitation and concerns expressed by Spanish speakers, stemming from both social factors and a lack of comprehensive information about COVID-19.³ It is not surprising that the New York City Department of Health and Mental Hygiene determined that the spread of misinformation about Covid-19 was having a harmful health impact, particularly on communities of color with low vaccination rates and decided to launch a misinformation response unit.¹⁰

Methodological advancement: language equity audits for autocompletes

This study marks a methodological advancement to support language equity audits for search autocompletes, offering a scalable and replicable framework for future research. By combining automated analysis with human labeling of topics, the study comprehensively captures the breadth and depth of autocomplete suggestions, revealing nuanced biases.

The study’s approach underscores the importance of addressing digital equity, especially as disparities in digital access—such as internet and mobile phones—have been recognized as significant factors in health equity.^23,24,40 By highlighting the latent layers of digital infrastructure, like search autocompletes, this research moves the conversation forward, emphasizing how unequal access in these areas can be a critical issue. The work intersects with current discussions on health misinformation and builds upon the gatekeeper theory suggesting that the role of search engines as “information gatekeepers” has evolved.^10,44 With the shift from traditional sources like librarians and healthcare providers to search engines for health information, the potential for bias in autocomplete suggestions becomes a significant concern with far-reaching impacts on vulnerable populations.

Implications to public health and informatics

Based on the findings of this work, we propose specific policy recommendations to mitigate biases in autocomplete results.^45,46 Each of these can be mandated by the governments, self-adopted by search engines, or actively demanded by users and advocacy groups.

• Regular Audits: Implement mandatory, regular audits of search autocomplete algorithms (e.g., using the proposed methodology) to identify and correct biases.

• Diverse Data Sets: Encourage the use of diverse data sets that include multilingual and multicultural perspectives to train algorithms.

• Designing Holistic Information Widgets: Outputs to same/similar queries across languages can be combined via widgets that ensure that information across languages is consistent and presented holistically rather than being siloed and disparate.

• Transparency in Algorithm Design: Advocate for transparency from search engine companies in the design and operation of their algorithms.

• Public Health Collaboration: Establish partnerships between search engine companies and public health entities to ensure that health-related autocomplete suggestions are given specific attention to ensure that they are accurate and beneficial.

• User Education: Develop educational campaigns to inform the public about the potential biases in search autocompletes and promote critical evaluation of search results.

Limitations

While providing valuable insights into language biases in search autocompletes, this study has limitations that impact the generalizability of the findings. The data was collected at a single location—New York City—using only Google’s search engine. While Google holds a dominant market position in search and New York City was a significant epicenter of the COVID-19 outbreak in Spring 2020, with a notably high number of cases within the Latino population, these factors may not fully represent global search behaviors or the nuances of autocomplete functions in other contexts.^47,48 To improve the generalizability, as part of future research, we plan to expand the study to multiple geographical locations and include various search engines. This broader approach will enable a more thorough understanding of autocomplete biases and their global implications. Additionally, while the current study tried to minimize the influence of search personalization by utilizing an API in incognito mode, the efficacy of this approach is debated.^17,49,50 As such, the impact of personalized search histories on autocomplete results remains an area ripe for exploration.

The reliance on automated analysis for quantifying the number of options and sentiment analysis presents another limitation, as it may constrain the depth of interpretability. Despite this, the study’s methodology is strengthened by an iterative human labeling process for topic analysis, which adds a layer of qualitative assessment to the quantitative data. Future studies could benefit from integrating more nuanced human interpretation to complement automated metrics, thereby enriching the analysis of autocomplete suggestions and their potential biases.

Conclusions

This study provides a pioneering systematic audit of language disparities in search autocompletes during a critical phase of the COVID-19 pandemic. It reveals that English and Spanish speakers received autocomplete suggestions that varied not only in quantity but also in sentiment and thematic content, despite confronting identical health threats. While these findings are specific to the early stages of the pandemic, they raise important considerations for the generalization of results to other health emergencies. As such, it is crucial for search engine providers, public health officials, and policymakers to consider the impact of autocomplete functions on public health messaging and individual health choices. In light of these insights, we call upon stakeholders to recognize the importance of equitable information access during health crises. Proactive audits and adjustments to autocomplete algorithms are essential steps to prevent the perpetuation of biases and ensure that all communities have access to empowering and accurate health information. By doing so, we can work towards minimizing the risk of unequal health outcomes and support informed decision-making across diverse linguistic groups in any future health emergency.

Footnotes

Author contribution

Conception and Development were done by V.K.S. and P.V. Theory and Computations were led by I.S. and R.S. with support from Y.B., Y.R. and V.K.S. Qualitative Analysis was undertaken by Y.B. and P.V. The work was supervised and coordinated by P.V. and V.K.S. All authors discussed the results and contributed to the final manuscript.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Science Foundation under Grant 2027784.

Ethical statement

ORCID iD

Vivek K Singh

References

Google Trends . COVID: Google Trends. Available from: https://trends.google.com/trends/explore?geo=US&q=COVID

Yang

Mei

Zheng

, et al. Query log analysis of an electronic health record search engine. In: AMIA Annu Symp Proc, Query log analysis of an electronic health record search engine. Vol. 2011. American Medical Informatics Association; 2011. p. 915-924.

Valera

Carmona

Singh

, et al. Understanding search autocompletes from the perspectives of English and Spanish speakers during the early months of the COVID-19 pandemic. J Community Psychol 2023; 52: 665–683.

Noble

. Algorithms of oppression: how search engines reinforce racism. New York: NYU Press, 2018.

Houli

Radford

Singh

. “COVID19 is”: the perpetuation of coronavirus conspiracy theories via Google autocomplete. Proc Assoc Inf Sci Technol 2021; 58(1): 218–229.

Vijay

Field

Gollnow

, et al. Using internet search data to understand information seeking behavior for health and conservation topics during the COVID-19 pandemic. Biol Conserv 2021; 257: 109078.

Pogacar

Ghenai

Smucker

, et al. The positive and negative influence of search results on people’s decisions about the efficacy of medical treatments. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval, 2017, pp. 209–216.

Marty

Ramos-Maqueda

Khan

, et al. The evolution of the COVID-19 pandemic through the lens of Google searches. Sci Rep 2023; 13(1): 19843.

Bari

Khubchandani

Wang

, et al. COVID-19 early-alert signals using human behavior alternative data. Soc Netw Anal Min 2021; 11: 18.

10.

Knudsen

Perlman-Gabel

Uccelli

, et al. Combating misinformation as a core function of public health. NEJM Catalyst Innovations in Care Delivery 2023; 4(2): CAT–22.

11.

UN Women . UN Women ad series reveals widespread sexism. Available from: https://www.unwomen.org/en/news/stories/2013/10/women-should-ads

12.

Buolamwini

Gebru

. Gender shades: intersectional accuracy disparities in commercial gender classification. In: Conference on Fairness, Accountability and Transparency. PMLR, 2018, pp. 77–91.

13.

Sullivan

. How Google autocomplete works in search. Available from: https://www.blog.google/products/search/how-google-autocomplete-works-search

14.

Ward

Hahn

Feist

. Autocomplete as research tool: a study on providing search suggestions. Inf Technol Libr 2012; 31(4): 6–19.

15.

Popyer

. Cache-22: the fine line between information and defamation in Google’s autocomplete function. Cardozo Arts & Ent LJ 2016; 34: 835.

16.

Google . How Google autocomplete works in search [internet]. 2018. Available from: https://www.blog.google/products/search/how-google-autocomplete-works-search

17.

Robertson

Jiang

Lazer

, et al. Auditing autocomplete: suggestion networks and recursive algorithm interrogation. In: Proceedings of the 10th ACM Conference on Web Science, 2019, pp. 235–244.

18.

Lin

Gao

, et al. Trapped in the search box: an examination of algorithmic bias in search engine autocomplete predictions. Telematics Inf 2023; 85: 102068.

19.

Röösli

Rice

Hernandez-Boussard

. Bias at warp speed: how AI may contribute to the disparities gap in the time of COVID-19. J Am Med Inf Assoc 2021; 28(1): 190–192.

20.

Babbel Magazine . How many people speak Spanish, and where is it spoken? 2023. Available from: https://www.babbel.com/en/magazine/how-many-people-speak-spanish-and-where-is-it-spoken

21.

Pew Research Center . Facts on Latinos in the U.S. Available from: https://www.pewresearch.org/hispanic/fact-sheet/latinos-in-the-u-s-fact-sheet

22.

United States Census Bureau . Hispanic poverty rate hit an all-time low in 2017. Available from: https://www.census.gov/library/stories/2019/02/hispanic-poverty-rate-hit-an-all-time-low-in-2017.html

23.

Lopez

Gonzalez-Barrera

Patten

. Closing the digital divide: Latinos and technology adoption [Internet]. Research: Pew. Available from: https://assets.pewresearch.org/wp-content/uploads/sites/7/2013/03/Latinos_Social_Media_and_Mobile_Tech_03-2013_final.pdf

24.

Hamilton

Saiyed

Miller

, et al. The digital divide in adoption and use of mobile health technology among caregivers of pediatric surgery patients. J Pediatr Surg 2018; 53(8): 1478–1493.

25.

Jensen

Kelly

Avendano

. Health equity and health system strengthening–Time for a WHO re-think. Global Publ Health 2021; 17: 377–390.

26.

Centers for Disease Control and Prevention . COVID-19 racial and ethnic disparities. Available from: https://www.cdc.gov/coronavirus/2019-ncov/community/health-equity/racial-ethnic-disparities/disparities-illness.html

27.

NPR . What do coronavirus racial disparities look like state by state? Available from: https://www.npr.org/sections/health-shots/2020/05/30/865413079/what-do-coronavirus-racial-disparities-look-like-state-by-state

28.

Google Developers . Custom search JSON API: introduction. Available from: https://developers.google.com/custom-search/v1/introduction

29.

Amazon Web Services . Amazon comprehend - determine sentiment. Available from: https://docs.aws.amazon.com/comprehend/latest/dg/how-sentiment.html

30.

Interactive Advertising Bureau . Content taxonomy. Available from: https://www.iab.com/guidelines/content-taxonomy/

31.

Hsieh

Shannon

. Three approaches to qualitative content analysis. Qual Health Res 2005; 15(9): 1277–1288.

32.

Elo

Kääriäinen

Kanste

, et al. Qualitative content analysis: a focus on trustworthiness. Sage Open 2014; 4(1): 2158244014522633.

33.

Kanji

. 100 statistical tests. London: Sage, 2006.

34.

Al-Rawi

Celestini

Stewart

, et al. How Google autocomplete algorithms about conspiracy theorists mislead the public. M Comput J 2022; 25(1).

35.

Karapapa

Borghi

. Search engine liability for autocomplete suggestions: personality, privacy and the power of the algorithm. Int J Law Info Technol 2015; 23(3): 261–289.

36.

Morozov

. Don’t be evil. New Republ 2011; 242(11): 18–24.

37.

Rogers

. Algorithmic probing: prompting offensive Google results and their moderation. Big Data & Society 2023; 10(1): 20539517231176228.

38.

Cohen-Cline

Gill

, et al. Major disparities in COVID-19 test positivity for patients with non-English preferred language even after accounting for race and social factors in the United States in 2020. BMC Publ Health 2021; 21: 2121–2129.

39.

Webb Hooper

Nápoles

Pérez-Stable

. COVID-19 and racial/ethnic disparities. JAMA 2020; 323(24): 2466–2467.

40.

National Institute for Health Care Management . Systemic racism, disparities & COVID-19: impacts on Latino health. Available from: https://nihcm.org/publications/systemic-racism-disparities-covid-19-impacts-on-latino-health

41.

Smith

Sudore

Pérez-Stable

. Palliative care for Latino patients and their families: whenever we prayed, she wept. JAMA 2009; 301(10): 1047–1057.

42.

Bautista

Zhang

Gwizdka

. Healthcare professionals’ acts of correcting health misinformation on social media. Int J Med Inf 2021; 148: 104375.

43.

Chin

Afsar-Manesh

Bierman

, et al. Guiding principles to address the impact of algorithm bias on racial and ethnic disparities in health and health care. JAMA Netw Open 2023; 6(12): e2345050.

44.

Lukenbill

Immroth

. School and public youth librarians as health information gatekeepers. Research from the Lower Rio Grande Valley of Texas. Sch Libr Media Res 2009; 12.

45.

Hislop

. Codified racism in digital health platforms A meta-analysis of COVID-19 prediction algorithms and their policy implications: Springer Science and Business Media LLC, 2023.

46.

Chin

Afsar-Manesh

Bierman

, et al. Guiding principles to address the impact of algorithm bias on racial and ethnic disparities in health and health care. JAMA Netw Open 2023; 6(12): e2345050.

47.

Thompson

Baumgartner

Pichardo

, et al. COVID-19 outbreak—New York city, february 29–june 1, 2020. MMWR Morb Mortal Wkly Rep 2020; 69(46): 1725–1729.

48.

Chonka

Diepeveen

Haile

. Algorithmic power and African indigenous languages: search engine autocomplete and the global multilingual Internet. Media Cult Soc 2023; 45(2): 246–265.

49.

DuckDuckGo . Measuring the “filter bubble”: how Google is influencing what you click. Available from: https://spreadprivacy.com/google-filter-bubble-study/

50.

Robertson

Lazer

Wilson

. Auditing the personalization and composition of politically-related search engine results pages. In: Proceedings of the 2018 World Wide Web Conference, 2018: 955–965.

Language disparities in pandemic information: Autocomplete analysis of COVID-19 searches in New York

Abstract

Keywords

Introduction

Background and significance

Materials and methods

The number of autocompletes

The average sentiment of the autocompletes

Topics covered by the autocompletes

Statistical analysis

Results

The number of autocompletes

The average sentiment of the autocompletes

Topics covered by the autocompletes

Discussion

Socio-technical complexity and legal debates surrounding Google Autocompletes

Public health and health equity impact

Methodological advancement: language equity audits for autocompletes

Implications to public health and informatics

Limitations

Conclusions

Footnotes

Author contribution

Declaration of conflicting interests

Funding

Ethical statement

ORCID iD

References