Abstract
Health care-related queries are a common source of information seeking online, and, in light of the June 2022 U.S. Supreme Court decision to overturn Roe v. Wade, information concerning reproductive health in the United States is now of particular interest. Increasingly, search applications turn to semantic models to assist in responding to user queries, with the knowledgebase Wikidata playing a prominent role. However, Wikidata’s representation of topics around reproductive health care, as well as the implications for user searches, are currently unclear. To further examine this, the present study analyzed Wikidata’s treatment of abortion, compared this to representations in three medical domain models, and tested web searches to gauge Wikidata’s influence on search results. Results show that, as a semantic model, Wikidata attempts to represent topics around abortion in a manner that is at once both multi-perspective and simplified, leading to logical inconsistencies when compared to domain models. Wikidata’s influence on semantically supported web search is more difficult to ascertain as search engines’ treatment of abortion appears purposely exceptional, though a strong influence from Wikipedia was noted. Findings from this study demonstrate the importance of how semantic models address the medical and health domain, and suggest the need for greater transparency in how health care information is treated within web search applications.
Introduction
Search engines are now a common starting point for many user information requests, and this is particularly true concerning health information. Consumer health queries are a frequent source of online search activity, as noted for over 20 years now (Cline & Haynes, 2001). A reliance on online searching for health information, as well as the associated pitfalls such as health misinformation, have only grown over time, something highlighted by the advent of the COVID-19 pandemic (Bin Naeem & Kamel Boulos, 2021). While online consumer health information requests have remained, the ways in which web search applications respond to queries has been changing. The growth of semantic media approaches have enabled such applications to shape results and even respond to user requests with direct answers. This can be seen, for example, in the form of knowledge panels provided in the upper right-airnowhand portion of search engine results, as well as direct responses from virtual assistants such as Siri (Vrandečić et al., 2023). Underlying such approaches are semantic models and resources, which seek to represent human knowledge in a machine-actionable format. Prominent among these resources is the popular knowledgebase Wikidata. This project of the larger Wikimedia Foundation hosts millions of semantic statements about a wide array of concepts and is leveraged as a semantic resource by a number of search engines and tools, particularly since its incorporation of Google’s previous Freebase knowledgebase (Pellissier Tanon et al., 2016). The knowledge and assumptions encoded within such semantic resources can be used to aid in answering queries and, thus, in a sense, hold the potential to influence the ways in which searchers understand the world.
One prominent area of online health information seeking is reproductive health. The World Health Organization (WHO, 2017) defines reproductive health as “a state of complete physical, mental and social well-being and not merely the absence of disease or infirmity, in all matters relating to the reproductive system and to its functions and processes.” Implicit from this definition is the capability and freedom to reproduce if and when desired (WHO, 2017). This capability and freedom can include safe abortions, thus positioning abortion within the broader topic of reproductive health (WHO, 2017). Within the United States, recent events have sparked increased public discourse, debate, and information-seeking concerning reproductive health, particularly concerning the topic of abortion. In June 2022, the U.S. Supreme Court overturned Roe v. Wade, the 1973 ruling that had guaranteed a constitutional right to an abortion. While lauded by some, this decision was met with surprise and disapproval from a large segment of the U.S. public (Blazina, 2022). This turn of events has sparked increased vigor in the public and legal debates around abortion and reproductive health, especially at the state level. Across the country, states have fallen back on existing abortion laws or rushed to enact new laws prohibiting or protecting abortion. The result has been an increasingly confusing and complex reproductive health care environment, with laws and options varying state by state, and sometimes day by day.
Now more than ever, topics around reproductive health are likely a frequent source of online information searching. As more users ask questions online concerning reproductive health, particularly about abortion, it is worth examining how these topics have been represented in semantic models and the media that utilize them. In this regard, it is first useful to note the distinction between a term and a concept: a concept is a class of phenomena that tends to get grouped together (e.g., the color blue), while various terms may be used to identify the same concept (blue, azul, niebieski, etc.) (Gnoli, 2020). Semantic models offer arrangements of concepts, though which concepts are included and which terms are used to label them vary from model to model (e.g., Munsell Color System, Natural Color System, etc.). How do semantic resources involved in search conceptualize topics around reproductive health care, that is, which concepts are included and how are they labeled? How does this conceptualization compare to other models designed to reflect specific understandings of the medical community? And how does this play out in search engines’ responses to queries concerning reproductive health?
In the present study, researchers examine semantic commitments around reproductive health within the popular and frequently used knowledgebase Wikidata, compare these to other semantic models used to depict professional understandings of this domain, and look for their influence in current web search applications. Using the concept of abortion as a case study, this work compares the Wikidata knowledgebase to three specialized semantic models used by medical communities. Drawing from the ontology evaluation levels described by Brank et al. (2005), this examination offers hierarchical, semantic, and contextual evaluations, including an examination of how data from Wikidata may influence semantically supported web searches.
In doing so, the researchers hope to better illuminate how topics concerning health care are represented through semantic models and media, the perspectives that may be present in such representations, and the implications for web users searching for health information. This is of particular import as search engines increasingly respond to users’ health-related queries with knowledge panels and even direct answers. Such semantically enriched responses rely on available metadata; probing the underlying semantic models that generated this metadata is one means of better understanding this trend. More broadly, the present work stands to contribute to the larger bodies of literature examining user-generated wiki content, how health information is spread and understood online, and ontologies as meaningful social representations.
Background
Wikidata’s treatment of abortion and other topics around reproductive health have not been addressed in current scholarship. Even so, literature relevant to the current study is available and will be briefly surveyed here. First, further background on Wikidata and its relationship to end-user searching will be addressed before moving on to literature concerning the medical domain and reproductive health in semantic models.
Wikidata and Search
First established in 2012, Wikidata is an open knowledgebase and part of the larger suite of Wikimedia Foundation projects. Like other knowledgebases, Wikidata contains a collection of standardized knowledge-bearing statements. In the past decade, it has grown rapidly via peer production, and currently includes semantic statements on over 90 million entities, including over a billion relationships (Turki et al., 2022). Wikidata’s size is just one of many reasons for its current prominence. Its open, collaborative nature, multilingual and multidisciplinary coverage, ease of use, and use of accepted semantic standards have all been cited as reasons for its influence (Tharani, 2021; Turki et al., 2022; Vrandečić & Krötzsch, 2014). In particular, data in this knowledgebase is available in Resource Description Framework (RDF), a data model associated with linked data and Semantic Web approaches. This allows data from Wikidata to be more easily queried, reused, and connected to other data sources. Not only are semantic statements from Wikidata incorporated into Wikimedia’s online encyclopedia Wikipedia, these statements have also found use in web development and searching.
Knowledgebases have become a valuable resource for any manner of project or application that stands to benefit from semantic support. Common examples include virtual personal assistants and knowledge panels presented to search engine end users (e.g., Google Knowledge Graph panels) (Clark et al., 2022). Wikidata in particular has been a useful knowledgebase for search applications to leverage due to its open nature, wide linguistic and topical coverage, and high level of integration with other datasets (Sarasua et al., 2019). Prior to 2014, Google utilized a separate knowledgebase, Freebase, to provide semantic support for searches; after this project was shut down, the semantic content was migrated into Wikidata, and further levels of integration between Google Knowledge Graph and Wikidata were established (Pellissier Tanon et al., 2016). Since then, Wikidata’s continued influence on Google and other web search applications has been noted (Vrandečić et al., 2023). This connection is even leveraged in search engine optimization efforts (Akhtar, 2023). In one recent example, a study by Clark et al. (2022) found that publishing data about Montana State University resources via Wikidata resulted in increased discovery by search engines and increased website traffic.
Medical Knowledge and Reproductive Health in Semantic Models
As a knowledgebase, Wikidata represents just one of many kinds of models of semantic representation available. Many systems representing semantics are studied in the literature concerning knowledge organization systems (KOS), systematic models capturing the connections between and among concepts and the terms used to represent them. Ranging from pick lists to full ontologies, KOS exhibit common features such as labels, classes, and relationships, and enable common functions including disambiguation, establishing hierarchy and other associations, and capturing data properties (Zeng, 2008). KOS appear in all areas of information organization and feature prominently within many web resources; Amazon’s product taxonomy, for example, is a KOS. Among different kinds of KOS, ontologies are the most semantically rich, representing knowledge within a specific domain through highly organized and articulated structures of entities and semantic relationships; a knowledgebase takes this further, filling in semantic statements about individual instances in the domain as well. The result is a robust semantic model capable of representing large amounts of real-world knowledge and understanding. Wikidata is an example of a large, general knowledgebase.
In contrast knowledgebases are smaller, domain-focused semantic models. A vast number of specialized vocabularies, thesauri, and ontologies have been developed to represent the understandings of medical communities and assist in their knowledge organization and retrieval. Among the most prominently used semantic resources in this domain are Gene Ontology, International Classification of Diseases (ICD), SNOMED CT, and Medical Subject Headings (MeSH) (Couto, 2019). Within the medical domain, such semantic models hold promise for supporting a number of tasks. In outlining the benefits of semantic approaches in health care data, Karami and Rahimi (2019) noted potential improvements in knowledge discovery and knowledge sharing among researchers and health care providers.
Concerning health and web search, one of the first search engines to incorporate semantics was GOPubMed, which leveraged vocabulary from the Gene Ontology in categorizing results (Couto, 2019). In recent years, researchers and data managers have become increasingly interested in leveraging data from Wikidata as a source of semantics in medical computer systems and clinical decision-making processes and applications (Turki et al., 2019). Several recent research projects have examined the benefits of Wikidata integration into such settings. For example, Koshman et al. (2022) used Wikidata as a semantic source for annotating electronic medical records, finding greater success in using Wikidata for semantic annotation than in using domain vocabularies. Regarding information concerning the COVID-19 pandemic, Turki et al. (2022) found Wikidata to be a particularly suitable knowledgebase due to its flexibility, multidisciplinary and multilingual nature, and alignment to external databases.
Though Wikidata’s semantic representation of topics around broad concepts of reproductive health has not been examined, some relevant studies of the related Wikimedia project Wikipedia have will be addressed here. While Wikidata originated from data extracted from Wikipedia, Wikipedia itself now incorporates structured data from Wikidata (Clark et al., 2022), and research has shown that, based on search engine results, Wikipedia is a prominent source of health information, even more so than some dedicated resources such as Medline Plus and NHS (Laurent & Vickers, 2009). Wikipedia’s treatment of reproductive health, particularly the topic of abortion, has been the subject of several studies, though these projects typically use Wikipedia’s page on abortion as an example of a controversial entry to other ends. For example, Borra et al. (2015) used this Wikipedia article as a means of illustrating disagreeing edits and the development of controversial articles on the platform. Jhandir et al. (2017) similarly used Wikipedia’s article on abortion as a case study in an examination of controversy detection at the edit level on the platform. In such works, the focus has been on work practices in Wikipedia editing rather than on semantics around abortion or reproductive health itself, with no conclusions being drawn on the latter subject.
Research Questions and Methodology
The goal of the present study is to better understand the implications of Wikidata as a semantic model of the domain of reproductive health. Using the concept of abortion as a case study, researchers sought to better understand semantic commitments around this concept in the Wikidata knowledgebase, how these commitments compare to other semantic models depicting reproductive health, and how these commitments may impact end-user activities on the web.
Specifically, the following research questions guided this study:
RQ1. How is the concept of abortion represented in Wikidata?
RQ2. How does this representation compare to other established semantic models of reproductive health?
RQ3. What are the current implications of these representations for web search activities concerning reproductive health?
Source Selection and Data Collection
To compare with Wikidata’s depiction of reproductive health concepts, researchers sought alternative semantic models representing professional and scientific understandings of this domain. While no medical resource is comparable in scope or implementation level to Wikidata, a number of medical systems have seen widespread adoption and influence in a multitude of settings. Based on the literature (see Couto, 2019; Hong & Zeng, 2022), researchers chose three systems for comparison with Wikidata: MeSH, ICD-10, and SNOMED CT. These systems were selected due to their cited popularity and usage, their availability through the common Unified Medical Language System interface, and their inclusion of at least one abortion-related concept. A brief description of each of these systems follows.
The MeSH began as the Index Medicus (Coletti & Bleich, 2001). The Index Medicus was created by John Shaw Billings, Associate Surgeon General of the United States, to index the growing collection of books, journals, and pamphlets in the U.S. Surgeon General’s office (Coletti & Bleich, 2001). Index Medicus continued to grow and eventually became what we know today as the MeSH thesaurus (Coletti & Bleich, 2001). To this day, we use MeSH to index the biomedical literature in the publicly available MEDLINE database (Coletti & Bleich, 2001). Despite the availability of MeSH terms on the articles in MEDLINE, many searchers and consumers rely on the search system to map entered terms to the correct MeSH terms (Gault et al., 2002). Of the three systems presented here, MeSH is most likely the most well-known and used by end-users with the help of librarians (Coletti & Bleich, 2001).
The ICD had its beginnings in the 1890s as the International List of Causes of Death (Hirsch et al., 2016). Around the late 1890s, North America adopted this system for reporting causes of death (Hirsch et al., 2016). By 1949, The World Health Organization (WHO) expanded the system to also include causes of morbidity, and renamed the system the ICD (Hirsch et al., 2016). From here, ICD went through many revisions to the current tenth revision with increased granularity with each revision (Hirsch et al., 2016). In the United States, the Public Health Service adapted the ICD terminology to index hospital records to be used for billing purposes (Hirsch et al., 2016). As such, it is likely that ICD vocabularies are not understood or used by most health information consumers.
The final semantic resource presented here is the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT). In 1965, SNOMED CT began as the Structured Nomenclature of Pathology (Bodenreider et al., 2018). Bodenreider and colleagues (2018) note that throughout the years SNOP changed in content and structure to become the current SNOMED CT terminology (Bodenreider et al., 2018). Currently, SNOMED CT contains terms related to such topics as pathologic conditions, normal and abnormal functions, symptoms and signs of disease, diseases/diagnoses, and procedures (Bodenreider et al., 2018). In its current form, SNOMED CT is used by researchers, clinicians, and those in industry and government (Elhanan et al., 2010). According to Elhanan and colleagues (2010) SNOMED CT is used most often with the Electronic Health Record (EHR) (Elhanan et al., 2010). In this context, searches are performed in the free-text field of the EHR. This is done with concepts from a terminology, such as SNOMED CT, in conjunction with a natural language process program (Resnick et al., 2021).
In analyzing these models, researchers built on an approach advanced by Feinberg (2017), in which the relationship between structure and content in a system is used to understand its interpretation of the world. For Wikidata and each of these three medical domain models, researchers performed an initial keyword search for the term “abortion.” For each concept returned, researchers collected the preferred and variant labels, any notational representations (i.e., native classification number), definitions, scope notes, structural hierarchical relationships to other classes or instances, and any associative relationships to other classes or instances. During initial data collection, researchers did not attempt to discern among varying meanings of the term “abortion,” though concepts related solely to veterinary medicine were excluded from collection. For each system, collected data was recorded in spreadsheets using Microsoft Excel. Concepts were also arranged using knowledge graph style visualizations to better apprehend the structure and main relationships depicted within each system. These visualizations were originally drawn by hand before being rendered and refined using Microsoft Visio.
Data Analysis
Analysis of all collected data was guided by Brank et al.’s (2005) summary of evaluation levels of ontology assessment. This framework includes six major aspects along which ontologies and similar systems are evaluated. Due to the scope and goals of the present study, evaluation criteria from three of these levels were utilized (Table 1).
Select Evaluation Levels from Brank et al. (2005).
For the hierarchy/taxonomy level, researchers examined hierarchical relationships within the Wikidata data, that is, the descending line of “is A” relationships present among concepts related to abortion. For the other semantic relations level, associative relationships (i.e., anything not hierarchical), as well as definitions and scope notes, were reviewed for all collected Wikidata concepts. Finally, for the context/application level, researchers repeated the previous two levels of evaluation for MeSH, ICD-10, and SNOMED CT and compared the results to those from Wikidata.
For the application aspect, researchers ran a series of searches in several web search engines, analyzing the results and attempting to discern the possible influence of metadata from the Wikidata knowledgebase. Searches were conducted using Google, along with Bing and Yahoo! search engines for comparison purposes. Searches were conducted on November 28, 2022 for “abortion,” “reproductive health,” “vasectomy,” and “tubal ligation.” While abortion was the primary focus of the searches, the other reproductive health terms were included as potential controls to identify differences in search result types returned within each search engine. A new Google Chrome incognito window was used for each search. During all searches, a VPN was used to ensure a constant, exact location throughout (Chicago, Illinois). Each result page was printed as a PDF with background images, preserving graphical as well as textual results. Search result types were recorded (i.e., top news stories, web results, or places), and whether or not results were featured within a knowledge panel result near the top of the results page, as well as specific instances of controlled medical terminology returning within search results (for example, an ICD number displaying within a knowledge panel). Related search terms, whether or not ads were displayed with the search result, and other information displayed as part of the search were analyzed. Each search engine’s policies pertaining to advertising and abortion were investigated as a follow-up step to assist researchers in understanding and interpreting the results.
Results
Results are first presented for the initial hierarchy/taxonomy and semantic relations evaluations of Wikidata. This is followed by the results of corresponding evaluations of MeSH, ICD-10, and SNOMED CT. Following a brief comparison, the results of the application-level evaluation of Wikidata and search engines are then offered.
Wikidata
The Wikidata knowledgebase contains a single, primary concept for abortion, labeled in English as Abortion (Q8452). Additional labels are provided for this concept in 152 other languages. In English, the definition of this concept is provided as, “intentional ending of a pregnancy.” Though brief, the phrasing of this definition is of note: by including the word “intentional,” Wikidata limits its abortion concept to what might also be known as a voluntary abortion. Three variant terms (“also known as”) are also provided in English: Termination of pregnancy, Induced miscarriage, and Abortion, induced.
Structurally, Wikidata is a polyhierarchy, meaning a single concept can belong to multiple superclasses. As such, Abortion is at once a member of three different superclasses: Pregnancy with abortive outcome, Medical procedure, and Feticide. Abortion has a total of 179 subclasses under it. Of these, 164 are geographic subdivisions, following the pattern “Abortion in. . .,” for example, Abortion in Austria. Two subclasses referred to purely veterinary medicine and were excluded, while another two subclasses referred to historical or statistical studies of abortion and were set aside as associative rather than hierarchical relations. The relationships among the remaining 11 subclasses, along with Abortion and its immediate superclasses, are visualized below in Figure 1. Overall, polyhierarchical relationships in Wikidata simultaneously portray abortion as a medical event, a medical procedure, and an action with legal implications.

The single hierarchical cluster within Wikidata.
At the semantic relations level, beyond the labels and definitions provided earlier, Wikidata’s Abortion concept also contains a number of associative relationships. Thirty-one statements or properties are provided, relating Abortion to other classes and instances in the Wikidata knowledgebase via 13 distinct relationship types. These classes and instances are not limited to the reproductive health domain but include other legal and social concepts. Included here were previously noted hierarchical relationships that function as associative relationships instead. All associative relationships are summarized and presented along with examples in Table 2. Other semantic relations include identifiers for closest matches in 93 external sources are provided, including MeSH and several editions of the ICD. Finally, links to Wikipedia articles for abortion are provided in 143 different languages.
Summary of Associative Relationships in Wikidata.
Note. MMS = Mortality and Morbidity Statistics.
Superclass relationships that function as associative rather than hierarchical.
Semantic Models of the Medical Domain
Unlike Wikidata, the three models for medical semantics examined for this study did not contain a single, basic concept for abortion. Rather, abortion appears as a distributed relative (i.e., occurrences of similar concepts that appear in different taxonomic positions in the system) throughout their respective hierarchies of classes and instances. Structurally, these three systems may be seen as aspect classifications: rather than allocating one space for a particular concept, and the concept appears throughout the system wherever it is of interest (Broughton, 2015).
In MeSH, a total of 13 abortion concepts were identified. When arranged hierarchically, these concepts can be seen as appearing in four main taxonomic trees: pregnancy complications, persons, crimes, and obstetric surgical procedures. Of particular note is the distinction between “Abortion, Spontaneous” and “Abortion, Induced.” This coverage of both intentional and unintentional early terminations stands in contrast with Wikidata’s focus solely on voluntary abortions. The hierarchical associations among all MeSH classes are visualized in Figure 2; the comparison to Figure 1 reveals the difference between a polyhierarchical model and a distributed, aspect-based model. Overall, separate hierarchical relations in MeSH portray abortion as an intentional procedure, an unintended consequence, and a crime.

The four hierarchical clusters within MeSH.
While associative relations are generally not captured within MeSH, annotations such as classification numbers, history and revision notes, alternate labels, and scope notes are provided for most terms. Table 3 presents definitional scope notes and alternate labels for abortion concepts in MeSH. From this table, it can be noted the varying senses in which the word “abortion” is used in MeSH, including both intentional and unintentional abortions.
Selected Semantic Annotations for MeSH Abortion Concepts.
In ICD-10, a total of 11 main abortion concepts were identified. When arranged hierarchically, these concepts can be seen as appearing in four main taxonomic trees: pregnancy, maternal disorders, disorders of the female anatomy, and persons encountering health services. Similar to MeSH, ICD-10’s coverage of abortion includes both voluntary and spontaneous cases. The hierarchical associations among ICD-10’s classes are visualized in Figure 3. For most lowest level classes in this figure, a range of enumerated subclasses is also available, offering further subdivision of the topic. For example, O03 Spontaneous Abortion can be subdivided by complicating conditions. Overall, hierarchical relations in ICD-10 portray abortion as a possible pregnancy outcome, a consequence of other medical events, or a situation requiring medical care.

The four hierarchical clusters within ICD-10.
As with MeSH, associative relationships are infrequently indicated within ICD-10. The majority of entries also lack additional semantic annotations such as alternate labels or definitions. Occasionally, clarifying inclusions and exclusions are provided. For example, for N96 Habitual Aborter, included are “investigation or care in a nonpregnant woman” and “relative infertility,” while “currently pregnant” and “with current abortion” are excluded.
Of the three medical KOS, SNOMED CT featured the most complicated arrangement of concepts concerning abortion. A total of 231 concept records related to abortion were identified. As with MeSH and ICD-10, these concepts are not arranged in a single taxonomic tree, but rather are distributed throughout the system in the style of an aspect classification. Arranging these concepts hierarchically, five main areas of SNOMED CT containing abortion concepts were identified: procedures related to pregnancy, disorders related to pregnancy, complications related to pregnancy, death, and pregnancy measures. Figure 4 portrays the hierarchical relationships among major abortion concepts within SNOMED CT. As with ICD-10, many of the lowest level classes in this diagram have enumerated subdivisions below them. For example, Therapeutic Abortion can be further subdivided by the name of the exact procedure employed. Overall, SNOMED CT’s portrayal of abortion places it as a medical procedure or event, and includes both intentional and unintentional outcomes.

The five hierarchical clusters within SNOMED CT.
Due to the relative size and complexity of the SNOMED CT dataset, only a general summary of the findings concerning semantic relations can be provided in the present article. All 231 concepts contained multiple English language labels, sorted into American English and Great Britain English, and marked as either preferred or acceptable. Only English labels were available in the international version of SNOMED CT reviewed for this study, though SNOMED CT is available separately in additional language editions. Associative relationships with other concepts in SNOMED CT were noted in most abortion-related concepts. The number of associative relationships for each concept ranged from 0 to 7, with concepts having an average of 2 associative relationships. The most commonly observed associative relationships included “due to,” “interprets,” “occurrence,” “finding site,” and “procedure site.”
Overall, Wikidata and the three medical models examined for this study varied greatly in their hierarchical structures and their inclusion of semantic relations. While Wikidata represents abortion with a single concept, the other models use an aspect approach that distributes abortion-related concepts throughout the system. While definitions were common in MeSH and Wikidata, ICD-10 and SNOMED CT provide very few. MeSH and ICD-10 contained very few associative relationships. In contrast, SNOMED CT contained an extensive amount, and Wikidata’s associative relationships branched outside the domain of reproductive health entirely. All systems provided regular alternate labels with the exception of ICD-10. These differences are due in part to the varying kinds of semantic structure (or KOS) each system represents. Common among all four systems, however, is the structural incorporation of hierarchical relationships, making this the easiest point of immediate comparison among them. Further examination of similarities and differences among the systems are explored in the “Discussion” section below.
Search Application Evaluation
Results from Google, Bing, and Yahoo! all varied in terms of the type of content returned. The types of results identified included news, semantic knowledge panels linking to authoritative information, locations in the form of maps, and ads. Table 4 details the results of searches for abortion, while the results of all searchers are summarized in Table 5.
Search Results for the Term “Abortion.”
Note. ICD = International Classification of Diseases.
Search Results for All Four Reproductive Health Search Terms.
Results from all three search engines showed little immediate influence from semantics within the Wikidata knowledgebase. Notably, at the time of this study, Google did not display a regular knowledge panel in response to searches for the term “abortion.” While it did display controlled terminology from MeSH and ICD, these were given as part of the Wikipedia search result in the result list. Google was the only search engine with a first-page web result linking to Planned Parenthood, and the only search engine with a map and options to return location results that provide abortions. Google’s results also included one advertisement for abortion; this was for at-home delivery of an abortion pill. Further investigation into Google’s advertisement policies revealed that an abortion provider in the United States, United Kingdom, and Ireland must be verified as either an institution that provides abortions, or one that does not provide abortions to advertise on the Google Ads Network (Google, 2022). For the other search terms related to reproductive health, Google did display knowledge panels with information directly populated from Mayo Clinic and Wikipedia, as well as ads for everything but the term “tubal ligation.”
In response to searches for “abortion,” Yahoo! provided a knowledge panel with information directly populated from Wikipedia, while Bing’s quick result set included location-specific local results and social media results. For the other searched terms, Bing and Yahoo! presented knowledge panel results populated directly from Wikipedia, Mayo Clinic, and Medline.
Neither Bing nor Yahoo! displayed ads during searches for the term “abortion,” though ads were displayed in response to some other reproductive health terms. Family planning services and products, including abortion, are included in “pharmacy and health care products and services” for Microsoft Advertising (Microsoft Advertising Policies, 2022). Advertising for abortion is specifically banned by Microsoft’s internal policy. Despite the ban, search results from Bing included a “see more” pop-out tab on the right-hand side of the main search results that include a result for a local business. It should be noted that this business maintains an active social media presence on Yelp, Facebook, Instagram, and Twitter, which may have influenced its appearance. Yahoo!’s advertising policies were more difficult to locate than those of Google or Bing, and an account seemed to be required to return results. Health services and medical devices are required to comply with “legislation,” but what constitutes a health service was not defined (Yahoo!, n.d.).
Discussion and Implications
Results from this study offer insight into semantic representations of reproductive health care and their potential impacts on semantically supported search. The findings concerning Wikidata’s representation of abortion and how this compares to other representations in medical semantic resources will be explored first, followed by further discussion of how these representations play out, or fail to, in web search and the implications for health information seeking online.
Representations in Wikidata and Other Semantic Models
Among the four models examined, Wikidata was the only one with a single, primary concept for abortion. In contrast, the three medical domain models depict multiple abortion concepts through the use of distributed relatives. In KOS, this can be seen as the difference between entity-based and aspect-based classificatory approaches. In offering a single place for a concept, entity classifications are more approachable and intuitive to general users, while aspect classifications are more suited to disciplinary domains or specialized fields of study (Broughton, 2015). Wikidata’s modeling of abortion and other medical topics could thus be seen as more approachable for the average web user and best capable of providing a broad context. The three medical models offer a more specialized, granular representation limited to their respective domain; SNOMED CT, in particular, offers the deepest disciplinary take and may be quite unintuitive for users without some medical domain expertise.
It should also be noted that Wikidata specifically limits the abortion concept to intentional events, in contrast with the other models. This finding likely reflects the difference between colloquial and medical senses of the word “abortion.” In popular discourse, the term “abortion” is often used to refer to a voluntary abortion, while in medical discourse the term simply denotes an early termination of pregnancy, with “voluntary abortion” referring to an intentional procedure, and “spontaneous abortion” denoting something that might colloquially be known as a miscarriage (see for example MeSH in Table 3). While the three medical systems reflect this broader sense of the term and focus more on similar outcomes and conditions, Wikidata limits focus to voluntary abortion and reflect popular terminology rather than specialized, domain-specific understandings. At the same time, many concepts related to abortion in the three medical systems are labeled with other terms entirely, for example, SNOMED CT’s use of “induced termination” and “operative termination” labels throughout. The medical models, particularly ICD-10 and SNOMED CT, also feature more granular subclasses under their abortion concepts, often enumerated based on specific procedures, conditions, or anatomical sites.
Though their relative complexity may not be surprising, the medical domain models accomplish this while maintaining a relatively singular perspective: concepts associated with abortion are generally depicted as medical events situated within the broader area of reproductive health. Of the three, MeSH is the only model that touches on legal perspectives, depicting a distinction between legal and criminal abortion. This distinction is one that has gained increased import in the United States given the overturning of Roe v. Wade and the ensuing, fragmented legal state of abortion across the country. While the medical systems explored here are largely devoid of this context, Wikidata more actively models legal and social issues around abortion, though in trade off it must contend with representing a procedure that is both legal and illegal in varying contexts. Wikidata positions Abortion as a subclass of Feticide, which is itself a subclass of Homicide and includes the subclasses of Illegal Abortion and Abortion, Legal. Wikidata’s abortion concept falls under two other superclasses as well though (Medical procedure, and Pregnancy With Abortive Outcome). While the three domain models mostly mirror the WHO’s (2017) perspective on the relationship between reproductive health and abortion, Wikidata attempts to offer a single concept situated within multiple, complex contexts. It is through this polyhierarchical structuring that Wikidata is able to avoid the distributed relatives observed in the other systems, though this comes at the cost of some logical consistency. For example, the subclass Paper Abortion, a purely legal act in which a parent gives up certain rights to a child, is technically also a subclass of Homicide through inheritance.
The differing structures among the four systems make the comparison of additional semantic relations and annotations more difficult than comparing their hierarchical relations. Even so, the multilingual, multicultural, and multidisciplinary nature of Wikidata is observable here, particularly through the inclusion of numerous alternate labels. It should be noted, however, that a similar multilingual aspect was not observed in the three medical systems due to the researchers’ intentional sampling from the English language editions. Still, while alternate language versions of these models are available, the semantic connections between labels of different languages are not immediately available to users. Associative relationships are present in Wikidata but vary widely and seem inconsistent and incomplete, especially in comparison to the more systematic associative relationships present in SNOMED CT. For example, in Wikidata, the abortion concept is associated with specific encyclopedias, categories of music, and general medical specialties, among other entities. On the other hand, Wikidata’s associative relationships help place abortion within a broader information context beyond the reproductive health domain. While scope notes and definitions were generally brief or lacking in Wikidata, these annotative features are much more fully realized in MeSH (Table 3). Such annotations were almost entirely absent from ICD-10 and SNOMED CT.
To some extent, differences in the presence and amount of certain semantic and annotative features may reflect differences in the primary design of each of the four systems. Though each can be viewed as a semantic model, as KOS, they differ in their specific types. While ICD-10 represents a classification and MeSH a thesaurus, SNOMED CT lies somewhere between a thesaurus and ontology, and Wikidata is a fully developed knowledgebase (i.e., an ontology with instance data). As per Zeng (2008), these KOS types differ in their included features and intended functions. Of all KOS types, ontologies and knowledgebases are seen as the most semantically rich, particularly due to their emphasis on associative relationships; this emphasis can indeed be observed in both SNOMED CT and Wikidata. While ontologies and their data are more suited to RDF and Semantic Web approaches, they are more complex and potentially less unified in perspective than a simpler structure such as a taxonomy or thesaurus. Differences in scope and history among the systems are also likely exhibiting an effect. MeSH, ICD, and SNOMED CT all have a much longer history, with previous versions for some dating back as far as the 19th century. Each was developed specifically for the medical domain, under professional oversight, and thus contains more specialized but more consistent representations. In contrast, it could be argued that Wikidata’s scope is almost too broad. In attempting to incorporate all perspectives on an open, global web, Wikidata also renders inconsistency and semantic uncertainty around concepts.
Even so, Wikidata’s open, socially driven, crowdsourced nature may also offer some advantages over more traditional domain models. Wikidata’s editorial process is governed by the principles of open editing and community control (Vrandečić & Krötzsch, 2014). While seemingly egalitarian in nature, these principles do not inherently account for inevitable disagreements. Policies and procedures for how editing decisions are were developed in part in response to disagreement between editors; in Wikipedia, for example, who wins has been attributed to the deftness with which an editor can quote policy and precedent (Kittur et al., 2007). Disputed statements must be resolved through editing and discussion to reach a consensus; consensus on a statement is determined by agreement, even if the resulting disputed statement is “barely” agreeable to all parties (Wikipedia, 2022). Wiki “edit wars” are common enough that the concept is defined within Wikimedia as “cycle of edits resulting from disagreement between editors” and is itself a subclass of the concept “conflict” (Wikimedia, 2023). Despite the obvious drawbacks, relying on consensus may make the outcome of a highly contentious article or concept better, as it requires that parties work together to vet facts despite disparate views.
The lack of strict editorial mechanisms also translates into a capacity to quickly respond to changing information, as observed by Turki et al. (2022) concerning COVID-19 pandemic information. And its broad coverage and inclusion of multilingual labeling led it to outperform domain vocabularies in semantic annotation of electronic medical records (Koshman et al., 2022). Though, as a semantic model, Wikidata may represent health care topics in a manner that is at once both multi-perspective and overly simplified, this manner of representation may be useful in settings where domain model depictions are too monolithic, specialized, and complex. Wikidata may particularly excel in its ability to place reproductive health topics into broader context.
Implications for Web Search
Results from the search engine tests revealed little apparent, direct influence from Wikidata on searches for topics around reproductive health at this time. While Google did provide relevant terms and codes from MeSH and ICD in response to searches for “abortion,” these semantics were part of the Wikipedia page description given in the results list for this topic. No formal knowledge panel was given for this search, and for other searches concerning reproductive health topics, Google populated knowledge panels with information directly from specific, trusted sources (i.e., Mayo Clinic and Wikipedia). In short, Google’s response to searches on reproductive health seems exceptional, particularly so with the term “abortion,” bypassing other forms of semantic support and directing the user to predetermined sources.
Google’s connection to the Wikidata knowledgebase has been documented (Akhtar, 2023; Vrandečić et al., 2023), though no public documentation reveals exactly how data from Wikidata is being leveraged by other search engines. What was clear from all results, however, was a surprisingly strong influence from another Wikimedia project, Wikipedia, on reproductive health-related queries in all three search engines. While Wikidata is a source of data structure and content for Wikipedia (Clark et al., 2022), it is not the only source, and the two must be viewed as a separate semantic resources. Though the importance of Wikipedia in online health searching has been noted (Laurent & Vickers, 2009), the present study shows search engines directly utilizing Wikipedia as a semantic support for reproductive health searches. Wikipedia’s influence here, as well as the relationships between Wikipedia and Wikidata for health topics, warrant further scrutiny in future research. It also indicates the need for greater transparency from search providers: if search engines are treating queries around abortion and other health care-related topics exceptionally, search providers should be more transparent in their deliberate efforts to shape results here.
The exact type of search results (e.g., web, advertisements, news) varied based on the individual search engine; however, many patterns of web search results were similar. For instance, all three search engines returned the WHO’s “Fact Sheet” page on abortion as the first result. Closely following were the English Wikipedia page on abortion, and Britannica.com’s entry on abortion. Results diverged beyond that, with Yahoo! including the results to a 2022 Vote About Abortion poll from YouGov. Bing and Yahoo! both included links to abortion.procon.org in their top results, while Google did not. Conversely, Google was the only search engine with a first-page web result linking to Planned Parenthood and the only search engine that provided a map with relevant location results.
Some influence from search location was noted during the search tests. As all searches performed for this study were VPN limited to Chicago, Illinois, additional testing from different locations would be needed to further understand the effect of searcher location on searches for reproductive health. As laws and regulations about abortion have begun to diverge in the United States on a state-by-state basis, this could be particularly important to pursue. Of the three search engines, the findings concerning Bing and search location were somewhat surprising and seem to conflict with their stated search policies. Though formal ads are not allowed, a specific commercial institution was highlighted in the Bing results, based perhaps on geolocation and social media presence. These two factors may exhibit a stronger influence on health-related searches than semantic models such as Wikidata, and their potential impact needs further investigation.
New developments in semantic technologies will continue to change semantically dependent search engine results. One such effort, known as Abstract Wikipedia, seeks to capture the relationships between concepts in Wikidata regardless of language (Vrandečić, 2021). The project does not seek to replace Wikipedia or Wikidata but to augment available Q-notated concepts in such a way that the underlying semantics translate into all spoken languages. The underlying linked conceptual structured data could be leveraged by search engines to clarify and to enrich search results independent of language.
Limitations
While this study employed the evaluative framework reported by Brank et al. (2005), only the three most pertinent levels of evaluation were employed here due to the scope of the study. Additional evaluative levels may hold further insight into how reproductive health and other health topics are modeled in Wikidata. Several limitations must be noted concerning the use of SNOMED CT in the present study as well. While SNOMED CT does have other languages available, these must be accessed via separate interfaces not consulted during this study. As such, these other language editions and the implications of terminology choices therein remain to be considered. The general summary of semantic relations within SNOMED CT was also limited here. Though a much deeper analysis is possible, this is likely more suited to a study with SNOMED CT as the main focus. Concerning the application analysis, it is worth noting that web results vary over time. Specific search results may vary day-to-day, and while general patterns of results over time might be expected, important details such as the presence and contents of knowledge panels may vary. Considering the time delay between dataset, website, and search engine index updates, a longitudinal study of how search results change over time for a public health topic would be more telling than the present methodology. It should also be mentioned that Microsoft Advertising changed its “Pharmacy and Health care” policies since the initial search and investigation in the present study, indicating health is an area of constant and rapid change in search engine advertising policy (Microsoft Advertising Policies, 2022). That search engine responses to health topics may change day-by-day further illustrates the need for greater transparency from search providers and continued research into public health information seeking.
Conclusion
Health care-related searches remain a common source of web activity for many users, and information seeking about reproductive health care has only increased in the United States in the wake of the Supreme Court’s 2022 overturning of the long-standing Roe v. Wade ruling (Blazina, 2022). As semantic resources play a more prominent role in web search, opportunities for assessing their perspectives and potential influences on end users are also increasing. In the present study, researchers examined the depiction of an important reproductive health topic, abortion, in the popular knowledgebase Wikidata. In comparing this depiction to those present in dedicated medical domain models, this study showed that Wikidata attempts to represent topics around abortion in a manner that is at once both multi-perspective and simplified. This stands in contrast to the more granular, medical event-focused depictions in domain models, which largely mirror the WHO’s (2017) positioning of abortion as a subtopic of reproductive health. While this leads to a certain degree of logical inconsistency and semantic uncertainty in Wikidata, it also enables this knowledgebase to situate reproductive health topics in a broader social and legal context than domain models are able to.
Though Wikidata’s influence on semantically supported search has been noted (Vrandečić et al., 2023), search engines’ exceptional treatment of abortion-related queries makes its influence on reproductive health information seeking less clear. Rather, the related resource Wikipedia appears to play a surprisingly strong role in search engine responses to queries concerning reproductive health. What are the implications of Wikipedia’s outsized influence over more formal models on health-related web queries? For those studying social media, user-generated content, and online collaborative work, this indicates the need for continued scrutiny over who produces health knowledge online, how it is published, and where and how it is used. Similarly, those engaged in research concerning information (and misinformation) on social and legal topics in online media should also explore the interplay of wiki platforms and web search and the implications for users. Finally, though Wikidata’s semantic statements may place abortion in a broader context that is helpful for web searching, other models may offer more precise semantics around specific conditions and procedures. As such, opportunities may exist to leverage medical domain models alongside popular knowledgebases as semantic resources, particularly if search engines are already treating certain health information requests as exceptions. Regardless, more transparency around how health information requests are treated and how semantic resources are involved in web search is needed.
The present work offers new insight into how various perspectives are incorporated into semantic models and how health and semantics interact in the current information environment. Going forward, continued attention should be paid to both how semantics around health are developed and how they are implemented in search. For the topic of reproductive health, such scrutiny is particularly important. Further examination of semantics around this area, as well as other potentially controversial topics around public health, may offer insight into how semantic media navigate diverging perspectives in attempting to provide information to end users.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
