Abstract
BACKGROUND:
The COVID-19 outbreak pandemic is a situation without a tested action plan. Rehabilitation team members have been called for duty with new responsibilities in addition to their conventional roles in the healthcare system. The infectious disease specialists are updating the knowledge base in limited time in clinical settings. The number of articles in PubMed grows at an increasing rate.
OBJECTIVE:
The purpose of this study is to identify core COVID-19 articles by citation and co-citation network analysis in the PMC subset of PubMed.
METHODS:
Citation and co-citation network analysis methods were used to identify core articles and knowledge base.
RESULTS:
COVID-19 terms query retrieved 15,387 articles in PubMed. These articles formed a citation network with 6,778 articles and 25,163 PMC-PubMed citations. The main article cluster in the co-citation network consists of 2,811 articles and 78,844 co-citations.
CONCLUSIONS:
The number of COVID-19 articles in PubMed is increasing at a very high rate. Citation and co-citation network analysis are advantageous techniques to identify knowledge base in a scientific discipline. These techniques may help rehabilitation specialists to identify core articles efficiently.
COVID-19 outbreak as a worldwide pandemic
Human coronaviruses are enveloped RNA viruses that mostly cause mild infections [1, 2]. SARS-CoV and MERS-CoV type human coronaviruses caused epidemics with more than 10,000 cumulative cases in the past two decades (mortality rates of 10% for SARS-CoV and 37% for MERS-CoV) [2]. A series of pneumonia cases were reported with a novel human coronavirus (2019-nCoV, COVID-19, SARS-CoV-2) in December 2019, in Wuhan (11 million city population), Hubei, China [1, 3]. The common feature of the first cases was the relation of these patients with the Huanan seafood market where live wild animals are sold [4]. When less than a thousand cases were reported in Wuhan, the first cases outside of China were reported in Thailand, Japan, South Korea, and the USA [1, 5–8]. The first confirmed cases in Europe were reported on January 31 in Italy (Rome) and Spain (La Gomera, Canary Islands). While active cases were decreasing in China, Europe became the center of the outbreak. 10,590 active cases were reported in Italy and 2,039 active cases were reported in Spain on March 11, 2020. On this date WHO declared novel human coronavirus outbreak a global pandemic after 118,000 cases in over 110 countries [9].
New duties and new responsibilities of rehabilitation team members in the COVID-19 pandemic outbreak
The COVID-19 outbreak pandemic is a situation without a tested action plan. None of the countries’ health care systems had sufficient capacity to deal with it. Field hospitals were built, intensive care units were expanded, new ventilators were purchased, mass production of medical masks and protective clothing were increased in an effort to fight it. All healthcare professionals were called for duty, including retired physicians and nurses in some countries. Rehabilitation team members participated in the re-organized healthcare system with extra duties and responsibilities in addition to their conventional roles for COVID-19.
A simple search result for COVID-19 retrieves more than fifteen thousand articles in PubMed. Not only does the number of articles increase daily, but also the publication rate of articles. Infectious disease specialists are generally well-informed and have the experience to identify core articles. Other healthcare professionals have very limited time to update their knowledge on current publications. Citation network analysis and co-citation network analysis are advanced methods that can help to identify core articles in any scientific field.
Document citation and document co-citation networks
In scientometrics, bibliometric and informatics article itself is the object of study. Citations and references of an article are the fundamental quantities of most study methods in these “article sciences”.
Every scholarly article is connected to previous articles by a simple citation relation. The reference list of an article represents the “important studies” in the article’s study topic. Citation network analysis identifies the “importance” of an article according to the assumption that highly-cited articles are likely to have a greater influence on the scientific literature [10, 11].
Co-citation is the simultaneous appearance of two related articles in the reference list of a third article. Co-citation network analysis may identify “more important and related articles” according to the assumption that the articles which have a high probability to share a common theme are tend to form clusters around the same co-cited article pairs [12, 13]. Jaccard Similarity Index is a similarity measure that represents the thematic similarity of two co-cited articles in percentages. High co-citation frequency (edge weight) represents the high probability of relatedness and it is directly proportional to the strength of co-citation coupling. Co-citation analysis can be used to map the knowledge base in a scientific discipline.
PMC subset of PubMed as a non-commercial medical database
PMC is the open-access full-text collection of the National Library of Medicine (NLM since 2000) [14]. PMC articles may be the unique source of scientific information for medical workers who do not have access to commercial databases. For this reason, bibliometric relations in PMC articles may also represent an important non-commercial clinical reasoning foundation.
The number of journals followed by PubMed is slightly lower than commercial databases and citation results can be less than the reported citations [15].
The highest degree in citation network and highest edge weight in the co-citation network may represent higher importance in COVID-19 literature.
Entrez APIs
The National Center for Biotechnology Information (NCBI) databases include Entrez Programming Utilities (E-utilities) for developing special queries on PubMed. E-Link functions can be used to retrieve citation relation of a PMC article in PUBMED (PMC-PubMed Citation). The acquired citation dataset can be enriched with the PubMed summary dataset.
There are a few R statistical programming language packages for PubMed that include e-Link functions [16]. E-Link functions retrieve a list of PMC-PubMed citations in PMID-PMID format. This raw data can be processed in social network analysis software to develop document citation and co-citation networks.
COVID-19 articles raw data was retrieved on 25 May 2020 (query = “COVID-19” OR “SARS-CoV-2”) by using an R script developed with reutils 0.2.3 package. Network analysis was performed with Gephi 0.9.1 and R igraph 0.7.0 package.
COVID-19 citation network analysis
Citation network analysis helps to identify and visualize the most cited COVID-19 articles. The degree of an article is the total number of citations and references that article has. In-degree represents the number of citations, out-degree represents the number of references in the citation network (Table 1). Each circle represents a single article and each edge (connection) represents a citation in the graph (Fig. 1). The diameter of a circle represents the number of citations. There can be more than one independent cluster of articles in a network and each cluster is called a connected component.
Most Cited COVID-19 Articles
Most Cited COVID-19 Articles

(a) COVID-19 PMC citation network graph, (b) COVID-19 PMC co-citation network graph. Graphs are in force atlas 2 layout. (a) COVID-19 PMC citation network is a directed network with 6,650 articles (nodes) and 25,095 citations (edges). Each circle represent an article, each arch represents a citation, and diameter of circle represents number of citation in citation network. (b) COVID-19 PMC co-citation network is an undirected network. The main article cluster (connected component) consist of 2,811 nodes and 78,844 edges. Graph is filtered to represent 19 major articles in PMC co-citation network (co-citation degree ≥ 600). Each circle represent an article, each arch represents a co-citation relation, thickness of edges represents co-citation frequencies.
COVID-19 terms query retrieved 15,387 articles in PubMed. Almost half of the articles do not have any PMC citations and do not contribute to the citation network. The citation network consists of 6,778 articles and 25,163 PMC-PubMed citations. The largest connected component (article cluster) in citation network consists of 6,650 (98.1%) nodes and 25,095 (99.7%) edges. The average degree is 3.774 (PMC-PubMed citations and references), network diameter is 9, the average path length is 2.27, and the average clustering coefficient is 0.035 in the citation network. 9,398 (61%) of articles has a free full text and 120 (0.7%) of articles are systematic review articles.
The articles by Huang et al., 2020, Zhu et al., 2020, and Chen et al., 2020 are the most PMC-PubMed cited articles in the citation networks. All of them give information about the clinical and epidemiological features of the first cases in China. The articles by Holshue et al., 2020 and Rothe et al., 2020 are in twelve most PMC-PubMed cited articles and they also give information about the first cases in the USA and Germany. The fifty most cited COVID-19 articles are presented in Table 1.
The clusters in the COVID-19 co-citation network represent possible thematic relations. Highly co-cited article couples are the source of information (knowledge-base) in the network. Circle diameter represents the number of co-citation relations and the thickness of edges (co-citation coupling frequency) represents how many times an article couple is cited by a third article in the graph (Fig. 1). The main article cluster (connected component) in the co-citation network consists of 2,811 articles and 78,844 co-citations (edges).
The article by Huang et al., 2020 is one of the first five article couples with the highest weight in the COVID-19 co-citation network. The articles by Chen et al., 2020, Zu et al., 2020, Chan et al., 2020, Li et al., 2020, and Zhou et al., 2020 are other members of the first five article couples. The article by Huan et al., 2020 studied clinical features of early cases in Wuhan and was published in the Lancet. The other five articles are also on clinical and epidemiological properties of the first cases in Wuhan. They were published in the Lancet, New England Journal of Medicine and Nature. These five articles are a member of other highly co-cited couples in Table 2. The details of other article couples are available in Table 1.
Most Co-cited COVID-19 Articles
Most Co-cited COVID-19 Articles
In this study, the fundamental properties of citation and co-citation network of COVID-19 PMC articles are presented. Rehabilitation team members are familiar with the most cited article lists in the musculoskeletal rehabilitation literature. COVID-19 PMC articles are arranged around 4 main topics; first cases in Wuhan, clinical properties, epidemiological properties, virus self-characteristics. Structural findings of the network analysis can be enriched by text-mining techniques.
Text-mining results can be improved by expert (infectious disease, genetic, pandemic, virus specialist) physician and rehabilitation team member opinion. The final results in each topic can be a subject of separate articles.
The purpose of this study is to give the current state of publications that improve itself every day. The improvement of COVID-19 literature in three months is almost equal to 20–30 years in musculoskeletal rehabilitation literature. The interpretation of findings with the local medical team based on the team goals could be a more suitable approach for details.
Co-citation network studies are rare publications in medical literature and readers might be unfamiliar. This method can help us to improve the findings of the most cited article rank list. Co-citation coupling frequency (edge weight) and Jaccard Similarity Index can help us to identify thematic relations of co-citation couples. The graph of the network could be more useful to understand co-citation coupling results.
The co-citation frequency represents the influence of two articles in COVID-19 literature. Jaccard Similarity Index normalizes the possible thematic relation of two articles. 18 articles and 52 most co-cited articles are presented in Table 2. The reader can clearly see that the arrangement in the most co-cited article table is slightly different than the most-cited article table. If the reader has limited time, the most co-cited article couples are suggested to be read first.
Conclusion
The COVID-19 outbreak pandemic is a situation without a tested action plan in real conditions. Rehabilitation team members are called for duty with new duties and responsibilities in addition to their conventional role in the healthcare system. Infectious disease specialists are well-informed and can update their knowledge in limited time in clinical settings. The number of articles in PubMed increases at a very high rate. Citation and co-citation network analysis are advantageous techniques to identify knowledge base in a scientific discipline. These techniques may help rehabilitation specialists to identify core articles and knowledge base in limited time.
Conflict of interest
None to report.
