Sage Journals: Discover world-class research

Abstract

Medical care services can be organized into a network. Understanding the structure of this network cannot only help analyze common clinical protocols but can also help reveal previously unknown patterns of care. The objective of this research is to introduce the concept and methods for constructing and analyzing the network of medical care services. We start by demonstrating how to build the network itself and then develop algorithms, based on principal component analysis and social network analysis, to detect communities of services. Finally, we propose novel graphical techniques for representing and assessing patterns of care. We demonstrate the application of our algorithms using data from an Emergency Department in New York State. One of the implications of our research is that clinical experts could use our algorithms to detect deviations from either existing protocols of care or administrative norms.

Keywords

medical care services principal component analysis social network analysis

Introduction

Medical care services (MCS) can be organized into a network that, if analyzed appropriately, can reveal common medical practices, such as clinical pathways. In this article, we introduce methods for constructing such a network and demonstrate how to explore the resulting communities to detect organized patterns of care. Clinical experts could utilize our techniques to confirm adherence to existing clinical protocols or identify atypical correlations of MCS.

A model of the delivery of MCS

Figure 1 portrays a basic model of the delivery of MCS where a physician, or a group of physicians, evaluates the patient and then recommends appropriate services.

Figure 1.

A basic model of the delivery of MCS.

From the model in Figure 1, we appreciate two possible relationships in the delivery of MCS:

Relationship 1. Physicians $x$ and $y$ are related if they evaluate or communicate about the same patient $i$ .

Relationship 2. Services $j$ and $k$ are related if they are provided to the same patient $i$ .

The existing research about the relations in the delivery of MCS has focused on Relationship 1 and only explores the network of physicians who share the same patients.¹ In this research, we propose a study of the relations in the delivery of MCS that emphasizes the network of shared services among patients, a topic that has not been considered as much.

A simulation of a small network of MCS

In Figure 2, we attempt to simulate a few scenarios of medical care delivery to demonstrate how a network of shared MCS can arise. We start by imagining a small health care clinic that only provides five services ⓐ, ⓑ, ⓒ, ⓓ, and ⓔ. In addition, we assume that there are four different patient visits. We do not make a distinction as to whether the same or a different patient visited, or whether the patient(s) saw the same or a different physician.

Figure 2.

A simulation of a small network of MCS: (a) Visit 1, services ⓐ and ⓑ delivered. (b) Visit 2, services ⓐ and ⓓ delivered. (c) Visit 3, services ⓐ, ⓑ, and ⓒ delivered. (d) Visit 4, service ⓒ delivered twice.

In Figure 2(a), a patient comes in and receives services ⓐ and ⓑ. In Figure 2(b), a patient receives services ⓐ and ⓓ. In Figure 2(c), a patient receives services ⓐ, ⓑ, and ⓔ. In Figure 2(d), a patient receives service ⓒ twice. By the fourth visit, we can already appreciate that a network has formed. Each node is a distinct type of service, and the number of links or edges between two nodes signifies how much the corresponding services are related. To analyze the network of MCS, we propose algorithms based on the social network analysis (SNA) and principal component analysis (PCA).

Literature review

SNA related to the network of providers

Landon et al.¹ apply SNA methods to study the properties of patient-sharing networks among physicians. In their model, physicians are nodes, and patients are edges. A more weighted link is indicative of a higher number of shared patients. In a different study, Landon et al.² use a similar model to evaluate the effects of the network of physicians on patient outcomes and health care spending. Wang et al.³ also use SNA techniques to construct a collaboration network of physicians and then evaluate the network effects on costs and quality outcomes. In this study, physicians are also nodes, but edges represent patient admissions. Similar patient-sharing network models are studied by several authors.^4–6 Creswick et al.⁷ study a slightly different network of providers. In their study, the authors attempt to understand a communication network among clinical and administrative staff in an emergency department (ED). The authors find that communities are formed around different professions. But, they also notice that other communities result from other reasons such as problem-solving, medication advice, and basic socialization. Boyer et al.⁸ utilize a similar model to analyze the properties of a network of communication among health care professionals in a French public hospital. Tang and Yang⁹ study the centrality of users in an online health care social network. The authors develop a userRank algorithm to assess the influence of users based on the nature of links and the content. Meltzer et al.¹⁰ apply SNA methods to model effective quality improvement teams in health care. In this study, the authors recommend a strategic selection of team members based on their centrality in the network. For example, the authors indicate that a team member with a higher “net degree” will potentially receive more influence from outside, the desired outcome but also one that may dampen the cohesion of the team. Poghosyan et al.¹¹ in their study conduct a systematic review of studies that apply SNA methods to social networks in health care. One of their highlights is that professional and personal networks do have the potential to affect the delivery of medical care, such as in the adoption of new drugs. Chambers et al.¹² also do a systematic review of applications of SNA methods in health care. From their investigation, the authors recommend that researchers focus on SNA-interventions-based models.

SNA related to the network of MCS

There is limited literature about the applications of SNA methods to study the network of MCS. The work that we are aware of includes Carroll and Richardson,¹³ where the authors discuss the use of SNA methods to model the dynamics of health care services. Specifically, the authors emphasize the application of SNA models to evaluate a careflow network. The suggested network includes nodes of patients, carers, health care administrators, social workers, and other relevant actors. The links in the network express the exchange of information and resources among the nodes. The authors indicate that using SNA to study the structure of such network could help reveal potential opportunities to improve patient safety and quality outcomes. Niyirora and Klimek-Yingling¹⁴ also apply SNA methods to analyze connections in health care services. The authors demonstrate how to use SNA centrality measures to identify the most important services in an ED. In this article, we extend the work of these authors by proposing SNA and PCA–based algorithms to evaluate the communities of MCS and thus help clinicians confirm adherence to existing clinical protocols or detect atypical connections.

SNA in medical care

SNA, which relates to graph theory,^15,16 is routinely applied to network problems in medical care. For example, in neuroscience, SNA is commonly utilized in the construction and analysis of graph theoretical models of the brain.¹⁷ Sanz-Arigita et al.¹⁸ and Supekar et al.¹⁹ use graph theoretical models to identify the biomarkers of Alzheimer’s disease. Similar approaches are undertaken in Bassett et al.²⁰ to analyze the biomarkers of schizophrenia. Bath et al.²¹ apply graph-theoretical models to study patterns and clusters of illness using public-health data.

PCA and other clustering techniques

PCA is a statistical technique that is often employed for purposes such as spectral decomposition, dimension reduction, factor analysis, and spectral clustering.^22,23 Several researchers have applied PCA methods on a variety of health care problems including attempting to identify the phenotypes of acute mountain sickness.²⁴ In Arai et al.,²⁵ PCA is used to predict strains of a bacterial infection in septic patients. In Platts-Mills et al.,²⁶ PCA is used for the dimensional reduction of the data related to motor vehicle accidents in an ED. In Motin et al.,²⁷ PCA is used in the ensemble empirical mode decomposition technique to extract respiratory and heart rates from photoplethysmographic signals. In Widjaja et al.,²⁸ PCA is applied to the electrocardiogram data to obtain respiratory signals, whereas in Sharma et al.,²⁹ PCA is used to compress the similar type of data. In Charkhkar et al.,³⁰ PCA-based clustering methods are applied to diagnose acute coronary syndrome. Nilashi et al.³¹ apply soft computing techniques to cluster diabetes mellitus disease into groups. Some of the methods the authors use include expectation maximization, PCA, and support vector machine.

Besides PCA, other statistical and clustering algorithms that are commonly employed to classify health care data, include k-means, hierarchical clustering methods, sphere exclusion techniques, and the lagged linear correlation method. For example, Loane et al.³² use a k-means algorithm along with hierarchical clustering methods to classify the patterns of movement in the homes of older residents. O’Neill et al.³³ also use k-means to cluster patterns of telephone calls regarding mental health. Vazquez Guillamet et al.³⁴ use a sphere exclusion technique to cluster different types of chronic obstructive pulmonary disease, given International Classification of Diseases–Ninth Revision (ICD-9) medical codes. Hripcsak and Albers³⁵ use the lagged linear correlation method to group the laboratory values and clinical concepts.

Contributions

In this research, we do not introduce any new mathematics. We build on existing PCA and SNA theories to develop algorithms to classify communities of MCS and allow for the detection of the patterns of care. Our main contributions are as follows:

We introduce an SNA-based algorithm to create a neighborhood graph of MCS and identify relevant communities of MCS.

We introduce a PCA-based algorithm to develop a loading matrix to allow for the detection of communities of MCS based on the coefficients of correlation.

To detect the pattern of care, we suggest introducing secondary nodes into the network and then infer the context of each community based on the meaning of these new nodes.

We propose using radar charts and scatter plots to present and visualize communities of MCS.

We provide a detailed example of our methods using data from an ED in New York State.

The organization of the rest of the article follows. In the “Methods” section, we introduce our SNA and PCA algorithms. In the “Application” section, we highlight the performance of our methods using the ED data. We end the article with concluding remarks and plans for future work. In Appendix 1, there is a detailed example of how to implement the proposed algorithms.

Methods

We propose applying SNA and PCA–based algorithms to analyze the network of MCS. We begin with the construction of an adjacency matrix that we later use to create a neighborhood graph and a loading matrix. Subsequently, we attempt to find and interpret the pattern of care in the communities of the network.

Constructing an adjacency matrix

We suppose that there is a population of patient visits in a particular clinical setting. From this population, we take a representative random sample of $n$ number of visits and identify $m$ number of unique services. For each visit $i$ (for $i = 1, \dots, n$ ), if service $j$ (for $j = 1, \dots, m$ ) is present, we encode 1 in the entry $x_{i j}$ of a dataframe $X$ ; otherwise, we encode 0. In the end, $X$ becomes an indicator $n \times m$ matrix, as follows

X = (\begin{matrix} x_{11} & x_{12} & \dots & x_{1 m} \\ x_{21} & x_{22} & \dots & x_{2 m} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ x_{n 1} & x_{n 2} & \dots & x_{n m} \end{matrix})

(1)

where for a visit $i$

x_{i j} = (\begin{matrix} 1 & if service j is present \\ 0 & otherwise \end{matrix}

(2)

Next, we use the indicator matrix $X$ to create a weighted adjacency matrix $W$ like this

W = X^{'} X

(3)

where ’ signifies transpose and the expanded version of matrix $W$ looks like this

W = (\begin{matrix} \sum_{i = 1}^{n} x_{i 1}^{2} & \sum_{i = 1}^{n} x_{i 1} x_{i 2} & \dots & \sum_{i = 1}^{n} x_{i 1} x_{i m} \\ \sum_{i = 1}^{v} x_{i 1} x_{i 2} & \sum_{i = 1}^{n} x_{i 2}^{2} & \dots & \sum_{i = 1}^{n} x_{i 2} x_{i m} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ \sum_{i = 1}^{n} x_{i 1} x_{i m} & \sum_{i = 1}^{n} x_{i 2} x_{i m} & \dots & \sum_{i = 1}^{n} x_{i m}^{2} \end{matrix})

(4)

\equiv (\begin{matrix} w_{11} & w_{12} & \dots & w_{1 m} \\ w_{21} & w_{22} & \dots & w_{2 m} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ w_{m 1} & x_{m 2} & \dots & w_{m m} \end{matrix})

(5)

where the ≡ symbol implies equivalence. It follows that $W \in I R^{m \times m}$ is symmetric, meaning that $w_{j k} = w_{k j}$ . In the field of study of correspondence analysis, matrix $W$ is known as the Burt matrix.³⁶ For our purposes, this matrix represents the similarity of services. The diagonal element of matrix W, $w_{j j}$ , represents the frequency of service $j$ and the off-diagonal element, $w_{j k}$ , represents the joint-frequency of services $j$ and $k$ , for $j \neq k$ . The bigger the magnitude of $w_{j k}$ , the more services $j$ and $k$ are similar in terms of being rendered together. Since service $j$ can be provided by itself, it follows that

w_{j j} \geq w_{j k}

(6)

which implies that $\sum_{k = 1}^{m} w_{j j} \geq \sum_{j = 1}^{m} w_{j k}$ for $k : 1, \dots, m$ . In a related similarity graph $G$ , that we define later, $w_{j k}$ is the weight of an edge between nodes $j$ and $k$ . The degree of node $j$ , denoted by $d_{j}$ , can be obtained as

d_{j} = \sum_{k = 1}^{m} w_{j k}

(7)

for $j \neq k$ . Accordingly, the size of graph $G$ is given by

v o l (G) = \sum_{j = 1}^{m} d_{j}

(8)

where $v o l ()$ symbolizes the sum of the weights for all the edges. In addition, we can characterize the size of graph $G$ in terms of the total number of nodes, as follows

| G | = m

(9)

Here, $| G |$ denotes the network measure of the number of nodes in graph $G$ . As indicated earlier, $m$ is the number of variables in matrix $W$ . From the spectral decomposition of matrix $X$ , it follows that

t r (W) = t r (Λ)

(10)

where $Λ$ is a diagonal matrix corresponding to the eigenvalues of matrix $W$ and $t r ()$ symbolizes the trace of a matrix.

Searching for MCS communities using SNA concepts

From matrix $W$ , we build an undirected similarity graph $G = (V, E)$ , where $V$ is a set of vertices or nodes and $E$ is a set of edges. The set $V$ is made of unique MCS, whereas the set $E$ is made of the off-diagonal pairwise weights of matrix $W$ . There are three basic ways to construct graph $G$ . Following from Von Luxburg,³⁷ the first option is to fully connect graph G. That is, we link node $j$ to node $k$ if $w_{j k} > 0$ . The second option is to connect node $j$ to node $k$ only if node $j$ is among the nearest neighbors of node $k$ and vice versa. This second method is known as the mutual k-nearest neighbors. There is a third option known as the $ϵ - n e i g h b o r h o o d$ graph, where we connect node $j$ to node $k$ only if $w_{j k}$ is less than the threshold distance $ϵ$ . In this article, we adopt the third option. But, since our matrix $W$ measures the frequency of services not the distance, we propose creating an $ϵ - n e i g h b o r h o o d$ graph by connecting nodes $j$ and $k$ only if node $j$ constitutes at least $ϵ$ percent of the frequency of node $k$ . The reverse must also be true. That is

w_{j k} = (\begin{matrix} w_{j k} & if \min (\frac{w_{j k}}{w_{j j}}, \frac{w_{j k}}{w_{k k}}) \geq ϵ \\ 0 & otherwise \end{matrix}

(11)

where $0 <_ϵ <_1$ and $\min ()$ denotes the minimum value.

We propose Algorithm 1 to find and analyze the communities of MCS using SNA techniques.

Algorithm 1. Based on SNA concepts.
Step 1. Construct an indicator matrix of patient visits $X_{n \times m}$ per equations (1) and (2).
Step 2. Create a weighted matrix $W = X^{'} X \in I R^{m \times m}$ , as indicated in equation (4). An off-diagonal element of this matrix, $w_{i j}$ , constitutes the weight of the link between nodes $i$ and $j$ in the network.
Step 3. Construct a similarity graph $G$ by connecting nodes $j$ and $k$ only if $m i n (\frac{w_{j k}}{w_{j j}}, \frac{w_{j k}}{w_{k k}}) \geq ϵ$ , where $0 <_ϵ <_1$ .
Step 4. Apply the Modularity option in Gephi software to detect communities of MCS.
Step 5. Search for the context or pattern of care in each community using the expert’s opinion, wordclouds, or other pattern recognition techniques.

We should note that, besides the Modularity option in Gephi, several other software such as Cytoscape³⁸ and NetworkX³⁹ are applicable for detecting communities in a graph. For a theoretical discussion about various community detection methods, see Newman.¹⁵

Searching for MCS communities using PCA concepts

Besides SNA techniques, we can also apply the PCA concept to attempt to find communities or clusters of similar MCS. We follow the existing PCA theory^22,23,40 and decompose $W$ this way

W = P Λ P^{'}

(12)

where $P = [p_{1}, p_{2}, \dots, p_{m}]$ is an $m \times m$ matrix of eigenvectors of matrix $W .$ We refer to $P$ as the principal component matrix. Here, $Λ = [λ_{1}, λ_{2}, \dots, λ_{m}]$ is an $m \times m$ diagonal matrix containing the eigenvalues of matrix $W$ such that $λ_{1} \geq λ_{2} \geq \dots \geq λ_{m}$ . We typically organize matrix $P$ so that column $j$ corresponds to $λ_{j}$ in matrix $Λ$ , for $j = 1, \dots, m$ . By right-multiplying both sides of equation (12), we obtain

W P = P Λ

(13)

since $P^{'} P$ equals the identify matrix $I$ and $Λ I = Λ$ . We refer to $W P$ as the principal component score matrix. We refer to the entries of matrix $W P$ as “new variables” and the entries of matrix $W$ as “old variables.” The correlation between new and old variables yields a “loading” matrix $R$ that we use to detect communities of MCS. We construct matrix $R$ as follows

R = ρ (W, W P)

(14)

where $ρ ()$ signifies the Pearson correlation function. The coefficients of matrix $R$ are called “loadings,” and they indicate how influential old variables were in the creation of the principal components of the new variables. In this research, we consider a loading significant when its absolute value is greater than the given cutoff threshold of 0.5, a practice that is consistent with what is typically done in factor analysis.²³ We propose Algorithm 2 to find and analyze the communities of MCS using PCA-based concepts.

Algorithm 2. Based on PCA concepts.
Step 1. Construct an indicator matrix of patient visits $X_{n \times m}$ per equations (1) and (2).
Step 2. Create a weighted matrix $W = X^{'} X \in I R^{m \times m}$ , as indicated in equation (4). An off-diagonal element of this matrix, $w_{i j}$ , constitutes the weight of the link between nodes $i$ and $j$ in the network.
Step 3. Conduct the spectral decomposition of the weighted matrix so that $W = P Λ P^{'}$ .
Step 4. Arrange matrix $P$ in a descending order of the largest to the lowest component by following the descending order of the corresponding eigenvalues.
Step 5. Create a principal component score matrix $W P$ .
Step 6. Create the loading matrix by $R = ρ (W, W P)$ .
Step 7. Visualize communities of services using scatters plots and radar charts.
Step 8. Search for the context or pattern of care in each community using the expert’s opinion, wordclouds, or topic modeling techniques.

Searching for the pattern of care

To search for the pattern of care in a given community of MCS, we can request an expert’s opinion or employ topic modeling techniques such as the latent Dirichlet allocation.⁴¹ To get started, we suggest utilizing simple tools like a word cloud⁴² (e.g. see https://www.wordclouds.com/ or https://pypi.org/project/wordcloud/) to help contextualize the distribution of words in the descriptions of MCS. As we will demonstrate later, the most frequent words tend to reveal the pattern of care. Another simple technique that we could employ involves introducing auxiliary nodes into the network (e.g. medical diagnoses) and then infer the pattern of care based on the meaning of these new nodes. One other benefit of auxiliary nodes is that we may be able to uncover transitive relationships that were not previously noticeable. For example, in the original network, we expect services j and k to belong in the same community only if $\min (w_{j k} / w_{j j}, w_{j k} / w_{k k}) \geq ϵ$ (see Algorithm 1). But, after adding secondary nodes, $\min (w_{j k} / w_{j j}, w_{j k} / w_{k k})$ may be less than $ϵ$ , yet services $j$ and $k$ could still belong in the same community through the transitive relationship with service $ℓ$ when the following two conditions are met:

$\min (w_{j ℓ} / w_{j j}, w_{j ℓ} / w_{ℓ ℓ}) \geq ϵ$ .

$\min (w_{k ℓ} / w_{k k}, w_{k ℓ} / w_{ℓ ℓ}) \geq ϵ$ .

Visualizing the community

In this article, we propose using radar charts to visualize communities of services in matrix $R$ . We also propose visualizing communities using a scatter diagram of the first two components of matrix $R$ . From our experiments, we have observed that the following:

Scatter plots tend to show more distinct communities when matrix $R$ is created from a normalized matrix $W P$ , meaning that each component is a unit vector.

The values of the first eigenvector of matrix $P$ are all negative. The sorted values of this vector could be used to rank the centrality of MCS.

The second eigenvector of matrix $P$ behaves like the Fiedler vector in graph theory⁴³ and could be used to partition the MCS network into at least two distinct communities.

We will demonstrate how to apply these visualization tools in the “Application” section and Appendix 1 of this article.

Other useful techniques for classifying MCS

Other useful techniques for classifying the communities of MCS include the centroid method (a hierarchical clustering technique) and k-mean (a non-hierarchical clustering technique). The k-mean method is particularly appealing given its tendency to work well in many applications. Although, from several experiments that we ran, we found the latter approach not to work as well, especially when a priori number of clusters k exaggerated the actual number of clusters in the data. One of the reasons of such a poor performance of the k-mean technique in the classification of MCS is because our matrix $W$ represents the frequency of services (with self-edges allowed), not the typical Euclidean distance used in traditional clustering techniques. But, we have found that the spectral clustering algorithms that incorporate the k-mean technique^37,44 tend to provide reasonable clusters when the initial number of communities $k$ is carefully chosen. We show the comparison of the results in Appendix 1 of this article.

Application

In this section, we demonstrate how to apply our proposed network algorithms using the data from an ED of a hospital in New York State. We show more details in Appendix 1.

Our data

We obtained a sample of de-identified patient visits $(n = 1931)$ with distinct MCS $(m = 264)$ coded using the Current Procedural Terminology (CPT) nomenclature system.⁴⁵ Also, we received related medical diagnoses coded using the International Classification of Diseases–Ninth Revision–Clinical Modification (ICD-9-CM) coding system. In the United States, all hospitals that treat Medicare and Medicaid patients are mandated to use CPT codes to report outpatient procedures and services.⁴⁶ As of 1 October 2015, the classification system used to code medical diagnosis is International Classification of Diseases–Tenth Revision–Clinical Modification (ICD-10-CM).⁴⁷

Nodes of the network

We use CPT codes as the main nodes in the network and ICD-9-CM codes as the secondary nodes that we employ to infer the pattern of care. For the reader not familiar with CPT, this coding system is arranged into distinct sections of medical services by physician specialties, body system, or other well-defined criteria. Within each section, codes are arranged in numerical or alphanumerical order. A valid CPT code is five characters long.⁴⁵ Typically, the first two characters of such a code are enough to reveal a broader meaning of the type of service that was provided. To minimize the sparsity and noise in the data, we only use the first two characters of a CPT code to label nodes in our network. For example, CPT code $21010$ (Arthrotomy, temporomandibular joint) and CPT code $21920$ (Biopsy, soft tissue of back, superficial) relate to a single node labeled $21$ . This node broadly indicates “surgery on the musculoskeletal system.” We apply the same reasoning to ICD-9-CM codes and only use the first character of a code to label secondary nodes in the network. Normally, a valid ICD-9-CM code is between three and five characters long,⁴⁷ but the first character tends to unveil the general diagnosis category. For example, an ICD-9-CM code that starts with number 8 always relates to some type of physical injury. So, we will group all ICD-9-CM codes that start with number $8$ under a single secondary node labeled $8$ , and whenever we see this node in a given community of MCS, we will infer that medical services in this community somehow relate to injury.

Creating an indicator matrix $X$

We create the indicator matrix $X$ , as indicated in equations (1) and (2) and exemplified in Table 1.

Table 1.

The setup of matrix $X$ .

Visit no.	10	11	. . .	9	E
1	0	1	. . .	0	0
2	0	0	. . .	1	0
⋮	⋮	⋮	⋮	⋮	⋮
1931	1	0	. . .	0	1

The first column of Table 1 identifies the patient visit number. The rest of the columns represent the nodes in the network. The two numerical characters in the heading on this table relate to CPT codes, and they represent primary nodes in the network. The single characters in the heading relate to ICD-9-CM codes and they represent secondary nodes in the network.

Creating matrices $W$ , $W P$ , and $R$

We create matrix $W$ per equation (4), and decompose it per equation (12). In addition, matrices $W P$ and $R$ follow from equations (13) and (14), respectively. To perform the required matrix operations, we employ the Scipy library in Python.⁴⁸ In Appendix 1, we show examples of matrices $W$ , $P$ , $W P$ , and $R$ from a smaller network of MCS, when only the first character, instead of the first two characters, of a CPT code is considered.

We search for the context of a given community by evaluating the pattern of words in the descriptions of CPT and ICD-9-CM labels. Whenever possible, we suggest that clinical experts be involved in helping identify unusual communities of services or those communities that relate to known medical protocols. In this article, we relied on our background in health information management to describe communities resulting from our analysis.

Results obtained by applying Algorithm 1

In Figure 3, we present an є-neighborhood graph that we created by applying Algorithm 1 with $ϵ = 0.05$ . In this figure, we only considered CPT nodes. In Figure 4, we added ICD-9-CM nodes, which slightly changed the configuration of the graph, but will help identify the pattern of care. In this example, we used the Modularity option in Gephi software⁴⁹ to detect communities of MCS.

Figure 3.

A network of MCS when only CPT nodes are considered.

Figure 4.

A network of MCS when both CPT and ICD-9-CM nodes are used. CPT nodes, labeled using two numerical digits, represent MCS. ICD-9-CM nodes, labeled using a single alphanumeric character, broadly present medical diagnoses under treatment.

In both Figures 3 and 4, a different color identifies a different community. The bigger the node, the more frequent the label. Also, the thicker the edge, the more related the linked nodes are. To interpret each community, we apply the Wordcloud method⁴² on the descriptions of both the CPT and ICD-9-CM nodes, as exemplified in Figure 5. In this figure, we attempt to identify the pattern of care in the communities portrayed in Figure 3.

Figure 5.

Pattern of care related to the network in Figure 3—(a) {36, 80, 81, . . .}: Lab tests. (b) {73, 12, 29, . . .}: Care of injuries. (c) {43, 49, 88}: Gastrointestinal care. (d) {99, 27, 23}: Care of shoulder conditions. (e) {50, 75}: Nephrostomy care. (f) {31, 92}: Care of cardiac arrest.

To interpret the communities in Figure 4, we can also apply the option of the Wordcloud technique. But, we also have an option of inferring the context of each community using the description of the related ICD-9-CM nodes. Hence, we can deduce that the largest community in this figure likely relates to MCS for medical or non-injury conditions given the context of ICD-9-CM nodes in this community (e.g. labels 2, 3, 4, 5, and 7). For example, node $7$ relates to the signs and symptoms, meaning that the associated CPT nodes likely involve MCS to diagnose the underlying medical conditions (e.g. MCS for blood work). The second-largest community in Figure 4 likely relates to MCS for treating injuries, since most of the ICD-9-CM codes that start with 8, 9, and E tend to characterize injuries. The community that includes node 6 likely relates to MCS for attending to pregnancy conditions, given that several ICD-9-CM codes that start with number 6 tend to associate to pregnancy diagnoses. Our conclusion here is supported by the fact that this community includes CPT nodes 76 and 86, which do indeed involve pregnancy services.

Smaller communities in Figure 4, with two or three linked nodes, do not have any ICD-9-CM nodes associated with them. So, we can apply the Wordcloud technique to the descriptions of the related CPT nodes to deduct the pattern of care. The unlinked CPT nodes (i.e. 10, 25, 26, 28, 30, etc.) do not exhibit any pattern of care. We consider them rare MCS.

Results obtained by applying Algorithm 2

In this section, we present results from our PCA-based algorithm. In typical applications of PCA, a scatter plot of the components of the principal score matrix $W P$ is created to detect related points. The general conclusion is that points that are clustered together or load on the same component are related.²³ We show an example of such scatter plots in Appendix 1. Here, we present scatter plots from the components of matrix $R$ , as illustrated in Figures 6 and 7. From both of these figures, we appreciate distinct communities of MCS, especially by the quadrant of location. The more proximal the points, the more they are related, which is why some points overlap in these figures. For example, quadrant I of Figure 6 contains a community of MCS given by ${12, 26, 25, 29, 27, 28, 90, 8}$ . Given that this community includes node 8, we conclude the pattern of care related to the treatment of injuries, since all ICD-9-CM codes that start with number 8 tends to represent injuries. We were able to validate this conclusion using the Wordcloud technique. In quadrant II of the same figure, we observe two potential communities of ${9, 23, E, 99, 73, 57}$ and ${70, 72, 75, 97}$ . Both of these communities relate to diagnostic MCS for injuries. In quadrant III of the same figure, we observe a large community ${50, 55, 56, \dots}$ that relates to the diagnostic and therapeutic MCS for non-injury conditions. The small community located in quadrant IV of this figure, ${43, 49}$ , relates to gastrointestinal care.

Figure 6.

A scatter plot of the normalized PC1 and PC2 of matrix R, where matrix $R$ was created using the normalized matrix $W P$ . Both the CPT and ICD-9-CM nodes are included.

Figure 7.

A scatter plot of the normalized PC1 and PC2 of matrix R, where matrix $R$ was created using the normalized matrix $W P$ . Only CPT nodes are considered.

A similar analysis can be carried out to detect the pattern of care in the communities of Figure 7.

From our experience, scatter plots from matrix $R$ tend to display more compact communities as compared to their counterparts from matrix $W P$ . We recommend normalizing matrix $W P$ before creating matrix $R$ , so each component has a unit length. In addition, normalizing the rows of PC1 and PC2, before creating a scatter plot, may help better isolate distinct communities. This idea of normalization is also often applied in spectral clustering algorithms where the principal components are first normalized before employing the k-mean procedure.⁴⁴

In addition to attempting to detect communities of MCS using scatter plots, we also suggest employing radar charts. To demonstrate this idea, we have plotted a few components of matrix $R$ in Figures 8 and 9. The plot in Figure 8 only incorporates CPT nodes, whereas the plot in Figure 9 includes both the CPT and ICD-9-CM nodes. We have adopted generic names for the components using $P C 1$ , $PC2$ , . . . , to indicate that $PC1$ relates to the latent root $λ_{1}$ , $PC2$ relates to the latent root $λ_{2}$ , and so on. But, these generic names do not carry any meaning beyond the radar where they are plotted. For example, $PC1$ in Figure 8 has nothing to do with $PC1$ in Figure 9, as the related components were derived using different data sets. In addition, we should note that all the components that we plotted satisfy the cutoff threshold of 0.5, as it is often the practice in factor analysis.²³ For example, $PC2$ and $PC3$ were not plotted in Figure 8 since they did not satisfy the indicated cutoff.

Figure 8.

Radar charts portraying a few components of matrix $R$ where at least one absolute loading is greater or equal to the cutoff threshold of 0.5. Only CPT nodes are used.

Figure 9.

Radar charts portraying a few components of matrix $R$ where at least one absolute loading is greater or equal to the cutoff threshold of 0.5. Both the CPT and ICD-9-CM nodes are used.

To visualize the communities in the radar charts, we color-code the components. Each distinct color represents a particular community. When the components overlap, the highest loading takes precedence. The interpretation of the pattern of care would be as before. For example, regarding Figure 8, after applying the Wordcloud technique to the descriptions of the nodes of PC1, we deduce that this community incorporates diagnostic MCS for labs. Likewise, after applying the same technique, we conclude that PC11 relates to MCS for lacerations. Our conclusion can be justified by the fact that node 12 relates to suture repair, and node 90 relates to prophylactic immunization. Both PC4 and PC6 in Figure 8 relate to the care for the musculoskeletal injuries, whereas PC19 relates to MCS for gastrointestinal conditions. Other components of Figure 8 can be interpreted using the same approach.

Regarding Figure 9, we can conclude that PC1 incorporates MCS for treating or diagnosing injuries, since attributes 8, 9, and E tend to be the first characters of ICD-9-CM codes that represent injuries. Likewise, given that attributes V, 1, 2, 3, 4, 5, 6, and 7 tend to be the first characters of ICD-9-CM codes for medical conditions, we can deduce that PC2 relates to the diagnostic or therapeutic MCS for medical conditions.

We want to emphasize that the components of Figure 9 (e.g. PC1) are not related in any way to the components in Figure 8 (e.g. PC1) since different data sets were used to derive them. The simply share the same generic names as discussed earlier.

Remark 1

Unlike in traditional scatter diagrams, where only two components are typically visualized at a time, radar charts allow for the visualization of several components at once. Indeed, we can plot all qualifying components of matrix $R$ to attempt to detect all potential communities in the network. For a network with a large number of nodes, we can alternately enumerate communities in a table or list format, as we demonstrate in Appendix 1, Table 6.

Practical implications of our research

Our research builds on existing SNA and PCA mathematical theories and introduces new algorithms to help health care practitioners visualize communities of MCS and assess the pattern of care. One practical implication is that by knowing typical patterns of care, health care managers may be able to use this information to better coordinate and schedule resources to provide needed services. The latter effort may contribute to the efficiency of medical care delivery. Another practical implication is that clinical experts may be able to confirm adherence to existing protocols of medical care. For example, if services $a$ and $b$ are supposed to be provided to patients with condition $x$ , but the analysis does not reveal any relationship between $a$ , $b$ and $x$ , then it is likely that the protocol is not being followed. Likewise, clinical experts may be able to detect unusual clusters of services, which may hint at possible deviations from clinical standards or administrative practices. Regardless of the potential cause, all suspiciously looking communities should be investigated. Another practical implication of our research relates to epidemiological studies. Indeed, one can easily use our algorithms to build a network of signs and symptoms for various medical conditions. Furthermore, patient demographics and geographic information could be added to the network, as secondary nodes, to help researchers assess the pattern of an epidemic under study.

Besides the application in health care settings, our algorithms could also be applied to the creation and analysis of any network using an indicator or incidence matrix of interest. We particularly believe that our approach of generating matrix $R$ and applying radar charts to display the results is a useful new tool for SNA.

Limitations and future research

Our research was limited to a network of MCS from a single medical facility. As a result, we are unable to generalize our findings regarding particular communities of services. For future research, we would like to extend our work to multiple facilities and assess similarities in the clinical protocols. In addition, we intend to evaluate the effects of the communities of MCS on quality outcomes and medical costs. On the technical level, we plan to investigate the parity between Algorithm 2 and spectral clustering algorithms that use k-mean. Our preliminary results indicate that the number of communities from Algorithm 2 could serve as the input number of k-clusters in spectral clustering algorithms.

For the researcher interested in expanding this work, a convenient problem to work on relates to how adjacency matrices of MCS converted into distance matrices affect relationships in the network. From the experiments that we ran, with limited data available to us, we have found that relationships do change, but have not yet established theoretical reasons or assessed practical implications of the resulting communities.

Conclusion

In this research, we introduced techniques for constructing a network of MCS by creating an indicator matrix and then converting it into a weighted adjacency matrix. From there, we developed algorithms to generate a neighborhood graph and a loading matrix to allow for the detection of communities of services. To interpret the context of each community, we proposed applying a Wordcloud technique to find the pattern of care in the descriptions of MCS in the network. Another pattern detection technique that we suggested was adding secondary nodes into the network and then conjecturing the meaning of a community from the descriptions of these new nodes. We demonstrated how to implement our algorithms using the data from an ED in New York State. From this ED example, we showed how to generate and visualize communities of services using radar and scatter diagrams. The findings from this study indicate that our algorithms could potentially help clinical experts detect deviation from either existing protocols of care or administrative norms.

Footnotes

Appendix 1

In this appendix, we present a detailed example of how to apply our proposed SNA and PCA algorithms. We still use the same data from an ED of a hospital in New York State, but this time consider a smaller network to generate matrices that can fit on a single page. Accordingly, we only select the first character of CPT codes to label main nodes. As before, we select the first character of ICD-9-CM codes to label secondary nodes. We pad CPT nodes with zeros to distinguish them from ICD-9-CM nodes. For example, the first character of a CPT code 21010 is 2, and we pad it as 20000, so it is distinguishable from 2, a label that we will use to represent all ICD-9-CM codes that begin with the number 2. We proceed by following these steps:

From this matrix P , we can derive the results in Figures 10 and 11.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Jerome Niyirora

References

Landon

Keating

Barnett

, et al. Variation in patient-sharing networks of physicians across the United States. JAMA 2012; 308(3): 265–273.

Landon

Keating

Onnela

, et al. Patient-sharing networks of physicians and health care utilization and spending among Medicare beneficiaries. JAMA Intern Med 2018; 178(1): 66–73.

Wang

Srinivasan

Uddin

, et al. Application of network analysis on healthcare. In: Proceedings of the 2014 IEEE/ACM international conference on advances in social networks analysis and mining, Beijing, China, 17–20 August 2014, pp. 596-603. New York: IEEE.

Evan Pollack

Wang

Bekelman

, et al. Physician social networks and variation in rates of complications after radical prostatectomy. Value Health 2014; 17(5): 611–618.

Srinivasan

Arunasalam

Leveraging big data analytics to reduce health care costs. IT Prof 2013; 15(6): 21–28.

Barnett

Christakis

O’Malley

, et al. Physician patient-sharing networks and the cost and intensity of care in US hospitals. Med Care 2012; 50(2): 152.

Creswick

Westbrook

Braithwaite

Understanding communication networks in the emergency department. BMC Health Serv Res 2009; 9: 247.

Boyer

Belzeaux

Maurel

, et al. A social network analysis of health care professional relationships in a French hospital. Int J Health Care Qual Assur 2010; 23(5): 460–469.

Tang

Yang

CC.

Identifing influential users in an online healthcare social network. In: Proceedings of the 2010 IEEE international conference on intelligence and security informatics, Vancouver, BC, Canada, 23–26 May 2010, pp. 43–48. New York: IEEE.

10.

Meltzer

Chung

Khalili

, et al. Exploring the use of social network methods in designing health care quality improvement teams. Soci Sci Med 2010; 71(6): 1119–1130.

11.

Poghosyan

Lucero

Knutson

ARW

, et al. Social networks in health care teams: evidence from the United States. J Health Organ Manag 2016; 30(7): 1119–1139.

12.

Chambers

Wilson

Thompson

, et al. Social network analysis in health care settings: a systematic scoping review. PLoS ONE 2012; 7(8): e41911.

13.

Carroll

Richardson

Mapping a careflow network to assess the connectedness of connected health. Health Informatics J 2019; 25(1): 106–125.

14.

Niyirora

Klimek-Yingling

Using social network analysis to identify the most central services in an emergency department. Health Syst 2016; 5(1): 29–42.

15.

Newman

Networks: an introduction. Oxford: Oxford University Press, 2010.

16.

Wasserman

Faust

Social network analysis: methods and applications, vol. 8. Cambridge: Cambridge University Press, 1994.

17.

Bullmore

Sporns

Complex brain networks: graph theoretical analysis of structural and functional systems. Nat Rev Neurosci 2009; 10(3): 186.

18.

Sanz-Arigita

Schoonheim

Damoiseaux

, et al. Loss of “small-world” networks in Alzheimer’s disease: graph analysis of FMRI resting-state functional connectivity. PLoS ONE 2010; 5(11): e13788.

19.

Supekar

Menon

Rubin

, et al. Network analysis of intrinsic functional brain connectivity in Alzheimer’s disease. PLoS Comput Biol 2008; 4(6): e1000100.

20.

Bassett

Bullmore

Verchinski

, et al. Hierarchical organization of human cortical networks in health and schizophrenia. J Neurosci 2008; 28(37): 9239–9248.

21.

Bath

Craigs

Maheswaran

, et al. Validation of graph-theoretical methods for pattern identification in public health datasets. Health Informatics J 2002; 8(4): 167–173.

22.

Jolliffe

Principal component analysis. New York: Springer, 2002.

23.

Sharma

Applied multivariate techniques. New York: John Wiley, 1996.

24.

Bian

Jin

Zhang

, et al. Principal component analysis and risk factors for acute mountain sickness upon acute exposure at 3700 m. PLoS ONE 2015; 10(11): e0142375.

25.

Arai

Ohta

Tsurukiri

, et al. Procalcitonin levels predict to identify bacterial strains in blood cultures of septic patients. Am J Emerg Med 2016; 34(11): 2150–2153.

26.

Platts-Mills

Flannigan

Bortsov

, et al. Persistent pain among older adults discharged home from the emergency department after motor vehicle crash: a prospective cohort study. Ann Emerg Med 2016; 67(2): 166–176. e1.

27.

Motin

Karmakar

Palaniswami

Ensemble empirical mode decomposition with principal component analysis: a novel approach for extracting respiratory rate and heart rate from the photoplethysmographic signal. IEEE J Biomed Health Inform 2018; 22(3): 766–774.

28.

Widjaja

Varon

Dorado

, et al. Application of kernel principal component analysis for single-lead-ECG-derived respiration. IEEE Trans Biomed Eng 2012; 59(4): 1169–1176.

29.

Sharma

Dandapat

Mahanta

Multichannel ECG data compression based on multiscale principal component analysis. IEEE Trans Inf Technol Biomed 2012; 16(4): 730–736.

30.

Charkhkar

Vakili

Kheirabadi

, et al. Clustering-making model for diagnosis of acute coronary syndrome using PCA based on K-MICA algorithms. Am J Emerg Med 2018; 36(1): 160–161.

31.

Nilashi

Bin Ibrahim

Mardani

, et al. A soft computing approach for diabetes disease classification. Health Informatics J 2018; 24(4): 379–393.

32.

Loane

O’Mullane

Bortz

, et al. Looking for similarities in movement between and within homes using cluster analysis. Health Informatics J 2012; 18(3): 202–211.

33.

O’Neill

Bond

Grigorash

, et al. Data analytics of call log data to identify caller behaviour patterns from a mental health and well-being helpline. Health Informatics J 2018; 25: 1722–1738.

34.

Vazquez Guillamet

Ursu

Iwamoto

, et al. Chronic obstructive pulmonary disease phenotypes using cluster analysis of electronic medical records. Health Informatics J 2018; 24(4): 394–409.

35.

Hripcsak

Albers

DJ.

Correlating electronic health record concepts with health care process events. J Am Med Inform Assoc 2013; 20(e2): e311–e318.

36.

Greenacre

Correspondence analysis in practice. London: Chapman and Hall, 2017.

37.

Von Luxburg

. A tutorial on spectral clustering. Stat Comput 2007; 17(4): 395–416.

38.

Shannon

Markiel

Ozier

, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 2003; 13(11): 2498–2504.

39.

Hagberg

Swart

Daniel

SC.

Exploring network structure, dynamics, and function using network. Report no. LA-UR-08-05495; LA-UR-08-5495, 1 January 2008. Los Alamos NM: Los Alamos National Laboratory.

40.

Searle

Khuri

AI.

Matrix algebra useful for statistics. Hoboken, NJ: John Wiley & Sons, 2017.

41.

Blei

Jordan

MI.

Latent Dirichlet allocation. J Mach Learn Res 2003; 3: 993–1022.

42.

Felix

Franconeri

Bertini

Taking word clouds apart: an empirical investigation of the design space for keyword summaries. IEEE Trans Visual Comput Graph 2017; 24(1): 657–666.

43.

Ding

Zha

, et al. A min-max cut algorithm for graph partitioning and data clustering. In: Proceedings 2001 IEEE international conference on data mining, San Jose, CA, 29 November–2 December 2001, pp. 107–114. New York: IEEE.

44.

Jordan

Weiss

. On spectral clustering: Analysis and an algorithm. In: Proceedings of the advances in neural information processing systems, Vancouver, BC, Canada, 9–14 December 2002, pp. 849–856. Cambridge, MA: MIT Press.

45.

AMA. Current procedural terminology. Chicago, IL: American Medical Association, 2018.

46.

Oachs

Watters

(eds). Health Information Management: Concepts, Principles, and Practice. Chicago, IL: American Health Information Management Association.

47.

CDC. International classification of diseases, ninth revision, clinical modification (ICD-9-cm), https://www.cdc.gov/nchs/icd/icd9cm.htm (accessed 30 August 2013).

48.

Oliphant

TE.

Python for scientific computing. Comput Sci Eng 2007; 9(3): 10–20.

49.

Bastian

Heymann

Jacomy

. Gephi: An open source software for exploring and manipulating networks. In: Proceedings of the third international AAAI conference on weblogs and social media, San Jose, CA, 17–20 May 2009.

50.

Pedregosa

Varoquaux

Gramfort

, et al. Scikit-Learn: Machine learning in Python. J Mach Learn Res 2011; 12: 2825–2830.

Network analysis of medical care services