Highlighting the rules between diagnosis types and laboratory diagnostic tests for patients of an emergency department: Use of association rule mining

Abstract

Diagnostic tests are widely used in emergency departments to make detailed investigations on diagnosis and treat patients correctly. However, since these tests are expensive and time-consuming, ordering correct tests for patients is crucial for efficient use of hospital resources. Thus, understanding the relation between diagnosis and diagnostic test requirement becomes an important issue in emergency departments. Association rule mining was used to extract hidden patterns and relation between diagnosis and diagnostic test requirement in real-life medical data received from an emergency department. Apriori was used as an association rule mining algorithm. Diagnosis was grouped into 21 categories based on International Classification of Disease, and laboratory tests were grouped into four main categories (hemogram, biochemistry, cardiac enzyme, urine and human excrement related). Both positive and negative rules were discovered. Since the nature of the data had the dominance of negative values, higher number of negative rules with higher confidences were discovered compared to positive ones. The extracted rules were validated by emergency department experts and practitioners. It was concluded that understanding the association between patient’s diagnosis and diagnostic test requirement can improve decision-making and efficient use of resources in emergency departments. Association rules can also be used for supporting physicians to treat patients.

Keywords

Apriori association rule mining diagnostic test emergency department ICD-10

Introduction

Most of the health services, such as ambulatory services, hospitals, clinics and many others, have employed information systems to store and manage their patient data. Although these information systems accumulate huge amounts of data in different forms (numbers, text, images, etc.), turning them into useful information that would enable to make important medical decisions is a big deal for a healthcare practitioner. In this context, use of data mining techniques becomes unavoidable for researchers to extract useful information from medical databases.

Data mining is an emerging technology combining statistics, artificial intelligence, and machine learning to extract valuable information from vast amounts of stored data in databases. Main functions of data mining are outlier detection, classification, cluster and association analysis, and forecasting. Particularly, functions of data mining can be applied in medical field to improve decision-making such as prognosis, diagnosis, and treatment planning.¹ Due to the high patient volumes, emergency departments (EDs) are the main units of hospitals which may have vast amount of the raw data of hospital’s information system. Besides, this overcrowding increases the complexity in operational planning.² Thus, use of data mining becomes more of an issue in EDs to make effective decisions. In the literature, there have been many studies which used different functions of data mining such as for clustering the patients,^3–5 classifying them,⁶ or generating predictions.^7–9 However, to the best of the knowledge, use of association analysis or association rule mining (ARM) is very rare in ED context.

ARM is one of the most important functions of data mining. It is a structured method of discovering all frequent patterns in a data set and forming noticeable rules among frequent patterns. In other words, it is a way of discovering relations between items in big data. In medical field, ARM is used to discover frequent diseases in specific areas.¹⁰ Particularly, in ED context, in addition to discover frequent diseases, relations between different types of diseases and diagnostic tests can also be highlighted by applying ARM to make rapid decisions and plan operations more efficiently.

The aim of this study was to discover frequent rules between diagnosis types and different types of laboratory diagnostic tests (LDTs) for patients of an ED using ARM. LDTs were combined in four main categories as: hemogram, coagulation, blood type; biochemistry; cardiac enzymes; and urine and human excrement. As one of the best and most commonly used algorithms, Apriori was used to extract association rules. Both positive and negative rules were generated to analyze which diagnosis types require or not require LDTs. The mined rules were discussed and validated by ED management. It is foreseen that while ordering LDTs, considering these rules will provide guidance for ED practitioners in decision-making.

Background

In this section, main concepts and definitions of ARM and the applied technique of ARM, namely, Apriori, were initially defined. Then, related studies aiming to discover association rules in the medicine literature were summarized.

Definitions

ARM

Mining the past transactions to extract association rules to discover relationships and dependencies within data set is called ARM. ARM was first introduced by Agrawal et al.¹¹ Let $D = {T_{1}, T_{2}, \dots, T_{n}}$ be set of n transactions and let I be set of items, $I = {i_{1}, i_{2}, \dots, i_{m}}$ . Each transaction is a set of items, that is, $T_{i} \subseteq I$ . An association rule is an implication of the form $X \Rightarrow Y$ , where $X, Y \subset I and X \cap Y = \emptyset$ ; X is called the antecedent and Y is called the consequent of the rule. In general, a set of items, such as X or Y (in other words antecedent or consequent of the rule), is called an item set. For an item set $X \subseteq I$ , $s u p p o r t (X)$ is defined as the fraction of transactions $T_{i} \in D$ such that $X \subseteq T_{i}$ . The support of a rule $X \Rightarrow Y$ is defined as $s u p p o r t (X \Rightarrow Y) = s u p p o r t (X \cup Y)$ . The rule has a measure of reliability named confidence and statistically defined as $c o n f i d e n c e (X \Rightarrow Y) = s u p p o r t (X \cup Y) / s u p p o r t (X)$ . Lift is the other significance metric for association rules which quantifies the predictive power of the rule $X \Rightarrow Y,$ and defined as $l i f t (X \Rightarrow Y) = c o n f i d e n c e (X \Rightarrow Y) / s u p p o r t (Y)$ .

The standard problem of ARM¹¹ is to discover all rules whose metrics are at least equal to user-specified values of minimum support and minimum confidence.

Apriori

Apriori algorithm which was first introduced by Agrawal and Srikant¹² has become a well-known and widely used approach for ARM. Data set containing transactions is given to Apriori as input to generate the association rules which represent frequent item set and have support or confidences greater than the given thresholds. In the algorithmic process of Apriori, item set I of length m is frequent if and only if every subset of I with length m – 1 are also frequent. In this regard, the Apriori algorithm evolves significant reduction of search space and allows rule discovery in computationally feasible time. Confidence, which is used to rank the discovered rules in Apriori, is the main accuracy criteria.¹¹

Related studies

Apriori algorithm which originally proposed for solving the market basket problem^12,13, recently adopted for healthcare services for generating association rules between clinical events and various medications, tests, and many others.¹⁴ In the medical literature, ARM has been widely used. One area of research was identifying risk patterns in medical data. Li et al.¹⁵ discussed the problem of finding risk patterns in medical data where the risk patterns were defined by a statistical metric, relative risk. In that study, the problem of finding risk patterns was characterized as an optimal rule discovery problem. It was believed that the discovered rules were useful for medical research scholars. Li et al.¹⁶ analyzed the problem of efficient discovering of risk patterns in medical data by proposing an algorithm for mining optimal risk pattern sets based on the anti-monotone property and concluded that the proposed algorithm was efficient in risk patterns exploration.

Majority of the research works in this context were based on the hospital data. In their study, Paetz and Brause¹⁷ showed results of a data-driven rule generation with categorical septic shock patient-data by applying an efficient algorithm for frequent patterns generation, and by rating the performance of generated rules based on frequency and confidence measures, they presented the best rules. Brossette et al.¹⁸ analyzed the problem of identifying interesting patterns in hospital infection control and public health surveillance data using the association rules. Ohsaki et al.¹⁹ developed a rule discovery support system to discover interesting rules from the data set on chronic hepatitis diagnosis. Ordonez et al.²⁰ focused on finding association rules on a real data set to predict absence or existence of hearth diseases by introducing the greedy algorithm. Ordonez et al.²¹ discovered association rules in medical data to predict heart disease. Their study introduced an improved algorithm to find constrained association rules and presented an experimental section summarizing several rules which were discovered. Other study by Nahar et al.²² also aimed to detect contributing factors to heart disease by using association rules, and analyzed the information available on sick and healthy males and females. Ordonez²³ presented that the main problem about ARM in a medical data set is the huge size of the mined rule set where majority of them were irrelevant that causes slow search and difficult interpretation by the field expert. Thus, in his study, search constraints were introduced to discover only medically significant association rules in order to make the search more efficient and faster. For the experimental setting, Ordonez²³ used arteries data and found out the association rules for healthy and diseased arteries. In their study, Lee et al.²⁴ proposed ARM method which was able to discover interesting patterns including a medical data in Korean acute myocardial infarction registry where data were collected by 51 participating hospitals. The performances of target pattern were evaluated in terms of statistical measures such as lift, leverage, and conviction. Cheng et al.²⁵ designed and developed an intensive care unit (ICU) clinical decision support system, namely, icuARM, together with ICU clinicians by using ARM. icuARM was implemented with multiple association rules and easy-to-use graphical user interface for care providers to perform real-time data and information mining in the ICU setting. It was discussed that icuARM was able to provide valuable insights for ICU physicians to tailor a treatment based on clinical status of the patient in real time. Other publication by Exarchos et al.²⁶ presented an automated methodology for the detection of ischemic beats in long-duration electrocardiographic recordings. Nahar et al.²⁷ aimed to extract significant prevention factors for specific types of cancer by employing different ARM techniques. Chaves et al.²⁸ proposed a novel voxel selection method based on ARM and tested this method for the early diagnosis of Alzheimer.

Although ARM has not been widely used in ED context, a few research works focused on this particular field. Imberman et al.²⁹ analyzed clinical head trauma data set, in order to find indications for computed tomography, by the association rules based on Boolean analyzer method, and concluded that ARM method had broad applicability in medical domain. Petrus et al.³⁰ compared decision tree and optimal risk pattern mining for the analysis of emergency ultra-short stay unit data, and showed that compared to decision tree method which was inadequate for finding understandable patterns, optimal risk pattern mining was very powerful for medical practitioners. Chan et al.³¹ investigated whether care-seeking patterns involve the use of healthcare services of various types prior to ED visits and examined the associations of these patterns with the severity of the presenting condition for ED visit and subsequent events. Bergmeir et al.³² aimed to design more efficient, effective, and safe medical emergency team service using ARM techniques.

From the methodological perspective, different ARM methods have been employed in the literature. Some of those methods include data cutting and inner product (DCIP) method,^33,34 extended FP-growth methods,³⁵ Boolean analyzer,²⁹ and Tertius.²⁷ However, many of the research works used Apriori to discover association rules in medical data sets,^16,22,27 which makes ARM one of the popular and widely used methods without doubt.

Experimental study

Research architecture

Proposed study was composed of two main process layers—which were data preprocessing and ARM—as indicated in Figure 1. Preprocessing of data was the preparation of data set for the appropriate data representation model for ARM. ARM was the second layer to mine association rules for the given structured data set. By applying Apriori algorithm with customized parameters, positive and negative rule mining were performed for validation.

Figure 1.

Research architecture.

Data preprocessing

EDs have data-rich ecosystem that raw data require preprocessing task to build reasonable ARM models. Raw data of this study were extracted from two distinct databases those were holding the patient arrival transactions and diagnosis test transactions. After elimination of missing values in raw data, first step was performed as preprocessing patient arrival transactions database specifically on the attribute: diagnosis code. Diagnosis was encoded in the form of codes which were listed in International Classification of Diseases (ICD) in the raw data. Due to the wide variety of diagnosis codes, ICD standard for an upper level grouping which resulted in 21 different diagnosis code types was performed. Instead of having the actual diagnosis code in the transactions, diagnosis codes were converted to these 21 groups to represent with nominal attributes for further preprocessing steps. Then, duplicate diagnosis codes were merged to represent the match between patient arrival identifier and corresponding diagnosis code groups.

First step was followed by merging the data sources using the patient arrival identifier, which was the common attribute in both databases that were holding the transactions of patient arrivals and diagnostic tests. Afterward, irrelevant attributes like name, age, and gender—which would not be involved in the mining of association relations—were eliminated from the processed data during attribute selection. Merged data set represented the match between the patient arrivals, corresponding diagnosis group codes, and applied diagnostic tests. Each diagnosis group code and test were the attributes which were represented in binary format for rule mining model. Each transaction was converted to the form of the interpretation, of which diagnosis codes existed and which diagnosis tests were applied for each patient arrival.

Next step was to build a data model for ARM to represent each patient arrival identifier with the mapping of diagnosis codes and applied tests to mine and extract rules from the given data set. Each diagnosis group and diagnosis test were modeled as the attributes of the model. To find the association rules between diagnosis types and LDTs (Table 1), rule mining model held the each patient’s arrival transaction, representing whether diagnosis codes existed or LDTs were required or not. For each transaction, diagnosis groups and LDTs were represented as “yes” and the other attributes were represented as “no” using binary values: true and false. Structured data set that has binary representation of transactional data was finally prepared for ARM task. Sample of the structured data set was illustrated on Table 2.

Table 1.

Diagnosis and laboratory diagnostic test groups.

Diagnosis/laboratory test	Components	Type	Abbreviation
Diagnosis	Certain infectious and parasitic diseases	A00-B99
	Neoplasms	C00-D49
	Diseases of the blood and blood-forming organs and certain disorders involving the immune mechanism	D50-D89
	Endocrine, nutritional and metabolic diseases	E00-E89
	Mental, behavioral and neurodevelopmental disorders	F01-F99
	Diseases of the nervous system	G00-G99
	Diseases of the eye and adnexa	H00-H59
	Diseases of the ear and mastoid process	H60-H95
	Diseases of the circulatory system	I00-I99
	Diseases of the respiratory system	J00-J99
	Diseases of the digestive system	K00-K95
	Diseases of the skin and subcutaneous tissue	L00-L99
	Diseases of the musculoskeletal system and connective tissue	M00-M99
	Diseases of the genitourinary system	N00-N99
	Pregnancy, childbirth, and the puerperium	O00-O9A
	Certain conditions originating in the perinatal period	P00-P96
	Congenital malformations, deformations, and chromosomal abnormalities	Q00-Q99
	Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified	R00-R99
	Injury, poisoning and certain other consequences of external causes	S00-T88
	External causes of morbidity	V00-Y99
	Factors influencing health status and contact with health services	Z00-Z99
Laboratory diagnostic tests (LDT)	Hemogram, prothrombin time, APTT, ABO + Rh test, ABO + Rh test (reverse), cross-match	Hemogram, coagulation, blood type	LDT-type 1
	Alanine aminotransferase (ALT), aspartate aminotransferase (AST), bilirubin, CRP, glucose, arterial blood gases, urea, chlorine, creatinine, potassium, sodium, albumin, procalcitonin, sedimentation, amylase, calcium, eGFR, D-dimer, lipase, alkaline phosphatase, protein, ASO, anti-HBs, anti-HIV, HBsAg, anti-HCV, gamma-glutamyl transferase, creatine kinase, lactate dehydrogenase, anti-HAV, anti-HAV IgM, anti-HBC, ethanol, HBeAg, neonatal bilirubin	Biochemistry	LDT-type 2
	n-terminal probrain natriuretic peptide (Pro BNP), CK-MB, CK-MB (mass), troponin T	Cardiac enzyme	LDT-type 3
	Urine analysis, urine microscopy, parasite analysis in human excrement, human excrement microscopy, occult blood analysis, enteric adenoviruses, rotavirus antigen	Urine and human excrement	LDT-type 4
	TSH, T3, T4, beta hCG	Hormone, pregnancy	Other

APTT: activated partial thromboplastin time; ABO: ABO blood group; Rh: Rhesus; CRP: C-reactive protein; eGFR: estimated glomerular filtration rate; ASO: antistreptolysin O; HAV: hepatitis A virus; HBsAg: surface antigen of hepatitis B virus; HCV: hepatitis C virus; HCV: hepatitis C virus; HBeAg: hepatitis B e antigen; CK-MB: creatine kinase-muscle/brain; TSH: thyroid stimulating hormone; hCG: human chorionic gonadotropin.

Table 2.

Sample of structured data set.

Patient identifier	A00-B99	C00-D49	D50-D89	E00-E89	F01-F99	G00-G99	H00-H59	H60-H95	I00-I99	J00-J99	K00-K95	L00-L99	M00-M99	N00-N99	O00-O9A	P00-P96	Q00-Q99	R00-R99	S00-T88	V00-Y99	Z00-Z99	LDT-type i
P_k	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	yes	no	no	no	no
P_k + 1	no	no	no	no	no	no	no	no	no	no	no	no	yes	no	no	no	no	no	no	no	no	no
P_k + 2	no	no	no	no	no	no	no	no	no	no	yes	no	no	no	no	no	no	no	no	no	no	yes
P_k + 3	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	yes	no
P_k + 4	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	yes	no
P_k + 5	no	no	no	no	no	no	no	no	yes	yes	no	no	no	no	no	no	no	no	no	no	no	yes
P_k + 6	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	yes	no	no	no	no
P_k + 7	no	no	no	no	no	no	no	no	yes	yes	no	no	no	no	no	no	no	yes	no	no	no	yes
P_k + 8	no	no	no	no	no	no	no	no	no	no	no	no	yes	no	no	no	no	no	no	no	no	yes
P_k + 9	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	no	yes	no	no	no	no

LDT: laboratory diagnostic test.

Experimentation

In this study, the experiments were built to focus on medical significance and applicability of predictive rules obtained by the ARM algorithm of Apriori. Context of ARM includes two main different ways of rule mining: positive rule mining considers the item sets only when they are frequent, whereas negative rule mining considers them for infrequent values. Dominance of the “no” values within structured data set caused to mine association rules between positive values and negative values distinctly for this study. Therefore, two types of ARM were performed for each type of LDT. Mined association rules for each type of LDT from both positive and negative rule mining models were extracted for expert team review and validation.

Experiments were performed on Waikato Environment for Knowledge Analysis (WEKA) environment. Apriori parameters were customized to gain insight due to the homogeneity of structured data set which was the outcome of the data preprocessing step. These parameters in WEKA environment and their definitions were represented as follows:

car (car association rule): Boolean data type representation is available for this parameter. Instead of mining all rules, only class association rules are generated if car parameter is set to true.

classIndex: index of the class attribute of the rule to be generated.

numRules: numRules is the parameter of upper bound for the number of mined association rules.

metricType and minMetric: metricType specifies the criteria to sort mined rules. In this study, confidence is used as a performance metric to evaluate the accuracy of a mined rule. For a given simple rule, $X \Rightarrow Y$ , where X and Y are the subsets of structured data set. The ratio of probability that X and Y exist independently to the probability that only X exists is evaluated for confidence. In other words, confidence denotes the percentage of transactions where X exists, which contain also Y

c o n f i d e n c e = \frac{P r (X) \cdot P r (Y)}{P r (X)}

lowerBoundMinSupport (lower bound minimum support): support is the frequency of the given rule within transactions. Threshold parameter that constraints the lower bound for the minimum support value is named as lowerBoundMinSupport.

upperBoundMinSupport (upper bound minimum support): threshold parameter that constraints the upper bound value for minimum support is named as upperBoundMinSupport.

treatZeroAsMissing (treat zero as missing): Boolean data type representation is available for this parameter. Missing values in the data set are counted as zero, which means non-existence or setting the attribute value false if treatZeroAsMissing parameter is true.

For positive rule mining, the rules satisfying car = True, classIndex = 22 (index of LDT column in data set), numRules = 10, lowerBoundMinSupport ⩾ 0.015, metricType = confidence, minMetric ⩾ 0.5, and treatZeroAsMining = True are considered. On account of the negative value dominance in structured data set, in obtaining negative rules, it was required to customize the parameters properly. Corresponding parameters were lower bound minimum support and number of rules. For negative rule mining, the rules satisfying car = True, classIndex = 22, numRules = 6 (see section “Predictive association rules”), lowerBoundMinSupport ⩾ 0.02, metricType = confidence, minMetric ⩾ 0.7, and treatZeroAsMining = True are considered.

Results

Descriptive results

During the study period (January 2017) total number of arrivals to ED of interest was 32,753. In triage process, 15,516 (47.37%) of these arrivals were categorized as urgent or emergent, while the remaining arrivals were not urgent. The frequency distributions based on to “receive or not” any type of LDT for both of the urgent or emergent and not urgent patients are shown in Figure 2.

Figure 2.

Frequency distributions of receiving/not receiving different types of LDT: (a) urgent or emergent and (b) not urgent.

From Figure 2(a), it was observed that for the patients categorized as urgent or emergent, the frequencies for receiving any type of diagnostic test were, respectively, as 5758 (37.11%), 5228 (33.69%), 3103 (20.00%), 1845 (11.89%), and 290 (1.87%) for LDT-type 1, LDT-type 2, LDT-type 3, LDT-type 4, and other. However, for not urgent patients (see Figure 2(b)), the respective frequencies were 995 (5.77%), 969 (5.62%), 101 (0.59%), 796 (4.62%), and 187 (1.08%). Thus, it was concluded that any type of LDT requirement was very low, almost below 5 percent, for the patients categorized as not urgent, while it sharply increased for those categorized as urgent or emergent. Since the frequencies of receiving tests were important especially for extracting positive rules in ARM, it was decided to consider only urgent or emergent cases and discover the meaningful rules between diagnosis and any type of LDTs in this article. Similarly, since the frequencies of receiving any type of hormone or pregnancy test (considered in other category) were very low even for urgent or emergent patients, these test types were also excluded from ARM analysis.

For each diagnosis categories, the frequencies of receiving any type of LDT are additionally represented in Figure 3.

Figure 3.

Frequencies of receiving any type of LDTs based on diagnosis types.

In Figure 3, it was observed that the highest number of LDTs was required by the patients having the diagnosis of “R00-R99,” “J00-J99,” “I00-I99,” “N00-N99,” “M00-M99,” “K00-K95,” and “Z00-Z99.” Thus, it was expected to discover many rules between these diagnosis types and LDTs.

Predictive association rules

Positive rules for four types of LDTs were summarized in Table 3.

Table 3.

Mined positive rules for each type of LDTs.

Type of LDT	Rules	Minimum support	Confidence
LDT-type 1	IF (I00-I99 = yes & J00-J99 = yes), THEN LDT-type 1 = yes	0.02	0.94
	IF (I00-I99 = yes & R00-R99 = yes), THEN LDT-type 1 = yes	0.025	0.9
	IF (E00-E89 = yes), THEN LDT-type 1 = yes	0.025	0.86
	IF (J00-J99 = yes & R00-R99 = yes), THEN LDT-type 1 = yes	0.02	0.82
	IF (I00-I99 = yes), THEN LDT-type 1 = yes	0.015	0.74
	IF (R00-R99 = yes), THEN LDT-type 1 = yes	0.02	0.69
	IF (N00-N99 = yes), THEN LDT-type 1 = yes	0.02	0.58
	IF (K00-K95 = yes), THEN LDT-type 1 = yes	0.025	0.52
LDT-type 2	IF (I00-I99 = yes & J00-J99 = yes), THEN LDT-type 2 = yes	0.015	0.9
	IF (I00-I99 = yes & R00-R99 = yes), THEN LDT-type 2 = yes	0.015	0.86
	IF (E00-E89 = yes), THEN LDT-type 2 = yes	0.02	0.81
	IF (J00-J99 = yes & R00-R99 = yes), THEN LDT-type 2 = yes	0.025	0.77
	IF (I00-I99 = yes), THEN LDT-type 2 = yes	0.025	0.69
	IF (N00-N99 = yes), THEN LDT-type 2 = yes	0.015	0.65
	IF (R00-R99 = yes), THEN LDT-type 2 = yes	0.02	0.63
LDT-type 3	IF (I00-I99 = yes & J00-J99 = yes), THEN LDT-type 3 = yes	0.02	0.89
	IF (I00-I99 = yes & R00-R99 = yes), THEN LDT-type 3 = yes	0.02	0.86
	IF (I00-I99 = yes), THEN LDT-type 3 = yes	0.025	0.66
	IF (J00-J99 = yes & R00-R99 = yes), THEN LDT-type 3 = yes	0.025	0.59
	IF (E00-E89 = yes), THEN LDT-type 3 = yes	0.025	0.57
LDT-type 4	IF (N00-N99 = yes & R00-R99 = yes), THEN LDT-type 4 = yes	0.02	0.57
	IF (N00-N99 = yes), THEN LDT-type 4 = yes	0.015	0.53
	IF (E00-E89 = yes), THEN LDT-type 4 = yes	0.015	0.52

LDT: laboratory diagnostic test.

From the obtained rules presented in Table 3, it was mainly observed that the positive rules for the first three types of LDT (LDT-type 1, LDT-type 2, and LDT-type 3) were generally discovered with the diagnosis types of “I00-I99,” “J00-J99,” “R00-R99,” and “E00-E89,” whereas the positive rules for LDT-type 4 were associated with the diagnosis types of “N00-N99,” “R00-R99,” and “E00-E89.” When the rules of the first two LDTs (LDT-type 1 and LDT-type 2) were comparatively analyzed, it was observed that for the similar diagnosis types, these two tests, namely, hemogram and biochemistry–related tests, were generally ordered together, where the ordering frequencies were higher in the first type, LDT-type 1. However, for the diagnosis type of “N00-N99,” ordering frequency of LDT-type 2 was higher compared to that of LDT-type 1 (see confidences were 0.58 and 0.65).

Since the nature of the data had the dominance of negative values, higher numbers of negative rules were discovered. However, it was decided to present only first six rules for each types of LDTs, in order to be consistent in the presented numbers of positive and negative rules. Negative rules for all types of LDTs are also summarized in Table 4.

Table 4.

Mined negative rules for each type of LDTs.

Type of LDT	Rules	Minimum support	Confidence
LDT-type 1	IF (M00-M99 = yes & S00-T88 = yes), THEN LDT-type 1 = no	0.03	0.84
	IF (S00-T88 = yes), THEN LDT-type 1 = no	0.025	0.81
	IF (M00-M99 = yes), THEN LDT-type 1 = no	0.025	0.8
	IF (Z00-Z99 = yes), THEN LDT-type 1 = no	0.02	0.77
	IF (V00-Y99 = yes), THEN LDT-type 1 = no	0.03	0.77
	IF (J01-J99 = yes), THEN LDT-type 1 = no	0.03	0.74
LDT-type 2	IF (M00-M99 = yes & S00-T88 = yes), THEN LDT-type 2 = no	0.02	0.87
	IF (S00-T88 = yes), THEN LDT-type 2 = no	0.025	0.84
	IF (F01-F99 = yes), THEN LDT-type 2 = no	0.025	0.84
	IF (M00-M99 = yes), THEN LDT-type 2 = no	0.03	0.83
	IF (Z00-Z99 = yes), THEN LDT-type 2 = no	0.02	0.81
	IF (J00-J99 = yes), THEN LDT-type 2 = no	0.02	0.79
LDT-type 3	IF (O00-O9A = yes & Z00-Z99 = yes), THEN LDT-type 3 = no	0.03	0.99
	IF (O00-O9A = yes), THEN LDT-type 3 = no	0.03	0.99
	IF (M00-M99 = yes & S00-T88 = yes), THEN LDT-type 3 = no	0.025	0.98
	IF (S00-T88 = yes), THEN LDT-type 3 = no	0.025	0.93
	IF (V00-Y99 = yes), THEN LDT-type 3 = no	0.025	0.91
	IF (Z00-Z99 = yes), THEN LDT-type 3 = no	0.02	0.91
LDT-type 4	IF (V00-Y99 = yes), THEN LDT-type 4 = no	0.03	0.97
	IF (S00-T88 = yes), THEN LDT-type 4 = no	0.025	0.97
	IF (M00-M99 = yes), THEN LDT-type 4 = no	0.025	0.96
	IF (M00-M99 = yes & S00-T88 = yes), THEN LDT-type 4 = no	0.02	0.95
	IF (J00-J99 = yes), THEN LDT-type 4 = no	0.025	0.93
	IF (Z00-Z99 = yes), THEN LDT-type 4 = no	0.025	0.90

LDT: laboratory diagnostic test.

The main point that could be observed in Table 4 was the higher values of minimum support and confidence in negative rules compared to those of positive rules shown in Table 3. One important result that needs to be highlighted was related to the diagnosis type of “J00-J99.” If this specific type of diagnosis was observed in a patient together with the diagnosis types of either “I00-I99” or “R00-R99,” then it was highly probable for this patient to require both LDT-type 1 and LDT-type 2 (see Table 3). However, when this type of diagnosis was seen alone in a patient, then it was more probable for this patient to not to require any type of these tests. Additional worth-emphasizing result of Table 4 was that, for the same conditions representing negative rules of LDT-type 1 and LDT-type 2, confidences were higher in the latter one meaning that frequencies of not ordering LDT-type 1 were lower compared to those of LDT-type 2. Recall that the opposite of this result was found out in mined positive rules presented in Table 3.

Practical implications

Presenting easily understandable methodologies and tools to guide decision-makers is very valuable in EDs, since making correct decisions in a timely manner is essential for them. In EDs, especially for patients with specific diagnoses types, diagnostic tests are widely used to treat patients correctly. However, redundant use of diagnostic tests causes higher costs and waiting times for each patient and results in inefficient use of resources. Thus, understanding/emphasizing the relation between diagnosis types and requirement of any diagnostic test gains an importance in making decisions in EDs. Thus, in this article, using ARM, one of the widely used data mining techniques, it was aimed to discover meaningful rules between diagnosis types and diagnostic tests for EDs. In other words, it was aimed to highlight which diagnosis types were most probable to require any type of LDTs (positive rules) and which were most probable to not require types of LDTs (negative rules). Highlighting these positive and negative rules has many implications in practice. First, these extracted rules via ARM present a guide for ED practitioners while deciding if a patient with specific diagnosis type, or may be more than one type of diagnosis observed together, really requires any type of LDTs. Besides, since the practitioners are required to make decisions rapidly in EDs, using these rules in practice is expected to improve operational performance in such an overcrowded environment. Gaining time for each treated patient, which decreases the overcrowding in EDs, leads in a higher satisfaction and hospital reputation. Benchmarking these rules may ensure use of resources (laboratories, personnel, equipment, etc.) more efficiently, since only the patients who really require these tests will use them. Finally, combining the medical knowledge inferred from the associated rules with population demographics can guide the planning of operations in EDs such as designing laboratories, planning capacities, and managing stocks, in generating medium-to-long-term plans.

Conclusion

This research has presented the easiness of ARM to extract set of rules between diagnosis types and requirement of different LDTs using real-life data received from a large-scale urban hospital’s ED. As most common ARM algorithm Apriori was used to extract rules, and the performance of this algorithm was tested based on confidence level. Laboratory-related diagnostic tests (LDTs) were grouped into four: hemogram, coagulation, blood type (LDT-type 1); biochemistry (LDT-type 2); cardiac enzyme (LDT-type 3); and urine and human excrement (LDT-type 4). Diagnosis types were defined based on International Statistical Classification of Diseases and Related Health Problems (10th revision; ICD-10) classification system combining diseases in 21 groups. The attributes of this study were defined as nominal values representing the existence/non-existence of any group of 21 diseases and requirement/non-requirement of any type of four LDTs. Both positive, showing which diagnosis group/groups (together) were probable to require any type of LDTs, and negative, representing the diagnosis group/groups (together) those were highly probable to not to require LDTs, rules were discovered.

The findings based on the discovered rules were summarized as follows. While circulatory system (I00-I99); respiratory system (J00-J99); symptoms, signs, and abnormal clinical laboratory findings (R00-R99); and endocrine, nutritional, and metabolic diseases (E00-E89) were highly probable to require first three types of aforementioned LDTs (LDT-type 1, LDT-type 2, and LDT-type 3), diseases of the genitourinary system (N00-N99) were most likely to require LDT-type 4. Besides, for some certain situations (“I00-I99” and “J00-J99” were seen together, “I00-I99” and “R00-R99” were seen together, “E00-E89” was seen, “J00-J99” and “R00-R99” were seen together, “I00-I99” was seen, and “R00-R99” was seen), it was highly probable to require LDT-type 1 and LDT-type 2 together, where confidences were higher for LDT-type 1–related rules. Since negative dominance of values were seen in data set, the confidences and number of generated rules were much higher in extracting negative rules in comparison with positive rules. One emphasizing finding of this study was related to respiratory system–related disease (J00-J99). If this disease was seen alone in a patient, then it was highly probable for this patient to not to require any type of LDTs; however, if this disease was simultaneously seen with any of “I00-I99” or “R00-R99,” then it was likely for this patient to require some types of LDTs.

All these extracted rules were validated by emergency medicine specialist team and found as useful and emphasizing. Although this study was limited with its design which was based on the data of a unique hospital, it was believed that computer-based medical knowledge of this study could be generalized and served as a guideline for ED practitioners to enhance the decision-making and operational planning. As a future research direction, it is suggested to develop and integrate ARM module in a decision support system for specialized research fields in EDs.

Footnotes

Acknowledgements

The authors acknowledge Dr Mustafa Gökalp Ataman for his technical support. They also acknowledge Dr İlker Kızıloğlu for his general support. For providing writing assistance, the authors acknowledge School of Foreign Languages and Academic Writing Center of İzmir University of Economics, İzmir, Turkey.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Ceren Öcal Taşar

References

Bellazzi

Zupan

Predictive data mining in clinical medicine: current issues and guidelines. Int J Med Inform 2008; 77(2): 81–97.

Sarıyer

Ataman

Kızıloğlu

İ.

Factors affecting length of stay in the emergency department: a research from an operational viewpoint. Int J Health Manag. Epub ahead of print 1 January 2018. DOI: 10.1080/20479700.2018.1489992.

Ceglowski

Churilov

Wassertheil

Knowledge discovery through mining emergency department data. In: Proceedings of the 38th IEEE annual Hawaii international conference on system sciences (HICSS), Big Island, HI, 3–6 January 2005, p. 142c. New York: IEEE.

Lin

Wang

Chiang

, et al. Abnormal diagnosis of emergency department triage explored with data mining technology: an emergency department at a medical center in Taiwan taken as an example. Expert Syst Appl 2010; 37(4): 2733–2741.

Resta

Sonnessa

Tànfani

, et al. Unsupervised neural networks for clustering emergent patient flows. Oper Res Health Care 2018; 18: 41–51.

Sarıyer

Cepe

Tasar

. The use of data mining and neural networks for forecasting patient volume in an emergency department. In: Proceedings of the 4th international researchers, statisticians, and young statisticians congress book of abstracts, Izmir, 28–30 April 2019, p. 194.

Gul

Guneri

AF.

Forecasting patient length of stay in an emergency department by artificial neural networks. J Aeronaut Space Tech 2015; 8(2): 43–48.

Golmohammadi

Predicting hospital admissions to reduce emergency department boarding. Int J Prod Econ 2016; 182: 535–544.

Tai

Chen

SCC

, et al. Predicting return visits to the emergency department for pediatric patients: applying supervised learning techniques to the Taiwan national health insurance research database. Comput Method Prog Biomed 2017; 144: 105–112.

10.

Shaukat

Zaheer

Nawaz

Association rule mining: an application perspective. Int J Comput Sci Innov 2015; 1: 29–38.

11.

Agrawal

Imieliński

Swami

Mining association rules between sets of items in large databases. Assoc Comput Machine 1993; 22(2): 207–216.

12.

Agrawal

Srikant

. Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases (VLDB), Santiago, Chile, 12–15 September 1994, pp. 487–499.

13.

Agrawal

Srikant

Mining sequential patterns. In : Proceedings of the 11th IEEE international conference on data engineering (ICDE), Taipei, Taiwan, 6–10 March 1995, pp. 3–14. New York: IEEE.

14.

Kumar

Park

Basole

, et al. Exploring clinical care processes using visual and data analytics: challenges and opportunities. In: Proceedings of the 20th ACM SIGKDD conference on knowledge discovery and data mining workshop on data science for social good, New York, 24–27 August 2014.

15.

AWC

, et al. Mining risk patterns in medical data. In: Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery in data mining, Chicago, IL, 21–24 August 2005, pp. 770–775. New York: ACM.

16.

AWC

Fahey

Efficient discovery of risk patterns in medical data. Artif Intell Med 2009; 45(1): 77–89.

17.

Paetz

Brause

. A frequent patterns tree approach for rule generation with categorical septic shock patient data. In: Proceedings of the international symposium on medical data analysis, Madrid, 8–9 October 2001, pp. 207–213. New York: Springer.

18.

Brossette

Sprague

Hardin

, et al. Association rules and data mining in hospital infection control and public health surveillance. J Am Med Inform Assoc 1998; 5(4): 373–381.

19.

Ohsaki

Sato

Yokoi

, et al. A rule discovery support system for sequential medical data, in the case study of a chronic hepatitis dataset. In: Proceedings of the workshop notes of the international workshop on active mining, at IEEE international conference on data mining, Brussels, 10 December 2002, pp. 121. New York: IEEE.

20.

Ordonez

Ezquerra

Santana

CA.

Constraining and summarizing association rules in medical data. Knowl Inform Syst 2006; 9(3): 1–2.

21.

Ordonez

Omiecinski

De Braal

, et al. Mining constrained association rules to predict heart disease. In: Proceedings of the IEEE international conference on data mining, San Jose, CA, 29 November–2 December 2001, pp. 433–440. New York: IEEE.

22.

Nahar

Imam

Tickle

, et al. Association rule mining to detect factors which contribute to heart disease in males and females. Expert Syst Appl 2013; 40(4): 1086–1093.

23.

Ordonez

. Comparing association rules and decision trees for disease prediction. In: Proceedings of the international workshop on healthcare information and knowledge management, Arlington, VI, 11 November 2006, pp. 17–24. New Work: ACM.

24.

Lee

Ryu

Bashir

, et al. Discovering medical knowledge using association rule mining in young adults with acute myocardial infarction. J Med Syst 2013; 37(2): 9896.

25.

Cheng

Chanani

Venugopalan

, et al. icuARM—an ICU clinical decision support system using association rule mining. IEEE J Trans Eng Health Med 2013; 1(1): 122–131.

26.

Exarchos

Papaloukas

Fotiadis

, et al. An association rule mining-based methodology for automated detection of ischemic ECG beats. IEEE T Biomed Eng 2006; 53(8): 1531–1540.

27.

Nahar

Tickle

KS.

Significant cancer risk factor extraction: an association rule discovery approach. J Med Syst 2011; 35(3): 353–367.

28.

Chaves

Ramírez

Górriz

, et al. Association rule-based feature selection method for Alzheimer’s disease diagnosis. Expert Syst Appl 2012; 39(14): 11766–11774.

29.

Imberman

Domanski

Thompson

HW.

Using dependency/association rules to find indications for computed tomography in a head trauma dataset. Artif Intell Med 2002; 26(1–2): 55–68.

30.

Petrus

Fahey

. Comparing decision tree and optimal risk pattern mining for analysing emergency Ultra Short Stay Unit data. In: Proceedings of the international conference on machine learning and cybernetics, Kunming, China, 12–15 July 2008, pp. 234–239. New York: IEEE.

31.

Chan

Lin

Yang

, et al. Pre-emergency-department care-seeking patterns are associated with the severity of presenting condition for emergency department visit and subsequent adverse events: a timeframe episode analysis. PLoS ONE 2015; 10(6): e0127793.

32.

Bergmeir

Bilgrami

Bain

, et al. Designing a more efficient, effective and safe Medical Emergency Team (MET) service using data analysis. PLoS ONE 2017; 12(12): e0188688.

33.

Huang

YC.

The application of data mining to explore association rules between metabolic syndrome and lifestyles. Health Inf Manag J 2013; 42(3): 29–36.

34.

Huang

YC.

Mining association rules between abnormal health examination results and outpatient medical records. Health Inf Manag J 2013; 42(2): 23–30.

35.

Wang

Chen

Shi

, et al. Comprehensive association rules mining of health examination data with an extended FP-growth method. Mobile Netw Appl 2017; 22(2): 267–274.