Abstract
Background:
Acute appendicitis is one of the challenging surgical conditions presented in the emergency departments. Clinical scoring systems were developed to reduce the negative appendectomy rate and also to avoid unnecessary diagnostic evaluation.
Objectives:
The primary aim was to compare the clinical adequacy of the Alvarado, Acute Inflammatory Response, and the Raja Isteri Pengiran Anak Saleha Appendicitis scores in patients with right lower quadrant pain for the diagnosis of acute appendicitis.
Methods:
This was a prospective and observational study. All patients over the age of 18 years who presented with a complaint of right lower quadrant pain were enrolled. The Alvarado, Acute Inflammatory Response, and Raja Isteri Pengiran Anak Saleha Appendicitis scoring systems were compared. The patients were either admitted or followed-up as out-patient. Face-to-face or telephone follow-up visits were arranged for the patients who did not have surgery and who were not admitted.
Results:
232 patients were included and 14 patients were excluded from the study. Of the 218 patients, 114 patients underwent surgery. Of the 114 patients, 107 patients were pathologically diagnosed with acute appendicitis. It was determined that Raja Isteri Pengiran Anak Saleha Appendicitis score was the most valuable score with 0.88 accuracy, followed by Acute Inflammatory Response (area under the curve = 0.79) and Alvarado (area under the curve = 0.71) scores.
Conclusion:
The accuracy of Raja Isteri Pengiran Anak Saleha Appendicitis scoring system was higher for the diagnosis of acute appendicitis than the other scores. The cut-off of the Raja Isteri Pengiran Anak Saleha Appendicitis score from a 7.5-point threshold provides a practical, non-invasive, rapid diagnostic method that increases acute appendicitis discriminative power in patients presenting with right lower quadrant pain.
Keywords
Introduction
Acute appendicitis (AA), the inflammation of appendix tissue, is one of the most presenting causes of abdominal pain in emergency departments (EDs). The lifetime incidence is 7% in the general population. 1 The diagnosis of AA remains a difficulty because of the atypical presentation, especially in young adults, elders, and females. It is easy to misdiagnose gynecological and urogenital conditions as AA because of the similar presentation symptoms and findings. 2
Scoring systems were developed to choose the appropriate clinical and surgical treatment, and to reduce the negative appendectomy rates.3–5 The Alvarado scoring system, a composite of symptoms, clinical, and laboratory findings, has been the most popular.6,7 It has been shown that, especially in Eastern populations, the Alvarado score has a lower diagnostic accuracy.8,9 The Acute Inflammatory Response (AIR) score is developed from the same clinical criteria of the Alvarado score with the addition of C-reactive protein (CRP) and the complete blood count tests, to improve the discriminating power.10,11 More recently, a new scoring system, called the Raja Isteri Pengiran Anak Saleha Appendicitis (RIPASA) score was developed. This score includes 14 clinical parameters and has a higher sensitivity, specificity, and diagnostic accuracy than the Alvarado scoring system, especially in Asian populations who have a completely different ethnic origin and diet.12,13 The objective of this study was to evaluate the Alvarado, the AIR, and the RIPASA scoring systems and to compare their performance to predict the risk for AA.
Methods
Study design
This prospective observational diagnostic accuracy study with cohort design was conducted at Fatih Sultan Mehmet Education and Research Hospital, Istanbul, Turkey, with an annual patient load of 220,000. Our prospective cohort consisted of 232 consecutive patients with right lower quadrant pain who presented to the ED between 1 January and 1 July 2016. Ethical permission was obtained prior to the beginning of this study.
Selection of participants
All consecutive adult patients (aged over 18 years), who were admitted to the ED with right lower quadrant pain, were eligible for the study. Patients who met any of the following criteria were included in the study: (1) admitted during the shifts of the researchers and (2) without the history of appendectomy. Patients (1) who were diagnosed with pregnancy during the ED work-up and (2) who were lost to follow-up were excluded from the final analysis.
Index tests
The Alvarado score contains 8, AIR score contains 7, and RIPASA score contains 16 variables. The scores for each of the variables are shown in Supplementary Tables E1 to E3. The optimal cut-off thresholds were accepted as ⩾5 for a high probability of AA in the Alvarado scoring system, ⩾7.5 for the RIPASA scoring system, and ⩾5 for the AIR scoring system.13,14 Scoring charts were filled by the researchers at the time of presentation. All the data necessary for the calculation of scores were collected by the researchers observationally from the patient charts or interviews.
Reference standard
The reference standard of this study was as a composite outcome and defined as the diagnosis of AA by surgery and histopathological reports, or during the follow-up. The surgeon’s clinical judgment, based on all the findings of clinical, laboratory, and radiological investigation, was the essential decision way for appendectomy or conservative treatment and follow-up. RIPASA, Alvarado, and AIR scores were only calculated for the study purpose. At the end of the study, an independent panel of experts blinded to the calculated scores of the patients made definitive AA diagnosis by analyzing the radiological imaging reports, surgery reports, histopathological results, patient charts, outpatient clinic follow-up form, and the results of the follow-up by phone.
Data collection and study protocol
The baseline characteristics of all patients including age, sex, vital findings, medical history, clinical symptoms, physical examination findings, and radiological imaging findings were recorded for all right lower quadrant pain patients presenting to the ED. The primary survey of all patients was managed by the emergency physicians (EPs). Blood biochemistry, urine analysis, and, if necessary, radiological tests were obtained as a part of the routine workup. When an eligible patient was presented according to the criteria mentioned above, a researcher on duty was alerted for the evaluation of the patient for study inclusion. All patients were managed without any intervention. The researchers never intervened into the decision of primary or consulting physicians.
Follow-up
After the evaluation process, patients were followed-up in three different ways, solely based on the test results and gestalt of the primary or consulting physician: (1) very low probability for AA (other non-surgical diagnoses were more probable, medical treatment was introduced, patients were scheduled for an outpatient follow-up the day after and in 2 weeks), (2) low probability for AA (the diagnosis of AA could not be excluded, but the risk was low, and the patient was discharged with medical treatment by the consulting surgeon with an outpatient follow-up the day after and in 2 weeks), (3) moderate-to-high probability for AA (those patients were admitted to a hospital ward for further evaluation or surgery). Patients who had definitive diagnosis of AA, but scheduled for conservative treatment, were called for follow-up in 2 weeks, and patients who did not come were reached by telephone. Patients, who were reached by telephone, were asked whether they presented to another health institution with the same complaint in the following 2 weeks or underwent surgery in another hospital. Those who did not come to follow-up and those who cannot be reached by telephone were excluded from the study.
Outcome
Patients were categorized into two groups: AA group (Group AA) which included the patients who underwent surgery and the diagnosis of AA was confirmed with histopathological reports. Non-appendicitis group (Group N-A) which included the patients to whom surgical intervention was performed and had negative appendectomy, who had pathologies other than AA, who were followed-up conservatively.
Statistical analysis
The analyses were conducted with MedCalc Statistical Software version 18.6 (MedCalc Software bvba, Ostend, Belgium; http://www.medcalc.org; 2018). Continuous variables were reported with mean values, standard deviations, and 95% confidence intervals (CIs) or medians and interquartile ranges according to the distribution pattern of the variable assessed by the Shapiro–Wilks test. Mean values were compared among independent groups with Student’s t-test or analysis of variance (ANOVA) for normal distributions, and medians with Mann–Whitney U test or Kruskal–Wallis test for non-normal distributions. Dependent groups were compared using paired sample t-test, or Wilcoxon sign test for normal and non-normal distributions, respectively. Proportions were compared among discrete groups by chi-square test and Fisher’s exact test with Continuity (Yates) Correction. Receiver operating characteristic (ROC) curves were used to identify the optimal cut-off points. Contingency tables were used to calculate sensitivity, specificity, and the diagnostic accuracy values of the scoring systems.
The sample size calculation was based on a significance level of 0.05. A sample of 144 patients to achieve 80% power was calculated. It was hypothesized that 20% were lost to follow-up, and the estimated sample size was calculated as 175. For the patients who were lost to follow-up, a sensitivity analysis was performed. In this study, the accepted type 1 error was 5%.
Results
A total of 232 patients with right lower quadrant pain were included, 14 patients were excluded; therefore, 218 patients completed the follow-up of the study. Of 218 patients, 114 patients were hospitalized and underwent to surgery since they were presumed to have moderate-to-high pretest probability for AA. Demographics, symptoms, physical examination findings, duration of symptoms, and laboratory findings are summarized in Table 1, and the final diagnosis of the patients in the study is shown in Table 2. A diagnostic radiological test—either computed tomography (CT) or ultrasound—was performed on all of the patients. Histopathologically, 107 of 114 patients were diagnosed with confirmed AA, and the rest were diagnosed as (n = 7) negative appendectomies (lymphoid hyperplasia in 4 patients and fibrous obliteration in 3 patients). The negative appendectomy rate was 7%. The flow diagram of the patients is presented in Figure 1.
The demographics and clinical variables.
Group AA: acute appendicitis group; Group N-A: non-appendicitis group; IQR: interquartile range; CRP: C-reactive protein; AA: acute appendicitis; MPV: mean platelet volume; PMN: polymorphonuclear neutrophils; WBC: white blood cell, N&V: nausea and vomiting, RLQ: right lower quadrant.
Mann–Whitney U test.
Chi-square test.
Final ED diagnosis of the patients participated to the study.
ED: emergency department; NSAP: non-specific abdominal pain; PID: pelvic inflammatory disease.

The flowchart of patients.
The areas under the curve (AUCs) of the scores were compared, and it was found that RIPASA score was the most valuable score with an accuracy of 0.88, followed by AIR (AUC = 0.79) and Alvarado scores (AUC = 0.71) (Table 3). The comparison of the diagnostic accuracies of the RIPASA, AIR, and Alvarado scores was also statistically significant and shown in Figure 2 and Table 4.
Cut-off values for the maximum sensitivity, specificity, +LR, and –LR of the scoring systems.
AIR: Acute Inflammatory Response; RIPASA: Raja Isteri Pengiran Anak Saleha Appendicitis; 95% CI: 95% confidence interval.

The ROCs of the AIR, Alvarado, and RIPASA scores.
The diagnostic accuracy of the scores and the comparison of the diagnostic accuracy with each other.
AUC: area under the curve; 95% CI: 95% confidence interval; AIR: Acute Inflammatory Response; RIPASA: Raja Isteri Pengiran Anak Saleha Appendicitis.
Based on the optimal cut-off values with the highest discriminative power, sensitivity and specificity of AIR score for ⩾5 points was 94.39% and 26.13%; the sensitivity and specificity of Alvarado score for ⩾5 points was 72.90% and 54.05%; and the sensitivity and specific- ity of RIPASA score for ⩾7.5 points was 91.59% and 65.77%, respectively (Table 5). If a cut-off threshold score of ⩾7 was taken as the highest discriminative power for the RIPASA, the sensitivity of the scoring system to rule-out AA would be 99%. However, the specificity would decrease to 50%.
Distribution of acute appendicitis scores according to the last diagnosis.
AIR: Acute Inflammatory Response; RIPASA: Raja Isteri Pengiran Anak Saleha Appendicitis.
Percentages are calculated from column total.
According to histopathological results of the appendectomy patients, negative appendectomy rates for RIPASA, AIR, and Alvarado were 14.28%, 14.29%, and 71.43%, respectively. In 85.7% (n = 6) of the negative appendectomy patients, a score of ⩾5 for AIR, a score of ⩾7.5 for RIPASA, and a score of ⩾5 for Alvarado score were reported.
The National Registration Identity Card (NRIC) is an additional parameter of the RIPASA scoring system. NRIC is used only in Singapore. In previous studies, Shuaib et al. 1 and Malik et al. 15 evaluated the utility of RIPASA score in different populations without using this parameter and also reported a good sensitivity, specificity, and diagnostic accuracy. We also performed a sensitivity analysis for the two foreign patients and found that, even if we excluded this parameter, there would be no significant changes in the results.
Sensitivity analysis
To evaluate the effect of the four patients lost to follow-up, a sensitivity analysis was performed. Even it was assumed that all four patients lost-to-follow-up had very low pretest probability despite being AA, the AUCs, sensitivity, and specificity values of the scoring systems changed slightly without any statistically significant difference (p > 0.05).
Discussion
The diagnosis of AA can be challenging for physicians. A delay in appendectomy may increase the risk of perforation. 16 The improvement of the diagnostic accuracy and reducing the negative appendectomy rates can be achieved with the help of diagnostic tools (such as clinical signs, symptoms, or laboratory findings). Although radiological imaging procedures (ultrasound or CT) increase the accuracy of the diagnosis of AA, the interpreter difference, high costs, and unavailability are unfortunately still the most important disadvantages. 16 Several scoring systems have been developed to increase the sensitivity and specificity of the diagnosis of AA. 1 The important thing in risk scoring systems is to minimize the unnecessary tests and operations, and protect the patients from unnecessary interventional procedures. The most famous scoring systems the Alvarado and the modified Alvarado have good sensitivity and specificity when performed in Western populations, but they have less sensitivity and specificity in Asian population. The AIR score, which has a better diagnostic accuracy than the Alvarado score, is another well-known scoring system used for AA during last decade. Because of the low sensitivity and specificity rates of these scoring systems for Asian populations, the RIPASA score, with a better sensitivity and specificity, was developed.1,11
The diagnostic accuracy of the RIPASA, AIR, and Alvarado scoring systems are compared. The RIPASA score with a diagnostic accuracy of 0.88 was remarkably better than the AIR (AUC = 0.79) or the Alvarado (AUC = 0.71) score in diagnosing AA. The sensitivity of the AIR score (94.39%) was higher than those of the RIPASA (91.59%) and the Alvarado (72.90%). Besides, the specificity of the RIPASA (65.77%) score was better than the Alvarado (54.05%) and the AIR (26.13%) score. In this study, a high probability cutoff score of ⩾7 for the RIPASA showed a better sensitivity (99%).
In our study, the Alvarado cut-off point was set at ⩾5 and yielded a sensitivity of 72.90%, a specificity of 54.05%, and the diagnostic accuracy of 0.71. In the study of Shuaib et al., 1 the cut-off threshold was set at 7 and a sensitivity of 82.8% and a specificity of 56% were found for the modified Alvarado score. Memon et al. 17 reported a sensitivity of 93.5% and a specificity of 80.6% for the Alvarado score at a cut-off threshold of 6. Also, Gwynn 18 investigated the diagnostic utility of the Alvarado score and found a sensitivity of 91.6% and a specificity of 84.7%. It was assumed that the reason of the large difference in the sensitivity rates between the study of Gwynn and this study was the method used for the AA diagnosis. In the previous study, radiological imaging findings were used for the AA diagnosis, not histopathological results. CT is a valuable tool in the evaluation of AA. Based on published reports the sensitivity and the specificity of CT for the diagnosis of AA were 94.1% and 96.4%, respectively. In our study, 4 of 107 AA patients had negative CT findings, and 6 of 7 negative appendectomy patients had positive CT findings. In the systematic review by Ohle et al., 19 a sensitivity of 99% and a specificity of 43% were reported for Alvarado score at a cut-off point of 5. Also, there are still some other factors that may have influenced the presence of lower utility measures within our study. In our study, all patients were actively followed-up. However, as mentioned in the systematic review, a bunch of studies had no follow-up at all. The lack of an active follow-up may have led the patients to be misclassified and may have inflated the estimated sensitivity and specificity. Also, the review has not searched the gray literature, this also may lead to inflated sensitivity and specificity values. 19 Finally, ethnicity may also play an important role in this difference. Since scoring systems were created in Western populations, the sensitivity and specificity levels may be found to be lower when applied to Asian, Oriental, or Middle-Eastern populations.
In our study, the AIR score showed a higher sensitivity but a lower specificity than the other scoring systems. A score of ⩾ 5 showed 94% sensitivity for the patients with mild to high probability of AA. Andersson and Andersson 10 demonstrated a sensitivity of 96% and a specificity of 73% for an AIR score of ⩾5. In our study, the diagnostic power of the AIR score (AUC = 0.79) was significantly better than the Alvarado score (AUC = 0.71) (p < 0.05). De Castro et al. 14 compared the diagnostic utility of the AIR score and the Alvarado score in their study and showed similar results with our study.
In this study, in a Middle-East population, the RIPASA score ⩾7.5 showed 91.59% sensitivity and 65.77% specificity. The RIPASA scoring system had a significantly higher AUC value of 0.88 than the other scoring systems. In the study by Chong et al., 13 in an Asian population, patients with a RIPASA score higher than 7 were classified as high risk, and the sensitivity and the specificity values were found to be 88% and 67%, respectively. Nanjundaiah et al. 20 compared the diagnostic power of the RIPASA and the Alvarado scores, the sensitivity and the specificity of the scoring systems were 96.2% and 58.9% and 90.5% and 85.7%, respectively. In the study of Shuaib et al., 1 a sensitivity of 94.5% and a specificity of 88% were reported.
Limitations
The major limitation of our study was the one-centered design. In our study, the establishment of the diagnosis of AA was based on the clinical judgment of the surgical specialist, combined with the laboratory and/or radiological imaging findings. Physical examination findings, which constituted the parameters of the scoring system, were subjective variables and could be interpreted differently according to the experience of the applying physician. The laboratory findings that constituted the scoring systems (e.g. WBC, CRP, and BUN) could influence the outcome according to the time of presentation. Laboratory findings also, could be easily affected by the accompanying diseases and the time of onset of pain, and thus, could have the potential to affect the accuracy of the calculated results. Moreover, the criteria of foreign patient, which differentiated the RIPASA scoring system from other scoring systems, could vary for every region and ethnicity. Due to the low number of foreign patients, presenting to our hospital, the data obtained were limited. Different results could have been obtained, if the same study was performed at another region.
Conclusion
The diagnostic accuracy of the RIPASA scoring system is better than the Alvarado and the AIR scores. It was concluded that the RIPASA score is a useful, basic, fast, and non-invasive diagnostic tool. We found a sensitivity of 91.59% and a specificity of 65.77% for the RIPASA score at a cut-off threshold score of 7.5. We also demonstrated a greater sensitivity than that was reported in Eastern populations (92% vs 88%), but a similar specificity (66% vs 66%).
Supplemental Material
Supplementary_Tables – Supplemental material for Predictive value of scoring systems for the diagnosis of acute appendicitis in emergency department patients: Is there an accurate one?
Supplemental material, Supplementary_Tables for Predictive value of scoring systems for the diagnosis of acute appendicitis in emergency department patients: Is there an accurate one? by Rohat Ak, Fatih Doğanay, Ebru Unal Akoğlu, Haldun Akoğlu, Aslı Bahar Uçar, Erdem Kurt, Cansu Arslan Turan and Ozge Onur in Hong Kong Journal of Emergency Medicine
Footnotes
Author contributions
R.A., F.D., C.A.T., E.K., and A.B.U. contributed to the literature search; E.U.A., R.A., H.A., and O.O. contributed to the study design; R.A., E.U.A., H.A., A.B.U., and O.O. contributed to the legislative applications; F.D., R.A., A.B.U., E.K., and C.A.T. contributed to the data collection; E.U.A., H.A., and O.O. contributed to the supervision and quality control; E.U.A. and HA contributed to the statistical advice and statistical data analysis; R.A., E.U.A., and H.A. contributed to the data interpretation; R.A., F.D., E.K., E.U.A., and H.A. contributed to the drafting the manuscript. All authors were involved in the writing and critical revision of the manuscript and approved the final version. R.A. and E.U.A. take responsibility for the paper as a whole.
Availability of data and materials
The authors agree to the conditions of publication including the availability of data and materials in our manuscript.
Ethical approval
This study was approved by the local ethics committee.
Human rights
The principles outlined in the Declaration of Helsinki have been followed.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Informed consent
Informed consent was obtained from the participants or their legally authorized representatives.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
