Sage Journals: Discover world-class research

Abstract

We validate our previously developed (DOI: 10.1101/089227) clinical prediction rule for diagnosing transient ischemic attack on the basis of presenting clinical symptoms and compare its performance with the ABCD2 score in first-contact patient settings. Two independent and prospectively collected patient validation cohorts were used: (a) referral cohort–prospectively referred emergency department and general practitioner patients (N = 877); and (b) SpecTRA cohort–participants recruited as part of the SpecTRA biomarker project (N = 545). Outcome measure consisted of imaging-confirmed clinical diagnosis of mild stroke/transient ischemic attack. Results showed that our clinical prediction rule demonstrated significantly higher accuracy than the ABCD2 score for both the referral cohort (70.5% vs 59.0%; p < 0.001) and SpecTRA cohort (72.8% vs 68.3%; p = 0.028). We discuss the potential of our clinical prediction rule to replace the use of the ABCD2 score in the triage of transient ischemic attack clinic referrals.

Keywords

acute cerebrovascular syndrome clinical prediction rule minor stroke stroke-mimic transient ischemic attack

Introduction

The initiation of management of acute cerebrovascular syndrome (ACVS)¹—such as transient ischemic attacks (TIA) and mild stroke—necessitates a clinical suspicion of ACVS based solely upon patients presenting clinical symptoms. National guidelines^2,3 for ACVS management typically emphasize clinical symptoms representing motor (unilateral limb weakness) or speech deficits, as these symptoms are the most prevalent and characteristic of ACVS. The ABCD2 score,⁴ for example, is a well-known prognostic risk score designed to predict the risk of recurrent stroke after an ischemic event, primarily on the basis of motor/speech deficits. The score is recommended by several guidelines to risk stratify patients^3,5 and is also used by over one-third of TIA clinics to triage patient referrals.⁶

Many non-cerebrovascular or “mimic” conditions, such as hemiplegic migraine, similarly present with motor/speech deficits. First-contact physicians frequently have difficulty in differentiating these conditions from ACVS. Of referrals to fast-track TIA clinics, approximately 30–60 percent of patients are ultimately diagnosed with a mimic condition.^7–11 This difficulty in diagnosing ACVS on the basis of unassisted clinical judgment over-burdens TIA clinics, increasing patient wait-times for consultation and intervention. Limited medical resources, such as brain imaging (computed tomography angiography (CTA) and magnetic resonance imaging (MRI)), are also inappropriately allocated to low-risk, mimic patients. The ABCD2 score is of limited use in these contexts; being prognostic in nature, the score presupposes the very diagnosis of ACVS under deliberation. Therefore, increasing physicians’ ability to differentiate ACVS from mimic patients using only presenting clinical symptoms would be a first step to addressing these resource issues.

Our research group—Spectrometry in TIA Rapid Assessment (SpecTRA)¹²—has developed a clinical prediction rule (CPR) to differentiate ACVS from mimic condition on the basis of presenting clinical symptoms.¹³ Our goal in the current analysis is to prospectively validate the diagnostic performance of our CPR using two independent, prospectively collected datasets. We also examine the performance of the ABCD2 score to better contextualize the performance of our CPR, as the ABCD2 score is the most frequently recommended and used CPR to guide ACVS management.

Method

Design

To validate our CPR, we utilized prospectively presenting successive TIA clinic referrals (i.e. referral cohort) as well as patients from a multi-site, clinical observational research study (i.e. SpecTRA cohort). Institutional review board approval to use the referral cohort data and to conduct the SpecTRA study was provided by the Island Health Research Ethics Board.

Referral cohort

The Stroke Rapid Assessment Unit (SRAU), Victoria, BC, Canada, is a specialized outpatient stroke unit servicing most of Vancouver Island (population 759,366). The SRAU receives referrals from emergency departments, general practice, and specialists (e.g. ophthalmologists). The referral dataset was chronologically collected after the initial phases of CPR derivation and temporal hold-out validation. In November 2014, the form used by physicians to refer patients to the SRAU was updated. By design, the updated referral form includes all fields required for the evaluation of the CPR. Referral forms were completed by referring physicians and entered verbatim into the SRAU electronic medical record (EMR) by unit staff.

The referral dataset consists of patients who had been referred to the SRAU between November 2014 and September 2015. On initial extraction from the SRAU EMR, the referral dataset contained 1475 patients. Patients’ ABCD2 scores were computed on the basis of the data present on the referral form using the standard formula.⁴ Clinical symptoms not selected on the referral form were treated as absent (i.e. not present). Patients’ final diagnoses were derived using the same procedure used during CPR development (as described elsewhere).¹³

Figure 1 displays the patient flow diagram detailing the treatment of missing data by casewise deletion (N = 598). Table 1 summarizes the demographic characteristics of the evaluable referral dataset (N = 877). Demographic information of the CPR development dataset (N = 4187) is also included for comparison; additional details on the development dataset and methods have been provided elsewhere.¹³

Figure 1.

Participant flow diagram for referral and SpecTRA cohorts.

Table 1.

Demographics.

	Development	Referral	SpecTRA study	Site 1	Site 2	*
N	4187	877	545	270	275
Patient age (years), mean (SD)	69.0 (13.8)	71.3 (13.5)	68.9 (15.2)	72.6 (14.4)	65.2 (15.1)	<0.001
Male, N (%)	2073 (49.5)	419 (47.8)	290 (53.2)	137 (50.7)	153 (55.6)	0.095
Diagnosis of ACVS, N (%)	2701 (64.5)	541 (61.7)	386 (70.8)	194 (71.9)	192 (69.8)	<0.001
CTA completed, N (%)	933 (22.3)	528 (60.2)	440 (80.7)	192 (71.1)	248 (90.2)	<0.001
MRI completed, N (%)	791 (18.9)	181 (20.6)	522 (95.8)	268 (99.3)	254 (92.4)	<0.001
ABCD2, N (%)						<0.001
0	48 (1.1)	14 (1.6)	1 (0.2)	1 (0.4)	0 (0.0)
1	174 (4.2)	52 (5.9)	5 (0.9)	2 (0.7)	3 (1.1)
2	495 (11.8)	157 (17.9)	29 (5.3)	18 (6.7)	11 (4.0)
3	813 (19.4)	187 (21.3)	67 (12.3)	34 (12.6)	33 (12.0)
4	1014 (24.2)	218 (24.9)	122 (22.4)	64 (23.7)	58 (21.1)
5	786 (18.8)	149 (17.0)	133 (24.4)	51 (18.9)	82 (29.8)
6	545 (13.0)	87 (9.9)	166 (30.5)	87 (32.2)	79 (28.7)
7	87 (2.1)	13 (1.5)	22 (4.0)	13 (4.8)	9 (3.3)
Missing	225 (5.4)	0 (0.0)	0 (0.0)	0 (0.0)	0 (0.0)
Systolic BP (mmHg), mean (SD)	140.9 (21.8)	149.6 (24.8)	156.2 (27.2)	158.0 (26.8)	154.4 (27.5)	<0.001
Diastolic BP (mmHg), mean (SD)	77.0 (10.9)	81.0 (13.1)	83.7 (14.1)	83.3 (13.7)	84.1 (14.5)	<0.001
Hypertension, N (%)	2503 (59.8)	544 (62.0)	313 (57.4)	167 (61.9)	146 (53.1)	0.07
Hyperlipidemia, N (%)	1754 (41.9)	352 (40.1)	217 (39.8)	107 (39.6)	110 (40.0)	0.719
Atrial fibrillation, N (%)	503 (12.0)	114 (13.0)	67 (12.3)	38 (14.1)	29 (10.5)	0.690
Diabetes, N (%)	737 (17.6)	161 (18.4)	92 (16.9)	40 (14.8)	52 (18.9)	0.6797
Smoking, N (%)	537 (12.8)	107 (12.2)	54 (9.9)	18 (6.7)	36 (13.1)	0.017

SD: standard deviation; ACVS: acute cerebrovascular syndrome; BP: blood pressure; MRI: magnetic resonance imaging; CTA: computed tomography angiography.

t and chi-square homogeneity test p values.

SpecTRA cohort

The SpecTRA cohort comes from the first phase of a multi-site, observational study that the SpecTRA group is conducting. The aim of the SpecTRA Study is to develop a blood-based, proteomic biomarker panel to differentiate mimic from ACVS patients in the emergency department (ED). Later phases of the SpecTRA study will validate the performance of the developed biomarker panel, both separately and in combination with some or all the clinical variables examined in this study. Patient enrollment for this SpecTRA cohort of participants (N = 560) occurred at two urban medical centers (Sites 1 and 2). Site 1 is an urban hospital with a small centralized stroke program, and Site 2 is an academic hospital with a large, centralized stroke program. Participants were enrolled within 24 h of symptom onset, and case-report forms (CRF) were completed by study nurses at the time of patient enrollment; the CRF included all clinical variables required by the CPR.

The participants for the SpecTRA cohort (N = 560) were recruited between December 2013 and May 2015. Enrolled patients received either MRI or CTA imaging as part of the study protocol. Double data entry was conducted by randomly selecting 10 percent of the patients from each enrolling site. Inter-rater reliability was calculated using Gwet’s AC1,¹⁴ which ranges in value from −1 (perfect disagreement) to 1 (perfect agreement); values can be interpreted using standard benchmarking tables¹⁴ (e.g. > 0.8 = excellent agreement).¹⁵ Gwet’s AC1 was computed for each of the presenting clinical symptoms/variables recorded in the CRF. Mean Gwet’s AC1 was 0.995 (min = 0.97; max = 1.000).

Patients’ final diagnoses were adjudicated by study neurologists and based upon brain imaging results and neurological assessments. Of the initial sample, 10 patients were removed due to protocol violations, such as missing necessary brain imaging. On consideration of the medical and clinical ambiguity regarding Transient Global Amnesia (TGA)¹⁶ and its potential relation to ACVS, an additional five patients were removed from the sample. In total, 15 patients were excluded from the dataset (see Figure 1). Patients’ ABCD2 scores were computed on the basis of the CRF data. Table 1 displays the demographic characteristics of the final evaluable SpecTRA dataset (N = 545).

Descriptive statistics comparing the individual predictors of the CPR between the development dataset and the referral and SpecTRA study datasets can be found in the supplement.¹⁷

Statistical analysis

In the present analysis, we evaluate our CPR which takes the form of a fitted logistic regression model with 50 main effects and 12 interaction terms (coefficients can be found in Table 2 of Bibok et al.¹³). The CPR is applied to both validation sets: referral dataset and SpecTRA study dataset. We also apply our CPR after stratifying the SpecTRA study dataset by study sites (Sites 1 and 2) to examine the impact of site-specific effects, as each site represents a different urban settings with potentially different patient and provider populations. Development of the CPR and associated datasets has been documented elsewhere.¹³

Table 2.

Performance measures (95% confidence interval) of models on development, referral, SpecTRA study (combined), site 1, and site 2 datasets.

	AUC	Sensitivity	Specificity	Accuracy
CPR (development)	0.804 (0.790–0.818)	0.873 (0.859–0.885)	0.556 (0.530–0.582)	0.760 (0.747–0.773)
CPR (referral)	0.731 (0.696–0.765)	0.911 (0.884–0.932)	0.372 (0.322–0.425)	0.705 (0.674–0.734)
CPR (SpecTRA)	0.713 (0.665–0.760)	0.842 (0.802–0.875)	0.453 (0.377–0.530)	0.728 (0.690–0.764)
CPR (site 1)	0.717 (0.645–0.789)	0.892 (0.840–0.928)	0.461 (0.353–0.572)	0.770 (0.717–0.817)
CPR (site 2)	0.707 (0.642–0.772)	0.792 (0.729–0.843)	0.446 (0.344–0.553)	0.687 (0.630–0.739)
ABCD2 (development)	0.655 (0.638–0.672)	0.691 (0.673–0.709)	0.526 (0.500–0.552)	0.633 (0.617–0.647)
ABCD2 (referral)	0.605 (0.568–0.642)	0.599 (0.557–0.639)	0.574 (0.521–0.626)	0.590 (0.557–0.622)
ABCD2 (SpecTRA)	0.647 (0.598–0.697)	0.850 (0.811–0.882)	0.277 (0.213–0.351)	0.683 (0.642–0.720)
ABCD2 (site 1)	0.695 (0.623–0.766)	0.856 (0.799–0.898)	0.355 (0.257–0.467)	0.715 (0.658–0.765)
ABCD2 (site 2)	0.600 (0.531–0.669)	0.844 (0.786–0.888)	0.205 (0.132–0.304)	0.651 (0.593–0.705)

AUC: area under the curve; CPR: clinical prediction rule.

We assess the performance of our CPR using two criteria, the area under the curve (AUC) and the threshold for the linear predictor scores which maximizes diagnostic accuracy.¹⁸ The threshold was previously established¹³ on the development dataset and has a value of ≥0.063. For the ABCD2 score, a threshold of ≥4 was used to assess discriminant performance, as suggested by a number of national stroke guidelines.^3,5 McNemar’s test¹⁹ was used to assess differences in sensitivity, specificity, and accuracy between the classifications made by our CPR and the ABCD2 score within the validation datasets; Fisher’s exact test²⁰ was used to assess differences in these measures between the CPRs performance on the development dataset and the independent validation datasets. DeLong’s test of correlated receiver operating characteristic (ROC) curves²¹ was used to access differences in the area under the ROC curve (AUC) between the two models within the validation datasets. Calibration plots^22,23 were used for visual assessment of the CPR’s calibration against the final clinical diagnoses.

Analyzes were completed using the ROCR (v1.0.7),²⁴ pROC (v1.9.1),²⁵ immer (v0.5.0),²⁶ Hmisc (v4.0.2),²⁷ rms (v5.1.0),²⁸ and ggplot2 (v2.2.1)²⁹ libraries in the R statistical language (v3.3.2).³⁰

Results

Table 2 displays the discriminant performance of our CPR on the four validation datasets, along with the performance of the ABCD2 score.

Figure 2 displays ROC plots for the CPR performance on the validation datasets. With the exception of Site 1, the ROC curve of the ABCD2 score was completely encompassed by that of the CPR. These non-overlapping curves indicate that across the entire range of sensitivities (or specificities) the CPR has better discriminant performance than the ABCD2.

Figure 2.

ROC plots of validation datasets with thresholds: (a) referral, (b) SpecTRA study (combined), (c) site 1, and (d) site 2. CPR: clinical prediction rule.

Figure 3 displays calibration plots for the CPR performance on the validation datasets. The calibration curve for the CPR on the referral dataset ran nearly parallel to the line of equality, with the confidence interval for the curve tracking the line. This suggests that in the context of the referral data, the model is well-specified.

Figure 3.

Calibration plots of validation datasets. (a) referral, (b) SpecTRA study (combined), (c) site 1, and (d) site 2. Solid line: calibration line; dashed line: line of equality; and dotted line: 95% confidence interval. Intercept/slope: intercept and slope of calibration line estimated by logistic regression (intercept = 0 and slope = 1 equals perfect calibration).^22,31 C (ROC): AUC and Tjur’s R² = Yate’s/discrimination slope.³² E (mean): mean absolute difference between calibration curve and line of equality.²⁸ Ticks along x-axis: histograms of predicted probabilities separated by outcome.^22,23

On the referral dataset, the CPR was found to have significantly higher accuracy (70.5% vs 59.0%; p < 0.001) and sensitivity (91.1% vs 59.9%; p < 0.001) than the ABCD2 score. DeLong’s test indicated that the CPR had a significantly higher AUC than the ABCD2 score (73.1% vs 60.5%; p < 0.001). The sensitivity, specificity, and accuracy of the CPR differed significantly between the development and referral dataset (each p < 0.01).

On the SpecTRA study dataset, the CPR had significantly higher accuracy than the ABCD2 score (72.8% vs 68.3%; p = 0.028). Sensitivities were not significantly different between the two models, but the CPR had significantly higher specificity (45.3% vs 27.7%; p < 0.001). The CPR had a significantly higher AUC than the ABCD2 score (71.3% vs 64.7%; p = 0.008). Fisher’s exact tests indicated that neither accuracy nor sensitivity for the CPR was significantly different between the development and SpecTRA study datasets (each p > 0.2).

When the SpecTRA dataset was stratified by site, sensitivity and specificity were not significantly different between the models for Site 1, although accuracy was higher for the CPR, (77.0% vs 71.5%; p = 0.041). The AUCs of the two models were not significantly different. Fisher’s exact tests indicated that neither sensitivity and specificity nor accuracy for the CPR was significantly different between the development and Site 1 study datasets (each p > 0.1).

For Site 2, only specificity was found to be significantly higher for the CPR than the ABCD2 score (46.1% vs 35.5%; p = 0.001). The CPR had a significantly higher AUC than the ABCD2 score (71.7% vs 69.5%; p = 0.002). Fisher’s exact tests indicated that the sensitivity and accuracy of the CPR differed significantly between the development and referral dataset (each p < 0.05).

Discussion

Our aim in this study was to validate the diagnostic performance of our CPR to distinguish mimic and ACVS patients based upon presenting clinical symptoms. To this end, we evaluated the diagnostic performance and calibration measures of our CPR on data collected from referral forms to a fast-track TIA clinic and prospectively collected clinical study data.

Overall, we found that the CPR evidenced greater accuracy than the ABCD2 score on the validation datasets, with the exception of Site 2 in which there was no difference. On the referral dataset, the CPR displayed promising performance. In the context of TIA clinic referrals, the model is likely to lead to an improvement in clinical decision making over the ABCD2 risk score traditionally used in TIA clinics for triaging patient referrals. This conclusion is further supported by the significantly increased accuracy, sensitivity, and AUC of the CPR compared to the ABCD2 score.

On the SpecTRA study dataset, the CPR displayed both sensitivity and accuracy that were consistent with that observed on the development dataset; moreover, for Site 1 dataset, the CPR also demonstrated consistent specificity. Both the CPR and the ABCD2 score were found to have similar sensitivities on the SpecTRA study datasets. Specificities, however, were observed to be markedly higher for the CPR than the ABCD2 score, along with AUCs for the combined and Site 2 datasets. This suggests that the CPR would likely be an improvement over the ABCD2 score for ED staff in differentiating ACVS from mimic patients and may have application in the areas of patient screening and triage for advanced neurological services.

Most encouragingly, the CPR performs consistently across different medical datasets despite the obvious variation in case-mixes between the historical training and validation datasets. Debray et al.³³ have suggested that models that display stable model performance across diverse case-mixes are likely to generalize well to other settings. Future work will need to verify whether the robustness of the CPR carries over into other patient populations with different assessment and triage procedures.

Footnotes

Acknowledgements

Research was conducted within the Department of Research and Capacity Building, Vancouver Island Health Authority (Island Health).

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Funding for this study was provided by non-industry sources: the Heart and Stroke Foundation (grant no. PG-08-0415), Genome British Columbia (grant no. 4125-Penn), and Genome Canada (grant no. 4125-Penn).

References

Albers

. Acute cerebrovascular syndrome: time for new terminology for acute brain ischemia. Nat Clin Pract Cardiovasc Med 2006; 3: 521.

Coutts

Wein

Lindsay

, et al. Canadian stroke best practice recommendations: secondary prevention of stroke guidelines, update 2014. Int J Stroke 2015; 10: 282–291.

National Institute for Health Care Excellence (NICE). Stroke and transient ischaemic attack in over 16s: diagnosis and initial management (NICE guidelines [CG68]). London: National Institute for Health and Care Excellence, https://www.nice.org.uk/guidance/cg68

Johnston

Rothwell

Nguyen-Huynh

, et al. Validation and refinement of scores to predict very early stroke risk after transient ischaemic attack. Lancet 2007; 369: 283–292.

Intercollegiate Stroke Working Party. National clinical guideline for stroke. 4th ed. London: Royal College of Physicians, 2012, https://www.rcplondon.ac.uk/guidelines-policy/stroke-guidelines

Brazzelli

Shuler

Quayyum

, et al. Clinical and imaging services for TIA and minor stroke: results of two surveys of practice across the UK. BMJ Open 2013; 3: e003359.

Ferro

Falcão

Rodrigues

, et al. Diagnosis of transient ischemic attack by the nonneurologist: a validation study. Stroke 1996; 27: 2225–2229.

Prabhakaran

Silver

Warrior

, et al. Misdiagnosis of transient ischemic attacks in the emergency room. Cerebrovasc Dis 2008; 26: 630–635.

Quinn

Cameron

Dawson

, et al. ABCD2 scores and prediction of noncerebrovascular diagnoses in an outpatient population: a case-control study. Stroke 2009; 40: 749–753.

10.

Sheehan

Merwick

Kelly

, et al. Diagnostic usefulness of the ABCD2 score to distinguish transient ischemic attack and minor ischemic stroke from noncerebrovascular events: the North Dublin TIA study. Stroke 2009; 40: 3449–3454.

11.

Fonseca

Canhão

. Diagnostic difficulties in the classification of transient neurological attacks. Eur J Neurol 2010; 18: 644–648.

12.

Spectrometry in TIA and Rapid Assessment, 2016, http://www.viha.ca/rnd/current/spectra.htm

13.

Bibok

Penn

Lesperance

, et al. Development of a multivariate clinical prediction model for the diagnosis of mild stroke/TIA in physician first-contact patient settings. Biorxiv. Epub Ahead of Print 22 November 2016. DOI: 10.1101/089227.

14.

Gwet

. Handbook of inter-rater reliability: the definitive guide to measuring the extent of agreement among multiple raters. 4th ed. Gaithersburg, MD: Advanced Analytics, 2014.

15.

Landis

Koch

. The measurement of observer agreement for categorical data. Biometrics 1977; 33: 159–174.

16.

Sedlaczek

Hirsch

Grips

, et al. Detection of delayed focal MR changes in the lateral hippocampus in transient global amnesia. Neurology 2004; 62: 2165–2170.

17.

Bibok

Penn

Lesperance

, et al. Supplement–validation of a multivariate clinical prediction model for the diagnosis of mild stroke/TIA in physician first-contact patient settings. Zenodo. Epub Ahead of Print 23 February 2017. DOI: 10.5281/zenodo.321903.

18.

Greiner

Pfeiffer

Smith

. Principles and practical application of the receiver-operating characteristic analysis for diagnostic tests. Prev Vet Med 2000; 45: 23–41.

19.

McNemar

. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 1947; 12: 153–157.

20.

Fisher

. On the interpretation of χ² from contingency tables, and the calculation of P. J R Stat Soc 1922; 85: 87–94.

21.

DeLong

Clarke-Pearson

. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988; 44: 837–845.

22.

Steyerberg

Vickers

Cook

, et al. Assessing the performance of prediction models. Epidemiology 2010; 21: 128–138.

23.

Steyerberg

. Clinical prediction models: a practical approach to development, validation, and updating. New York: Springer-Verlag, 2009.

24.

Sing

Sander

Beerenwinkel

, et al. ROCR: visualizing classifier performance in R. Bioinformatics 2005; 21: 3940–3941.

25.

Robin

Turck

Hainard

, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 2011; 12: 77.

26.

Robitzsch

Steinfeld

. immer: Item response models for multiple ratings, 2016, https://cran.r-project.org/package=immer

27.

Harrell

Jr Dupont

. Hmisc: Harrell miscellaneous, 2016, https://cran.r-project.org/package=Hmisc

28.

Harrell

Jr . rms: Regression modeling strategies, 2016, https://cran.r-project.org/package=rms

29.

Wickham

. ggplot2: Elegant graphics for data analysis. New York: Springer-Verlag, 2009.

30.

R Core Team. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing, https://www.R-project.org

31.

Cox

. Two further applications of a model for binary regression. Biometrika 1958; 45: 562–565.

32.

Tjur

. Coefficients of determination in logistic regression models—a new proposal: the coefficient of discrimination. Am Stat 2009; 63: 366–372.

33.

Debray

Vergouwe

Koffijberg

, et al. A new framework to enhance the interpretation of external validation studies of clinical prediction models. J Clin Epidemiol 2015; 68: 279–289.

Validation of a multivariate clinical prediction model for the diagnosis of mild stroke/transient ischemic attack in physician first-contact patient settings