Sage Journals: Discover world-class research

Abstract

French

Background: Adherence of diagnostic accuracy imaging research to the STARD 2015 reporting guideline was assessed at baseline in 2016; on average, only 55% of 30 items were reported. Several knowledge translation strategies have since been implemented by the STARD group. Purpose: The purpose of this study was to evaluate the adherence of diagnostic accuracy studies recently published in imaging journals to STARD 2015, to assess for changes in the level of adherence relative to the baseline study. Methods: We performed an electronic search on MEDLINE for diagnostic accuracy studies, published between May and June of 2024, from a select group of imaging journals. The timespan was modulated to achieve a sample size of 100 to 150 included studies. Overall and item-specific adherence to STARD 2015 was evaluated, in addition to associations with journal of publication, imaging modality, study design, country of corresponding author, imaging subspecialty area, journal impact factor, and journal STARD adoption. Statistical comparison to the baseline study from 2016 was also performed. Poisson Regression and two-tailed student’s tests were used to compare STARD adherence relative to variables included in subgroup analysis. Results: In the 126 included studies, average adherence to STARD 2015 was 61% (18.3/30 items; SD = 3.1), improved compared to the baseline study (55%; 16.6/30 items; SD = 2.2; P < .0001). Studies published in higher impact factor journals reported more items than those in lower impact factor journals (20.6 vs 18.4 items, P-value <.0001). There was no significant association between reporting completeness and journal of publication (P = .7), imaging modality (P = .21), country of corresponding author (P = .46), imaging subspecialty (P = .31), and journal STARD adoption status (P = .55). Conclusion: Recently published diagnostic accuracy studies reported more STARD 2015 items than studies published in 2016, but completeness of reporting is still not optimal.

Keywords

STARD reporting guideline adherence diagnostic accuracy studies imaging knowledge translation DTA CONSORT PRISMA impact factor

Introduction

In biomedical research, including imaging research, published studies are often incompletely reported,^1-5 hindering the reader’s ability to evaluate the strengths and limitations of the study, determine the generalizability of the results, and predict reproducibility.^6-9 Furthermore, insufficient reporting of key study details imposes challenges on researchers conducting systematic reviews, as critical information required for proper analysis and risk of bias assessment may be absent.^10-12 Incomplete reporting also influences patient outcomes. Although clinical decision making is largely directed by guidelines, clinicians will refer to current literature for assistance in specific cases.^13,14

In 2015, The STAndards for Reporting of Diagnostic Accuracy Studies (STARD) group updated its reporting guidelines. STARD 2015 contains a 30-item checklist that applies to all types of diagnostic accuracy research, including diagnostic imaging, microbiology, laboratory tests, physical exams, and screening questionnaires.¹⁵ A study published by Hong et al evaluated the adherence of diagnostic accuracy imaging research published in 2016 to the updated STARD 2015 guideline.¹⁶ Of the 142 included studies, it determined an adherence rate of 55% (with 16.6/30 items being reported on average). In addition, new STARD items (such as sample size calculation and protocol details) were infrequently reported.¹⁶ Hong et al’s objective was to evaluate and establish a baseline level of completeness of reporting to allow for measurement of improvement over time. Since then, several efforts toward promoting knowledge translation have been implemented by the STARD group: an Explanation and Elaboration was published with detailed instructions and examples of how to report each item¹⁷; endorsement and dissemination of the updated guideline was promoted via the EQUATOR network¹⁸; and increasing numbers of journals, such as the Canadian Medical Association Journal (CMAJ), have endorsed STARD 2015 reporting guideline as part of their manuscript submission guidelines for authors.^19,20

The purpose of this study is to evaluate the current adherence of diagnostic accuracy studies recently published in imaging journals to STARD 2015 to assess for potential improvement in adherence.

Methods

This study is an update of the baseline evaluation performed by Hong et al¹⁶ in 2016; methods are based on those established in the “baseline” evaluation,¹⁶ following the guidelines from the Text Recycling Project,²¹ to permit comparison between cohorts. The protocol was registered on the Open Science Frame (OSF): (link).

Literature Search

We performed an electronic literature search on MEDLINE (Appendix A); this was modelled after the search strategy used in the study conducted by Hong et al.¹⁶ The search encompassed diagnostic accuracy studies published in imaging journals with a previously published search filter for diagnostic accuracy studies restricted to 136 imaging journals as defined by Thomson Reuters’ Journal Citation Reports.^22,23 No new journals were added in the literature search. With the exception of the date of the search, the literature search remained identical to the original study to ensure a fair comparison. The timespan of the search was modulated to achieve a minimum sample size of 100 included studies by restricting the scope of studies to those published between May and June of 2024, with the search expanded to gradually earlier dates as required based on sample size requirements. Our sample size calculations were based on a study by Stahl et al.²⁴ With 66 studies, their group was able to find a significant difference with medium effect sizes of 0.24 to 0.54. Based on an estimated medium effect size of 0.5, it was determined that for 80% power and .05 significance, a minimum sample size of 63 articles was required. Moreover, our goal was to achieve a sample size similar to the study conducted by Hong et al,¹⁶ which was 142. We included studies from the most recent study published backward until we had the desired sample size.

Study Selection

The study selection criteria, besides the publication date, remained identical to the original study to maximize comparability and minimize variability. Articles were included if they met the following inclusion criteria: (1) reporting on a study in which the diagnostic accuracy of one or more imaging index tests was evaluated against a reference standard in human subjects, and (2) reporting at least one estimate of diagnostic accuracy (eg, sensitivity, specificity, predictive values, likelihood ratios, and/or area under the ROC curve). We excluded studies on the predictive or prognostic accuracy of tests, reviews, short reports, and articles published in languages other than English.

Two reviewers independently screened the titles and abstracts of the retrieved articles to evaluate their potential relevance. Full texts of all potentially relevant articles were assessed independently and in duplicate by 2 reviewers. Disagreements were discussed and resolved in consensus meetings and, if necessary, by consultation of a third reviewer. A third reviewer was available for resolving any complex conflicts, it was rarely (<2% of entries) necessary as the initial 2 reviewers were typically able to come to an agreement.

Data Extraction

The following baseline characteristics were extracted from each study report: first author, country of corresponding author’s institution, journal, journal impact factor (Thomson Reuters’ 2023 Journal Citation Report), journal STARD adoption status, study design (prospective vs retrospective), imaging modality (eg, MRI, CT, ultrasound, X-ray), and subspecialty area (eg, neuroradiology, chest imaging, cardiovascular). In November of 2024, we evaluated STARD adoption status by looking at each journal’s instructions for authors; a journal was considered a STARD adopter if its instructions for authors explicitly recommended adherence to STARD and/or STARD 2015.

Adherence to STARD 2015

The assessment of adherence was in line with the baseline evaluation by Hong et al. In the case that STARD 2015 items had more than one component (eg, Item 8: where and when potentially eligible participants were identified [setting, location, and dates]), the item was divided into subitems (eg, 8.1 setting, 8.2 location, and 8.3 dates) that were scored independently. In addition, items regarding test methods with potential aspects unique to imaging research were divided into multiple subitems. Each item had a weight of 1 point out of 30. If there were sub-items for any specific item, they were assigned equal weight such that all sub-items added up to 1 point.

Before STARD scoring, all extractors performed pilot scoring on 3 studies. The final list of items scored with associated descriptors/criteria is provided in Appendix B. Two reviewers used the final list of items to independently score STARD adherence for each included study. Each reporting requirement was rated as “Yes,” “No,” or “N/A.” Any disagreements were resolved by consensus. If a consensus was not reached, a third reviewer made the final decision.

Data Analysis

The overall adherence to STARD 2015 was measured using the total number of STARD items reported for each study, which could range from 0 (none of the items sufficiently reported) to 30 (all the items sufficiently reported). Items rated as “N/A” were scored as a “Yes” regarding adherence and therefore received 1 point. Items with multiple subitems were scored with a total of 1 point with equal fractional points awarded for each sub-item. An arbitrary classification about reporting adherence was devised as follows: items reported in <33% of studies were considered “infrequently reported,” those reported in 33% to 66% of studies were considered “moderately reported,” and those reported in >66% of studies were considered “frequently reported.”

Univariable Poisson Regression was used to evaluate differences in the number of reported STARD items related to journal of publication, imaging modality, study design, country of corresponding author, and imaging subspecialty area. A multivariable Poisson Regression was used to evaluate the effect of each variable independently on STARD item adherence. A two-tailed student’s t-test was used to compare STARD adherence between the baseline study and current study to evaluate differences in overall STARD adherence relative to factors including journal impact factor (median split) and STARD adoption status (adopter vs nonadopter). The level of statistical significance was set at P < .05. All statistical analyses were performed using SAS OnDemand for Academics and R (R Core Team (2023)).

Results

Study Selection

One hundred twenty-six studies were included in the final analysis. The selection process and total number of studies screened can be found in Figure 1. The characteristics of the included studies are outlined in Table 1.

Figure 1.

PRISMA flow diagram of included studies.

Table 1.

Characteristics of Included Studies.

Study characteristics	Number
Total included studies	126
Country of corresponding author
China	44
USA	18
Germany	11
Japan	11
Italy	6
Others	36
Published in “STARD adopting journal”
Yes	85
No	41
Journal of publication
European Radiology	31
Academic Radiology	15
European Journal of Nuclear Medicine and Molecular Imaging	9
Journal of Magnetic Resonance Imaging	8
Japanese Journal of Radiology	5
Other	58
Median journal impact factor (range)	3.8 (0.7-7.0)
Imaging modality
MRI	58
CT	37
US	18
PET	6
PET/CT	4
X-Ray	2
WBS	1
Subspecialty area
GI/GU	40
Neuroradiology	23
Chest	22
Cardiovascular	18
Musculoskeletal	16
Other	7
Study design
Retrospective	91
Prospective	30
Not mentioned	5

STARD 2015 Statement Adherence

Of the 126 included studies, overall adherence to STARD 2015 was 61% (18.3/30 items SD = 2.9), higher than that reported in the baseline study by Hong et al at 55% (16.6/30 items SD = 2.2; P < .0001). STARD item adherence ranged from 12.1 to 26.0 items. The itemized STARD adherence rates can be found in Table 2. The complete list of included studies with overall STARD adherence is provided in Appendix C.

Table 2.

Reporting Frequency of STARD 2015 Items.

Note. “*” indicates a new STARD 2015 item. For all included studies, black-shaded items were infrequently reported (<33%) in our study; grey-shaded items were moderately reported (33%-66% of studies), and un-shaded items were frequently reported (>66% of studies).

There were 26 frequently reported items or sub-items (reported by >66% of studies), 18 moderately reported (reported by 33%-66% of studies), and 10 infrequently reported (reported by <33% of studies). In comparison, in the baseline by Hong et al, there were 25 frequently reported items or sub-items, 16 moderately reported, and 13 infrequently reported. A visual representation of adherence rates between our study and Hong’s study can be found in Figures 2 to 4.

Figure 2.

Comparison of frequently reported items between Hong’s study and our study (%). This bar chart shows the percentage of studies reporting specific STARD items that were frequently reported in our study. Percentages are shown on the Y-axis, with each item number listed on the X-axis.

Figure 3.

Comparison of moderately reported items between Hong’s Study and our Study (%). This bar chart shows the percentage of studies reporting specific STARD items that were frequently reported in our study. Percentages are shown on the Y-axis, with each item number listed on the X-axis.

Figure 4.

Comparison of infrequently reported items between Hong’s Study and our Study (%). This bar chart shows the percentage of studies reporting specific STARD items that were infrequently reported in our study. Percentages are shown on the Y-axis, with each item number listed on the X-axis.

Notable adherence improvement was observed for item 19 “Flow of participants” (from 17% to 60%), item 10.2b “Level of training of readers” (from 60% to 80%), item 26.3 “Generalizability” (from 45% to 82%), and item 15 “Handling of indeterminate results” (from 19% to 47%). In contrast, notable decline in adherence was observed for item 12.1.a “Definition of test positivity cut-offs or result categories of the index test reported” (from 86% to 63%).

Subgroup Analysis

A summary of the subgroup analyses can be found in Table 3. Studies published in higher impact factor journals (impact factor >3.8) reported more items than lower impact journals: 20.6 versus 18.4 (P < .0001, Appendix I). There was no difference in STARD reporting between retrospective and prospective study designs; however, studies that did not report their study design had significantly lower STARD adherence (P = .002, Appendix F). There was no correlation between journal STARD adoption status and STARD reporting adherence (P = .55, Appendix J). All other sub-group analyses such as imaging modalities, imaging subspecialty, journal, or country of publication showed no significant difference in STARD reporting (Appendix D, E, G, H).

Table 3.

Summary of Subgroup Analyses Performed.

Subgroup	Summary of finding	P-value (test)
Journal of publication	European Radiology (21.4 items) and Academic Radiology (20.3 items) reported the most items.	P = .70 (Poisson Regression)
Imaging modality	No significant difference in STARD reporting was identified.	P = .21 (Poisson Regression)
Study design	Studies that did not mention their study design had a significantly lower STARD adherence than the studies identified as prospective or retrospective.	P < .05 (Poisson Regression)
Country of corresponding author	No significant difference in STARD reporting was identified.	P = .46 (Poisson Regression)
Imaging subspecialty	No significant difference in STARD reporting was identified.	P = .31 (Poisson Regression)
Journal IF (median split)	Studies published in higher IF journals reported more items than lower IF journals (20.6 vs 18.4 items)	P < .0001 (two tailed t-test)
Journal STARD adoption	No significant difference between “STARD adopter journals” compared to non-adopters (19.5 vs 19.8 items).	P = .55 (two tailed t-test)

Note. The grey shaded cells indicate a statistically significant difference (P < .05). The exact number of items reported for each subgroup analysis can be found in the Appendices D to J.

Discussion

Recently published diagnostic accuracy studies showed a small improvement in STARD 2015 adherence compared to studies published in 2016 (from 55% to 61%). Similar to the baseline study, studies published in journals with a higher impact factor reported more STARD items than journals with a lower impact factor (20.6 vs 18.4). This indicates a possible relationship between journal resources and their ability to ensure complete reporting. The significant improvements in adherence observed in high-impact journals underscore the importance of journal-level policies and editorial practices in promoting the use of standardized reporting guidelines. Nonetheless, contrary to baseline results, we did not find a significant difference in reporting adherence when stratified by STARD adoption status. This finding is surprising and reveals that journals, under the title of STARD adopters, still do not strictly enforce adherence to guidelines. In addition, mentioning the STARD guidelines under “Instruction to Authors” alone is insufficient. Alternatively, it may help to incorporate active implementation during the editorial and peer review process. In our study, 68% of the articles were published in journals that were STARD adopters. In contrast, in the original study,¹⁴ only 23% of the articles were published in journals that were STARD adopters. Furthermore, the original study¹⁴ revealed that studies published in journals that were considered STARD adopters had a higher level of STARD adherence compared to articles published in non-adopters. This variability in journals’ enforcement of STARD 2015 adherence may be partially addressed by possibly encouraging journals to adopt a more stringent, defined threshold for the minimum number of items that must be reported to allow for publication consideration, as currently, authors may be more inclined to view the checklist as a loose recommendation for how to improve the quality of their study. Furthermore, an ongoing study by Speich et al is looking at the impact of short versions of reporting guidelines for peer reviewers to improve reporting, as the complete checklist might be overwhelming for reviewers to use for the evaluation of every study.²⁵ Depending on the findings of this study, it may be beneficial to create a shortened STARD item list as a supplement, which could include a smaller list of the most important and infrequently reported STARD items. The shortened list could serve as a simplified rubric for evaluators to provide feedback to authors. Another study conducted by Blanco et al revealed that involving a reporting guideline expert on the editorial board had a significant impact on the completeness of reporting.²⁶ Such an intervention was performed in the British Medical Jounal (BMJ), where a number of editorial board members have expertise in reporting guidelines, such as Dr. Duan and Dr. Dwan.²⁶ This may be a preferred approach, although more resource-intensive, compared to mentioning reporting guidelines under the “instruction to authors” section on journal websites alone.

Our study had some limitations. Many items in the STARD checklist are open to subjective interpretation. We tried to mitigate this issue by conducting every step in duplicate. In addition, we conducted pilot assessments to ensure reviewers adequately understood the STARD checklist items. After the pilot studies were completed, the reviewers met and discussed their conflicts to ensure a consistent method of evaluation. Our methods aimed to acquire a random sample of journals from a comprehensive list of 136 imaging journals. As a result of this randomization, some high impact factor journals were not captured in this study, such as Radiology and the Canadian Association of Radiologist Journal. Other journals, such as. European Radiology and Academic Radiology could be considered over-represented. However, our sub-analysis reveals that STARD adherence did not differ based on the journal of publication (P = .70, Poisson Regression).

In terms of study strengths, the present study included a variety of journals, modalities, subspecialties, and countries in the analysis to ensure a representative sample of the available literature. In addition, the study methods was followed stringently, to allow for an accurate evaluation for any changes in adherence.

The small change in the completeness of reporting of STARD items over 9 years indicates that the current knowledge translation strategies employed are insufficient, which is similarly seen for other types of studies, such as randomized trials (CONSORT guideline) and systematic reviews (PRISMA guideline), for which evaluations have also shown improvements over time but still substandard reporting for a considerable number of studies/journals. New methods should be evaluated in experimental studies to determine how completeness of reporting might be improved to improve the quality of research published in the domain of diagnostic accuracy studies. In addition, there is a need for more interventional studies on knowledge translation strategies.

Footnotes

Appendices

Appendix J.

Subgroup Analysis of STARD Adoption by Journal.

STARD adoption status	Number of studies (N = 126)	Mean STARD items reported (±SD)
Yes	85	19.5 ± 2.97
No	41	19.8 ± 2.77

Note. Statistical analysis using a student’s t-test (P = .55).

Data Sharing

Relevant data and code will be posted on our research group’s Open Science Framework Page.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Dr. McInnes is supported by the University of Ottawa Department of Radiology Research Stipend Program. The project is also supported by the University of Ottawa Medical Student Summer Research Program.

Registration

ORCID iDs

Mohammed Kashif Al-Ghita

Robert G. Adamo

Sebastian Karpinski

Matthew D. F. McInnes

References

Salameh

McInnes

MDF

Moher

, et al. Completeness of reporting of systematic reviews of diagnostic test accuracy based on the PRISMA-DTA reporting guideline. Clin Chem. 2019;65(2):291-301.

van der Pol

McInnes

Petrcich

Tunis

Hanna

. Is quality and completeness of reporting of systematic reviews and meta-analyses published in high impact radiology journals associated with citation rates? PLoS One. 2015;10(3):e0119892.

Frank

Fabiano

Hallgrimson

, et al. Association of accuracy, conclusions, and reporting completeness with acceptance by radiology conferences and journals. J Magn Reson Imaging. 2022;56(2):380-390.

Kazi

Frank

Salameh

, et al. Evaluating the impact of peer review on the completeness of reporting in imaging diagnostic test accuracy research. J Magn Reson Imaging. 2022;56(3):680-690.

Dubois

Dawit

Korevaar

, et al. Evaluation of adherence to STARD for abstracts in a diverse sample of diagnostic accuracy abstracts published in 2012 and 2019 reveals suboptimal reporting practices. J Clin Epidemiol. 2024;173:111459.

Korevaar

Cohen

Hooft

Bossuyt

. Literature survey of high-impact journals revealed reporting weaknesses in abstracts of diagnostic accuracy studies. J Clin Epidemiol. 2015;68(6):708-715.

Korevaar

Wang

van Enst

, et al. Reporting diagnostic accuracy studies: some improvements after 10 years of STARD. Radiology. 2015;274(3):781-789.

Korevaar

Cohen

de Ronde

Virgili

Dickersin

Bossuyt

. Reporting weaknesses in conference abstracts of diagnostic accuracy studies in ophthalmology. JAMA Ophthalmol. 2015;133(12):1464-1467.

Dilauro

McInnes

Korevaar

, et al. Is there an association between STARD statement adherence and citation rate? Radiology. 2016;280(1):62-67.

10.

Korevaar

van Enst

Spijker

Bossuyt

Hooft

. Reporting quality of diagnostic accuracy studies: a systematic review and meta-analysis of investigations on adherence to STARD. Evid Based Med. 2014;19(2):47-54.

11.

Smidt

Rutjes

van der Windt

, et al. The quality of diagnostic accuracy studies since the STARD statement: has it improved? Neurology. 2006;67(5):792-797.

12.

Smidt

Rutjes

van der Windt

, et al. Quality of reporting of diagnostic accuracy studies. Radiology. 2005;235(2):347-353.

13.

Tonelli

. Integrating clinical research into clinical decision making. Ann Ist Super Sanita. 2011;47(1):26-30.

14.

Klarenbach

Sims-Jones

Lewin

, et al. Recommendations on screening for breast cancer in women aged 40–74 years who are not at increased risk for breast cancer. CMAJ. 2018;190(49):E1441-E1451.

15.

Bossuyt

Reitsma

Bruns

, et al. STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. Radiology. 2015;277(3):826-832.

16.

Hong

Korevaar

McGrath

, et al. Reporting of imaging diagnostic accuracy studies with focus on MRI subgroup: adherence to STARD 2015. J Magn Reson Imaging. 2018;47(2):523-544.

17.

Cohen

Korevaar

Altman

, et al. STARD 2015 guidelines for reporting diagnostic accuracy studies: explanation and elaboration. BMJ Open. 2016;6(11):e012799.

18.

EQUATOR Network. Reporting guidelines. Accessed May 2024. https://www.equator-network.org/reporting-guidelines/stard/

19.

Radiology. Submission guideline. 2024. Accessed May 2024. https://pubs.rsna.org/page/radiology/author-instructions

20.

(CMAJ) CMAJ. Submission guideline. Accessed May 2024. https://www.cmaj.ca/submission-guidelines

21.

Hall

Moskovitz

Pemberton

. Text recycling TRRP best practices for researchers. 2021. Accessed May 2024. https://textrecycling.org/

22.

Devillé

Bezemer

Bouter

. Publications on diagnostic test evaluation in family medicine journals: an optimal search strategy. J Clin Epidemiol. 2000;53(1):65-69.

23.

Reuters

. Science citation index expanded - radiology, nuclear medicine & medical imaging. 2024. Accessed May 2024. https://libguides.lib.uct.ac.za/tracking_your_academic_footprint/journal-citation-reports

24.

Stahl

Tietz

Dewey

Kendziora

. Has the quality of reporting improved since it became mandatory to use the Standards for Reporting Diagnostic Accuracy? Insights Imaging. 2023;14(1):85.

25.

Speich

Schroter

Briel

, et al. Impact of a short version of the CONSORT checklist for peer reviewers to improve the reporting of randomised controlled trials published in biomedical journals: study protocol for a randomised controlled trial. BMJ Open. 2020;10(3):e035114.

26.

Blanco

Schroter

Aldcroft

, et al. Effect of an editorial intervention to improve the completeness of reporting of randomised trials: a randomised controlled trial. BMJ Open. 2020;10(5):e036799.

Evaluation of Imaging Research Adherence to the STARD 2015 Reporting Guideline: Update 9 Years After Implementation and Baseline Assessment

Abstract

Keywords

Introduction

Methods

Literature Search

Study Selection

Data Extraction

Adherence to STARD 2015

Data Analysis

Results

Study Selection

STARD 2015 Statement Adherence

Subgroup Analysis

Discussion

Footnotes

Appendices

Data Sharing

Declaration of Conflicting Interests

Funding

Registration

ORCID iDs

References