Abstract
The purpose is to systematically review randomised controlled trials (RCTs) to change family physicians’ laboratory test-ordering. We searched 15 electronic databases (no language/date limitations). We identified 29 RCTs (4,111 physicians, 175,563 patients). Six studies specifically focused on reducing unnecessary tests, 23 on increasing screening tests. Using Cochrane methodology 48.5% of studies were low risk-of-bias for randomisation, 7% concealment of randomisation, 17% blinding of participants/personnel, 21% blinding outcome assessors, 27.5% attrition, 93% selective reporting. Only six studies were low risk for both randomisation and attrition. Twelve studies performed a power computation, three an intention-to-treat analysis and 13 statistically controlled clustering. Unweighted averages were computed to compare intervention/control groups for tests assessed by >5 studies. The results were that fourteen studies assessed lipids (average 10% more tests than control), 14 diabetes (average 8% > control), 5 cervical smears, 2 INR, one each thyroid, fecal occult-blood, cotinine, throat-swabs, testing after prescribing, and urine-cultures. Six studies aimed to decrease test groups (average decrease 18%), and two to increase test groups. Intervention strategies: one study used education (no change): two feedback (one 5% increase, one 27% desired decrease); eight education + feedback (average increase in desired direction >control 4.9%), ten system change (average increase 14.9%), one system change + feedback (increases 5-44%), three education + system change (average increase 6%), three education + system change + feedback (average 7.7% increase), one delayed testing. The conclusions are that only six RCTs were assessed at low risk of bias from both randomisation and attrition. Nevertheless, despite methodological shortcomings studies that found large changes (e.g. >20%) probably obtained real change.
Introduction
There is concern in several countries about the increasing numbers of laboratory tests ordered by community family physicians and the wide variation in test ordering by family physicians. The increase in testing can be illustrated for several countries. In 2003, the Australian government’s initiative to improve the quality of care of chronic illnesses by family physicians and general practitioners (GPs; defined as general primary care physicians without specialty training in family medicine) had a marked effect on specific areas of laboratory test ordering. Although the number of family physicians/GPs increased by 10.6% between 2003 and 2007/2008, clinical activity increased by 16.7% and test ordering increased even more. Between 2004 and 2008, 20 patient problems that accounted for <20% of all problems managed by family physicians/GPs were responsible for 73% of growth in pathology testing, preventive health interventions accounted for 32% of this pathology test growth, and management of 3 chronic diseases (diabetes, hypertension, and lipid disorders) accounted for a further 27% of pathology test growth. 1,2
In the United Kingdom, the quality and outcomes framework offered financial rewards to GPs for more intensive monitoring of patients, and its introduction was associated in 2002 to 2005 with a 20% increase in laboratory tests and from 2005 to 2009 a 24.2% increase in tests, mainly due to testing more patients than more tests/patient. The largest increases were in fecal occult blood (121%), C-reactive protein (86%), hematinics (75%), immunoglobulins (73.4%), and serum iron testing (72.2%). 3 A review of the United Kingdom National Health Service estimated that 25% of all pathology tests ordered were unnecessary. 4
In Calgary, Alberta, which has a large integrated laboratory system, the number of laboratory tests increased 6% to 8% annually between 2004 and 2014, whereas the annual population growth was 2.2%. 5 During 2005 to 2011, 125 million tests were processed, with a 24% increase/capita in chemistry tests, 10% increase/capita in microbiology, 7% increase/capita in anatomical pathology, and a 15% decrease/capita in cytopathology. 6 There is also a striking variability in test ordering by family physicians (Table 1). Two examples are Mindemark et al 7 who found test ordering by GPs across 8 counties in Sweden on average varied by a factor of 2.5, and for some tests by a factor of 8, and O’Kane et al 8 across 58 practices in Northern Ireland found that electrolyte tests ordered varied between 158 and 1056/1000 patients.
Examples of Variability in Testing Between Physicians and Between Jurisdictions.
Abbreviations: CRP, C-reactive protein; GP, general practitioner; HbA1C, glycated hemoglobin; PSA, prostate-specific antigen; T4/TSH, thyroxine/thyroid-stimulating hormone.
* The metric of comparison differed widely between studies and could not be brought to a common metric.
With such a rapid increase in laboratory testing volumes, identifying effective strategies to slow the rate down without affecting the quality of patient treatment is important to restrain health costs. Therefore, we wished to perform a systematic review of test ordering behavior by family physicians/GPs. We identified 3 systematic reviews: one of 70 randomized controlled trials (RCTs) of audit and feedback, which found 4 RCTs on test ordering behavior involving family physicians, 12,13 a systematic review of on-screen point-of-care computer reminders, which identified 3 studies of test ordering in primary care, 14 and a systematic review of laboratory test ordering with 109 RCTs and nonrandomized studies, which also identified only 4 RCTs of test ordering practices. 15 Thus, the purpose of this systematic review and meta-analysis is to identify all published RCTs that educated family physicians about test utilization and assess whether studies succeeded, which planned to (a) increase desired testing, (b) decrease undesired testing, and (c) decrease variability among physicians.
Methods
Search Strategy
We searched the following databases using predetermined search strategies discussed between the librarian and the principal and coinvestigators (Figure 1): MEDLINE (1946-February 2015), EMBASE (1980-February 2015), EBM Reviews (1980-February 2015; Cochrane Database of Systematic Reviews, ACP Journal Club, Database of Abstracts of Reviews of Effects, Cochrane Central Register of Controlled Trials, Cochrane Methodology Register, Health Technology Assessment, NHS Economic Evaluation Database), PubMed (1966-February 2015), PubMed Central (1900-February 2015), Scopus (1960-February 2015), Web of Science (1900-February 2015), and CINAHL (1982-February 2015). No limits on publication date were applied; the search included studies in all languages and from all countries. All included studies were entered in the PubMed Single Citation Matcher on October 1, 2015, and all references to these studies followed up to identify any additional relevant studies.

Literature search strategy.
Searching Other Resources
Reference lists of the included studies were searched to identify additional potentially relevant studies. Studies in systematic reviews of health maintenance and screening interventions; physician education, on-screen, telephone, and paper reminders; audit and feedback; computerized clinical decision support systems; and pathology test use were searched for relevant RCTs. We identified 23 reviews of related areas and searched their reference lists. Experts in the field (ie, laboratory directors and managers) were consulted to identify additional unpublished studies or studies in press.
Inclusion Criteria
Inclusion criteria were all RCTs with an intervention to change family physicians’ test ordering behavior.
Exclusion Criteria
Exclusion criteria were studies that on review of the abstract met the inclusion criteria, but on reading the full text were not RCTs or in which the outcomes of family physicians were not separable from those of other physicians. We wished to identify a “pure intervention cohort” of family physicians so that later systematic reviewers could compare outcomes for other professional groups such as diabetologists or nurse practitioners.
Study Assessment and Data Entry
All titles and abstracts were independently assessed by 2 authors for inclusion, and data were independently entered.
Classification of Interventions
Kobewka et al 15 in 2014 performed a systematic review of the effect of education, audit, and feedback on physicians’ laboratory test ordering but only identified 4 RCTs about family physicians’ test ordering, and nearly all of the RCTs they found were of hospital-based test ordering. To enable comparison to the study by Kobewka et al, 15 we adopted their classification of interventions: educational (teaching appropriate test ordering guidelines), audit and feedback (physicians were presented with their test utilization results compared to a previous period or to peers), system-based interventions (order form modifications, computer clinical decision support systems), and incentives.
Data Extraction and Risk of Bias Assessment
Data were independently extracted by 2 reviewers and discrepancies solved by discussion or referral to a third reviewer. Risk of bias was assessed using the methodology of the Cochrane Handbook. 16,17
Data Analysis
Because there was marked heterogeneity in populations, practice settings, comparators, numbers and types of tests assessed, and outcome measures, a meta-analysis was performed only within groups of similar tests (eg, cholesterol). Studies reported either percentage change or total change in test numbers or both, and we modified the approach by Kobewka et al 15 and for a simple meta-analysis appropriate to the data computed (tests ordered at follow-up) minus (tests ordered at baseline) for each of the intervention group minus the comparator group.
Results
Search
The searches excluding duplicates identified 9282 titles and abstracts, of which 238 were read in full text and 29 RCTs were included in this review (Figure 2).

Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow sheet of assessment of studies. Interventions to educate family physicians to change test ordering: systematic review.
Description of Studies
The intervention in 10 studies was to reduce unnecessary testing 18 –30,32,33 and in the other 19 studies was to increase the numbers of tests to improve screening. There were 7 studies from each of the United States and the Netherlands, 5 from the United Kingdom, 3 from Canada, 2 each from Australia, Norway, and Belgium, and 1 from New Zealand. The studies that reported data included collectively 4111 physicians and 175 563 patients (2 studies did not report the number of physicians, 34,35 and 8 studies did not report the number of patients, 20,22,23,28 –30,32 –34,35 and those numbers were also not available in related papers). There were 20 studies 36 –55 in which outcomes for family physicians were not separable from those of other professional groups, and these were excluded from this review.
Risk of Bias
The Cochrane Collaboration is the most authoritative method of analyzing risk of bias in RCTs. The Cochrane Handbook 17 asks authors of systematic reviews to independently search the text of each RCT and copy verbatim how the author describes the methods used in order to provide a transparent and reproducible method of recording risk of bias. Assessment of the risks of bias in an RCT is the key information in deciding whether the results of the trial are trustworthy and can be acted upon. The Handbook 17 assesses the risk of bias as low, unclear, or high for 6 key aspects of study execution (randomization, concealment of randomization from the researchers, blinding of participants and personnel, blinding of outcome assessors, attrition, and selective reporting of results). If authors provide no information for the above risk of bias categories that could place their study at either low or high risk of bias, the risk of that item is assessed as “unclear” (Handbook). 17 For example, for randomization, the unclear designation most frequently occurs when the authors only say that physicians or patients were “randomly assigned” without stating a strong randomization method as defined by the Handbook. 17 The unclear category thus includes studies with data that are unclear because the authors did not perform the maneuver to reduce the risk of bias or did not report it or both.
Results of the Risk of Bias Assessments
In Table 2, we present an overview of the risks of bias for the 6 items of research design, a sensitivity analysis identifying 6 RCTs at lowest risk of bias, and whether studies performed a power computation, an intention-to-treat analysis, and corrected for clustering in cluster randomized trial (C-RCTs; Figure 3).
Overview of the risk of bias assessments: Only 48.5% of studies were at low risk of bias from randomization (they used a strong method of randomization such as by computer), 7% from concealment of allocation from the researchers, 17% from blinding of participants and personnel, 21% from blinding of outcome assessors, 27.5% from attrition, but 93% did not selectively report results (and only 7% selectively reported results).
Sensitivity analysis identifying 6 RCTs at lowest risk of bias: The key aspect of study design and execution are studies with both a strong method of randomization and minimal attrition. We identified 6 studies in which we can have confidence in their results: Baker et al 26 (no change); Buntinx et al 56,57 (no change possible as 99% of Pap smears were satisfactory); Holbrook et al 58 (18% improvement); Kenealy et al 59 (8.2%-16.3% change); McClellan et al 60 (0.1%-3.8% change), and van Wyk et al 68 (1.4 fewer tests/form). The lack of clarity about whether a strong method of randomization was used, the lack of clarity about attrition, and the amount of attrition in the other 23 RCTs are major causes of weakness of this entire research enterprise. No study performed a differential attrition analysis (proving that those dropping out of the intervention and control groups were similar and thus unlikely to affect the results).
Identification of studies that performed a power computation, intention-to-treat analysis, and corrected for clustering in C-RCTs: Only 12 studies 21,24,25,28 –30,32, 56 –60,62 –67 made a power computation for needed sample size, 3 studies 27,58,59 made an intention-to-treat analysis, and only 13 studies used statistical techniques such as generalized estimating equations or multilevel analysis to estimate the effects of clustering on outcomes 21,24,25,28 –30,33,58,60,62,64,65 –67,69,70,71,72,73 (Table 2). The failure to correct the analyses in the other studies means that the conclusions need to be treated with considerable caution.
Results of the Risk of Bias Assessments of 29 Included RCTs.
Abbreviations: C-RCT, cluster randomized trial; RCT, randomized trial. Key data bolded.

Risk of bias graph for 29 included studies.
Analysis of the Results
We analyzed studies according to 2 criteria of interest: (1) by the tests for which the researchers wished to optimize test ordering (Figure 4, Table 3) and (2) by the 4 intervention strategies used (audit and feedback, system change [computerized reminders, computerized decision support systems, other reminders to physicians or patients], and practice system changes; Figure 5, Table 4).

Unweighted average desired changes in behavior for various tests and groups of tests. For interventions designed to increased test orders, an increase is considered a positive change. For interventions designed to decrease test orders, a decrease is considered a desired change. See Table 2 for an explanation of individual studies and associated statistical significance of individual studies.
Interventions to Change Family Physicians’ Test Ordering for Specific Tests.
Abbreviations: ACE, angiotensin-converting enzyme; ALT, alanine transaminase; ARB, angiotensin receptor blocker; AST, aspartate transaminase; CBC, complete blood count; CEA, carcinoembryonic antigen; CI, confidence interval; COPD, chronic obstructive pulmonary disease; ESR, erythrocyte sedimentation rate; FSH, follicle-stimulating hormone; GP, general practitioner; HbA1C, glycated hemoglobin; HDL, high-density lipoprotein; INR, international normalized ratio; LDL, low-density lipoprotein; NS, not significant; RR, relative risk; TG, triglyceride; TSH, thyroid-stimulating hormone; UTI, urinary tract infection.

Unweighted averages of desired changes using different intervention strategies. For interventions designed to increased test orders, an increase is considered a positive change. For interventions designed to decrease test orders, a decrease is considered a desired change. See Table 3 for an explanation of individual studies and associated statistical significance of individual studies.
Effects on Test Ordering by Types of Intervention.
Abbreviations: ACE, angiotensin-converting enzyme; ARB, angiotensin receptor blocker; CBC, complete blood count; CEA, carcinoembryonic antigen; CI, confidence interval; FSH, follicle-stimulating hormone; GP, general practitioner; HbA1C, glycated hemoglobin; INR, international normalized ratio; LDL, low-density lipoprotein; NS, not significant; RR, relative risk; TSH, thyroid-stimulating hormone; UTI, urinary tract infection.
Results analyzed according to tests of interest: We present a graphic overview (Figure 4), followed by further details about the studies in Table 3 (studies are listed beginning with the most frequent tests and then for each test in increasing order of magnitude of the intervention effect). Studies with interventions to increase testing for single illness included 14 for lipids, 14 for diabetes, 5 for cervical smears, 2 for international normalized ratio (INR), and 1 each for thyroid tests, fecal occult blood tests, serum cotinine (to detect smoking), throat swabs, testing after prescribing medications, and urine cultures. Six studies used interventions to decrease groups of tests and 2 to increase groups of tests (numbers of interventions add up to more than the total of 29 studies as some studies attempted to change more than 1 test). Unweighted averages for intervention effects are provided only for tests with >5 studies. Lipids: Fourteen studies to increase lipid testing: (1) 5 resulted in slightly more testing in the control group, (2) 2 showed no difference between the intervention and control group, and (3) the others ranged from 5% to 44% more testing in the intervention group. Overall, the intervention group averaged 10.2% more tests ordered than the control group. Diabetes tests: Fourteen studies to increase testing: (1) 2 resulted in slightly more testing in the control group and (2) the others ranged from 2% to 41% more testing in the intervention group. Overall, the intervention group averaged 8% more tests ordered than the control group. Six studies to reduce use of groups of tests: (1) 1 found no decrease in the intervention group and (2) the others ranged from reductions of 5% to 17% of tests. In the unique study of patients with fatigue by Koch et al,
18,19
which compared immediate to delayed testing, family physicians permitted to test immediately ordered tests on 146 (92.4%) of 158 patients and those asked to delay a month ordered tests immediately on only 27 (19.5%) of 138 patients, a 72.9% reduction in the immediate testing. The entire set of tests established diagnoses in only 11 patients, and few patients in the delay group reconsulted the GP within 4 weeks. An expanded fatigue-specific set of 13 tests resulted in more false positives than a limited set of 4 tests. Overall, the average was 18% fewer tests in the intervention compared to the control group.
Results analyzed according to the type of intervention: The data are presented in a graphic overview (Figure 5), with more detail about the studies in Table 4 (studies are listed by the type of intervention and then for each intervention in increasing order of magnitude of the intervention effect). Unweighted averages for intervention effects are provided only for groups with >5 studies. Education: 1 RCT: There was a small increase (1.4%) in the control group.
70
Feedback: 3 RCTs: O’Connor et al
71
found mostly small increases in testing in the control group, Kiefe et al
72
found a 5% increase in testing in the intervention group, and Winkens et al
20
found a net desired 27% decrease in the number of Pap tests for the intervention group compared to the control group. Education and Feedback: 7 RCTs: (1) 2 found no changes: Baker et al
26
found no changes for any test (lipid, thyroid, and urine tests), Buntinx et al
56,57
found <1% of Paps were judged unsatisfactory so there was no room for improvement; (ii) 3 studies found changes <5%: Lafata et al
76
found no increase in follow-up testing after prescribing digoxin, a 3.3% increase in testing after prescribing angiotensin-converting enzyme/angiotensin receptor blockers, and a 4.9% increase after prescribing a diuretic, and Borgiel et al
63
found that the intervention arm that received continuing medical education and visits from a mentor over 3 years increased the number of Pap smears by 5.3% and decreased cholesterol tests by 1% compared to the less intensive physician assessment report intervention arm; (iii) 3 studies found changes >8%: Bunting and Van Walraven
27
in a unique study of 200 family physicians who ordered the most tests in a region found that the intervention produced change in the desired direction with the intervention group ordering 7.9% fewer tests/visit than the control group. Verstappen et al
28
–30
found a desired 12% reduction in testing in a physician group asked to solve problems involving 15 laboratory tests and a 5% reduction in a group with problems involving 10 laboratory tests (cf also Verstappen).
31
Thomas et al
32
found a desired 13% reduction of tests in the enhanced feedback group for 9 tests the laboratory regarded as unnecessary and 11% in the group that received brief educational reminders. Overall, for 11 outcomes, the average increase in test ordering in the intervention group compared to the control group was 4.9% (converting the desired reductions for Bunting and Van Walraven, Verstappen et al, and Thomas et al to positive change). System change: 10 RCTs: System change usually consisted of computer-assisted decision-making. (1) Three found minimal changes.
60,61,62,69
(2) Two studies found change >8%. Frame et al
74
found the intervention group ordered 15% more fecal occult blood tests, 9% more Pap smears, and 8% more cholesterol tests. van Wijk et al
22,23
found that physicians who used a computer system with guidelines ordered a desired 14% fewer INR tests than a computer system without the guidelines. (3) Two studies found change >15%: Kenealy et al
59
found 16.3% more eligible were screened for diabetes with a computer reminder, 8.4% with a patient reminder, and 8.2% with combined reminders compared to usual care. Holbrook et al
58
found that the intervention group increased testing for low-density lipoprotein by 18%, glycated hemoglobin (HbA1C) by 20%, and albuminuria by 28% more than the control group. (4) Three studies found changes 26% to 44%: Sequist et al
73
found a 41% increase in annual cholesterol testing for diabetics but no increase in HbA1C and lipid testing for those with coronary artery disease. van Wyk et al
68
found that 39.5% more patients were screened for dyslipidemia with a computer alert, and 9.5% more with an on-demand computer-assisted decision support system the physician had to decide to use, compared to the control group (although screening increased 25.5% in the control group). Smith et al
35
found that for a group of follow-up tests requested to be obtained within 25 days of an intervention, 26.1% more were obtained using an electronic medical record, 43.9% more with automated voice messages to patients, and 59.6% more with a phone call from pharmacy compared to usual care. The unweighted average increase in testing for 26 outcomes in the intervention group compared to the control group was 14.9% (converting the desired reduction for van Wijk to positive change). System + feedback: 1 RCT: Moher et al
75
found cholesterol screening increased 25% with audit, 35% with a facilitator identifying and recalling patients to clinic to see their GP, and 44% with recall to their nurse. Tobacco screening increased by 5%, 21%, and 24%, respectively (an average over 6 outcomes of 26%). Education + system change: 3 RCTs: Hobbs et al
34
found no changes in lipids, Bindels et al
33
found a 17% desired decrease in 30 tests, and Hetlevik et al
65
–67
found a 3.4% increase in HbA1C and a 15.4% increase in cholesterol tests compared to the control group. The average change for 7 outcomes was 6%. Education + system change + feedback: 3 RCTs: Flottorp et al
24,25
found a 0.4% decrease in throat swabs in the intervention group and 5.1% fewer urine tests in the intervention group compared to the control groups. Bonevski et al
64
found a 12% increase in cholesterol testing in the intervention group compared to the control group. Claes et al
21
found a 14% improvement in the percentage of time INR results were within 0.5 of the target range in the education group, 11% in the feedback group, 8% in the group that used the INR in-office test, and 8% in the group that used computer-assisted decision-making, compared to the control group. All were (P < .0001) better than control, but there were no significant differences between the 4 physician intervention groups. For 7 outcomes, the average improvement in testing was 7.7%.
Discussion and Conclusions
In this review of RCTs to change family physicians’ laboratory test ordering, we found that although some studies achieved no change, the interventions generally produced changes in the desired direction, and some of the changes were very large (20%-40%). How many studies are at low risk of bias and thus we can place confidence in them? The key aspect of study design and execution is studies with both a strong method of randomization and minimal attrition. We identified only 6 such studies in which we can have confidence in their results: Baker et al (no change),
26
Buntinx et al (no change possible as 99% of Pap smears were satisfactory),
56,57
Holbrook et al (18% improvement),
58
Kenealy et al (8.2%-16.3% change),
59
McClellan et al (0.1% and 3.8% change),
60
and van Wyk et al (1.4 fewer tests/form, P = .003).
68
However, some studies without a strong method of randomization and with attrition achieved high change rates (eg, above 20%-40%), and although we should note their methodological problems, the studies clearly achieved worthwhile change. How many studies focused specifically on increasing or decreasing testing rates? Only 6 studies were specifically designed to increase or decrease laboratory testing: Claes et al,
21
van Wijk et al (to reduce INR testing),
22,23
Bunting and Van Walraven (to decrease testing by the 193 physicians who ordered the most laboratory tests during 1 year),
27
Verstappen et al
28,29,30
and Bindels et al (to improve test ordering strategies),
33
and Koch et al and van Bokhoven 2009 (to reduce testing for vague complaints by delaying testing for 1 month).
18,19
These are the studies likely to be of most interest to laboratory directors. Which tests were investigated? In the remaining studies, investigators were strongly focused on improving screening and monitoring chronic disease (14 RCTs testing lipids and 14 testing diabetes), with the next largest number of 6 RCTs aiming to reduce groups of heterogeneous tests and 4 to improve cervical smear testing. Surprisingly, there was only 1 RCT for each of these areas of frequent testing: thyroid, throat swabs, urine, and fecal occult blood (Table 3). Within each of the groups with enough studies to draw conclusions, the range of improvements in testing was very wide. Which interventions were tested? The most frequently tested intervention was system change (10 RCTs, average change 14.9%) and then education + feedback (7 RCTs, average change 4.9%). There were much smaller numbers testing other interventions, with 3 each on feedback, education + system change, and education + system change + feedback and 1 each on education and delayed testing, with the numbers in these latter groups too small to draw conclusions, so we do not know if these latter 4 combinations of interventions are effective in increasing testing. Do we know why the interventions worked or not? Only 3 studies followed up with the physician and staff participants to assess how the RCTs had functioned and detected the sources of problems. Flottorp et al
24,25
conducted telephone interviews with 112 (93%) of the 120 of the practices and discussed reasons for variation between practices. They identified 3 problems: all relevant staff (such as practice assistants) participated in only 67% of the practices for the intervention (however, 89% of all GPs participated); 10% of practices spent no time discussing the guidelines and 52% spent <1 hour; only 38% had started a change process (but most said they needed more time) and 39% said they did not need to change their practice; and 13% had serious internal communication problems. The researchers themselves reported that it was difficult to run the project in 25% of the practices, 20% of the practices reported serious problems with the software installation, and 11% with the use of the software. Decision support software was available in only 2418 (48%) of 5031 sore throat and 703 (28%) of 2522 of urinary tract infection consultations. Hobbs et al
34
encountered many problems with the then available software. The computer program was not loadable onto a central file server in any of the practices so there was only 1 workstation per practice and physicians who wanted to participate had to go to that workstation and enter demographic and clinical data already in their practice computers. The 386 computers were very slow. Three practices were unable to record any data, and the data from another were lost in the post. The software was unable to import and export data successfully from and to the practice medical systems. Buntinx et al
56,57
asked family physicians if the feedback they received about their test ordering was meaningful and desirable. Those who received either a mailed comment or specific advice about their technique rated both types of feedback as 96% meaningful and desirable, whereas monthly overview reports on their tests or comparison to peers were rated lower at 74% to 78% meaningful and desirable. Do we know why there is marked variability in test ordering between family physicians? A review identified 104 articles about factors that affect physicians’ test ordering and found that test ordering was correlated with physician age, gender, specialization, geographic location, practice setting, belief systems, experience, knowledge, fear of malpractice litigation, physician regret about missed diagnoses, financial incentives, awareness of costs, and provision of written feedback.
77
A review of 38 studies of factors that may influence test ordering in patients with undiagnosed complaints in primary or secondary care identified 5 key factors: diagnostic, therapeutic and prognostic, patient-related, doctor-related, and policy- and organization-related factors.
78
None of the studies assessed in this current review explored why there is variability among physicians or intervened to specifically correct it (other than providing interventions to improve test ordering for all physicians). Smellie et al concluded that “The large differences observed in general practice pathology requesting probably result mostly from individual variation in clinical practice.”
10(p312) Variability between family physicians remains a key large unresolved problem. No insight was provided by the 29 studies in this review how to diminish variability between physicians. Did studies build on previous research? Science usually progresses by improving the work of others and testing the next steps. No study explicitly built upon and improved the studies of others or recorded that they had interviewed the research team and health staff and patients who had participated in previous projects to find out the obstacles encountered and how to improve outcomes. There has been much discussion why some research projects in primary care falter, and it has been concluded that they falter if the physicians and staff are not interested, are too busy with patient care, already have a quality improvement project, or they think that a readymade research project is being imposed on them and there are no benefits for them. An alternative approach to improve participation and decrease attrition is to discover the key problems that family physicians in the practices are interested in and motivated to research and build the change projects from the ground up with their continuing involvement and advice rather than imposing a completed research design.
79
The skill is then to execute the project to the highest standards of research with attention to a strong method of randomization, minimizing attrition, and being present to motivate and solve problems as they arise.
Future Research
The interventions used in these studies are appropriate and practical, but the execution of the research projects, data analysis, and presentation of results require major improvement. Skilled trial coordinators and statisticians need to be involved in future trials from their inception. The apparently most effective interventions to increase rational testing need replicating and improving. They need to engage involved medical staff in planning the studies to be of direct interest to them in their practices. Careful attention to adherence to the protocol and manual, minimization of attrition, and ongoing engagement with participants during trials to detect obstacles to participation are essential.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
