Sage Journals: Discover world-class research

Abstract

Background and objectives

To carry out a cross-sectional survey of the medical literature on laboratory research papers published later than 2012 and available in the common search engines (PubMed, Google Scholar) on the quality of statistical reporting of method comparison studies using Bland–Altman (B-A) analysis.

Methods

Fifty clinical studies were identified which had undertaken method comparison of laboratory analytes using B-A. The reporting of B-A was evaluated using a predesigned checklist with following six items: (1) correct representation of x-axis on B-A plot, (2) representation and correct definition of limits of agreement (LOA), (3) reporting of confidence interval (CI) of LOA, (4) comparison of LOA with a priori defined clinical criteria, (5) evaluation of the pattern of the relationship between difference (y-axis) and average (x-axis) and (6) measures of repeatability.

Results and interpretation

The x-axis and LOA were presented correctly in 94%, comparison with a priori clinical criteria in 74%, CI reporting in 6%, evaluation of pattern in 28% and repeatability assessment in 38% of studies.

Conclusions

There is incomplete reporting of B-A in published clinical studies. Despite its simplicity, B-A appears not to be completely understood by researchers, reviewers and editors of journals. There appear to be differences in the reporting of B-A between laboratory medicine journals and other clinical journals. A uniform reporting of B-A method will enhance the generalizability of results.

Keywords

Bland–Altman B-A method method comparison laboratory research agreement analysis

Introduction

Comparison of different measurement techniques for the same analyte or quantitative clinical variable is commonly undertaken in laboratory and clinical medicine. Frequently, this arises from the need to compare a new measurement method with an older and more established method.

Pearson’s correlation coefficient is used as a surrogate measure for agreement; however, it primarily estimates a linear correlation rather than agreement. Two methods can show excellent correlation despite the presence of significant systematic bias. Correlation fails to provide information on the type of association between the data. Scatter plot of data from two method comparison studies, although different, may still have the same correlation coefficient. The correlation coefficient is sensitive to the range of values and cannot differentiate between systematic or random difference in two measurements.¹ Furthermore, if one method measures consistently lower than the other method for half of the subjects, but higher for the other subjects, the average discrepancy may be zero, despite the discrepancy for individual subjects being higher.¹ Assessment of agreement is therefore essential in the comparison of two methods.

Altman and Bland² in 1983 provided a novel statistical approach (Bland–Altman [B-A]) for quantitative method comparison of continuous variables. The authors stressed the requirement for quantification of bias and assessment of agreement in two respects, i.e. how well the methods agree on average and how well the measurements agree for individual readings. Agreement is not something which is merely present or absent, but something which must be quantified.³ They proposed a simple and visually intuitive plot for method comparison.³ The method of analysis has been extensively used in evaluating the agreement of laboratory analytes and physiological variables

Despite its simplicity and frequent use in clinical laboratory research, B-A is often not properly interpreted or reported in clinical literature. Previous surveys^4–6 of clinical studies using B-A methodology observed that reporting was often not fully in accordance the techniques suggested by Bland and Altman.⁷ We sought to ascertain whether any improvement in reporting of B-A had occurred. The aim of our study was to undertake a cross-sectional survey of the medical literature on the reporting of results of B-A analysis and to compare it with the reporting format suggested by Bland and Altman.⁷

Material and methods

We conducted a Boolean literature search for studies on the agreement of laboratory analytes, in the medical search engines PubMed and Google Scholar. The search was conducted for articles published in the year 2012 and later, using the search terms: ‘Bland–Altman’ AND ‘Laboratory research’ OR ‘Method comparison’. During PubMed search, the additional filters used were medical field, human subjects and English language (to exclude studies in other languages and non-human studies). Only method comparison studies of laboratory analytes were included; review articles were excluded.

B-A analysis

In B-A analysis, a scatter plot is constructed in which the difference between the paired measurements is plotted on y-axis and average of the measures of two methods on x-axis.⁷ The mean difference in values obtained with the two methods is called the bias and is represented by a central horizontal line on the plot. The standard deviation (SD) of differences between paired measurements is then used to construct horizontal lines above and below the central horizontal line to represent 95% limits of agreement (LOA) (mean bias ±1.96 SD) and is called upper and lower LOA. The plot enables the researcher to assess visually the bias, data scatter and the relationship between magnitude of difference and size of measurement. Often a heteroscedastic distribution (i.e. the magnitude of differences increases proportionally to the size of the measurement) can be observed. In the case of heteroscedastic distribution, logarithmic or percentage transformation of data may be required to construct a log difference or relative difference plot.⁷

In contrast to statistical hypothesis testing, B-A analysis estimates bias and LOA.⁷ The estimates have inherent risk of sampling error, and hence the precision calculation (confidence interval [CI]) of bias and LOA are required. Any conclusions on agreement and interchangeability of two methods are then made based upon the width of these LOA in comparison to a priori defined clinical criteria. Data collection in replicates is required under this approach. Replicates are defined as two or more measurements on the same individual by the same method, taken under identical conditions. Repeatability of measurements enables the comparison of the agreement between the two methods with the agreement each method has to itself.

A predesigned checklist was used for detailed evaluation of the publications identified in the literature search. The checklist included details of authors, analyte studied, sample size calculation and the following six items: (1) correct representation of x-axis on B-A plot; (2) representation and correct definition of LOA; (3) reporting of CI of LOA; (4) comparison of LOA with a priori defined clinical criteria; (5) evaluation of the pattern of relationship between difference (y-axis) and average (x-axis) and (6) measures of repeatability. For those studies on which a relationship pattern had been evaluated (item 5) and in which the data showed heteroscedastic scatter, we recorded whether appropriate data transformation (e.g. logarithmic, percentage scale) had been undertaken. Each item (1–6) on the checklist was rated as ‘Yes’ or ‘No’.

Each study was evaluated by two authors (VC, SKK) who had previous experience of publishing laboratory research with use of B-A analysis^8,9 and a consensus answer recorded for each item.

Results

We identified 50 clinical studies (Supplementary Table) which fulfilled the prespecified selection criteria. Of these studies, 32% were published in journals of laboratory medicine with the remainder in clinical medicine journals. Reporting of B-A was generally incomplete with 6% of studies not presenting the x-axis and LOA correctly (Table 1). Reporting of CIs of LOA and evaluation of the pattern of the relationship was seen in 6% and 28% of studies, respectively. Of the 14 studies in which the evaluation of pattern was undertaken, heteroscedasticity was reported by authors in six studies. In comparison with previous surveys of B-A reporting published in 2000,⁵ 2002⁴ and 2006,⁶ there appears to have been some improvement, in particular with respect to the representation and correct definition of LOA and comparison of LOA with predefined clinical criteria. Journals of laboratory medicine reported repeatability (item 6) more frequently than other clinical journals (62.5% versus 23.5% respectively, P = 0.011), although comparison with a priori clinical criteria (item 4) was reported less frequently (50% versus 80.5%, P = 0.014). For items 1, 2, 3 and 5, there was no significant difference between journals of laboratory medicine and other clinical journals.

Table 1.

Quality of reporting of Bland–Altman analysis and comparison with previous studies.

	Authors	Current study	Mantha et al.⁵ (2000)	Dewitt et al.⁴ (2002)	Berthelsen and Nilsson⁶ (2006)
	Total number of studies included	50	42	96	50
1.	Correct representation of x-axis on B-A plot	47 (94%)	36 (94.7%)	85 (86%)	NP*
2.	Representation and correct definition of LOA	47 (94%)	NP*	67 (68%)	NP*
3.	Reporting of CIs of LOA	3 (6%)	1 (2.4%)	NP*	7 (14%)
4.	Comparison of LOA with a priori defined clinical criteria	37 (74%)	3 (7.1%)	2 (2%)	2 (4%)
5.	Evaluation of pattern of relationship between difference (y-axis) and average (x-axis)	14 (28%)	4 (9.5%)	NP*	NP*
6.	Measures of repeatability	19 (38%)	9 (21.4%)	NP	11 (22%)
	Reporting of heteroscedasticity among studies with evaluation of pattern	6 /14	NP*	23 (24%)	NP*
	Use of logarithmic difference plot	1/6	NP*	3 (3%)	NP*
	Use of percentage difference plot	5/6	NP*	20 (21%)	NP*

Reported in studies: number (%); NP*: Findings not presented by authors.

CI: confidence interval; LOA: limits of agreement; B-A: Bland–Altman.

Discussion

The original article by Altman and Bland³ which proposed this method of agreement analysis has received more than 28,000 citations in the biomedical literature and has increased in usage in recent years. Previous investigations of the reporting of B-A have demonstrated deficiencies.^4–6

Bland and Altman suggested the plotting of the average of two methods on x-axis and differences of measures on y-axis. Plotting of difference against any individual method may falsely show either significant positive or negative correlation between the two, even when there is no true relationship. The same is not the case when the average of two methods is plotted as x-axis.¹⁰ In all of the publications evaluated here, the B-A plot was presented correctly which represents an improvement from a previous study which found that the plot was not provided in 22% of cases.⁶ This improvement might reflect the wider availability of various computer software packages such Analyse-it, Graphpad Prism, EP-evaluator, which automatically select x-axis as average of two methods. Appropriate representation of x-axis was found in 94% of studies which is similar to that reported by Dewitte et al. (87%)⁴ and Mantha et al. (94%).⁵

LOA were correctly represented and defined in 94% of studies. The remainder had either wrongly drawn LOA or defined good agreement between methods only on the basis that 95% of differences were lying within upper and lower LOA, which is not appropriate. The CI limits of LOA were reported in only 6% of studies. The LOA are estimates, and reporting LOA without CI is equivalent to reporting a sample mean without its CI. The CI limits represent the range within which a single, new observation would lie if taken from the same population.¹¹ Although reporting of CI of LOA has been strongly recommended by Bland and Altman⁷ and Hamilton and Stamey,¹² our findings that this is frequently not done confirm those of a previous study.⁵

To assess agreement between two methods, instead of LOA per se, the width of LOA needs to be compared to a priori defined clinical limit criteria. These acceptable clinical limits of laboratory analytes have been provided by Ricos et al.¹³ and Westgard QC.¹⁴ Alternatively, if specifications are lacking, a Delphi survey (a multistage process of group facilitation designed to transform expert opinion into a group consensus) can be undertaken to determine acceptable limits.¹⁵ Previous surveys have shown that comparison with predefined clinical criteria was missing in >90% of publications but was found in 74% of publications included in the present study which represents a major improvement.^4,5

The B-A plot is also a graphical check on the LOA and pattern of scatter of the data.¹⁶ Only 28% of studies evaluated the pattern of scatter of data. Drawing difference plots with parallel LOA in heteroscedastic scatter datasets may give LOA which are wider in the lower concentration range and narrower in the higher concentration range, thus affecting the validity of interpretation. Logarithmic transformation of heteroscedastic data was proposed by Bland and Altman.⁷ For meaningful understanding of LOA, back-transformation (antilog) of the log transformed data was suggested. Alternatively, the ratios of two methods or percent difference between methods can be plotted against average of two methods for simpler interpretation.⁷ Transformation of data usually renders the scatter of differences as homoscedastic. Twomey¹⁷ recommended the drawing up of funnel-shaped or V-shaped LOA instead of classical parallel LOA in data sets with heteroscedastic scatter. Another option is breaking the data into smaller subsets and then analysing these subsets with absolute difference plot to make conclusions.¹⁷

Repeatability of data in the form of paired measurement is an essential requirement in B-A analysis. If one or both methods have poor repeatability, the agreement between the two methods is bound to be poor also. Repeatability of data measurements was described in 38% of publications which does not represent a substantial improvement from previous studies (Table 1). The width of LOA varies with the precision of methods. LOA are wider when methods are imprecise and vice versa. So the conclusions drawn from studies without repeatability assessment are likely to be uncertain.⁶

Sample size calculations were performed in only 15 of publications reviewed here. Bland and Altman⁷ had proposed a formula (1.71 SD/√n) for calculation of standard error (SE) of the 95% LOA, where SD is the standard deviation of the differences between measurements by the two methods and n is the sample size.⁷ If n increases, SE decreases and the CI of LOA are narrower; however, if the sample size is insufficient, then CI of LOA are wider. Sample size therefore affects the CI of LOA, and a small sample size makes comparison with a priori defined clinical limit criteria uncertain due to wider LOA.¹⁸

In summary, we found that there is still incomplete reporting of B-A analysis in the biomedical literature. A weakness of our paper is that the search was limited to only two databases, and therefore it is possible that other relevant studies might have been missed. Despite its simplicity, B-A analysis still appears not to be completely understood by researchers, reviewers and editors of journals. It is of interest that there appear to be differences in the reporting of B-A analysis between laboratory medicine journals and other clinical journals. More uniform reporting of B-A method will enhance the generalizability and facilitate the inclusion of studies in systematic reviews.

Footnotes

Acknowledgements

None.

Declaration of conflicting interests

None declared.

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Ethical approval

Not applicable. The study is a cross-sectional survey of medical literature and did not involve human or animal subjects; hence, ethical approval was not sought.

Guarantor

VC.

Contributorship

VC and SKK collected data and drafted the manuscript; RB critically reviewed the manuscript.

References

Van Stralen

Jager

Zoccali

. Agreement between methods. Kidney Int 2008; 74: 1116–1120.

Altman

Bland

. Measurement in medicine: the analysis of method comparison studies. Statistician 1983; 32: 307–317.

Altman

Bland

. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986; 1: 307–310.

Dewitte

Fierens

Stockly

. Application of the Bland–Altman plot for interpretation of method-comparison studies: a critical investigation of its practice. Clin Chem 2002; 48: 799–801.

Mantha

Roizen

Fleisher

. Comparing methods of clinical measurement. Reporting standards for Bland and Altman analysis. Anesth Analg 2000; 90: 593–602.

Berthelsen

Nilsson

. Researcher bias and generalization of results in bias and limits of agreement analyses: a commentary based on the review of 50 Acta Anaesthesiologica Scandinavica papers using the Altman-Bland approach. Acta Anaesthesiol Scand 2006; 50: 1111–1113.

Bland

Altman

. Measuring agreement in method comparison studies. Stat Methods Med Res 1999; 8: 135–160.

Chhapola

Kumar

Goyal

. Is liquid heparin comparable to dry balanced heparin for blood gas sampling in intensive care unit? Indian J Crit Care Med 2014; 18: 13–18.

Chhapola

Kanwal

Sharma

. A comparative study on reliability of point of care sodium and potassium estimation in a pediatric intensive care unit. Indian J Pediatr 2013; 80: 731–735.

10.

Bland

Altman

. Comparing methods of measurement: why plotting difference against standard method is misleading. Lancet 1995; 346: 1085–1087.

11.

Ludbrook

. Confidence in Altman-Bland plots: a critical review of the method of differences. Clin Exp Pharmacol Physiol 2010; 37: 143–149.

12.

Hamilton

Stamey

. Using Bland Altman to assess agreement between two medical devices – don’t forget the confidence intervals!. J Clin Monit Comput 2007; 21: 331–333.

13.

Ricos

Alvarez

Cava

. Current databases on biological variation: pros, cons and progress. Scand J Clin Lab Invest 1999; 59: 491–500.

14.

Westgard QC. Optimal specifications for total error, imprecision, and bias, derived from intra- and inter-individual biologic variation, http://westgard.com/clia-quality/page-3.html (accessed 22 February 2014).

15.

Hasson

Keeney

Mckenna

. Research guidelines for the Delphi survey technique. J Adv Nurs 2000; 32: 1008–1015.

16.

Bland

Altman

. Applying the right statistics: analyses of measurement studies. Ultrasound Obstet Gynecol 2003; 22: 85–93.

17.

Twomey

. How to use difference plots in quantitative method comparison studies. Ann Clin Biochem 2006; 43: 124–129.

18.

Stockl

Cabaleiro

Van Uytfanghe

. Interpreting method comparison studies by use of the Bland–Altman plot: reflecting the importance of sample size by incorporating confidence limits and predefined error limits in the graphic. Clin Chem 2004; 40: 2216–2218.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.15 MB

Reporting standards for Bland–Altman agreement analysis in laboratory research: a cross-sectional survey of current practice

Abstract

Background and objectives

Methods

Results and interpretation

Conclusions

Keywords

Introduction

Material and methods

B-A analysis

Results

Discussion

Footnotes

Acknowledgements

Declaration of conflicting interests

Funding

Ethical approval

Guarantor

Contributorship

References

Supplementary Material