Interval Estimation of Bivariate Correlations With Missing Data on Both Variables: A Bayesian Approach

Abstract

The posterior distribution of the bivariate correlation (ρ_xy ) is analytically derived given a data set consisting N ₁ cases measured on both x and y, N ₂ cases measured only on x, and N ₃ cases measured only ony. The posterior distribution is shown to be a function of the subsample sizes, the sample correlation (r_xy ) computed from the N ₁ complete cases, a set of four statistics which measure the extent to which the missing data are not missing completely at random, and the specified prior distribution for ρ_xy . A sampling study suggests that in small (N = 20) and moderate (N = 50) sized samples, posterior Bayesian interval estimates will dominate maximum likelihood based estimates in terms of coverage probability and expected interval widths when the prior distribution for ρ_xy is simply uniform on (0, 1). The advantage of the Bayesian method when more informative priors based on beta densities are employed is not as consistent.

Keywords

Bayesian statistics correlations missing data

Get full access to this article

View all access options for this article.

References

10.

11.

12.

13.

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

25.

26.