Procedures for treating missing data in the statistical analysis of survey data are reviewed. The main topics covered are: (1) how to assess the nature of missing data especially with regard to randomness, (2) a comparison of listwise and pairwise deletion, and (3) methods for using maximum information to estimate (a) parameters or (b) missing values.
Get full access to this article
View all access options for this article.
References
1.
Afifi, A.A. and R.M. Elashoff (1969) "Missing observations in multivariate statistics IV: a note on simple linear regression." Amer. Statistical Association64 (March): 359-365.
2.
--- ( 1969) "Missing observations in multivariate statistics III: large sample analysis of simple linear regression." Amer. Statistical Association64 (March): 337-358.
3.
——— (1967) "Missing observations in multivariate statistics II. Point estimation in simple linear regression." Amer. Statistical Association62 (March): 10-29.
4.
——— ( 1966) "Missing observations in multivariate statistics I. Review of the literature." Amer. Statistical Association61: 595-604.
5.
Anderson, T.W. (1957) "Maximum likelihood estimates for a multivariate normal distribution when some observations are missing." Amer. Statistical Association52: 200-203.
6.
Beale, E.M. and R.J.A. Little (1974) "Missing values in multivariate analysis." J. of the Royal Statistical Society, London37.
7.
Blau, P.M. and O.D. Duncan (1967) The American Occupational StructureNew York: John Wiley.
8.
Bloomfield, P. (1970) "Spectral analysis with randomly missing observations ." Royal Statistical Society, London B, 32: 369-380.
9.
Box, M.J., N.R. Draper, and W.G. Hunter (1970) "Missing values in multi-response non-linear model fitting." Technometrics, 12 (August): 613-320.
10.
Buck, S.F. (1960) "A method of estimation of missing values in multivariate data suitable for use with an electronic computer." Royal Statistical Society, London B, 22: 302-306.
11.
Chan, L.S. and O.J. Dunn (1974) "A note on the asymptotic aspect of the treatment of missing values in discriminant analysis." J. of the Amer. Statistical Association69 (September): 672-673.
12.
Chow, G.C. and An-Loh Lin (1976) "Best linear unbiased estimation of missing observations in an economic time series." J. of the Amer. Statistical Association71 (September): 719-721.
13.
Cochran, W.G. and G.M. Cox (1957) Experimental DesignsNew York : John Wiley.
14.
Cohen, J. and P. Cohen (1975) Applied Multiple Regression/Correlation Analysis . New York: Erlbaum.
15.
Cramer, H. (1946) Mathematical Methods of Statistics. Princeton: Princeton Univ. Press.
16.
Dagenais, M.G. (1974) "Multiple regression analysis with incomplete observations, from a Bayesian viewpoint." Stud. in Bayesian Econometrics and Statistics.
17.
——— (1971) "Utilization of incomplete observations in regression analysis." J. of the Amer. Statistical Association66 (March): 93-98.
18.
Draper, N.R. and D.M. Stoneman (1964) "Estimating missing values in unreplicated two-level factorial and fractional factorial designs." Biometrics20 (September): 443-458.
19.
Glasser, M. (1964) "Linear regression analysis with missing observations among the independent variables." J. of the Amer. Statistical Association59: 834-844.
20.
Goodman, L.A. (1968) "The analysis of cross-classified data: independence, quasi-independence and interactions in contingency tables with or without missing entries." J. of the Amer. Statistical Association , 63 (December): 1091-1131.
21.
Fienberg, S.E. (1970) "Quasi-independence and maximum likelihood estimation in incomplete contingency tables." J. of the Amer. Statistical Association65 (December): 1610-1616.
22.
Haitovsky, Y. (1968) "Missing data in regression analysis." Royal Statistical Society. London, B, 30: 67-82.
23.
Hartley, H.O. and R.R. Hocking (1971) "The analysis of incomplete data," Biometrics, 27 (December): 783-823.
24.
Hartwell, T.D. and D.W. Gaylor (1973) "Estimating variance components for two-way disproportionate data with missing cells by method of unweighted means." J. of the Amer. Statistical Association68 (June): 379-383.
25.
Hertel, B.R. (1976) "Minimizing error variance introduced by missing data routines in survey analysis." Soc. Methods and Research4 (May): 459-474.
26.
Hocking, R.R. and H.H. Oxspring (1971) "Maximum likelihood estimation with incomplete observations in regression analysis." J. of the Amer. Statistical Association66 (March): 65-70.
27.
Hocking, R.R. and W.B. Smith (1968) "Estimation of parameters in the multivariate normal distribution with missing observations." J. of the Amer. Statistical Association63 (March): 159-173.
28.
Jackson, E.C. (1968) "Missing values in linear multiple discriminant analysis." Biometrics24 (December): 835-844.
29.
Johnston, J. (1972) Econometric MethodsNew York : McGraw-Hill.
30.
Kaiser, H.F. and K. Dickman (1962) "Sample and population score matrices and sample correlation matrices from an arbitrary population correlation matrix." Psychometrika27: 179-182.
31.
Kelejian, H.H. (1969) "Missing observations in multivariate regression: efficiency of a first-order method." Amer. Statistical Association641609-1616.
32.
Kim, Jae-On, N.H. Nie, and S. Verba, (1977) "A note on factor analyzing dichotomous variables: the case of political participation." Pol. Methodology (Spring): 39-62.
33.
Lin, Pi-Erh (1973) "Procedures for testing the difference of means with incomplete data." J. of the Amer. Statistical Association68 (September): 699-703.
34.
——— (1971) "Estimation procedures for difference of means with missing data." J. of the Amer. Statistical Association66 (September): 634-636.
35.
——— and L.E. Stivers (1975) "Testing for equality of means with incomplete data on one variable: a Monte Carlo study." J. of the Amer. Soc. Association70 (March): 190-193.
36.
Little, R.J.A. (1976a) "Comments on paper by D. B. Rubin." Biometrika63, 3: 590-591.
37.
——— (1976b) "Inference about means from incomplete multivariate data." Biometrika63: 593-604.
38.
McDonald, L.( 1971) "On the estimation of missing data in the multivariate linear model." Biometrics27 (September): 535-543.
39.
Mehta, J.S. and P.A.V.B. Swamy (1973) "Bayesian analysis of a bivariate normal distribution with incomplete observations." J. of the Amer. Soc. Association68 (December): 922-927.
40.
Morrison, D.F. (1971) "Expectations and variances of maximum likelihood estimates of multivariate normal distribution parameters with missing data ." J. of the Amer. Soc. Association66 (September): 602-604.
41.
Newman, T.G. and P.L. Odell (1971) The Generation of Random Variates. New York: Hafner Publishing.
42.
Nie, N.H., C.H. Hull, J.G. Jenkins, K. Steinbrenner, and D.H. Bent (1975) Statistical Package for the Social Sciences. New York: McGraw-Hill.
43.
Orchard, T. and M.A. Woodbury (1972) "A missing information principle: theory and applications ." Proceedings of the sixth Berkeley Symposium on Mathematical Statistics and Probability, Theory of Statistics, Univ. of California Press.
44.
Press, S.J. and A.J. Scott (1976) "Missing variables in Bayesian regression, II." J. of the Amer. Soc. Association71 (June): 366-369.
45.
--- ( 1974) "Missing variables in Bayesian regression." Studies in Bayesian Econometrics and Statistics. Amsterdam : North Holland: 259-272.
46.
Rubin, D.B. (1976) "Comparing regressions when some predictor values are missing." Technometrics23 (May): 201-205.
47.
——— (1976) "Inference and missing data." Biometrika63, 3: 581-592.
48.
———( 1974) "Characterizing the estimation of parameters in incomplete-data problems." J. of the Amer. Statistical Association69, 346: 467-474.
49.
Timm, N.H. (1970) "The estimation of variance-covariance and correlation matrices from incomplete data." Psychometrika35 (December): 417-437.
50.
U.S. Bureau of the Census (1970) 1970 Census User's Guide. 1: 26-28.
51.
Wilks, S.S. (1932) "Moments and distributions of population parameters from fragmentary samples." Annals of Mathematical Statistics3 (August), 163-195.
52.
Woodbury, M.A. (1971) "Discussion of paper by Hartley and Hocking." Biometrics27 (December): 808-823.