Sage Journals: Discover world-class research

Abstract

Objective: Increasing awareness of how missing data affects the analysis of clinical and public health interventions has led to increasing numbers of missing data procedures. There is little advice regarding which procedures should be selected under different circumstances. This paper compares six popular procedures: listwise deletion, item mean substitution, person mean substitution at two levels, regression imputation and hot deck imputation.

Method: Using a complete dataset, each was examined under a variety of sample sizes and differing levels ofmissing data. The criteria were the true t-values for the entire sample.

Results: The results suggest important differences. Ifmissing data are from a scale where about half the items are present, hot deck imputation or person mean substitution are best. Because person mean substitution is computationally simpler, similar in its efficiency, advocated by other researchers and more likely to be an option on statistical software packages, it is the method of choice. If the missing data are from a scale where more than half the items are missing, or with single-item measures, then hot deck imputation is recommended. The findings also showed that listwise deletion and item mean substitution performed poorly.

Conclusions: Person mean and hot deck imputation are preferred. Since listwise deletion and item mean substitution performed poorly, yet are the most widely reported methods, the findings have broad implications.

Keywords

horizontal mean substitution hot deck imputation imputation item mean substitution listwise deletion missing data person mean substitution regression imputation vertical mean substitution

Abbreviations

AUDIT, Alcohol Use Disorders Identification Test [1], [2]; CAPS, Clinician Administered PTSD Scale [3]; EM, expectation-maximization; HADS, Hospital Anxiety and Depression Scale [4], [5]; MAR, missing at random; MCAR, missing completely at random; PCL, Posttraumatic Stress Disorder Checklist – Military version [6]; PTSD, posttraumatic stress disorder; RE, relative efficiency statistic [7], [8].

Cross-sectional missing data

Missing data are problematic: incomplete datasets may lead to results that are different from those that would have been obtained from a complete dataset. Where there is missing data the missing values can be imputed using the technique that minimizes these differences. This paper focuses on cross-sectional missing data, which is where there is missing data at a particular timepoint. For a discussion of missing data in longitudinal studies our companion paper should be consulted [9].

Despite recent advances in understanding missing data and imputation methods most researchers still report listwise deletion, perhaps because of a lack of adequate guidelines for handling missing data [10]. Most clinical researchers are familiar with the more common strategies, such as listwise or pairwise deletion and mean substitution that are readily available in statistical packages, such as the Statistical Package for the Social Sciences (SPSS) [11]. Those who are proficient in data analyses may use more complex methods, such as regression or ‘hot deck’. Others may use statistical packages, such as Mplus [12] or AMOS [13], which include sophisticated approaches, such as the expectation-maximization (EM) algorithm [14] or multiple imputation [15].

This paper provides guidelines for several commonly used and easy to calculate missing-value procedures. The intention is to inform those with limited statistical familiarity about the merits of different strategies.

Cross-sectional missing data occur for several reasons. There may be errors in the data or the data collection process may be inadequate (e.g. where a single page is missing from a particular questionnaire). Participants may refuse to provide data for reasons such as fatigue or the sensitive nature of the information. Finally, researchers may have adopted designs that involved planned missing data, such as surveys where some respondents are asked most questions but others are asked different questions.

The different reasons formissing data give rise to different types of missing data. Generally, there are two important types described by Little and Rubin [16] and Schafer [17] as ignorable and non-ignorable. Non-ignorable is where the probability of a missing datum is dependent upon its value and ignorable missing data is where the probability of a missing datum is not dependent upon its value.

There are three forms of ignorable missing data. The first is associated with sampling. Because in most situations it is neither efficient nor possible to obtain data from a whole population, probability sampling is widely used to obtain a representative population sample. This form of missing data is not considered further. The second form of ignorable missing data is missing at random (MAR) [18], [19] or recoverable [20]. Missing at random occurs where the pattern of missingness for a particular variable (Y) may vary for subsets. For example, MAR may occur if an item is more likely to be completed by men than women. A third form of ignorable missing data is missing completely at random (MCAR) [19], [21], where the missingness occurs at random across the whole dataset. For example, if one questionnaire accidentally had a page missing, the resultant missing values would be MCAR.

Non-ignorablemissing data occurs where the pattern of missingness is such that the missing values of Y cannot be reliably predicted from other dataset variables. This may be caused by an underlying process that leads to bias in the observed data [22], [23]. For example, where a respondent dislikes an interviewer and refuses to answer questions.

Even if the ‘true’ ignorability status of intermittent missing data is unknown, most missing data are recoverable through imputation [20], [24]. This is the position adopted for the present paper which examines the impact of various approaches to handling cross-sectional missing data.

This paper has been written to assist clinicians and researchers who have a basic understanding of missing data techniques. To achieve this, four strategies have been adopted. First, there are no complex algebraic notations. More mathematically and statistically comprehensive descriptions of missing-value approaches can be found elsewhere [25–29]. Second, in the main the approaches are those that clinicians and researchers are likely to be familiar with. Third, the present study uses ‘real’ data collected during clinical treatments rather than simulations [30]. Finally, the t-test is used as the basis for assessing the utility of the various missing data approaches.

Approaches for handling intermittent missing data

Six procedures for handling cross-sectional missing data are considered: listwise deletion; item (or vertical) mean substitution; two levels of person (or horizontal) mean substitution; regression imputation; and ‘hot deck’ substitution. Hair et al. [22] describe listwise deletion, mean substitution and regression imputation techniques. Hot deck imputation is explained succinctly by Fayers and Machin [31]. Two other state-of-the-art approaches are the EM algorithm [14] and multiple imputations [15]. Because these are relatively complex we have not included them in the data analyses although we provide brief descriptions.

Listwise deletion is the simplest procedure and is the default in many statistical packages [10]. This removes a case (case i) from an analysis if a datum is missing for case i on any variable that is included in the analysis [23]. The main concerns for listwise deletion are the loss of data that may have been time-consuming or difficult to collect, the reduction in sample size and the resulting bias in variance estimates (e.g. standard deviation).

Mean substitution takes two forms. In itemmean substitution, a missing value for case i on variable Y (hereafter Y_i) is replaced by the mean value of all other participants that have a valid value for Y [32]. Where other information is not available, Tabachnick and Fidell [10], p. 62] regard the mean as the ‘best guess’. Item mean substitution is conservative because the sample mean does not change. However, the variance is underestimated. For the second form, person mean substitution [32], the imputed value for a variable with missing data is derived from the non-missing items for the case. Consider where a scale score (Y) is summed from 10 items (Y₁, Y₂,…, Y₁₀) and, for case i, Y_i3, Y_i6 and Y_i7 are missing. The value for Y_i (i.e. Y for case i) is calculated:

The implicit assumptions are that: (i) the item response ranges are the same for all items (for a discussion see Curran et al. [23]); (ii) the missed items would have had the same values as the mean of the non-missing items; and (iii) that each item makes an equal contribution to the overall scale score. Personmean substitutions may lead to reduced variances. Obviously this has limited application where single variables are used (e.g. gender) or where all scale items are missing.

Regression imputation uses the relationship between two or more variables such that a missing value of Y_i is estimated from the overall relationship between Y and other dataset variables. However, it reduces residual variance and assumes that other dataset variables have strong relationships with the variable of interest. For regression imputation, a missing value of Y_i can be estimated from the relationship between Y and one or more conceptually different variables (e.g. X₁, X₂,…, X_p) measured at the same time. For the present study, the univariate regression imputation procedure uses a single independent variable.

The hot deck procedure selects a substitute value from another participant in the dataset (the deck). The deck may be selected according to one or more characteristics of case i for whom an imputed value is required. Consider where case i is missing data for the variable of interest (Y_i) and has a value of ‘1’ for the dichotomized independent variable. The deck would constitute all valid values of Y for which cases have a value of ‘1’ for the independent variable. A substitute value for Y_i would be selected at random from this deck. In the present study two decks are used; one for each level of another variable in the dataset (dichotomized Alcohol Use Disorders Identification Test (AUDIT) scores; see Method section).

The EMapproach, strictly, is not an imputationmethod, rather it estimates parameters for datasets with missing data. An iterative two-step (E-step and M-step) process is used. In the E-step, the missing data are estimated using a function (such as the normal distribution). The M-step then uses the (now) complete data to calculate parameters (e.g. means, standard deviations and regression coefficients). This process is iteratively continued until the change in the parameter estimate is below a specified minimum criterion. Unfortunately, descriptions of the EM algorithm are usually complex, although Bilmes [25] has constructed ‘A gentle tutorial of the EM algorithm’. Other useful references can be found in Enders [33], Little [28] and Meng and Rubin [34].

Finally, multiple imputation [15] builds, to some extent, on other missing data methods. It uses Monte Carlo simulation to produce a number (usually less than 10) of complete datasets derived from the initial dataset with missing values. These datasets are then analysed separately. This yields multiple parameter estimates which are combined to produce means and confidence intervals which reflect the uncertainty from the missing data in the original dataset. Further descriptions can be found in Schafer [29] and Schafer and Olsen [24].

Aim of the present study

Six missing-value procedures are compared in terms of the closeness of the average t-test values to the t-value for the full dataset. Several different percentages of missing data are tested. This illustrates differences between the techniques, thus enabling readers to make informed judgements about the approaches they should adopt when confronted with cross-sectional missing data.

Method

Participants

The participants were 1200 Australian male Vietnam veterans with a posttraumatic stress disorder (PTSD) diagnosis who were consecutive admissions to the Australian Department of Veterans' Affairs accredited PTSD treatment programs. The average intake age was 52.10 years (SD=4.89), 81% were married. Eighty-five per cent had served in the army, 10% in the navy and 5% in the air force. The mean level of clinically assessed PTSD, using the Clinician Administered PTSD Scale (CAPS) [3], [35], was 82.72 (SD=16.81).

Measures

The measures used in the study constitute a small component of a battery of instruments completed at program intake. Because the study was statistical in nature, the measures were selected more by reasons of convenience than by any substantive research question. Only those instruments used in this study are described. More details can be found in Creamer et al. [36].

The dependent variablewas the Posttraumatic Stress Disorder Checklist – Military version (PCL) [6] completed by the veterans at intake. The PCL comprises 17 items each scored on a five-point Guttman-type scale [37] from 1 (not at all) to 5 (extremely). Possible scores range from 17 (a low level of stress) to 85 (a high level of stress).

The second measure used was the AUDIT [1], [2], measured at intake. It comprises 10 items each measured on a Guttman-type scale. The first eight are scored from 0 to 4. Items 9 and 10 are initially scored on a scale with 0, 1 and 2, but rescaled to 0, 2 and 4. Possible scores range from 0 (low alcohol use) to 40 (high alcohol use). AUDIT scores were dichotomized using a median split to 0 (low alcohol use) and 1 (high alcohol use) thereby ensuring approximately 50% of cases in each group of the complete dataset. This was the independent variable in the t-tests.

The final measure used was the anxiety subscale of the Hospital Anxiety and Depression Scale (HADS) [4], [5]. It was chosen because it had the highest correlation with the PCL of any scale in the full dataset (r=0.59). It consists of seven items each measured on a Guttman-type scale from 0 to 3. Possible scores range from 0 (low anxiety) to 21 (high anxiety). The anxiety subscale was used in the regression imputation procedures.

Data analyses

Data analyses comprised the comparison of two sets of t-values. The first set were calculated where none of the data were missing. This produced a set of ‘true’ t-values. The second set were calculated for the same data, but where various percentages of the data were randomly made missing. In five of the six methods (all but listwise deletion), the missing values were imputed. For listwise deletion, t-values were calculated for the data that had not been made missing (i.e. the data remaining in the dataset). In all cases, only the PCL scores were made missing. Mean t-values were calculated for each of the six missing data procedures. For each procedure, three different levels of missing data were investigated: 20, 40 and 60%. For person mean substitution, the PCL values were made missing by randomly making some of the 17 PCL items missing. Two levels of missing items were used; nine (i.e. where eight items were present) and 13 items missing (i.e. where four items were present). In addition, the analyses were carried out for randomly selected samples of 25, 50, 100, 200 and 400 cases.

The t-test was used because it is a commonly used procedure which provides opportunities to understand the reasons for different results. The formula for the t-test is

where

{\overset{\cdot}{X}}_{1}

and

{\overset{\cdot}{X}}_{2}

are the mean PCL scores for groups 1 (low alcohol consumption) and 2 (high alcohol consumption), n₁ and n₂ are the sample sizes for groups 1 and 2 and s_p is the pooled standard deviation calculated as

Higher t-values are associated with: (i) greater differences between group means ( $({\overset{\cdot}{X}}_{1} - {\overset{\cdot}{X}}_{2})$ ); (ii) larger sample sizes (n); and (iii) less variation between scores. It is likely that differences between means varied with both changes in sample size and missing data proportions because different participants were included in different t-tests. Hence, it is important to compare the mean values for different methods and levels of missing data both within a particular sample size as well as across particular sample sizes. It is possible that a missing data procedure will work better than others depending upon the particular levels of missingness with particular sample sizes. We were interested in the overall performance of the procedures.

In order to ensure parameter stability, for each combination of sample size, percentage of missing data and missing data procedure, the analyses were repeated 1000 times as an independent block of runs. Hence, the tables in the ‘Results’ section report the mean t-values for each combination of test conditions for each of the missing data approaches over the 1000 runs.

For the regression imputation, PCL scores at intake were regressed upon the HADS anxiety subscale scores.

To compare the different procedures, the difference between the true t-value and the mean obtained t-values for each specification's runs was computed. Average means were obtained and the results used to compute a modified variation of the relative efficiency (RE) statistic [7], [8]:

where Δt₁ is the mean of the differences between the t-values for the complete data (the true t-values) and the t-values for method 1 (the method of interest) and Δt₂ is the mean of the differences between the t-values for the complete data and the t-values for method 2 (i.e. the procedure with the t-values closest to the true t-values across all runs).

Results

The true t-value and the mean t-values using the six different missing data procedures are presented in Table 1.

Table 1.

True t-values and mean t-values for six intermittent missing data methods

		Sample size
Missing data method	% Missing data	25	50	100	200	400
Complete data (true t-value)	20	0.71	1.02	1.47	2.06	2.87
	40	0.66	1.04	1.49	2.08	2.91
	60	0.68	1.04	1.45	2.05	2.92
Listwise deletion	20	0.62	0.90	1.30	1.84	2.56
	40	0.49	0.83	1.13	1.59	2.30
	60	0.43	0.64	0.94	1.29	1.83
Item mean substitution	20	0.61	0.89	1.29	1.84	2.55
	40	0.47	0.81	1.12	1.58	2.29
	60	0.40	0.62	0.92	1.27	1.81
Person mean substitution (eight items present)	20	0.69	0.99	1.43	2.02	2.83
	40	0.62	0.99	1.44	2.01	2.81
	60	0.64	0.98	1.36	1.96	2.76
Person mean substitution (four items present)	20	0.69	0.97	1.38	1.94	2.72
	40	0.60	0.94	1.37	1.89	2.66
	60	0.58	0.93	1.27	1.78	2.54
Regression imputation	20	0.66	0.96	1.40	1.96	2.74
	40	0.57	0.94	1.36	1.88	2.67
	60	0.57	0.89	1.25	1.78	2.51
Hot deck	20	0.73	1.01	1.46	2.08	2.87
	40	0.67	1.17	1.50	2.06	2.99
	60	0.81	1.09	1.57	2.06	2.91

Table 2 shows the rankings of the procedures for each combination of sample size and percentage of missing data in terms of the closeness of the estimated t-values to the true t-values. A lower ranking indicates that the t-value estimated by the missing data method was closer in magnitude to the true t-value.

Table 2.

Rankings of six intermittent missing data procedures in terms of closeness to the true t-values

		Sample size					Mean rank (SD)
Missing data method	% Missing data	25	50	100	200	400
Listwise deletion	20	9.5	10.0	11.0	11.5	11.0	14.20 (2.80)
	40	15.0	15.0	15.0	15.0	15.0
	60	17.0	17.0	17.0	17.0	17.0
Item mean substitution	20	11.5	12.5	12.5	11.5	12.0	15.27 (2.59)
	40	16.0	16.0	16.0	16.0	16.0
	60	18.0	18.0	18.0	18.0	18.0
Person mean substitution (eight items present)	20	3.0	2.0	3.0	4.0	3.0	4.73 (1.65)
	40	5.5	4.0	4.0	5.0	5.0
	60	5.5	6.5	6.5	6.0	8.0
Person mean substitution (four items present)	20	3.0	4.0	6.5	8.0	7.0	8.93 (3.09)
	40	8.0	8.5	8.5	9.0	10.0
	60	11.5	11.0	12.5	13.5	13.0
Regression imputation	20	7.0	6.5	5.0	7.0	6.0	9.80 (3.20)
	40	9.5	8.5	10.0	10.0	9.0
	60	13.0	14.0	14.0	13.5	14.0
Hot deck	20	3.0	1.0	1.5	2.5	1.0	4.00 (4.23)
	40	1.0	12.5	1.5	2.5	4.0

The data in Tables 1 and 2 suggest that the hot deck imputation and the person mean substitution method where there was data for eight of the 17 PCL items produced estimated t-values that were close to the true t-values. As expected, in all cases the t-values increased with sample size increases. In addition, the difference between the true t-values and those of the missing data methods increased as the percentage of missing data increased. Although the hot deck and person mean substitution both performed well overall, there was some evidence that the hot deck imputation was more robust for larger sample sizes.

Table 3 shows the differences between the procedures. The most efficient method was hot deck. When compared with this method, person mean substitution with eight items present was almost as efficient, followed by person mean substitution with four items present and regression imputation. Listwise deletion and item mean substitution performed particularly badly.

Table 3.

Relative efficiency (RE) of six intermittent missing data methods

Missing data procedure	Mean (SD) t-score difference from true t-value	RE statistic
Listwise deletion	0.384 (0.273)	83.59
Item mean substitution (vertical)	0.400 (0.275)	90.70
Person mean substitution (eight items present)	0.061 (0.036)	2.32
Person mean substitution (four items present)	0.146 (0.095)	12.08
Regression imputation	0.154 (0.096)	13.44
Hot deck	0.042 (0.048)	1.00

The lower the value, the more effective the relative performance (e.g. hot deck was over 80 times more effective in estimating the true t-value when compared with listwise deletion).

Discussion

A major issue with missing data relates to its status; it is important to identify ignorable from non-ignorable missing data [17]. If the missing data pattern for a particular variable is linked to the variable scores and it is not possible to confidently estimate the missing values from other dataset variables, then it is not possible to substitute new values for missing data [21]. Almost certainly, the results will be biased. Every effort should be made to collect full and complete datasets. Where data are missing, it is important to identify ignorable missing data and impute missing values.

Six popular methods for doing this were examined, namely listwise deletion, item mean substitution, person mean substitution with two levels of missingness, regression and hot deck imputation. When interpreting the findings, however, it should be borne in mind that these methods constitute only a selection of those potentially available. Better results may have been obtained from the more sophisticated procedures, such as the EMalgorithm or multiple imputation. The study has, nevertheless, examined the methods commonly reported and provides advice concerning the selection of appropriate methods.

The findings suggest that hot deck imputation and person mean substitution (with approximately half the items present) are superior to the other procedures – a finding consistent with that of earlier researchers [38]. Evenwhen less than 25% (four out of 17) of items were completed, person mean substitution performed better than listwise and item mean substitution. Caution, however, should be exercised since increases in the number of missing items will influence the results due to correlations between items within scales [32]. Of particular interest is the observation that the two most popular missing data methods, item mean substitution and listwise deletion, performed poorly, particularly where there were high levels of missing data. This finding is consistent with that reported by Downey and King [32].

We advocate using person mean substitution over hot deck imputation for scale scores because of its ease of computation. Unlike hot deck, because the mean substitution strategy uses the non-missing items in a particular scale, all available data is used.

Of concern is the poor performance of listwise deletion and item mean substitution, given that they are the most commonly reported and are the default methods in many statistical software packages. The t-values in Table 1 suggest that where missing data are MCAR, as is the case in our study, these procedures may lead to excessive null findings or Type II errors. For listwise deletion with MCAR data the group means and variances are likely to stay the same. Hence, the t-value will decrease as the sample size reduces, primarily because of the reduced degrees of freedom. For item mean substitution there are two possible outcomes when imputingMCAR data. If the values are imputed across all study participants, the differences between group means will decrease and, hence, the t-values will also be smaller. If the values are imputed within study groups, the differences between group means will increase and the t-values will be larger.

Regarding the study limitations, there may have been better ways of carrying out the regression analyses. Values for missing data were generated for PCL at intake from HADS anxiety scores. Thus imputed values were calculated regardless of the participants' AUDIT class. The results for the regression imputations may have been closer to those of the complete dataset if separate regression lines were calculated for each AUDIT class. We did not do this because it would have necessitated calculating a new set of missing values for each analysis – a procedure that few researchers would be willing to undertake on a day-to-day basis. Additionally, the results of the regression imputations might have been further improved if the correlations between PCL at intake and the independent variable had been stronger. The correlation with PCL at intake was moderate (r=0.59 with HADS anxiety), a situation that is typical of many research datasets.

The comparisons were based on the disparity between the true t-values and the estimated t-values. It is possible that different results may have been obtained for alternative statistical tests, such as correlation or regression coefficients. Hence, there may be value in comparing other parameter estimates for complete datasets with those derived from datasets with imputed values. There may also be benefit from making further comparisons between person mean substitution and hot deck imputation. This study found that these two methods produced similar results when about half of the items were present. Given that the person mean substitution method with about half of the items present was superior to the person mean substitution method with about one-quarter of the items present, there may be a level of missing items beyond which person mean substitution becomes better than hot deck imputation.

Finally, the results may have been affected by distribution and covariation within the particular dataset used. We have no reason to suspect this was the case, but further research with other datasets is needed.

Conclusion

This study examined several popular and simple procedures for handling missing data. The relative merits of each procedure were examined under a variety of sample sizes and with differing levels of missing data, where the criteria were the true t-values for the entire sample. The results suggest that there are important differences between the procedures. In particular, they suggest that it is difficult to justify using listwise deletion or item mean substitution. Since these are the most widely reported methods of handling missing data and are the default methods in many statistical software packages, the findings have broad implications. Given the limitations of the study, replication is important.

Subject to the study limitations, where data are missing from instrument scales, the results suggest that if there are at least half of the items of a scale present, the appropriate procedure would be person mean substitution or hot deck imputation. Where no items have been completed for a scale, or where the measure consists of a single item, hot deck imputation should be the preferred method.

Footnotes

Acknowledgements

This study was funded as part of a block grant to the Australian Centre for Posttraumatic Mental Health by the Australian Common wealth Department of Veterans' Affairs. We thank Dirk Biddle for his helpful comments on themanuscript and also those veterans who completed long evaluation forms at the time when they were seeking helpwith mental health problems. The ethics approval for data collection from veterans was given by the Australian Common wealth Department of Veterans' Affairs Human Ethics Committee.

References

Babor

de la Fuente

Saunders

Grant

. AUDIT –The Alcohol Use Disorders Indentification Test: guidelines for use in primary health care. World Health Organisation Division of Mental Health, Geneva 1989; 1–23.

Saunders

Aasland

Babor

de la Fuente

Grant

. Development of the Alcohol Use Disorders Identification Test (AUDIT): WHO Collaborative Project on early detection of persons with harmful alcohol consumption –II. Addiction 1993; 88: 791–804.

Blake

Weathers

Nagy

. A clinician rating scale for assessing current and lifetime PTSD: the CAPS-1. Behavior Therapist 1990; 13: 187–188.

Zigmond

Snaith

. The Hospital Anxiety and Depression Scale. Acta Psychiatrica Scandinavica 1983; 67: 361–370.

Snaith

Zigmond

. The Hospital Anxiety and Depression Scale. NFER-Nelson, Windsor 1994.

Blanchard

Jones-Alexander

Buckley

Forneris

. Psychometric properties of the PTSD Checklist (PCL). Behaviour Research and Therapy 1996; 34: 669–673.

Liang

Larson

Cullen

Schwartz

. Comparative measurement efficiency and sensitivity of five health status instruments for arthritis research. Arthritis and Rheumatism 1985; 28: 542–547.

Wright

Young

. A comparison of different indices of responsiveness. Journal of Clinical Epidemiology 1997; 50: 239–246.

Elliott

Hawthorne

. Imputing missing repeated measures data: how should we proceed?. Australian and New Zealand Journal of Psychiatry 2005; 39: 575–582.

10.

Tabachnick

B G

Fidell

L S

. Using multivariate statistics 4th edn. Harper Collins, New York 2001.

11.

SPSS . SPSS for Windows. SPSS, Chicago 2003.

12.

Muthen

. Mplus. Muthen and Muthen, Los Angeles 1998.

13.

Arbuckle

Wothke

. Amos. Small Waters Corporation, Chicago 1999.

14.

Dempster

A P

Laird

N M

Rubin

D B

. Maximum likelihood estimation from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 1977; 39(Ser. B)1–38.

15.

Rubin

. Multiple imputation for nonresponse in surveys. Wiley, New York 1987.

16.

Little

Rubin

. Statistical analysis with missing data. Wiley, New York 1987.

17.

Schafer

. Analysis of incomplete multivariate data. Chapman and Hall, London 1997.

18.

Rubin

D B

. Inference and missing data. Biometrika 1976; 63: 581–592.

19.

Fairclough

. Methods of analysis for longitudinal studies of health-related quality of life. Quality of life assessment in clinical trials, Staquet

Hays

Fayers

. Oxford University Press, Oxford 1998; 227–248.

20.

McArdle

J J

. Structural factor analysis experiments with incomplete data. Multivariate Behavioral Research 1994; 29: 409–454.

21.

King

Bachrach

McArdle

. Contemporary approaches to missing data: the glass is really half full. PTSD Research Quarterly 2001; 12: 1–7.

22.

Hair

Anderson

Tatham

Black

. Multivariate data analysis 5th edn. Prentice Hall, Upper Saddle River 1998.

23.

Curran

Fayers

Molenberghs

Machin

. Analysis of incomplete quality of life data in clinical trials. Quality of life assessment in clinical trials, Staquet

Hays

Fayers

. Oxford University Press, Oxford 1998; 249–280.

24.

Schafer

J L

Olsen

M K

. Multiple imputation for multivariate missing-data problems: a data analyst's perspective. Multivariate Behavioral Research 1998; 33: 545–571.

25.

Bilmes

. A gentle tutorial of the EM algorithm and its applications to parameter estimation for Gaussian mixture and hidden Markov models 1998, International Computer Science Institute Technical Report. International Computer Science Institute.

26.

Conway

. The analysis of repeated categorical measurements subject to nonignorable nonresponse. Journal of the American Statistical Association 1992; 87: 817–824.

27.

Hedeker

Gibbons

R D

. Application of random-effects pattern-mixture models for missing data in longitudinal studies. Psychological Methods 1997; 2: 64–78.

28.

Little

. Robust estimation of the mean and covariance matrix from data with missing values. Applied Statistics 1988; 37: 23–38.

29.

Schafer

J L

. Multiple imputation: a primer. Statistical Methods in Medical Research 1999; 8: 3–15.

30.

Graham

J W

Hofer

S M

MacKinnon

D P

. Maximizing the usefulness of data obtained with planned missing value patterns: an application of maximum likelihood procedures. Multivariate Behavioral Research 1996; 31: 197–218.

31.

Fayers

Machin

. Quality of life: assessment, analysis and interpretation. Wiley, Chichester 2000.

32.

Downey

R G

King

. Missing data in Likert ratings: a comparison of replacement methods. Journal of General Psychology 1998; 125: 175–191.

33.

Enders

C K

. A primer of maximum likelihood algorithms available for use with missing data. Structural Equation Modeling 2001; 8: 128–141.

34.

Meng

Rubin

D B

. Using EM to obtain asymptotic variance–covariance matrices: the SEM algorithm. Journal of the American Statistical Association 1991; 86: 899–909.

35.

Blake

Weathers

Nagy

. Instruction manual: Clinician Administered PTSD Scale (CAPS). National Centre for Posttraumatic Stress Disorder, Boston 1990.

36.

Creamer

Morris

Biddle

Elliott

. Treatment outcome in Australian veterans with combat-related posttraumatic stress disorder: a cause for cautious optimism?. Journal of Traumatic Stress 1999; 12: 545–558.

37.

Guttman

. A basis for scaling qualitative data. American Sociological Review 1944; 9: 139–150.

38.

Roth

P L

Switzer

F S

Switzer

. Missing data in multiple item scales: a Monte Carlo analysis of missing data techniques. Organizational Research Methods 1999; 2: 211–232.

Imputing Cross-Sectional Missing Data: Comparison of Common Techniques

Abstract

Keywords

Abbreviations

Cross-sectional missing data

Approaches for handling intermittent missing data

Aim of the present study

Method

Participants

Measures

Data analyses

Results

Discussion

Conclusion

Footnotes

Acknowledgements

References