The simulation-based imputation methods for handling missing data in double sampling scheme

Abstract

Missing data is a pervasive issue in the field of data science, leading to biased estimates and a reduction in statistical power. To address this challenge, double sampling schemes have emerged as a cost-effective strategy. However, selecting the most suitable imputation method for handling missing data in double sampling scheme remains a complex task. This paper proposes improved imputation methods under a double-sampling setup. It uses the simulation-based approach to evaluate and compare different imputation methods with proposed ones for effectively handling missing data in double-sampling scenarios. Consider various imputation techniques, including mean imputation and regression imputation. Each imputation method is applied to incomplete datasets, and the performance of the estimators is evaluated based on metrics such as mean squared error and percentage relative efficiency. In conclusion, our simulation-based approach comprehensively evaluates various imputation techniques in the context of double sampling. The insights gained from this study can assist researchers in making informed decisions when handling missing data, ultimately improving the validity and reliability of statistical inferences in double-sampling schemes.

Keywords

two-phase sampling missing data imputation simulation efficiency

Get full access to this article

View all access options for this article.

References

Bethlehem

. Applied survey methods: a statistical persp. Hoboken, NJ: John Wiley & Sons, 2009.

Särndal

Swensson

Wretman

. Model assisted survey sampling. New York, USA: Springer Science & Business Media, 2003.

Rubin

. Inference and missing data. Biometrika 1976; 63(3): 581–592.

Kalton

Kasprzyk

Santos

. Issues of nonresponse and imputation in the survey of income and program participation. In: Krewski D, Platek R, Rao JNK (eds). Current topics in survey sampling. New York: Academic Press, 1981. p.455–480.

Heitjan

Basu

. Distinguishing ‘missing at random’ and ‘missing completely at random’. Am Stat 1996; 50(3): 207–213.

Ahmed

Al-Titi

Al-Rawi

, et al. Estimation of a population mean using different imputation methods. Stat Trans 2006; 7(6): 1247–1264.

Bhushan

Pandey

. Optimal imputation of missing data for estimation of population mean. J Stat Manage Syst 2016; 19(6): 755–769.

Bhushan

Pandey

. Optimality of ratio type estimation methods for population mean in presence of missing data. Commun Stat Theory Methods 2018; 47(11): 2576–2589.

Bhushan

Pandey

. On optimality of imputation methods for estimating population mean using higher order moment of an auxiliary variable. Commun Stat Simul Comput 2018; 49(6): 1560–1574.

10.

Diana

Perri

. Improved estimators of the population mean for missing data. Commun Stat Theory Methods 2010; 39(18): 3245–3251.

11.

Kalton

Kasprzyk

. Imputing for missing survey responses. In: Proceedings of the section on survey research method. American Statistical Association, 1982. p.22–31.

12.

Kadilar

Cingi

. Estimators for the population mean in the case missing data. Commun Stat Theory Methods 2008; 37(14): 2226–2236.

13.

Lee

Rancourt

Sarndal

. Experiments with variance estimation from survey data with imputed values. J Off Stat 1994; 10(3): 231–243.

14.

Lee

Rancourt

Sarndal

. Variance estimation in the presence of imputed data for the generalized estimation system. In: Proceedings of the section on survey research methods. American Statistical Association, 1995. p.384–389.

15.

Singh

Horn

. Compromised imputation in survey sampling. Metrika 2000; 51: 267–276.

16.

Singh

Deo

. Imputation by power transformation. Stat Pap 2003; 44: 555–579.

17.

Singh

Suman

Kadilar

. On the use of imputation methods for missing data in estimation of population mean under two-phase sampling design. Hacettepe J Math Stat 2018; 47(6): 1715–1729.

18.

Thakur

Yadav

Pathak

. Estimation of mean in presence of missing data under two-phase sampling scheme. J Reliab Stat Stud 2011; 4(2): 93–104.

19.

Thakur

Yadav

Pathak

. Imputation using regression estimators for estimating population mean in two-phase sampling. J Reliab Stat Stud 2012; 5(2): 21–31.

20.

Thakur

Yadav

Pathak

. On mean estimation with imputation in two-phase sampling. Res J Math Stat Sci 2013; 1(13): 1–9.

21.

Pandey

Yadav

. Mean estimation under imputation based on two-phase sampling design using an auxiliary variable. Pak J Stat Oper Res 2016; 12(4): 639–658.

22.

Singh

Suman

. Estimation of population mean using imputation methods for missing data under two-phase sampling design. J Stat Theory Pract 2019; 13: 1–24.

23.

Kadilar

Cingi

. Improvement in estimating the population mean in simple random sampling. Appl Math Lett 2006; 19(1): 75–79.

24.

Kyouncu

Kadilar

. Ratio and product estimators in stratified random sampling. J Stat Plan Inference 2009; 139(8): 2552–2558.

25.

Murthy

. Sampling theory and methods. Calcutta: Statistical Publishing Society, 1967.

26.

Bhushan

Pandey

. Optimality of ratio-type imputation methods for estimation of population mean using higher order moment of an auxiliary variable. J Stat Theory Pract 2021; 15(2): 1–35.

27.

Searls

. The utilization of a known coefficient of variation in the estimation procedure. J Am Stat Assoc 1964; 59(308): 1225–1226.