In this article, we develop finite sample inference based on multiply imputed synthetic data generated under the multiple linear regression model. We consider two methods of generating the synthetic data, namely posterior predictive sampling and plug-in sampling. Simulation results are presented to confirm that the proposed methodology performs as the theory predicts and to numerically compare the proposed methodology with the current state-of-the-art procedures for analysing multiply imputed partially synthetic data.
AbowdJMStinsonMBenedettoG.Final Report to the Social Security Administration on the SIPP/SSA/IRS Public Use File Project. Technical Report; 2006. Available from http://www2.vrdc.cornell.edu/news/wp-content/uploads/2007/11/ssainal.pdf[Accessed 1st October 2018].
2.
BenedettoGStinsonMHAbowdJM.The Creation and Use of the SIPP Synthetic Beta. Technical Report; 2013. Available from http://www.census.gov/content/dam/Census/programs-[surveys/sipp/methodology/SSBdescribe_nontechnical.pdf][Accessed 1st October 2018].
3.
DrechslerJ.Synthetic Datasets for Statistical Disclosure Control. New York, NY: Springer; 2011.
4.
HawalaS.Producing Partially Synthetic Data to Avoid Disclosure. Proceedings of the Joint Statistical Meetings (pp. 1345–1350). Alexandria, VA: American Statistical Association. 2008.
5.
KinneySKReiterJPMirandaJ.SynLBD 2.0: Improving the Synthetic Longitudinal Business Database. Stat J Inter Assoc Official Stat. 2014; 30(2): 129–135.
6.
KinneySKReiterJPReznekAPMirandaJJarminRSAbowdJM.Towards Unrestricted Public Use Business Microdata: The Synthetic Longitudinal Business Database. Inter Stat Rev. 2011; 79(3): 362–384.
7.
KleinMSinhaB.Inference for Singly Imputed Synthetic Data Based on Posterior Predictive Sampling under Multivariate Normal and Multiple Linear Regression Models. Sankhya: Indian J Stat. 2015; 77-B(2): 293–311.
8.
KleinMSinhaB.Likelihood Based Finite Sample Inference for Singly Imputed Synthetic Data Under the Multivariate Normal and Multiple Linear Regression Models. J Priv Confidentiality. 2016; 7(1): 43–98.
9.
Little RJA. Statistical Analysis of Masked Data. refstepcounter Remcounter J Official Stat. 1993; 9(2): 407–426.
10.
MachanavajjhalaAKiferDAbowdJGehrkeJVilhuberL.Privacy: Theory Meets Practice on the Map. IEEE 24th International Conference on Data Engineering (pp. 277–286). Piscataway, NJ: IEEE.2008.
11.
R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2013. Available from http://www.R-project.org/[Accessed 1st October 2018].
12.
RaghunathanTEReiterJPRubinDB.Multiple Imputation for Statistical Disclosure Limitation. J Official Stat. 2003; 19(1): 1–16.
13.
ReiterJP.Inference for Partially Synthetic, Public Use Microdata Sets. Surv Methodol. 2003; 29(2): 181–188.
14.
ReiterJP. Significance Tests for Multi-Component Estimands From Multiply Imputed, Synthetic Microdata. J Stat Plan Inference. 2005; 131(2): 365–377.
15.
ReiterJPKinneySK.Inferentially Valid, Partially Synthetic Data: Generating from Posterior Predictive Distributions Not Necessary. J Official Stat. 2012; 28(4): 583–590.
16.
RencherACSchaaljeGB.Linear Models in Statistics. 2nd ed. Hoboken, NJ: John Wiley & Sons; 2008.
17.
RodríguezR.Synthetic Data Disclosure Control for American Community Survey Group Quarters. Proceedings of the Joint Statistical Meetings (pp. 1439–1450). Alexandria, VA: American Statistical Association.2007.
18.
RubinDB.Multiple Imputation for Nonresponse in Surveys. Hoboken, NJ: John Wiley & Sons; 1987.
19.
RubinDB.Discussion: Statistical Disclosure Limitation. J Official Stat. 1993; 9(2): 461–468.