REDI for Binned Data: A Random Empirical Distribution Imputation Method for Estimating Continuous Incomes

Abstract

Researchers often need to work with categorical income data. The typical nonparametric (including midpoint) and parametric estimation methods used to estimate summary statistics both have advantages, but they carry assumptions that cause them to deviate in important ways from real-world income distributions. The method introduced here, random empirical distribution imputation (REDI), imputes discrete observations using binned income data, while also calculating summary statistics. REDI achieves this through random cold-deck imputation from a real-world reference data set (demonstrated here using the Current Population Survey Annual Social and Economic Supplement). This method can be used to reconcile bins between data sets or across years and handle top incomes. REDI has other advantages for computing values of an income distribution that is nonparametric, bin consistent, area and variance preserving, continuous, and computationally fast. The author provides proof of concept using two years of the American Community Survey. The method is available as the redi command for Stata.

Keywords

imputation distribution free income brackets grouped data inequality interval censored top-coded

Get full access to this article

View all access options for this article.

References

Allison

Paul D.

2000. “Multiple Imputation for Missing Data.” Sociological Methods & Research 28(3):301–309.

Allison

Paul D.

2002. “Multiple Imputation: Basics.” Pp. 27–49 in Missing Data. Thousand Oaks, CA: Sage.

Bailey

Stanley R.

Saperstein

Aliya

Penner

Andrew M.

2014. “Race, Color, and Income Inequality across the Americas.” Demographic Research 31(24):735–56.

Bauldry

Shawn

. 2015. “Structural Equation Modeling.” Pp. 615–20 in International Encyclopedia of the Social and Behavioral Sciences, 2nd ed., Vol. 23, edited by Wright

James D.

Oxford, UK: Elsevier.

Bhat

Chandra R.

1994. “Imputing a Continuous Income Variable from Grouped and Missing Income Observations.” Economics Letters 46(4):311–19.

Blanchet

Thomas

Garbinti

Bertrand

Goupille-Lebret

Jonathan

Martínez-Toledano

Clara

. 2018. “Applying Generalized Pareto Curves to Inequality Analysis.” AEA Papers and Proceedings 108:114–18.

Bollinger

Christopher R.

Hirsch

Barry T.

Hokayem

Charles M.

Ziliak

James P.

2019. “Trouble in the Tails? What We Know about Earnings Nonresponse 30 Years after Lillard, Smith, and Welch.” Journal of Political Economy 127(5):2143–85.

Chetty

Raj

Hendren

Nathaniel

Kline

Patrick

Saez

Emmanuel

Turner

Nicholas

. 2014. “Is the United States Still a Land of Opportunity? Recent Trends in Intergenerational Mobility.” American Economic Review 104(5):141–47.

Coder

John

Scoon-Rogers

Lydia

. 1996. “Evaluating the Quality of Income Data Collected in the Annual Supplement to the March Current Population Survey and the Survey of Income and Program Participation.” Working Paper No. SEHSD-WP1996-20. Retrieved June 10, 2022. https://www.census.gov/library/working-papers/1996/demo/SEHSD-WP1996-20.html.

10.

Collins

Linda M.

Schafer

Joseph L.

Kam

Chi-Ming

. 2001. “A Comparison of Inclusive and Restrictive Strategies in Modern Missing Data Procedures.” Psychological Methods 6(4):330–51.

11.

Cox

Nicholas J.

1998. “DISTPLOT: Stata Module to Generate Distribution Function Plot.” Boston College Department of Economics. Retrieved June 10, 2022. https://econpapers.repec.org/RePEc:boc:bocode:s337502.

12.

Donnelly

Michael J.

Pop-Eleches

Grigore

. 2018. “Income Measures in Cross-National Surveys: Problems and Solutions.” Political Science Research and Methods 6(2):355–63.

13.

Evans

James A.

Foster

Jacob G.

2019. “Computation and the Sociological Imagination.” Contexts 18:10–15.

14.

Fixler

Dennis

Gindelsky

Marina

Johnson

David

. 2019. “Improving the Measure of the Distribution of Personal Income.” Bureau of Economic Analysis. Retrieved June 10, 2022. https://www.bea.gov/research/papers/2019/improving-measure-distribution-personal-income.

15.

Fontenot

Kayla

Semega

Jessica

Kollar

Melissa

. 2018. “Income and Poverty in the United States: 2017.” Current Population Reports. Washington, DC: U.S. Government Printing Office.

16.

Francisco

Carol A.

Fuller

Wayne A.

2008. “Quantile Estimation with a Complex Survey Design.” Annals of Statistics 19(1):454–69.

17.

Freese

Jeremy

King

Molly M.

2018. “Institutionalizing Transparency.” Socius 4. Retrieved June 10, 2022. https://journals.sagepub.com/doi/full/10.1177/2378023117739216.

18.

Gelman

Andrew

Hill

Jennifer

. 2006. “Missing-Data Imputation.” Pp. 529–44 in Data Analysis Using Regression and Multilevel/Hierarchical Models, edited by Gelman

Hill

New York: Cambridge University Press.

19.

Goerg

Sebastian J.

Kaiser

Johannes

. 2009. “Nonparametric Testing of Distributions: The Epps–Singleton Two-Sample Test Using the Empirical Characteristic Function.” Stata Journal 9(3):454–65.

20.

Graham

John W.

2009. “Missing Data Analysis: Making It Work in the Real World.” Annual Review of Psychology 60:549–76.

21.

Guzman

Gloria G.

2018. “Household Income: 2017.”American Community Survey Briefs, ACSBR/17-0. Retrieved June 10, 2022. https://www.census.gov/content/dam/Census/library/publications/2018/acs/acsbr17-01.pdf.

22.

Henson

Mary F.

1967. “Trends in the Income of Families and Persons in the United States, 1947–1964.”Washington, DC: U.S. Department of Commerce, Bureau of the Census.

23.

Hout

Michael.

2004. “Getting the Most out of the GSS Income Measures.” GSS Methodological Report No. 101. Berkeley, University of California, Berkeley Survey Research Center.

24.

Ming-xiu

Salvucci

Sameena

. 2001. “A Study of Imputation Algorithms.” Working Paper No. 2001-17. Washington, DC: U.S. Department of Education, National Center for Education Statistics,

25.

Hunter

David J.

Drown

McKalie

. 2020. “binsmooth: Generate PDFs and CDFs from Binned Data.” R Package Version 0.2.2. Retrieved June 10, 2022. https://cran.r-project.org/web/packages/binsmooth/binsmooth.pdf.

26.

Jargowsky

Paul A.

Wheeler

Christopher A.

2018. “Estimating Income Statistics from Grouped Data: Mean-Constrained Integration over Brackets.” Sociological Methodology 48:337–74.

27.

Ligon

Ethan

. 1989. “The Development and Use of a Consistent Income Measure for the General Social Survey.” GSS Methodological Report No. 64. Chicago: NORC. Retrieved June 10, 2022. https://gss.norc.org/Documents/reports/methodological-reports/MR064.pdf.

28.

Marquis

Kent H. M.

Marquis

Susan

Polich

J. Michael

. 1986. “Response Bias and Reliability in Sensitive Topic Surveys.” Journal of the American Statistical Association 81(394):381–89.

29.

McDonald

James B.

1984. “Some Generalized Functions for the Size Distribution of Income.” Econometrica 52(3):647–63.

30.

McDonald

James B.

Ransom

Michael R.

1979. “Alternative Parameter Estimators Based upon Grouped Data.” Communications in Statistics—Theory and Methods A8(9):899–917.

31.

Mellon

Jonathan

Prosser

Christopher

. 2018. “Constructing Continuous Household Income Measurement on the British Election Study Internet Panel.” SSRN Electronic Journal. Retrieved June 10, 2022. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3153831.

32.

Minnesota Population Center. 2018. “CPS Income and Tax Variables User’s Note: Missing Cases, N.I.U. Cases, Top Codes and Bottom Codes.” Minneapolis: University of Minnesota. Retrieved June 10, 2022. https://cps.ipums.org/cps/inctaxcodes.shtml#topcodes.

33.

Moore

Richard A.

1996. “Controlled Data Swapping Techniques for Masking Public Use Microdata Sets.” Center for Disclosure Avoidance Research Working Papers, U.S. Census Bureau. Retrieved June 10, 2022. https://www.census.gov/content/dam/Census/library/working-papers/1996/adrm/rr96-4.pdf.

34.

Morris

Martina

Western

Bruce

. 1999. “Inequality in Earnings at the Close of the Twentieth Century.” Annual Review of Sociology 25:623–57.

35.

Pavia

Jose

. 2018. “GoFKernel: Testing Goodness-of-Fit with the Kernel Density Estimator.” Retrieved June 10, 2022. https://www.rdocumentation.org/packages/GoFKernel/versions/2.1-1.

36.

Piketty

Thomas

. 2014. Capital in the Twenty-First Century. Cambridge, MA: Belknap.

37.

Piketty

Thomas

Saez

Emmanuel

. 2006. “The Evolution of Top Incomes: A Historical and International Perspective.” American Economic Review 96:200–205.

38.

Piketty

Thomas

Saez

Emmanuel

. 2014. “Inequality in the Long Run.” Science 344(6186):838–43.

39.

Reardon

Sean F.

2011. “Online Appendix 5: The Widening Academic-Achievement Gap between the Rich and the Poor: New Evidence and Possible Explanations.” In Whither Opportunity? edited by Duncan

G. J.

Murnane

R. J.

New York: Russell Sage.

40.

Roth

Philip L.

1994. “Missing Data: A Conceptual Review for Applied Psychologists.” Personnel Psychology 47(3):537–60.

41.

Rothbaum

Jonathan L.

2015. “Comparing Income Aggregates: How Do the CPS and ACS Match the National Income and Product Accounts, 2007–2012.” Social, Economic, and Housing Statistics Division Working Papers Working Paper No. 2015-01. Washington, DC: U.S. Census Bureau.

42.

Ruggles

Steven

Flood

Sarah

Foster

Sophia

Goeken

Ronald

Pacas

Jose

Schouweiler

Megan

Sobek

Matthew

. 2021. “IPUMS USA: Version 11.0.”Minneapolis, MN: IPUMS. Retrieved June 10, 2022. https://www.ipums.org/projects/ipums-usa/d010.v11.0.

43.

Schenker

Nathaniel

Raghunathan

Trivellore E.

Chiu

Pei Lu

Makuc

Diane M.

Zhang

Guangyu

Cohen

Alan J.

2006. “Multiple Imputation of Missing Income Data in the National Health Interview Survey.” Journal of the American Statistical Association 101(475):924–33.

44.

U.S. Bureau of Labor Statistics. 2015. “Consumer Expenditure Surveys.” Washington, DC: U.S. Department of Labor. Retrieved June 10, 2022. https://www.bls.gov/cpi/research-series/r-cpi-u-rs-home.htm.

45.

U.S. Bureau of Labor Statistics. 2020. “R-CPI-U-RS Homepage.” Retrieved June 10, 2022. https://www.bls.gov/cpi/research-series/r-cpi-u-rs-home.htm.

46.

U.S. Census Bureau. 2018. “How the Census Bureau Measures Poverty.” Retrieved June 10, 2022. https://www.census.gov/hhes/www/poverty/about/overview/measure.html.

47.

U.S. Census Bureau and U.S. Bureau of Labor Statistics. 2018. “Current Population Survey March Annual Social and Economic Supplement (CPS ASEC).” Retrieved June 10, 2022. https://cps.ipums.org/cps/asec_sample_notes.shtml.

48.

UCLA: Statistical Consulting Group. 2020. “How Can I Do a t-Test with Survey Data?”Stata FAQ. Retrieved June 10, 2022. https://stats.idre.ucla.edu/stata/faq/how-can-i-do-a-t-test-with-survey-data/.

49.

von Hippel

Paul T.

Hunter

David J.

Drown

McKalie

. 2017. “Better Estimates from Binned Income Data: Interpolated CDFs and Mean-Matching.” Sociological Science 4:641–55.

50.

von Hippel

Paul T.

Powers

Daniel A.

2015. “RPME: Stata Module to Compute Robust Pareto Midpoint Estimator.” Boston College Department of Economics. Retrieved June 10, 2022. https://ideas.repec.org/c/boc/bocode/s457962.html.

51.

von Hippel

Paul T.

Scarpino

Samuel V.

Holas

Igor

. 2016. “Robust Estimation of Inequality from Binned Incomes.” Sociological Methodology 46:212–52.

52.

Wang

Weidong

Xie

Guihua

Hao

Lingxin

. 2014. “Rural Panel Surveys in Developing Countries: A Selective Review.” Economic and Political Studies 2(2):151–77.

53.

Yan

Ting

. 2011. “Hot-Deck Imputation.” Pp. 316–17 in Encyclopedia of Survey Research Methods, edited by Lavrakas

P. J.

Thousand Oaks, CA: Sage.