Effect of Proportion of Missing Data on Application of Data Imputation in Pavement Management Systems

Abstract

Instances of missing data are common in pavement condition–performance databases. A common practice today is to apply statistical imputation methods to replace the missing data with imputed values. Pavement management decision makers must know the uncertainty and errors involved in the use of data sets with imputed values in their analysis. Equally important information of practical significance is the maximum allowable proportion of missing data (i.e., the level of missing data) that can still produce results with an acceptable magnitude of error or risk when the imputed data are used. This paper proposes a procedure for determining such useful information. A numerical example analyzing pavement roughness data is presented to demonstrate the procedure through evaluating the error and reliability characteristics of imputed data. The roughness data of three road sections were obtained from the Long-Term Pavement Performance database. From these data records, data sets with different proportions of missing data were randomly generated to study the effect of level of missing data. The analysis shows that the errors of imputed data tend to increase with the level of missing data and that their magnitudes are significantly influenced by the effect of pavement rehabilitation. On the application of data imputation in pavement management systems, the study suggests that, at a 95% confidence level, 25% of missing data appears to be a reasonable allowable maximum limit for analyzing time series data on pavement roughness that include no rehabilitation within the analysis period. When pavement rehabilitation occurs within the analysis period, the maximum proportion of imputed data should be limited to 15%.

Get full access to this article

View all access options for this article.

References

Amado

, and Bernhardt

K. L. S.

. Knowledge Discovery in Pavement Condition Data. Presented at 81st Annual Meeting of the Transportation Research Board, Washington, D.C., 2002.

LTPP Infopave. FHWA, U.S. Department of Transportation. http://www.infopave.com. Accessed May 20, 2014.

Lindly

J. K.

, Bell

, and Sharif

. Specifying Automated Pavement Condition Surveys. Journal of the Transportation Research Forum, Vol. 44, No. 3, 2005, pp. 19–32.

Flintsch

G. W.

, and McGhee

K. K.

. NCHRP Synthesis of Highway Practice 401: Quality Management of Pavement Condition Data Collection. Transportation Research Board of the National Academies, Washington, D.C., 2009.

Amado

, and Bernhardt

. Expanding the Use of Pavement Condition Data Through Knowledge Discovery in Databases. Proc., International Conference on Applications of Advanced Technologies in Transportation Engineering, Cambridge, Mass., 2002, pp. 394–401.

Bennett

C. R.

Sectioning of Road Data for Pavement. Proc., 6th International Conference on Managing Pavements, Queensland, Australia, 2004.

Cismondi

, Fialho

A. S.

, Vieira

S. M.

, Reti

S. R.

, Sousa

J. M.

, and Finkelstein

S. N.

. Missing Data in Medical Databases: Impute, Delete or Classify? Artificial Intelligence in Medicine Vol. 58, No. 1, 2013, pp. 63–72.

Rubin

D. B.

, and Schenker

. Multiple Imputation in Health-Care Databases: An Overview and Some Applications. Statistics in Medicine, Vol. 10, No. 4, 1991, pp. 585–598.

Saunders

J. A.

, Howell

N. M.

, Spitznagel

, Dori

, Proctor

E. K.

, and Pescarino

. Imputing Missing Data: A Comparison of Methods for Social Work Researchers. Social Work Research, Vol. 30, No. 1, 2006, pp. 19–32.

10.

King

, Honaker

, Joseph

, and Scheve

. Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation. American Political Science Review, Vol. 95, No. 1, 2001, pp. 49–69.

11.

Schafer

J. L.

Multiple Imputation: A Primer. Statistical Methods in Medical Research, Vol. 8, 1999, pp. 3–15.

12.

Peng

C. Y. J.

, Harwell

, Liou

S. M.

, and Ehman

L. H.

. Advances in Missing Data Methods and Implications for Educational Research. In Real Data Analysis ( Sawilowsky

, ed.), Greenwich, Conn., 2006, pp. 31–78.

13.

Preston

N. J.

, Fayers

, Walters

S. J.

, Pilling

, Grande

G. E.

, Short

, Owen-Jones

, Evans

C. J.

, Benalia

, Higginson

I. J.

, and Todd

C. J.

. Recommendations for Managing Missing Data, Attrition and Response Shift in Palliative and End-Of-Life Care Research. Palliative Medicine, Vol. 27, No. 10, 2013, pp. 899–907.

14.

Little

R. J. A.

, and Rubin

D. B.

. Statistical Analysis with Missing Data. John Wiley & Sons, New York, 1987.

15.

Schlomer

G. L.

, Bauman

, and Card

N. A.

. Best Practices for Missing Data Management in Counseling Psychology. Journal of Counseling Psychology, Vol. 57, No. 1, 2010, pp. 1–10.

16.

Long-Term Pavement Performance (LTPP) Database. LTPP DataPave Online. http://www.ltpp-products.com/DataPave. Accessed June 3, 2014.

17.

Rubin

D. B.

Multiple Imputation for Nonresponse in Survey. John Wiley & Sons, New York, 1987.

18.

Enders

C. K.

Applied Missing Data Analysis. Guilford Press, New York, 2010.

19.

Farhan

, and Fwa

T. F.

. Augmented Stochastic Multiple Imputation Model for Airport Pavement Missing Data Imputation. In Transportation Research Record: Journal of the Transportation Research Board, No. 2449, Transportation Research Board of the National Academies, Washington, D.C., 2014, pp. 96–104.

20.

Schafer

J. L.

Analysis of Incomplete Multivariate Data. Chapman & Hall/CRC, Boca Raton, Fla., 1997.

21.

Schafer

J. L.

NORM Users Guide: Multiple Imputation of Incomplete Multivariate Data Under a Normal Model, Version 2. Pennsylvania State University, State College, 1999.

22.

Hahn

, and Shapiro

. Statistical Models in Engineering. John Wiley & Sons, New York, 1967.

23.

Harrison

, and Park

H. A.

. NCHRP Report 20-24 (37B): Comparative Performance Measurement: Pavement Smoothness. Transportation Research Institute, University of Michigan, Ann Arbor, and NCHRP, Transportation Research Board of the National Academies, Washington, D.C., 2008.