Abstract
Instances of missing data are common in pavement condition–performance databases. A common practice today is to apply statistical imputation methods to replace the missing data with imputed values. Pavement management decision makers must know the uncertainty and errors involved in the use of data sets with imputed values in their analysis. Equally important information of practical significance is the maximum allowable proportion of missing data (i.e., the level of missing data) that can still produce results with an acceptable magnitude of error or risk when the imputed data are used. This paper proposes a procedure for determining such useful information. A numerical example analyzing pavement roughness data is presented to demonstrate the procedure through evaluating the error and reliability characteristics of imputed data. The roughness data of three road sections were obtained from the Long-Term Pavement Performance database. From these data records, data sets with different proportions of missing data were randomly generated to study the effect of level of missing data. The analysis shows that the errors of imputed data tend to increase with the level of missing data and that their magnitudes are significantly influenced by the effect of pavement rehabilitation. On the application of data imputation in pavement management systems, the study suggests that, at a 95% confidence level, 25% of missing data appears to be a reasonable allowable maximum limit for analyzing time series data on pavement roughness that include no rehabilitation within the analysis period. When pavement rehabilitation occurs within the analysis period, the maximum proportion of imputed data should be limited to 15%.
Get full access to this article
View all access options for this article.
