Abstract
This study aims to develop and compare different models for predicting the International Roughness Index (IRI) of 419 long-term pavement performance pavement sections from seven states in the United States. The IRI is a crucial metric used to evaluate pavement roughness, ranging from very smooth to very rough surfaces. Machine learning models have gained popularity in predicting IRI as a result of their ability to analyze large volumes of data, improving accuracy and providing cost-effective solutions for pavement management and maintenance. The developed models are generalized linear model (GLM), support vector machine regression, multivariate adaptive regression splines, artificial neural network (ANN), and extreme gradient boosting (XGBoost). The XGBoost model outperformed the other models with the lowest root mean square error and the highest R2. The GLM model showed good performance for lower values of the roughness index while it underpredicted higher values. After regularization and feature selection, the input variables that were common to all models included age, structural number, deflection at 1.5 times of pavement thickness, traffic load, precipitation, and maintenance and rehabilitation history. By utilizing these models, pavement engineers can make informed decisions, allocate resources efficiently, and prioritize maintenance activities based on accurate predictions of pavement roughness.
Keywords
Road roughness stands as a critical indicator of road condition, vital for informed road network management decisions. The international roughness index (IRI), introduced by the World Bank in 1982, has since become a prevalent metric for assessing pavement roughness. This metric offers the advantage of yielding comparable values across diverse scenarios involving various vehicles or similar vehicles during distinct data collection periods ( 1 ). Despite the diverse methods used to measure roughness, the need for standardized indices has arisen to minimize discrepancies and enhance accuracy. The IRI, measured in units like meters per kilometer (m/km) or inch per mile (in./mi), offers a stable scale for assessing road roughness. Derived from response-type road roughness measurement systems, the IRI employs a quarter-car simulation to compute the characteristic longitudinal profile of a traveled wheel track ( 2 – 4 ). This approach ensures both time stability and transportability, accommodating various profilometric methods and yielding measurements closely aligned with those from response-type road roughness systems and subjective opinions, culminating in a versatile and reliable road roughness assessment scale. Because of its transferability over the world and its stability over time, it has been widely used in countries to measure roughness ( 5 – 8 ).
The range of IRI is between 0 m/km for very smooth pavements and may reach 8 m/km for very rough and deteriorated pavements in some countries ( 9 ). There are several methods for collecting IRI data, including manual methods and automated methods using laser, inertial, or accelerometer-based sensors. Automated methods are generally faster and more efficient, covering larger sections of road in less time, while manual methods are typically more accurate and precise. Recently, many advanced camera-based techniques have been introduced for collecting IRI and pavement defect data ( 10 , 11 ).
Many transportation agencies use this roughness index as a quality assurance criterion after construction and as an index indicating the need for maintenance and rehabilitation (M&R) to extend a pavement’s service life ( 12 ). At the network level, managers should know the current condition and have a reasonable estimation of a pavement’s future status. IRI is a commonly used performance metric to represent pavement performance at both network and project levels ( 13 ). Researchers have suggested methods for using IRI for evaluating pavement condition and impact and cost analysis of M&R strategies ( 14 , 15 ).
The Mechanistic-Empirical Pavement Design Guide (MEPDG) uses the IRI as the criterion for pavement design ( 16 ). The MEPDG models ( 16 ) are well-known models that linearly fit the IRI to the independent variables, including pavement distresses and structural factors. However, the MEPDG models need to be calibrated to ensure that the predicted performance is accurate and reliable. The calibrated model is then used to predict the performance of new pavement designs ( 17 – 19 ). Because of the IRI’s importance as a pavement performance criterion and the applicability of this index in pavement design and management, IRI modeling and prediction of roughness degradation in the future have been a topic of interest in many studies ( 20 ). Using machine learning models in predicting IRI has several benefits, including improved accuracy, timesaving, and cost-effectiveness. Machine learning models can process large volumes of data leading to more accurate predictions of roughness. By analyzing data more efficiently, machine learning models can help reduce the need for costly field tests and surveys and provide pavement managers with a good estimate of the current and future pavement roughness. These models improve prediction accuracy and help in efficient resource allocation.
This study compares machine learning models in the prediction of IRI and introduces accurate models in IRI predictions, providing better understanding of which factors and variables are most important, allowing for model combinations and improvement, and enabling customized model selection. The remainder of this paper is organized as follows: we first provide background and related work; next we overview the machine learning methods considered; we then describe the data sources, exploratory analysis, and preprocessing; afterward we detail model development and training; we then present the results and discussion; we summarize key findings and outline future research; and finally we provide a conclusion.
Background
The IRI prediction model predicts the roughness index with high accuracy by having the structural number (SN), road class, climate condition, traffic load, and subgrade and structural information. Table 1 summarizes selected studies that implemented machine learning techniques, their independent variables, and the related statistical measurements. Lin et al. ( 21 ) used neural networks for the prediction of flexible pavement IRI from pavement distresses using 125 flexible pavement sections from the Taiwan pavement management system. This model shows the goodness of fit of 0.94. Choi et al. ( 22 ) applied a neural network for predicting the flexible pavement IRI using pavement properties, including SN, asphalt concrete (AC) thickness and material properties, and traffic load. The model used 117 sections of the long-term pavement performance (LTPP) database and showed coefficient of determination (R2) of 0.71. Chandra et al. ( 23 ) compared the linear regression performance, nonlinear regression, and artificial neural network (ANN) models in predicting the IRI from pavement distresses. The ANN model shows better performance compared with the linear and nonlinear regression models. The goodness of fit values were reported as R2 and mean square error), which were 0.86 and 0.22 for the training trial and 0.76 and 0.43 for the testing trial.
Summary of Studies Applying Machine Learning Techniques for IRI Prediction
Note: ANN = artificial neural network; L = linear; NL = nonlinear; RBF = radial basis function; SVM = support vector machine; ANFIS = adaptive network-based fuzzy; IRI0 = initial international roughness index; SN = structural number; AC = asphalt concrete properties; Base = base layer properties; Sub = subgrade properties; N = number of observation; PMS = pavement management system; LTPP = long-term pavement performance; GMDH = group method of data handling; GPR = Ground Penetrating Radar.
Ziari et al. ( 27 – 29 ) used a complex polynomial model named Group Method of Data Handling (GMDH) to predict the flexible pavement roughness in short and long terms and compared the polynomial model results with a deep neural network model. The independent variables include age, traffic load, annual average precipitation and temperature, AC layer thickness, and the total thickness of pavement. The ANN model shows accuracy of higher than 90% in relation to R2, and the GMDH model shows a performance between 80% and 90%. The ANN model was used in several other studies as a strong tool for IRI prediction ( 24 , 30 , 32 ). Other machine learning methods that were used for the prediction of IRI include random forest regression ( 31 , 33 ), fuzzy and gray model ( 36 ), radial basis function (RBF) networks, CatBoost regression ( 35 ), and support vector machine (SVM) ( 26 ). Kaloop et al. ( 1 ) developed precise prediction models for IRI in pavement performance using Gaussian process regression and locally weighted polynomials. Results showed that these two models can achieve correlation coefficients above 0.9. These models, developed using more than 900 IRI measurements, provide promising IRI predictions for flexible pavements.
The pavement structural integrity can be characterized by SN. The SN is an important parameter in predicting pavement roughness because it is a measure of the thickness and stiffness of the pavement layers. When the pavement layers are not thick enough or stiff enough, they will experience excessive deformation, which can lead to roughness, cracking, and other forms of distress. The intrinsic complexity of the pavement properties and parameters which affect the pavement behavior during the pavement lifetime asserts the need to try different machine learning predictive models. The performance of each model and its accuracy vary based on several factors related to the dataset, including the type of the features, distribution, and correlation between variables, amount and severity of the outliers, and number of available observations. In this study, generalized linear (GLMnet), support vector regression (SVM), multivariate adaptive regression splines (MARS), ANN, and XGBoost models were investigated and used for predicting the IRI of the flexible pavements.
While literature has demonstrated the effectiveness of various machine learning techniques in predicting pavement roughness (IRI), there are certain limitations that need to be acknowledged. For instance, the existing models often focus on a limited set of independent variables. However, this paper considers a wide range of variables, including SN, climate conditions, and traffic load. In addition, some models may exhibit performance variations when applied to different datasets, especially those with varying distributions and levels of outliers. So, this paper emphasizes using a large dataset to cover a wide range of characteristics.
Furthermore, existing research on IRI prediction has often neglected the impact of structural integrity factors, likely as a result of the complexities and higher costs involved in obtaining SN information compared with acquiring IRI measurements. However, acknowledging the crucial importance of examining the influence of SN factors on IRI prediction is essential to address this research gap. Such examination deepens the understanding of the contribution of structural integrity to pavement roughness, aiding in the development of more precise predictive models for road condition assessment and maintenance strategy optimization. Employing this approach in pavement studies presents a unique and valuable contribution to the field, potentially leading to enhanced pavement design, more effective pavement management, and, consequently, safer and smoother roads for all users.
This study highlights that structural integrity parameters significantly affect the accuracy of IRI prediction, with their exclusion negatively affecting model performance. The concept of utilizing SN measurements to predict pavement IRI in the absence of concurrent IRI data is introduced, thereby broadening the scope of these models. Furthermore, technological advancements, such as the rolling wheel deflectometer, are expected to simplify and accelerate the collection of structural integrity data, enhancing data acquisition efforts. The stability of falling weight deflectometer (FWD) data, which evolves more slowly than the rapidly fluctuating IRI values, contributes to the accuracy of these models, particularly in predicting future pavement roughness. The present study derives SN from FWD measurements, providing an SN that reflects the actual in-situ structural condition of the pavement. This approach captures changes in structural capacity over time, whereas previous studies may have used design SN values.
Notably, the models developed, especially the XGBoost model, demonstrate an advantage in handling missing parameters during prediction, ensuring their robustness and wider applicability. These insights contribute new perspectives that broaden the scope of pavement roughness prediction and evaluation. By demonstrating the significant impact of structural parameters on model accuracy, our research provides valuable insights into the factors influencing pavement roughness and offers practical tools for pavement management systems.
SN
SN is a measure of the total structural capacity of a pavement section and is calculated based on the thickness and strength of each layer. This number is an index that represents the overall strength of the pavement structure and the load-carrying capacity of the pavement ( 16 ). Equation 1 defines this index. It is a function of pavement layer thickness multiplied by coefficients relative to each layer’s contribution to the pavement’s structural strength. These coefficients were developed during the AASHO road test, and SN has been used as a pavement design parameter in the AASHTO Design guideline.
where
where D0 is deflection under 9,000 lb FWD load, D1.5Hp is surface deflection at an offset of 1.5 times of Hp, and Hp is total pavement thickness. Rohde ( 38 ) asserted that the SIP index is highly related to the stiffness of the pavement and, subsequently, with an SN. It was found that the SN of the pavement can be estimated by having the total thickness of the pavement and SIP index using
where SN is the structural number of pavements, SIP is the structural index of pavement (microns), Hp is the total pavement thickness(mm); and k1, k2, and k3 are regression coefficients = 0.4728, -0.4810 and 0.7581, respectively.
It is noteworthy to mention that the FWD deflection data were corrected to a reference temperature of 68°F to standardize measurements. This adjustment process accounts for the significant impact of temperature variations on deflections, ensuring that comparisons are consistent regardless of the measurement conditions. Figure 1 presents the effect of temperature adjustment on the average calculated SN from FWD data. For the FWD tests at temperatures higher than the adjustment, the factor is greater than 1, which means the adjusted SNs have higher values after eliminating the effect of temperature. On the other hand, at lower temperatures, the calculated SN values should be decreased to be adjusted to the reference temperature.

Effect of temperature adjustment on calculated SN from FWD data.
Overview of the Machine Learning Methods
The intrinsic complexity of the pavement properties and parameters that affect the pavement behavior during the pavement lifetime asserts the need to try machine learning prediction models and compare each model’s performance in predicting the IRI. Each model’s performance and accuracy vary based on several factors related to the dataset, including the type of features, distribution, and correlation between variables, amount and severity of the outliers, and number of available observations. The tuning parameters, the type of loss and kernel functions, and validation techniques need to be studied and investigated before applying the machine learning technique. In the following, selected machine learning techniques used in predicting the IRI are explained.
GLM
The GLM extends the concept of linear regression. Linear regression’s assumption of a constant response change by predictor variables is not always applicable. For example, in the pavement performance prediction, the difference in performance is lower at the early age of the pavement life; however, increasing the age increases the rate of pavement performance deterioration. To address this, GLM introduces a link function that relates the linear model to the response variable, while allowing for nonconstant variance of the expected value. Regularization techniques such as LASSO and RIDGE regression enhance regression models by curbing overfitting ( 39 ), and the GLM elastic net (GLMnet) combines these techniques to better handle collinearity and improve performance.
SVM Regression
SVM is primarily used for classification, and also proves effective as a regressor ( 40 ). SVM overcomes some of linear regression’s limitations by introducing kernels that transform features for more flexible predictions. Diverse kernel functions ( 40 ), such as linear, nonlinear, polynomial, RBF, and sigmoid, can be evaluated using cross-validation to enhance model performance.
MARS
MARS automatically incorporates nonlinearity and interactions between features. Acting as a polynomial model, MARS adeptly captures both nonlinear trends and feature interactions. This model breaks the range of independent variables, X into n number of bins, and fits the best line for each bin. Figure 2 shows the schematic of the MARS model fitted to nonlinear data. In this example, the MARS model consists of two knots that break the independent variable into three bins. A linear model is fitted to each bin, which reduces the amount of total error compared with a linear regression model. The fitted model to each bin could be linear or multivariate nonlinear. By using this technique, the MARS change the continuous variables into clusters that optimally can be estimated by a nonlinear function. Despite its merits, MARS may struggle with handling missing data and is prone to overfitting, posing certain limitations.

Schematic of MARS versus linear regression fitted models.
Neural Network Model
The ANN model includes simple elements named neurons that contribute to a mathematical process that contains the interaction of the features and transfers the results through transfer functions to increase the prediction accuracy. A linear combination of features is passed through a nonlinear transformation in successive layers. By adding a new layer to the model, the output of the neurons of the previous layer is the input features for the next layer. By adding more layers, the features’ interaction level will increase, which usually increases the accuracy of the prediction. On the other hand, increasing the neural network model’s complexity results in overfitting and increasing the variance term of the error. The parameters which should be optimized for the neural network model include the number of layers, the number of neurons in each layer, the learning rate, and the activation function type.
XGBoost Tree
XGBoost is a powerful tree-based algorithm well suited for structured and tabular data ( 41 ). Its inherent structure aligns with the nature of pavement prediction features because of its ability to capture high nonlinearity between the independent variables. XGBoost employs decision trees, bagging, and gradient boosting to handle complexities in pavement prediction. The model’s ability to manage missing data, parallelization, regularization, and cross-validation enhances its predictive capabilities.
The XGBoost model’s flexibility, robustness, and computational efficiency render it an optimal choice for pavement performance prediction, especially in scenarios with complex or missing data. The XGBoost model parameters that should be optimized are the maximum depth of the tree models, the total number of trees to grow in each cycle, the minimum number of variables used in the tree model, and the minimum number of samples leaf of the tree and few others.
Data Sources
The main factors affecting the roughness of flexible pavement can be categorized as material properties, structural properties, climate, environmental conditions, and loading volume. The pavement properties were collected and prepared for each section from various sources. The LTPP database ( 42 ) was used as the main source of data for this study. The LTPP program was established to collect the pavement performance data as one of the major research areas of the Strategic Highway Research Program. The LTPP program monitored and collected pavement performance data. The collected data include information on seven modules: inventory, maintenance, monitoring (deflection, distress, and profile), rehabilitation, materials testing, traffic, and climate. For this study, a total number of 419 flexible pavement sections from Oklahoma, Texas, Arkansas, Missouri, Kansas, Colorado, and New Mexico were evaluated. Figure 3 shows the location of the selected LTPP sections used for the IRI prediction. These sections have a good variety in layer thickness, road functionality, load of traffic, climate, and experienced M&R.

Location of the long-term pavement performance (LTPP) sections. Blue section: out of study sections. Green sections: active sections ( 42 ).
Exploratory Data Analysis
In this section, the dataset used for the IRI prediction is explained, and an exploratory data analysis is conducted to obtain detailed information about datasets, variables, and the relationship between different features. Tables 2 and 3 present the list of derived properties and the source of data used for developing the IRI prediction. The numerical type features with their minimum, maximum, mean, and standard deviation values were presented in Table 2. IRI is the dependent variable, and the rest of the features are the independent variables in IRI prediction models. The structural properties include the AC and total thickness of the pavement to the subgrade level, and features calculated from FWD (“Background” section) test results provided the LTPP database. Average annual daily truck traffic (AADTT) is used as one of the inputs. It represents the average daily number of trucks that pass a particular point on a road over the course of a year. AADTT was derived from LTPP database for each year for vehicles in FHWA Classes 4–13 combined for the LTPP lane ( 43 ).
List of Numerical Pavement Properties Used in Prediction and Maintenance and Rehabilitation Models
Note: Min. = minimum; Max. = maximum; SD = standard deviation; M&R = maintenance and rehabilitation.
List of Categorical Pavement Properties Used in Prediction and M&R Models
Note: M&R = maintenance and rehabilitation.
The climate information has been extracted from Modern-Era Retrospective analysis for Research and Applications, which is a climate reanalysis dataset maintained and operated by NASA ( 44 ). The climate features used in this study include the average annual temperature, average annual precipitation, average annual freezing index, and average annual evaporation during the IRI measurement period.
The freezing index is the cumulative number of degree-days when air temperatures are below 32°F. This index is a standard metric for determining the freezing severity during winter season, which triggers thermal distresses in the pavement ( 45 ). The average difference between precipitation and evaporation rate corresponds to the wetting/drying cycles of the subgrade. Subgrade of the pavements with a higher average difference between evaporation and precipitation has a higher settlement/swelling rate, which leads to weakening pavement structure and increasing pavement roughness ( 46 ).
Figure 4 shows the correlation between numerical variables in the dataset. If the dataset shows perfectly positive or negative attributes, the impact of the multicollinearity problem on the performance of models will increase, which is addressed in the developed algorithms. The size and color of the circles shown in Figure 4 indicate the magnitude of the Pearson correlation and its direction, respectively.

Correlation between numerical variables.
Pavement roughness shows a positive correlation with deflection magnitude, precipitation, evaporation, age, and AADTT. The pavement’s average roughness decreases by an increase in asphalt and total thicknesses, SN, and temperature.
The peak deflection (PD0) under the load and peak deflection at an offset of 1.5 times of total pavement thickness (PD1.5) were derived from FWD data available at LTPP database. The FWD test data are available in annual or biannual timeframe for 5 to 15 years for each section. Then SIP and SN were calculated using Equations 2 and 3 for any available FWD test result. Figure 5 shows the relation between the SN estimated from FWD data and pavement roughness values. Pavement roughness is higher for structurally weaker sections, and the presence of SN in predictors features can help boost the accuracy of the IRI prediction models.

The sensitivity of the SN values estimated from FWD data to pavement roughness.
Table 3 presents categorical type properties, each category’s levels, and the mean value of the IRI of the level. The impact of categorizing the observation into several levels in each variable has been tested through statistical mean difference tests. For two-level variables including base type, plasticity, and road class, a standard two-sample t-test, and the M&R variable with three levels, the one-way analysis of variation test was used. The p-value of statistical tests for all categorical variables was less than 5%, indicating a significant difference in mean IRI value between levels.
M&R history is derived from the LTPP section database. Three different construction groups, including pavements with no M&R, pavements with maintenance (e.g., crack sealing, tack coat, fog seal coat), and pavements with rehabilitation (e.g., overlay with AC, mill existing pavement, and overlay with Hot-Mix Asphalt Concrete (HMAC), Reclaimed Asphalt Pavement (RAP), and Warm-Mix Asphalt Concrete (WMAC)) are defined. Maintenance activities focus on regular upkeep and minor repairs to prevent deterioration, while rehabilitation activities involve more extensive interventions to restore the pavement’s structural integrity and functionality after significant distress has occurred. Thus, the age of the pavement will be calculated from the opening traffic date for construction groups 1 and 2, and from the last rehabilitation date for construction group 3. The average IRI for pavements with no M&R is lower than the average IRI for pavements under maintenance. This indicates that pavements requiring maintenance activities tend to have higher roughness levels. Maintenance can slow down the rate of roughness increase but may not significantly reduce roughness levels in the long term. Pavements that underwent rehabilitation have a lower average IRI, suggesting that rehabilitation activities effectively restore pavement smoothness.
Flexible pavements can have either a treated base or a granular base, depending on the design and construction specifications. A treated base in a flexible pavement is typically a layer that is stabilized with additives to improve its engineering properties. Treated bases are commonly used when the underlying soil or natural aggregate base requires improvement to meet the design requirements and provide adequate support to the pavement. A granular base, also known as an unbound base, consists of a layer of well-graded aggregates like crushed stone, gravel, or sand. The granular base distributes traffic loads to the subgrade and provides a stable platform for the pavement layers above. The average IRI for pavements with treated bases is lower compared with pavements with granular bases in our dataset. This observation suggests that, in these cases, treated bases may contribute to smoother pavement surfaces. However, it is important to note that the performance of pavements with unbound granular bases can be comparable to those with treated bases if they are well designed and constructed. Factors such as material quality, construction practices, and environmental conditions play significant roles.
Moreover, the plasticity of the subgrade soil is a crucial consideration in pavement design as it affects the pavement’s load-bearing capacity, shear strength, settlement behavior, resistance to environmental factors (such as frost), and overall long-term performance. The average IRI value for nonplastic soils is 77.5 and for the plastics soils are 87.5 in./mi. Plastic subgrade causes higher swelling severity and deteriorates the pavement structure.
In addition, road classification is another important feature in developing M&R models. The deterioration of pavement in urban and rural areas can vary because of several factors. Urban roads deteriorate faster as a result of higher traffic volumes, complex designs, and pollution exposure, leading to more frequent maintenance. In contrast, rural roads with lower traffic volumes and more evenly distributed stress undergo slower deterioration but may face limited maintenance resources and vulnerability to natural environmental factors. Based on these facts, the IRI values on these roads could be different; The average IRI value for pavements at rural roads is 82.24 and for pavements at urban roads is 97.69 in./mi.
Data Preprocessing
The numerical variables mostly have nonnormal distribution. In addition, there is a big difference between the scale of the numerical variables. A transformation of the independent variables and having the same scale between variables will help achieve the model residuals’ normality. Features with nonnormal and skewed distributions were transformed and normalized to boost the performance of machine learning algorithms. Figure 6 shows some examples of the row data distribution and normalized distributions after applying Yeo-Johnson’s power transformation ( 47 ). The Yeo-Johnson transformation is a flexible method that can handle both positive and negative values, transforming the data to approximate a normal distribution. It is defined by a family of power transformations indexed by a parameter λ, which is estimated from the data. This transformation improves the symmetry of the data distribution and can enhance the performance of algorithms that assume normally distributed input variables.

The distribution of the original and normalized numerical variables.
Most machine learning algorithms (e.g., XGBoost, logistic regression, SVM, neural network) require all input variables and output variables to be numeric; thus, the categorical variables need to be encoded. Therefore, the one-hot encoding process was applied to categorical features such as base_type, M&R, road_class, and plasticity to make them act as numerical variables in machine learning models.
Although the base_type is included as a categorical variable to differentiate between pavements with treated bases and those with granular bases, it is noteworthy to mention that grouping all pavements into a single model may not fully capture the unique deterioration behaviors of each base type. Developing separate models for pavements with treated bases and those with granular bases could potentially improve prediction accuracy by tailoring the models to the specific characteristics and performance patterns of each group.
Because of limitations in our dataset, specifically the sample size for each base type (1,467 observations for granular base and 1,001 observations for treated base), a single model incorporating base_type as a variable was developed.
Model Development
The IRI prediction model aims to predict the IRI change versus time given the structural properties of the pavement section, including structural features, FWD data, subgrade soil properties, climate, traffic information, and road type. In the following, different models that are developed for this study are explained. First, the traditional fitting model is presented followed by presenting the results of machine learning methods. In this regard, five machine learning models were used for the prediction of the IRI. Each machine learning model has parameters that govern and tune the training process. These parameters are named hyperparameters and need to be optimized during the training process. A fivefold cross-validation was used for hyperparameters tuning and model validation. The total data of more than 2,500 IRI measurements were randomly divided into training and testing datasets with a ratio of 0.75 and 0.25, respectively. Machine learning models were trained on the training datasets, and their performances were evaluated using the testing dataset. All models were implemented in Python, using the scikit-learn library for GLM, SVM, and MARS; the TensorFlow/Keras framework for ANN; and the XGBoost library for gradient boosting trees. Each section presents the tuning results and the performance of the models on training and test dataset followed by introducing the best model that can be used for IRI prediction.
Database Processing
The pavement roughness in many of the LTPP sections was not reported in a standard timeframe. On the other hand, there is variation in the Date/Time of the IRI survey and the FWD test. To overcome the variation between measured pavement roughness survey date and visit date for FWD test, this study followed a similar method suggested by the literature ( 48 ). For each section, a regression model was fitted to the measured IRI and pavement age, considering the traffic opening date as the first date. For the pavements experiencing rehabilitation and reconstruction treatments, the IRI model was refreshed, and a new model was fitted to the reconstructed pavement. For the reconstructed pavements, the age will be calculated from the time of reconstruction. Exponential and sigmoidal functions were used as proper fitting curves for predicting distresses and material behaviors in pavement studies ( 49 ). The mathematical form of the IRI fitting model used in this study is
where IRI is the international roughness index (m/km), IRI0 is the initial roughness of the pavement, t is the pavement life in year, and α, β, and ρ are the maximum threshold, scale parameter, and shape parameter of the fitting curve, respectively.
Table 4 presents the mean and standard deviation of the three fitting parameters in Equation 4 based on the optimization of gathered data from LTPP sections. In total, 843 curves were generated using 3,325 points, where 1,029 points (254 curves) were before rehab, with the root mean square error (RMSE) of 0.016 m/km and R2 of 0.95. The rest of the data came from after rehab with the RMSE of 0.017 m/km and R2 of 0.91.
Statistics of the Fitting Parameters in Equation 4
Note: IRI = international roughness index.
Figure 7 shows IRI change versus time for section 40-0506, construction number 3 fitted by the proposed model.

Fitted model to the IRI change versus time for LTPP section 40-0506-Construction Number 3.
This model was fitted to the experimental sections, and the best fitting parameters were determined for each section. The RMSE and R2 for fitting the model to all experimental sections is 0.02 m/km and 0.93. The statistics show that this model can accurately be fitted to the IRI change as a function of time. The proposed model can perfectly be fitted to the IRI field data at separate construction groups. Figure 8 shows the IRI change versus time and the M&R history for LTPP section 0607, located in the state of Oklahoma. It is noteworthy to mention that this section represents a composite pavement structure where a jointed concrete pavement is overlaid by AC. Furthermore, this section is not included with other flexible pavements for model development; instead, it is used solely as an example to demonstrate the fitting parameters’ process.

Model fit to the measured IRI at LTPP section 0607.
Moreover, Table 5 shows curve parameters for these three curves. By knowing the fitting parameters, the IRI value versus time can be plotted, and the pavement roughness at any given time after the construction can be estimated. The developed fitted model was then used for handling the inconsistency in reported IRI survey date and FWD test date by providing the IRI value at any given date of the pavement age.
Curve Parameters for Each Part of LTPP Section 0607
Note: LTPP = long-term pavement performance; IRI0 = initial international roughness index; RMSE = root mean square error.
GLM
The model was trained and evaluated using fivefold cross-validation. The RMSE was used as an accuracy metric for finding the best tuning parameters. Regularization is a technique used in GLM to prevent overfitting and improve the generalization of the model. It involves adding a penalty term to the loss function of the model, which penalizes large parameter values. This penalty term helps to shrink the values of the parameters toward zero, which reduces the model’s complexity and prevents it from overfitting the training data. Here, α is the hyperparameter that controls the L1 regularization penalty term, also known as Lasso regularization, and λ is the hyperparameter that controls the L2 regularization penalty term, also known as Ridge regularization.
The optimal values of α and λ are typically chosen through techniques such as cross-validation. A set of α = [0, 0.01, 0.1,0.5,1] and λ = [1e-4, 5e-4, 1e-3, 1e-2] were optimized based on the best performance by the lowest RMSE. For α= 0.5 and λ = 1e-4, the GLM model shows the best. The optimized model shrinks the coefficients of the temperature, road class, and base type features toward zero, thereby reducing the impact of these features on the model output.
SVM Regression
The SVM can effectively model complex, nonlinear relationships between the dependent and independent variables. The SVM model developed for the prediction of IRI has a radial kernel to get the nonlinear relation between the roughness and pavement features at high IRI values. In SVM, C and ε are hyperparameters that control the trade-off between the margin width and the training error. A larger C value leads to a smaller margin but fewer errors, while a smaller C value leads to a larger margin but more errors. Here, ε determines the width of the insensitive zone around the margin, where no penalty is given for errors, and larger values of ε result in wider insensitive zones. These parameters need to be determined through hyperparameter optimization. A set of C = [0.01, 0.1, 1, 10, 50] and ε=[0.1, 0.5, 0.1] were optimized based on the best performance considering the lowest RMSE. For C= 10 and ε = 1, the SVM model shows the best performance. SVM model does not do feature selection; however, it can help prevent overfitting and improve generalization performance, which is important when working with limited data.
MARS
MARS is a nonparametric regression technique that uses a series of piecewise linear functions to model the relationships between the dependent and independent variables. MARS models can be prone to overfitting, so it is important to select the optimal model complexity using techniques like cross-validation. A higher number of breakpoints can capture more complex nonlinear relationships between the predictors and the outcome variable but may also increase the risk of overfitting the training data. Higher polynomial degrees can capture more complex nonlinear relationships, but again, may increase the risk of overfitting. The number of breakpoints = 5 to 20, and polynomial degrees of 1, 2, 3, and 4 were chosen for optimization using the RMSE metric. For knots = 10 and n = 3, the MARS model shows the best performance.
ANN
Building an ANN model requires several important factors to be considered. ANN models have a complex architecture, and selecting the appropriate number of layers, number of neurons in each layer, and activation functions is crucial to achieve the best performance. The selection of hyperparameters such as learning rate can significantly affect the ANN model’s performance and should be carefully tuned. After the hyper tuning process, the optimized model with the lowest RMSE error was determined. The optimum model comprises three layers with 9, 12, and 15 neurons. The ReLU function was chosen as the activation function of all layers in the ANN model. A learning rate of 0.01 gave the minimum error during cross-validation.
XGBoost Tree
The parameters of the XGBoost model that should be optimized are the maximum depth of the tree models, total number of trees to grow in each cycle, the minimum number of variables used in the tree model, minimum number of samples in the batch of observations, or measured data and few others. The maximum depth of the tree, including 3, 5, 7, 10, and a maximum number of iterations of 10 to 3,000, were chosen for optimization using the RMSE metric. For the maximum depth of 5 and number of iterations value considered 2,500, the XGBoost model shows the best performance during cross-validation.
Discussion
Figure 9 shows the prediction results for the training and testing dataset for each of the developed models. The dataset that was used for training covers 75% of the data and the rest was used in the test model. The GLM model shows RMSE of 0.29 m/km and coefficient of determination of 0.74. The model prediction using the test dataset shows RMSE of 0.3 m/km and R2 of 0.68. This model performs better than the linear regression models; however, for higher IRI values, the GLM model underpredicts the IRI. This can be because of a lack of enough data on the high values of IRI. The number of observations with high values of IRI is much less than lower values, and this is the reason for nonlinear variance of error in the GLM model.

Comparison between models.
Other machine learning models can be used to reduce the error of the prediction in higher IRI values. The advantage of using this model is in selecting the important features and reducing the complexity of the prediction. The SVM model shows RMSE of 0.16 m/km and a coefficient of determination of 0.91. The model’s prediction using the test dataset shows an RMSE value of 0.27 m/km and R2 of 0.77. This model works better than the GLM model, especially for higher values of IRI. One advantage of SVM over GLM is that SVM can effectively model complex, nonlinear relationships between the dependent and independent variables, whereas GLMs are limited to linear relationships.
The MARS model shows RMSE of 0.27 m/km and the coefficient of determination of 0.78. The model’s prediction using the test dataset shows RMSE of 0.28 m/km and R2 of 0.74. This model does not show good performance, such as other machine learning techniques. One of the reasons could be the presence of categorical variables. The MARS model gives low bias but high variance. This is because of the overfitting problem in this model. The overfitting problem was compensated by hyperparameter optimization, but the model still shows the high variance in prediction results using the test dataset.
The ANN model shows RMSE of 0.22 m/km and coefficient of determination R2 of 0.85. The model performance using the test dataset shows RMSE of 0.25 m/km and R2 of 0.81. This model shows good performance compared with previous models and can be used as a prediction model to predict pavement performance during its lifetime; however, the computational resource for model training was much higher than the other models.
The XGBoost model shows RMSE of 0.09 m/km and the coefficient of determination of 0.95. The model’s prediction using the test dataset shows RMSE of 0.2 m/km and R2 of 0.91. XGBoost is a powerful algorithm that can handle complex nonlinear relationships between variables, making it highly accurate in predicting the target variable in regression problems. This model shows outstanding performance and can be used as a prediction model for predicting pavement performance during its lifetime.
The generalization capability of each model, evaluated by the RMSE difference between training and testing datasets, varies across algorithms. SVM shows a notable difference between testing (0.16) and training (0.27) RMSE, indicating possible overfitting. Both MARS and GLM display moderate generalization, with minimal RMSE differences (MARS: 0.27 testing versus 0.28 training; GLM: 0.29 testing versus 0.3 training). The ANN model exhibits stronger generalization, with a testing RMSE of 0.22 and training RMSE of 0.25. Although XGBoost shows a larger discrepancy between testing (0.093) and training (0.2) RMSE, suggesting a potentially lower generalization compared with ANN, it offers substantial advantages in complex data scenarios. XGBoost’s robust performance, scalability, and ability to model nonlinear relationships make it a compelling choice for applications requiring high predictive accuracy and adaptability, despite ANN’s better generalization in this dataset.
Figure 10 shows the comparison between the performance of IRI prediction models. Most of the newly developed models show acceptable performance. However, the XGBoost model shows the best RMSE and the highest R2 among the developed models. This is because of the novel cutting-edge algorithms that were implemented by this model. The tree-based regression, boosting algorithm, cross-validation, and regularization of the features, helps to create a robust, accurate model that can precisely predict the IRI based on the available information collected for the pavement section.

Comparison of the performance of new developed predictive models on test dataset.
Table 6 presents the input variables used in the final developed model after regularization and removing less important and correlated features. Removing these features from input variables decreases model complexity and increases the accuracy of the prediction. Age, SN, deflection at 1.5 times of pavement thickness (PD1.5), traffic load (AADTT), precipitation, and M&R history were used in all of the models. SIP had a very high correlation with the SN and was removed from the input variables. Asphalt and total pavement thickness were used in MARS, ANN, and XGBoost models. The SVM and GLM models allocated very low weights to these features and removed them from the prediction model. This could be because of the correlation of these variables with SN of the pavement. The correlation between the IRI and temperature was not as good as other environmental characteristics, and thus, the average temperature was not selected as a good predictor in IRI prediction models.
Selected Features for the Final Developed Models
Note: GLM = generalized linear model; SVM = support vector machine; MARS = regression, multivariate adaptive regression splines; ANN = artificial neural network; XGBoost = extreme gradient boosting; SN = structural number; SIP = structural index of pavement; PD = peak deflection; DAC = asphalt concrete thickness; Dtotal = overall pavement’s thicknesses; AADTT = annual average daily truck traffic; FI = freezing index; M&R = maintenance and rehabilitation.
The next step is an assessment of how the input variables work with the model. The developed models and the contribution of each input variable should be investigated. The XGBoost model does not provide an estimation parameter or weight for each feature, which can be used to interpret the relationship between a specific feature and the supervisor. Figure 11 shows the feature importance of the developed XGBoost model for the prediction of IRI ranked by the Gain. Gain in XGBoost represents the improvement in the model’s performance (specifically, the reduction in the loss function) that results from splitting on a particular feature. It reflects how much a feature contributes to making accurate predictions. By summing the gains across all the trees in the model where a feature is used, we obtain an overall importance score for that feature. Features that contribute most to predicting IRI by the XGBoost model are SN and peak deflection at 1.5 times pavement thickness.

Variable importance derived from the XGBoost model.
Summary of Findings and Future Research
This study aims to provide practitioners with enhanced predictive tools that cater to the complexities of pavement roughness prediction. Utilizing these methods provides accurate roughness predictions in guiding effective pavement maintenance and management strategies, thereby extending the lifespan of pavements and minimizing costs associated with repairs.
Each of the five machine learning models (GLM, SVM regression, MARS, ANN, and XGBoost) that were employed presents unique advantages and considerations for pavement practitioners. The GLM model excels in selecting important features and reducing prediction complexity, offering a practical approach for models with interpretable results. SVM stands out in modeling complex, nonlinear relationships, which is essential for capturing the intricate dynamics inherent in pavement behavior. MARS, while exhibiting some limitations, can still provide valuable insights into feature contributions. Despite MARS complexity, it has a fast and efficient algorithm and is robust to outliers. The limitations of the MARS model include the disability of handling the missing data and susceptibility to overfitting. In addition, it is more difficult to understand and interpret than other analytical models.
The ANN model, while computationally resource-intensive, demonstrates strong predictive performance and can be particularly valuable when a higher degree of accuracy is essential. The ANN model depends greatly on the training data, and the chance of the overfitting is higher than the other machine learning models; however, by increasing the observations, the ANN model’s accuracy significantly increases ( 40 ). Lastly, the XGBoost model, leveraging state-of-the-art algorithms, robustly handles nonlinear relationships and emerges as a highly accurate predictor, especially beneficial for pavement practitioners seeking precision in predictions. Moreover, XGBoost trains considerably faster than ANN, making it more practical in pavement management applications where computational resources are limited.
However, it is crucial to recognize the limitations of the models presented. Each model exhibits specific strengths and weaknesses. For example, the GLM might struggle with accurately predicting higher IRI values because of a scarcity of data in that range. The MARS model may face challenges with potential overfitting, affecting its performance. In addition, the computational intensity of the ANN model should be considered in settings with limited resources. These insights are valuable for practitioners in selecting the most appropriate model for their specific needs and guide ongoing research in enhancing the capabilities of these models.
For future predictions, values for parameters such as SN, SIP, and PD1.5 are not directly available. Instead, they can be estimated by back-calculating with regression models that generate incremental changes (delta) based on pavement age and M&R history. By applying regression models, which account for the gradual degradation of pavement attributed to factors like cumulative traffic loads, material fatigue, and environmental influences, we can approximate future values of these structural parameters. These estimated values are then used as inputs in our IRI prediction models to accurately forecast future pavement roughness.
This combined approach—using regression-based estimation of structural parameters together with IRI prediction modeling—allows pavement engineers to utilize our models in planning and decision-making processes within pavement management systems, where predicting future pavement conditions is essential for effective M&R scheduling. It is acknowledged that the accuracy of these projections depends on the quality of the deterioration models and the reliability of input data. Therefore, further research into the development and calibration of structural deterioration models is recommended to enhance the practical applicability of the IRI prediction framework presented in this paper. Accurate IRI forecasts can support not only asset management but also safety-oriented decision-making, complementing guidelines and diagnostic approaches aimed at mitigating roadway encroachment risks ( 50 , 51 ).
Future research could aim to develop state-specific pavement performance models using data from each state’s unique roadway network. Alternatively, incorporating state or region as a categorical variable, or employing hierarchical modeling techniques, could help capture interstate variability. In addition, future efforts could focus on partitioning the data by pavement sections to exclude certain sections across various locations, preserving them for independent testing of the models developed.
Unobserved heterogeneity refers to the presence of influential factors affecting pavement roughness that are not captured by the observed explanatory variables. This can lead to biased and inconsistent estimates in the prediction models. Random-parameters models (also known as mixed-effects models) allow for individual-specific effects by introducing random parameters, thus accounting for unobserved heterogeneity. Future studies could incorporate random-parameters modeling techniques to address this limitation. By allowing certain model parameters to vary randomly across observational units (e.g., pavement sections or states), researchers can better account for unobserved heterogeneity, potentially improving model accuracy and reliability.
In addition, future research should aim to integrate these performance prediction models with original design and construction parameters, enabling a feedback loop that can improve both future design practices and pavement management strategies. Another promising avenue is to expand the use of variable importance analysis (gain) from the XGBoost model, which not only highlights the most influential factors for pavement roughness but could also provide transferable insights for other engineering and even nonengineering domains.
In conclusion, this study serves as a bridge between advanced machine learning techniques and practical pavement management. By outlining the applicability, strengths, and limitations of the proposed models, it provides pavement practitioners with essential insights for more informed decision-making in pavement maintenance and performance assessment.
Conclusions
This study concentrates on comparing the performance of five commonly used regression models in predicting pavement roughness. The analysis involves a dataset of 419 flexible pavement sections from various states in the United States, including Oklahoma, Texas, Arkansas, Missouri, Kansas, Colorado, and New Mexico. Key pavement features such as thickness, age, structural integrity, base and subgrade characteristics, traffic load, road class, climate, and temperature information were utilized as input variables for the machine learning models. The findings indicate that XGBoost outperformed all other models, achieving the lowest RMSE of 0.2 m/km. ANN also showed competitive performance with an RMSE of 0.25 m/km. MARS and SVM achieved moderate performance with RMSEs of 0.26 and 0.27 m/km, respectively, while GLM had the poorest performance, with an RMSE of 0.3 m/km.
Furthermore, incorporating the SN into IRI prediction models can improve the accuracy of the predictions. This is because the SN can help to account for the impact of different pavement layer configurations on roughness, which can be missed in models that only consider surface roughness data. Therefore, the use of the SN in predicting IRI can make the case study more robust by providing a more comprehensive approach to pavement design and management, improving prediction accuracy, optimizing pavement design, and offering a unique research focus in the field of pavement engineering. The results of this study suggest that XGBoost and ANN are promising models for predicting pavement roughness, as they demonstrated the best performance in relation to accuracy. However, XGBoost is faster in training than ANN. MARS and SVM may also be suitable alternatives, especially when the dataset has many interactions between predictors. However, GLM may not be the best choice for this type of problem. Future studies can focus on further improving the prediction accuracy of the developed models by incorporating more relevant features, using larger and more diverse datasets, and exploring novel techniques for feature selection and engineering.
Footnotes
Acknowledgements
The authors thank the support from colleagues at Texas A&M University and Texas A&M Transportation Institute.
Author Contributions
The authors confirm contribution to the paper as follows: Study conception and design: MT, AM. data collection: Mahmood Tabesh, Ahmadreza Mahmoudzadeh; analysis and interpretation of results: Mahmood Tabesh, Ahmadreza Mahmoudzadeh, Erfan Hajibandeh; draft manuscript preparation: Mahmood Tabesh, Ahmadreza Mahmoudzadeh, Erfan Hajibandeh. All authors reviewed the results and approved the final version of the manuscript.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
