Abstract
Spatio-temporal land-use change (LUC) modeling provides vital information about land development dynamics. However, accounting for such dynamics faces methodological challenges. This research introduces a Dynamic Spatial Panel Data (DSPD) modeling framework for LUC, incorporating spatial and temporal dependencies. A continuous response variable is introduced to take advantage of traditional spatial regression models. The DSPD model is applied to balanced spatial panel data at the block-group level covering Florida between 2010 and 2019 and incorporating both new and previously used proxy variables. The urban growth impacts of site-specific, proximity, neighborhood, socio-economic, and transportation factors are investigated. This study contributes to the literature by providing extensive insights into spatial autocorrelation, spillover, heterogeneity, and temporal lag effects in urban growth. Also, the study reveals the importance of mobility and mortgage financing in land development. The proposed modeling framework achieves high accuracy. The dynamic structure of this model provides an opportunity to predict future urban growth without the need for a land development scenario. Such predictions provide insights about future land development to practitioners and policymakers.
Introduction
The main goal of land-use change (LUC) modeling is to mimic the complex land dynamics that lead to changes in urban and rural areas. Data-driven empirical approaches require extensive information to develop mathematical representations of urban and rural systems. Based on land economics theory, the current land-use status on a given parcel changes if the expected benefits from changing conditions minus the one-time conversion costs are larger than the benefits under current conditions. However, because the expected benefits cannot be directly estimated with the available information, proxy information is used to model these benefits. In earlier studies, LUC models have mainly included site-specific, proximity, neighborhood, socio-economic, and transportation factors, but there is no consensus regarding the selection of proxy information (Irwin and Geoghegan, 2001; Verburg et al., 2004). Bella and Irwin (2002) and Carrión-Flores and Irwin (2004) highlight the importance of spatially explicit LUC modeling using micro-level data. Carrión-Flores and Irwin (2004) control for the spatial autocorrelations in the error term. Nahuelhual et al. (2012) apply an autologistic regression approach to examine temporal dynamics in land development. Recent studies have also incorporated spatial and temporal dependencies because neighborhood and historical effects play critical roles in land dynamics (Deng and Srinivasan, 2016). Tepe and Guldmann (2017, 2020) introduce an autologistic regression methods explicitly accounting for spatial and temporal dependencies. Including these lags resolves the omitted variable bias problem in OLS regression-based LUC models.
The main goal of this study is to propose and estimate a Dynamic Spatial Panel Data (DSPD) model with both new and previously used proxy variables. This model explicitly quantifies spatial and temporal lagged relationships and incorporates site-specific, proximity, neighborhood, socio-economic, and transportation variables. Iacono et al. (2008) indicate that detailed characteristics of locations and residents should be included in LUC models because transportation impacts LUC through changes in accessibility and congestion. Also, traffic congestion creates negative externalities such as air and noise pollution in adjacent areas and endangers environmental systems. The model better represents human decision-making, using socio-economic indicators, such as population density, ethnicity, student and employment ratios, housing finance, and workers’ travel behaviors. Florida is selected as the case study area to evaluate the impacts of this proxy information. A balanced spatial panel data is derived from three primary data sources, and the details are presented in Section 4.
This study contributes to the existing literature in four ways by: (1) providing detailed insights about spatial and temporal lag effects in urban growth through the dependent variable, covariates, and residuals; (2) introducing a continuous response variable at the block-group (BG) level, in contrast to earlier discrete parcel information, thus enabling the implementation of traditional spatial regression modeling; (3) investigating the impacts of transportation infrastructure and mobility in land development by incorporating the distances to main roads and traffic volumes; (4) incorporating direct information about the demand for new real estate development, including mortgage financing and employment rates.
Literature review
Summary of LUC statistical models incorporating spatial and/or temporal components.
Spatial heterogeneity and clustering patterns are important driving factors in LUC (Arsanjani et al., 2013; Legendre and Legendre, 1998). Spatial dependencies must be controlled to improve model robustness and achieve efficient results. In some studies, spatial sampling is implemented to reduce spatial dependencies (Gaur et al., 2020; Hu and Lo, 2007; Jokar Arsanjani et al., 2013; Munroe and Müller, 2007). Bhat et al. (2015), Deng and Srinivasan (2016), and Hu and Lo (2007) use spatial lag components to control for spatial dependency. Similarly, temporal lagged components are used by Bhat et al. (2015), Huang et al. (2009), and Tepe and Guldmann (2017, 2020). Alternatively, Carrión-Flores & Irwin (2004) specify a simultaneous autoregressive process in the error terms of their residential land conversion model. Liao and Wei (2014) employ a logistic regression approach with spatially expanded coefficients to investigate the spatial complexities of urban growth in Dongguan, China. The EUCS100 model (Lavalle et al., 2011) implements a standard multinomial logit for simulating LUC dynamics in the European region but does not account for the endogeneity of the interactions among the land-use variables across spatial units.
Temporal dependencies are also crucial in LUC dynamics, but few studies have incorporated temporal dynamics in their models. Huang et al. (2009) apply a logistic regression for urban expansion and implement exponentially decreasing weights for older observations to integrate bi-temporal land-use models over three periods between the 1980s and 2000s. Carrión-Flores and Irwin (2004) apply a temporal lag to neighborhood-level land-use composition variables to resolve the potential endogeneity between surrounding development and open spaces in a probit model of residential land conversion in Ohio.
Finally, some studies incorporate spatial and temporal dynamics in their LUC models. Tepe and Guldmann (2017, 2020) include spatial-temporal autoregressive structures to account for the parcel dynamics in Delaware County, Ohio. Ferdous and Bhat (2013) account for both spatial and temporal dependency by applying a spatial lag specification on latent variables of decision-makers and including a temporal autoregressive error term in the spatial panel ordered response variable to uncover changes in land development intensity in Austin, Texas.
Some models presented in Table 1 are further explored by focusing on their proxy variables. Table S1 in the supplementary file (SF) summarizes the categories of predictor variables. Site-specific and proximity variables are included in most models, followed by socio-economic, neighborhood, and transportation variables. Site-specific characteristics, such as parcel size, frontage, and soil type, play an essential role, and so does the proximity to points of interest. Accessibility to various services impacts travel demand, as do the neighborhood characteristics, such as surrounding land uses. These variables provide insights into spatial clustering or dispersion patterns.
Socio-economic factors are also among the main drivers of LUC. Table S1 in the SF shows that many studies incorporate socio-economic indicators. Household characteristics are influential in shaping the urban landscape (Schirmer et al., 2014) and are commonly included in LUC models (Waddell et al., 2003), as they impact new land developments and land-use allocation (Verburg et al., 2004; Waddell, 2011). They typically include population characteristics, employment rates, student enrollments, education attainments, and access to job opportunities. However, such socio-economic factors as real estate financing, which directly affect land development, are rarely considered.
Transportation and land use mutually influence each other through feedback mechanisms (Kelly, 1994). The transportation network is believed to influence land use via improved accessibility, which increases land investment attractiveness. Accessibility, often measured by travel time (Iacono et al., 2008), is influenced by the conditions of the transportation network and the traffic volumes on transportation links. The literature on transportation demand models points out the effects of congestion – as more traffic uses a transportation link, travel speed declines, and travel time increases (Donnelly et al., 2010). Thus, congested travel time directly impacts locational accessibility and indirectly affects locational choices of activities and land investments. However, existing LUC models only consider proximity to transportation infrastructures, such as roads, railways, and airports, and rarely traffic volumes.
Methodology
This study uses balanced spatial panel data of annual characteristics at the block-group (BG) level for Florida from 2010 to 2019. BGs are statistical divisions of census tracts, as defined by the U.S. Bureau of the Census, and generally include 600 to 3,000 people. Spatial panel data models are used to control for spatial heterogeneity and correlation (Baltagi et al., 2003). Equations (1) and (2) represent such a model, including a spatial lag of the dependent variable and a spatial autoregressive disturbance (Millo and Piras, 2012). The dependent variable is the ratio of the number of developed parcels at time t to the total number of parcels in each BG. Figure S1 in the SF shows a sample BG with parcels and the computation method for the development ratio. Discrete-response regression models are commonly preferred in LUC modeling because of the nature of land use categories. However, the continuous response variable allows for the implementation of conventional spatial regression models.
The disturbance vector consists of two error terms: (1)
The static panel model in equations (1) and (2) can be modified to account for autoregressive temporal dynamics, whereby previous developments impact future land developments. Temporal lags of the dependent variable can be included in the model to obtain a dynamic model (also called the Autoregressive Regression model), which can be used to forecast future land development. Equations (2) and (3) represent such a DSPD model.
Similarly, a temporal lag of the covariates (
The spatial autoregressive coefficient
Equations (5) and (2) can be combined with
Conceptualization of the spatial relationships
The DSPD model uses the spatial weight matrix W to conceptualize spatial relationships. Three main methods are used to construct spatial weight matrices: fixed distance, K-nearest neighbors, and contiguity. This study uses the K-nearest neighbors rule to test various spatial conceptualizations (Figure S2 in the SF). The inverse-distance-weight (IDW) method is used to account for potential distance-related spatial spillover effects. The row-standardization method is implemented for all spatial weight matrices to avoid the identification problem in regression models.
Computation of direct and indirect effects
Equation (6) presents the DSPD model incorporating both the spatial interactions of exogenous variables and spatial and temporal lags. Marginal effects based on the spatial structure are constant over time due to the non-varying spatial weight matrix. The marginal effects can be obtained by using the partial derivative of the dependent variable with respect to a given explanatory variable and temporal lags of the dependent variable at time t
Using Equations (7) and (8), average direct, total, and indirect effects can be computed for a given year (LeSage and Pace, 2009; Park et al., 2021). Equations (9), (10), and (12) represent the average direct (ADE), total (ATE), and indirect (ANE) effects of the temporal lags of the dependent variable, respectively
Marginal effects for the independent variable can also be computed using the following equations
The elasticities of the direct and indirect effects can be further computed by multiplying both sides of equation (6) by
Data and study area
Florida is one of the fastest-growing U.S. states, with a population of 20.9 million in 2019. In this study, three primary data sources are used: (1) site-specific, proximity, and neighborhood characteristics are derived from the 2019 Florida Parcel Database, published by UF GeoPlan Center (“Florida Parcel Data Statewide,” 2019), where the database consists of 8,995,663 individual parcel records; (2) transportation characteristics are derived from the Florida Department of Transportation (FDOT) traffic records; (3) socio-economic characteristics at the BG level are obtained from the American Community Survey (ACS). These data sets are further discussed below.
Land development characteristics
Land development characteristics are derived from the actual construction year and land use information provided in the statewide parcel-level database. Figure S3 in the SF depicts the number of developed parcels in 2019 and their changes between 2010 and 2019 (an increase of 10.06%). Figure S4 in the SF illustrates the annual changes in the number of developed parcels in BGs by population density category between 2010 and 2019. BGs are grouped into 10 equally sized groups based on population density, where the least dense BGs are assigned to Category 1 and the densest BGs to category 10. Land development rates increased over these 10 years. Less dense BGs experienced higher land development intensity than denser BGs. The average lot size in Florida is 1.51 ha, the median is 0.09 ha, and the standard deviation is 31.62. The distribution of average lot size is heavily skewed to the left.
Proximity and neighborhood characteristics
Using the parcel data, Euclidean distances from parcel centroids to the nearest points of interest and amenities are calculated and averaged at the BG level to capture proximity characteristics (Figure S5 in the SF). In addition, the shares of various land uses within a 3.2 km (2-mile) circular buffer centered at each parcel centroid are also calculated and averaged at the BG level as proxies for neighborhood characteristics (Figure S6 in the SF).
The average distance to the nearest industrial facility from parcels was 1.37 km in 2010, decreasing by 2.47% over the 10 years. The average distance to the nearest educational building was 1.43 miles in 2010, decreasing by 1.89% over the decade. The average agricultural land share within a 3.2-km radius increased from 2.49% in 2010 to 2.63% in 2019. Similarly, the average share of recreational land rose from 1.9% to 1.98%. The average share of single-family residential land was 31.5% in 2019, increasing by 3.8%. Finally, the average percentage of mixed-use land rose from 0.28% in 2010 to 0.29% in 2019.
Traffic volume
Historical annual average daily traffic data between 2010 and 2019 were obtained from the Transportation Data and Analytics Office of the Florida Department of Transportation (FDOT). The perpendicular distance to the closest roadways and the average of Annual Average Daily Traffic (AADT) on the 100 closest road sections from a given BG centroid are computed using the annual data (Figure S7 in the SF). The average perpendicular distance to the nearest main road section was 0.87 km in 2010, and it declined by 36.5% over 10 years. The average AADT increased by almost 20% over the same period.
Socio-economic characteristics
The ACS 5-year estimates provide socio-economic information at the BG level. Socio-economic characteristics for a given year are retrieved from the 5-year ACS survey ending with that year. The average population density has increased from 16.76 to 18.10 (person per hectare). The ratio of minority populations, including African-Americans, Asians, and people of two or more races, also increased. The number of occupied housing units rose by 6.79%. The average student ratio decreased by 6.6%, from 22.2% to 20.1%. The average employment ratio increased by 4.3%. The average share of employees commuting by car declined by 1.4%. Finally, the share of financed housing units using mortgage loans decreased from 34.9% to 30.7%.
Descriptive statistics
Table S2 in the SF presents descriptive statistics for the variables used in the regression models over 2010–2019. The table reports mean and standard deviation statistics for each year. The only time-invariant variables are lot area and variance of lot areas. Therefore, these variables are not included in the table. The mean and standard deviation of the average lot area at the BG level are 8.73 and 369.8 ha, respectively. The average lot area variance is 4370.5 with a standard deviation of 190262.4.
Results
This section presents the statistical analysis results based on data covering 11,394 BGs in Florida, after removing BGs located in coastal areas and without any parcels and 21 BGs because of missing information. All statistical analyses are conducted using R software packages (spdep and splm). Spatially lagged covariates and temporally lagged dependent variables are prepared in the R software before inputting them into the spdep and splm packages. Before conducting regression analyses, pair-wise Pearson’s correlations are computed to identify moderate and strong linear associations between independent variables. Significant correlations among independent variables violate the independence assumption. Figure S8 in the SF presents the statistically significant pair-wise Pearson’s correlations for the independent variables. The home occupancy ratio, distance to industrial complexes, and distance to education facilities are moderately correlated with many variables. Therefore, these three variables are excluded from further statistical analysis. The only exception is made for the variance of parcel areas, which is strongly correlated with the average parcel area.
Testing different spatial weight matrix rules
Different spatial weight matrix rules based on the K-nearest neighbors method are tested using the DSPD model with a temporal lag to find the optimal spatial weights for further analysis after excluding the previously identified independent variables which are moderately correlated with many variables. The spatial autoregressive coefficient (
Exploratory regressions
After identifying the appropriate spatial weight matrix, multiple model specifications are tested to determine the best-fit model for the data. Using the set of explanatory variables used in the previous section, different temporal and spatial configurations are tested during this exploratory regression analysis phase. Table S4 in the SF represents statistics from the tested models to evaluate the goodness-of-fit. Nonspatial and spatial models without temporal lag are tested to obtain a baseline for spatio-temporal models. Model combinations with contemporaneous or temporally lagged covariates, spatially lagged covariates, the number of temporal lags (up to 2), and spatial autocorrelation in the error term are tested.
Model 8 results excluding the insignificant variable.
“***” p < 0.001 AIC: −461171.9
“**” p < 0.01 BIC: −461023.3
“*” p < 0.05 Adj. R2 : 0.9909
“.” p < 0.1
Since the development ratio ranges between 0 and 1, this limited response variable may not be suitable for the DSPD model. To evaluate the impacts of this approach on parameter estimations and model statistics, the dependent variable is transformed using the logit function (
Direct and indirect effects
Average elasticities of the direct, indirect, and total impacts based on Model 8 estimates between 2012 and 2018.
“***” p < 0.001; “**” p < 0.01
The model also captures other essential dynamics. Among the socio-economic factors, the lagged employment ratio most strongly affects the land development potential. A 1% rise in the previous year’s employment ratio increases the expected land development ratio by 0.0017%. At the same time, higher employment rates in neighboring BGs are also positively associated with land development but with a much weaker effect. On the other hand, a 1% change in the previous year’s student population ratio in each BG results in a 0.0006% rise in the expected development. In contrast, the expected development decreases slightly as the ratios of student population increase in neighboring BGs. A higher proportion of the employed and student populations supports an active real estate market. A 1% increase in the lagged non-white population ratio decreases the expected percentage of land development by 0.0006%. Because the non-white population ratio is generally related to income, areas with a larger non-white population are likely to have lower development potential. A 1% rise in the lagged population density results in a 0.0006% decline in the expected development ratio, but the same surge in neighboring BGs increases the development potential. This suggests that relatively less developed areas surrounded by denser populations are attractive locations for further development. A 1% increase in the lagged ratio of financed homes results in a 0.0004% rise in development, and there is also a positive indirect association between neighboring financed home ratios and land development. High rates of financed units are signs of active buyers in the market. Finally, a 1% increase in the lagged percentage of workers commuting by car decreases the expected ratio of developed land by 0.001%. Higher commuting rates by car may indicate potential traffic congestion, which may discourage new urban growth.
The model also captures the effects of specific neighborhood characteristics. A 1% increase in the lagged shares of single-family, agricultural, and recreational areas in each BG decreases the expected development by 0.0004% for single-family units and 0.0002% for agricultural development. However, it increases the expected development by 0.00008% for recreational areas. The corresponding indirect effects are much smaller than the direct effects. Considering the analyzed period, the existing agricultural lands are in peripheral regions, and such locations require longer commutes and are less desired for new land developments. An increase in the current single-family residential areas in BG areas reduces the potential for urban growth due to fewer available vacant lots. Finally, recreational lands benefit nearby areas, including landscape views, clean air, and activity spaces. Therefore, these areas are attractive places for new urban developments.
Finally, the model results underline the importance of transportation infrastructure and traffic volumes. A 1% decrease in the lagged distance from a given BG to the nearest road section increases the expected development by 0.0011%. This result confirms the importance of road accessibility for real estate development. However, a 1% increase in average traffic volume results in a 0.0004% decline in growth, suggesting that the adverse effects of high traffic volumes reduce development potential.
Predictive ability of the model
In this study, 2019-year data is used for model validation. Therefore, 2019-year data is not included in prior regression analyses. As LUC models can be used for predicting future land developments, Model 8 is used to predict growth in 2019 based on the situation in 2018. Equation (17) estimates the expected land development at time t+1 (2019) using the existing (2018) covariate matrix. Since the DSPD is a regression model, R2 statistics can be used for prediction accuracy (equation (18)). The prediction accuracy is computed as 99.1%. Figure 1 presents (a) the difference between actual and estimated values and (b) estimated changes in the developed ratios in 2019. The model predicts fewer developments in dense urban areas like Jacksonville, Miami, Tampa, Orlando, and Tallahassee (Figure 1(b)). However, the model expects urban growth potential near environmentally sensitive areas south of Orlando and west of Miami. (a) Differences between actual and estimated values and (b) Changes in developed ratio from 2018 to 2019.

Discussion
In this section, the findings of this study are compared with those of previous studies to evaluate further the appropriateness of the proposed framework for urban growth modeling. Previous studies (see Table S9 in the SF) do not include the ratios of minorities and students, financed housing units, workers commuting by car, and average AADT (or any type of traffic variable). The impact of parcel size is estimated to be negative in this paper and studies by Carrión-Flores and Irwin (2004) and Tepe and Guldmann (2020), while Nahuelhual et al. (2012) and Tepe and Guldmann (2017) indicate a positive association between urban growth and parcel size. These contradictory results suggest non-linear relationships between parcel size and urban development. According to several previous studies, population density positively affects growth (Gaur et al., 2020; Huang et al., 2009; Jokar Arsanjani et al., 2013; Tepe and Guldmann, 2017). However, Carrión-Flores & Irwin (2004) and Hu and Lo (2007) indicate negative associations, as does this study. The employment ratio is consistently estimated to have a positive effect (Deng and Srinivasan, 2016; Hu and Lo, 2007).
The share of single-family residential areas is negatively associated with urban growth in this study and in the regression results of Arsanjani et al. (2013) and Tepe and Guldmann (2017), in contrast to the positive effects in Carrión-Flores and Irwin (2004) and Tepe and Guldmann (2020). Tepe and Guldmann (2020) provide insight into the impacts of the share of residential areas, where the percentage of residential areas is positively associated with single-family residential parcels and other residential units. However, the relationship is negative with other land uses. The impacts of recreational areas on land development are always positive (Jokar Arsanjani et al., 2013). In this study, the effect of the agricultural land share is assessed as negative, consistent with the results in Deng and Srinivasan (2016) and Tepe and Guldmann (2020). However, other studies display negative associations (Carrión-Flores and Irwin, 2004; Irwin et al., 2003; Liao and Wei, 2014). Irwin et al. (2003) state that the positive sign is unexpected. There is a consensus on the sign of distance to major roads among previous studies (Carrión-Flores and Irwin, 2004; Hu and Lo, 2007; Huang et al., 2009; Jokar Arsanjani et al., 2013; Lavalle et al., 2011; Liao and Wei, 2014; Nahuelhual et al., 2012; Tepe and Guldmann, 2017, 2020). Spatial autocorrelation and the first temporal lag coefficient are also positively associated with land developments in several studies (Bhat et al., 2015; Deng and Srinivasan, 2016; Ferdous and Bhat, 2013; Huang et al., 2009; Nahuelhual et al., 2012; Tepe and Guldmann, 2017, 2020).
In this study, spatial weight matrices are constructed based on a predefined rule conceptualizing spatial relationships, similar to the previous studies discussed in this section. There is no theoretical basis for creating spatial relationships in land-use change modeling. Therefore, the conceptualization rule is identified based on an empirical analysis that minimizes the spatial autocorrelations in the residual. The presence of spatial dependency in the residual violates the statistical independence of observations (LeSage and Pace, 2009).
Conclusions
A DSPD model has been proposed and applied, including site-specific, proximity, neighborhood, socio-economic, and transportation variables. This model explicitly accounts for spatial and temporal dependencies. The overall model accuracy is almost 99%, and the prediction accuracy is 99.3%. Using historical data and temporally lagged covariates allows this model to predict future urban growth. A robust land development model can provide essential information to practitioners and policymakers about potential urban expansions. Such information will enable them to mitigate unexpected outcomes and effectively manage resources appropriately. For instance, the proposed modeling framework can predict land developments near environmentally sensitive areas.
Computations of marginal effects provide insight into the commonly used proxy information in LUC models. Proximities to major road infrastructure and employment ratio are the most impactful factors in urban growth after historical conditions in neighboring BGs. The results further emphasize the importance of incorporating temporal dimensions in LUC models.
There are four significant limitations to the proposed model: (1) Inability to control more complex non-linear relationships in land development dynamics; (2) Use of only continuous response variables, while LUC models often contain discrete-response variables; (3) Scalability of the modeling framework because of computational challenges in computing the inverse of the spatial weight matrix, as required for parameter estimations and predicting future urban growth; (4) accounting for random and fixed effects in the DSPD modeling framework. These limitations provide future research opportunities to contribute to the existing literature.
Supplemental Material
Supplemental Material - History, neighborhood, and proximity as factors of land-use change: A dynamic spatial regression model
Supplemental Material for History, neighborhood, and proximity as factors of land-use change: A dynamic spatial regression model by Emre Tepe in Environment and Planning B: Urban Analytics and City Science.
Footnotes
Acknowledgements
I would like to thank Dr. Jean-Michel Guldmann at Ohio State University for his valuable comments on this work. I also would like to thank anonymous reviewers for helpful comments.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
