Comparing connected vehicle and mobile phone data for urban mobility analysis: Examining mobility diversity and spatial interaction across income groups

Abstract

Connected vehicle (CV) data are increasingly available and widely used in transportation engineering for traffic monitoring, safety analysis, and infrastructure planning. However, the representativeness of CV data in general urban mobility analysis remains underexplored, raising concerns about potential biases between observed mobility patterns in CV data and actual travel behaviors, particularly across different demographic and socioeconomic groups. This study compares Wejo connected vehicle (CV) data with SafeGraph mobile phone records covering boarder population to reveal the representativeness of connected vehicle data in the context of urban mobility analysis. Using entropy-based measures of destination income diversity and normalized inter-neighborhood interaction strength, we examine how each dataset reflects mobility structures across income groups in San Antonio, Texas. Results show that SafeGraph data capture more multimodal and socially integrative travel behaviors, particularly among low-income communities, while Wejo data primarily reflect routine, vehicle-based movements concentrated among higher-income users. Interaction patterns in the CV data are more spatially clustered, with stronger flows observed within affluent neighborhoods. These differences underscore the behavioral and demographic selectivity embedded in CV data and point to important limitations when using such data to analyze mobility-based segregation. The findings contribute to ongoing efforts to evaluate the representativeness of emerging mobility datasets and their implications for urban spatial analysis.

Keywords

connected vehicle data mobile phone data urban mobility income diversity spatial interaction

Introduction

In recent years, with the advancement of Internet of Things technology, connected vehicles (CVs) have been widely accepted by many households (Schoettle and Sivak, 2014; Shin et al., 2015). Equipped with built-in telematic systems, connected vehicles can transmit and receive real-time data about their location, performance, and surroundings, which enhances driver safety and mobility efficiency on roadways (Olia et al., 2016). More and more drivers are choosing connected cars as their daily mobility tools to reduce the risk of accidents and increase driving experience (Bourne, 2024). As the massive amount of connected vehicle data being collected, it has opened new possibilities for understanding travel behavior and mobility patterns in urban transportation system. Based on the high-frequency recordings of vehicle trajectories and precise GPS-based tracking, CV datasets provide valuable insights into driving behavior analysis, driving safety estimation, and optimizing traffic signal timing (Ali et al., 2020; Goodall et al., 2013). In addition, CV datasets also offer the potential to support broader investigations into urban mobility, such as identifying travel demand patterns, evaluating accessibility, and measuring spatial connectivity among neighborhoods (Lin et al., 2019). As CVs become increasingly integrated into everyday transportation, the data they generate provide a novel lens through which to observe and analyze how people utilize urban space.

Although CVs have provided a source of high temporal and spatial resolution data for many studies, the question of the representativeness of CV data in overall urban mobility remains uncertain. Connected vehicles are typically owned by individuals with newer and more expensive vehicles, which could be concentrated in higher-income neighborhoods or some low and moderate income families who can afford it (Shin et al., 2015). Factors such as vehicle cost, insurance, and household income lead to an uneven distribution of CV users across the population. Thus, the observed CV data may capture only a subset of urban travel behaviors, particularly individuals associated with private automobile ownership. This limitation raises important questions about the extent to which mobility patterns derived from CV data can reflect the travel characteristics of the overall urban population.

Since the trip records documented by CV data tend to come from individuals with specific socioeconomic backgrounds, the captured mobility patterns may reflect the behavior of a selective segment of the urban population. Although this introduces limitations for generalizing to the broader public, it also offers a valuable opportunity to investigate the travel behaviors of this distinct subgroup. By systematically comparing CV-based mobility patterns with those derived from more inclusive data sources, such as mobile phone records, it becomes possible to identify behavioral differences, assess the spatial and social limitations of CV users’ movements, and explore whether they exhibit unique or structurally constrained mobility patterns within the urban context. Furthermore, this also creates an opportunity to further examine the mobility segregation and spatial interaction preferences among the CV user group. By measuring the diversity of their travel destinations and the intensity of inter-neighborhood interactions, it is possible to assess whether CV users tend to move within socioeconomically homogeneous environments or maintain diverse mobility connections. Integrating with neighborhood-level socioeconomic and built environment attributes, it allows us to identify the factors that may enhance or limit the spatial connectivity and social exposure of CV users. Such analysis not only contributes to a deeper understanding of the behavioral boundaries of connected vehicle users in urban space but also provides an empirical foundation for evaluating the applicability of CV data in transportation equity studies, mobility pattern analysis, and smart mobility planning.

This study aims to address the representativeness of connected vehicle data in the context of urban mobility analysis through the lens of destination-based diversity and spatial interaction. In this study, mobility refers to the aggregated intra-urban movements of residents, encompassing all trip purposes and times of day. Rather than focusing solely on daily commuting or specific trip types, we interpret mobility as a comprehensive reflection of how people traverse the city in their everyday lives. This broad definition allows both connected-vehicle and mobile-phone datasets to be analyzed under a consistent conceptual framework that emphasizes spatial representativeness over temporal or purpose-specific variations.

Through comparison with a more inclusive dataset of mobile phone-based location records, this study reveals the specific interaction patterns of the connected vehicle user group and the socioeconomic factors that drive the group’s interactions. Specifically, we examine connected vehicle data provided by Wejo and mobile phone data collected from SafeGraph in the city of San Antonio, Texas. We adopt the diversity of destinations by income group, measured by the Shannon entropy, and the normalized interaction strength between origin-destination pairs at the Census Block Group (CBG) level. These metrics are used to assess the extent to which connected vehicle users interact across socioeconomic boundaries or remain spatially and socially constrained in their travel behavior. Statistical models are employed to investigate the explanatory of neighborhood-level socioeconomic attributes and build environmental characteristics in shaping these patterns, allowing for direct comparison across data sources. This study provides empirical insights into the selective visibility embedded in connected vehicle datasets and highlights the implications of such bias for the interpretation of urban mobility structures.

Literature review

Urban mobility research has increasingly turned to mobile phone records (e.g., SafeGraph) to investigate travel flows and social interaction across cities. Although sampling bias in mobile phone data may result in activity characteristics not being extracted for certain segments of the urban population, their broad coverage and multimodal representation make them valuable for analyzing urban mobility patterns (Z. Li et al., 2024). Numerous studies have used origin-destination matrix derived from mobile phone data to reveal commuting patterns, accessibility, and population exposure to different social and demographic environments (Hu et al., 2021; Jiang et al., 2025; Sparks et al., 2022; Toole et al., 2012; Yabe et al., 2023). In particular, a growing number of studies have introduced entropy-based metrics to estimate destination diversity, quantifying the degree to which residents from a given neighborhood travel to areas with varying socioeconomic conditions (Abbasi et al., 2021; Huang et al., 2022; Iyer et al., 2024; Marin et al., 2022). For instance, scholars analyzed mobility data from multiple U.S. cities and shown that people from neighborhoods with higher median income, higher education level and less Black residents are more likely to have a higher place exploration and more diverse destination preference (Moro et al., 2021). Some studies also have investigated interaction strength, the intensity of mobility flows between neighborhoods, to understand spatial connectivity, potential social integration and the effects of travel distance on interaction strength (Xu et al., 2022). These metrics are effective to reveal structural social integration or segregation embedded in urban mobility and have been widely used in studies addressing mobility inequality across demographic groups. Overall, mobile phone data offer a relatively comprehensive description of urban travel behavior and spatial interaction, serving as an important benchmark for evaluating the representativeness of other mobility datasets (e.g., connected vehicle data), particularly in terms of sampling bias and behavioral selectivity (Jiang et al., 2025).

Connected vehicle data have drawn increasing attention in recent years as a valuable source for analyzing vehicle driving behavior, traffic performance with optimization, and traffic safety-related events. Generated by telematic-equipped vehicles, CV data captures high-frequency and GPS-accurate vehicular trajectories in real-world driving conditions, which provide good opportunities for applications such as real-time traffic monitoring, traffic signal optimization, emissions estimation, and crash or near-crash risk forecasting (Ali et al., 2020; Goodall et al., 2013; Li et al., 2024a; Sultana and Hassan, 2025; Xie et al., 2019). For example, connected vehicles, like a moving sensor for traffic speed, can detect the traffic speed on road segments in real time, which facilities dynamic traffic management and further improves the traffic efficiency in cities (Gao et al., 2019; Khan et al., 2017). Many studies also leverage detailed heading and speed information of connected vehicle movement to detect the near-crash events on roads, providing insights into driving behavior and road safety conditions that may not be captured through conventional crash reports (Islam and Abdel-Aty, 2023; Zhang and Abdel-Aty, 2022). However, the use of CV data in broader urban mobility research remains relatively limited, especially in relation to human travel behavior and socio-spatial interaction patterns.

Given its spatiotemporal granularity and large-scale availability, CV data offer significant potential for advancing urban mobility research. Based on the expectations of connected vehicle markets in the US, the connected vehicle user group will surpass 180 million by 2028, accounting for 70.9% of all licensed drivers across the country (Bourne, 2024). As adoption increases, the continuous and precise tracking of vehicles enables detailed reconstruction of origin–destination (OD) flows, providing an opportunity to investigate how mobility is structured across neighborhoods and social groups. Compared to traditional travel surveys, CV data can provide more temporally dense and route-specific information. However, despite these advantages, questions remain regarding the types of travelers CV data represent and how they compare with the general population in terms of socioeconomic characteristics and behavioral diversity.

Understanding the representativeness of CV data is particularly important when such data are used to estimate social segregation, accessibility, or policy impacts. Without a clear assessment of who is visible in CV datasets and how their mobility behaviors differ from the general population, there is a risk of drawing incomplete or skewed conclusions about mobility structures. This study contributes to addressing this gap by directly comparing mobility measures derived from CV and mobile phone data. In doing so, it highlights the demographic and behavioral specificity of CV users and evaluates the implications for using connected vehicle data in broader urban mobility research.

Study area, mobility datasets, and methodology

Study area

This study focuses on the city of San Antonio, the second-largest city in Texas and one of the fastest-growing urban regions in the United States. According to the 2020 U.S. Census, San Antonio’s population exceeded 1.4 million, of which more than 64% are Hispanic (Bureau, 2023b). The city exhibits a strong reliance on automobile travel, with over 70.5% of commuters using private vehicles and driving alone as their primary mode of transportation (Bureau, 2023a). Meanwhile, as one of the pilot cities for smart transportation, transportation department of San Antonio has been expanding the application of telematics technology in urban transportation system, attracting a large number of connected vehicle users (Antonio, 2019). Thus, the city of San Antonio is an ideal study area to address the representativeness of connected vehicle data in urban mobility analysis.

In this study, we concentrate on analyzing neighborhood-level interaction characteristics. Thus, we adopt the census block group (CBG) as the spatial analysis unit for aggregating mobility data, which is commonly used in mobility pattern analysis (Z. Li et al., 2024; Yang et al., 2023; Zhang et al., 2023). The CBG level provides a consistent spatial framework that allows integration of mobility records with census-based socioeconomic variables, such as median income, educational attainment, and demographic information. These variables represent the specific social and economic characteristics that could influence mobility diversity and interaction intensity, which will be used as explanatory factors in this study. Among the 1,047 CBGs, this study focuses on 940 CBGs that have valid observations in both the SafeGraph and Wejo datasets. In order to investigate the interaction patterns between neighborhoods with different income levels, we categorize CBGs into low, middle, and high income communities. Given the categorization of income levels is commonly used in existing studies (Clark and Fossett, 2008; Spiegel et al., 2025; Yip et al., 2016), we use 30% ($51,376.60) and 60% ($78,019.60) annual median household income as thresholds to classify the CBGs into three income levels. Specifically, CBGs with annual median household income below $51,376.60 are considered as low-income communities, the CBGs with median income higher than $78,019.60 are classified as high-income communities, and remaining CBGs are middle-income communities. As shown in Figure 1, the majority of high-income neighborhoods are primarily located in the northern suburb areas of the city, while low-income communities are mainly aggregated in the downtown and southern areas. This spatial differentiation offers a useful foundation for assessing the extent to which mobility diversity and interaction strength vary across income groups and urban contexts.

Figure 1.

Income-level categorization of census block groups in San Antonio.

Mobility datasets: SafeGraph and Wejo connected vehicle datasets

To provide a broader context for understanding travel behavior, this study adopts the SafeGraph (SG) dataset as a background. Unlike connected vehicle data, SG offers insights into population movement across all transportation modes, allowing us to better reveal the specific preferences or bias of connected vehicle users’ travel patterns. We collected SG data in San Antonio from November 1, 2021, to January 3, 2022, and over 6 million records are observed. Based on the raw SG records, we aggregate the phone users’ home places and their visited Point of Interests (POIs) at the census block group level to build an original-destination (OD) matrix. This OD matrix will be used to represent the overall population’s movement in San Antonio and server as a baseline for comparison with connected vehicle user group.

Compared with general-sourced mobile phone data, we use anonymized connected vehicle (CV) trajectory data collected by Wejo Group Ltd. The Wejo dataset mainly captures privately owned connected passenger vehicles equipped with in-vehicle telematics systems. It does not include on-demand or fleet-based connected vehicles, such as ride-hailing, taxi, or delivery services. Therefore, the connected vehicle data in this study reflect the travel behavior of private vehicle users, which serves as a consistent basis for comparison with the population-level mobility patterns derived from SG data. The Wejo dataset contains high-resolution trajectory records collected via telematics devices, providing detailed spatial-temporal information. It covers the same period as the SG dataset (November 1, 2021, to January 3, 2022) and includes over 14 million raw trajectories. To ensure the analysis focuses on intra-urban mobility of San Antonio residents, we retain only trajectories that start from residential areas and end within the study boundary, resulting in approximately 5.6 million trips. We use USA Structures datasets, released by Federal Emergency Management Agency (FEMA), to map the starting locations of Wejo trajectories, determining the selected trajectories are starting from residential places.

While the two datasets differ in population coverage and sensing mechanisms, a consistent spatial boundary is critical for meaningful comparison in the following analysis. In this study, the focus is placed on intra-urban mobility behaviors of residents in San Antonio, rather than regional commuting across the metropolitan area. Therefore, both SG and Wejo trips are filtered to include only movements that originate from residential areas and terminate within the study area boundary. This design excludes the mobility patterns of external commuters or pass-through travelers whose trips could otherwise overestimate cross-income interaction strength or distort entropy-based diversity measures. With this consistent boundary setting, both datasets represent comparable resident-based travel patterns within the city. In addition to spatial consistency, both datasets are temporally aggregated over the same study period, which covers regular weekdays, weekends, and several public holidays. This aggregation provides a stable representation of urban mobility by smoothing short-term variations and highlighting the broader spatial patterns of residents’ travel behaviors. Accordingly, the datasets include trips for all purposes and times of day, instead of distinguishing specific daily or commuting patterns, in order to capture the overall structure of residents’ intra-urban movements.

It is possible that a small portion of users appear in both datasets, for example, some connected vehicle drivers may also carry mobile phones captured by SG. However, this potential overlap does not significantly affect the results of this study. SG data are used as a background reference to represent overall population mobility across all travel modes, while Wejo data describe the mobility of private connected vehicle users. The comparison between the two datasets is designed to reveal how connected vehicle data represent only a subset of the broader human mobility captured by mobile phone data, rather than to test their independence. Both datasets are aggregated and anonymized at the census block group level, some individual-level overlap could have a minor effect on the spatial patterns and statistical relationships discussed in this paper.

Methods

Destination diversity of income level

Based on the three income level categories, we calculate an entropy for each origin block group to measure the destination diversity of income levels. We adopt Shannon Entropy to represent the destination diversity, which is widely used in many studies (Marin et al., 2022; Zachary and Dobson, 2021). The entropy is computed as:

E_{i} = - \sum_{c = 1}^{C} p_{i c} \log (p_{i c})

where

E_{i}

is the destination entropy of income level from origin block group

i

p_{i c}

is the proportion of trips from block group

i

to destination block groups belonging to income category

c

, and

C

is the total number of income categories.

Higher entropy values suggest that trips from a block group are distributed evenly across low-, middle-, and high-income areas, reflecting diverse mobility patterns and greater socioeconomic exposure. Lower entropy values indicate that trips are concentrated within specific income categories, potentially showing limited mobility diversity or income-based segregation in travel behavior. This metric is calculated separately for both the Wejo connected vehicle data and SG mobile phone data, allowing us to compare how observed population from different data sources interact with income-diverse areas.

Normalized interaction strength

In addition to analyzing destination diversity, we quantify the strength of mobility interactions between different census block groups through a normalized interaction strength metric (Xu et al., 2022). This measure captures the relative strength of connections between origin-destination pairs, while accounting for differences in the total travel volume of each block group.

Given any two block groups $B_{i}$ and $B_{j}$ , the normalized interaction strength ( $S_{i, j}$ ) between them is defined as the total number of OD trips between $B_{i}$ and $B_{j}$ , normalized by the total trips generated from these two blocks:

S_{i, j} = \frac{T r i p s (B_{i}, B_{j})}{N (B_{i}) * N (B_{j})}

Here

T r i p s (B_{i}, B_{j})

denotes the total number of trips between block group

B_{i}

and

B_{j}

. The trips that start at and end in the same block groups are not counted in this calculation.

N (B_{i})

and

N (B_{j})

refer to the total number of trips starting from residential areas in

B_{i}

and

B_{j}

, respectively. In other words,

N (B_{i}) * N (B_{j})

represents the total probability of trips that can travel between the two block groups and

S_{i, j}

measures the likelihood of interaction strengths between two block groups.

This normalization controls the activity level differences between block groups, allowing us to compare the relative strength of interactions across block pairs on an equal footing. By focusing on normalized values, we can better isolate the effects of socioeconomic and spatial attributes on mobility connectivity, independent of overall travel volume. By analyzing the interaction strength between block groups, we investigate how socioeconomic differences, geographic distance, and built environment characteristics shape the intensity of mobility flows.

Regression models

We employ two independent models to identify how the local socioeconomic, demographic and built environmental factors relate to mobility diversity and interaction intensity. For each model, we run separate regressions on the SafeGraph and on the Wejo datasets, using the same CBG-level explanatory variables. This design enables a direct comparison of the coefficient directions and significance across the two datasets, revealing whether connected vehicle data capture similar spatial relationships as population-wide mobility. The diversity model examples how neighborhood characteristics explain variations in the income heterogeneity of travel destinations, whereas the interaction model focuses on how the difference of these factors influences the strength of inter-CBG connections.

The first model focuses on mobility diversity, represented by the income-based entropy of travel destinations at each original CBG. This model is designed to capture how neighborhood socioeconomic and built-environment characteristics shape the heterogeneity of residents’ travel destinations, serving as an indicator of the spatial mixing across income groups. We regress this entropy value on a set of socioeconomic, demographic, and built environmental characteristics, specified as:

E_{i} = β_{0} + \sum_{k = 1}^{K} β_{k} X_{i k} + ϵ_{i}

where

E_{i}

denotes the destination income-level entropy of block group

i

X_{i k}

represents the

k

-th attribute of block group

i

, and

ϵ_{i}

is the error term. This model enables us to assess which neighborhood attributes are associated with destination diversity of income levels. Given that ordinary least squares (OLS) models do not account for spatial dependencies, we additionally employed spatial lag models to more accurately capture the influence of neighboring areas and better explain the spatial patterns of travel diversity. In spatial lag models, spatial dependence is represented through a spatial weight matrix constructed using the eight nearest neighbors based on centroid distances between CBGs. This k-nearest-neighbor structure ensures that each unit is spatially connected while maintaining a balanced neighborhood size across the study area (Guo et al., 2012; Xian et al., 2022). Model parameters are estimated using a maximum-likelihood approach, and residual spatial autocorrelation is further examined with Moran’s I statistics to confirm the adequacy of the specification.

The second model examines the determinants of mobility interaction strength between pairs of block groups. For each origin-destination pair, the normalized interaction strength is regressed on differences in socioeconomic and built environment attributes, as well as geographic distance, formulated as:

S_{i j} = α_{0} + \sum_{m = 1}^{M} α_{m} {Δ X}_{i j m} + {\sum_{n = 1}^{N} γ_{n} Z_{i j n} + μ}_{i j}

where

S_{i j}

represents the normalized interaction strength between block groups

i

and

j

{Δ X}_{i j m}

captures the differences in socioeconomic attributes (e.g., income difference and education difference) between the two block groups, and

Z_{i j n}

includes control variables such as the geographic distance between

i

and

j

. This model allows us to evaluate how socioeconomic disparities and spatial separation affect the intensity of mobility connections.

Both regression models use continuous dependent variables that summarize relative variations in mobility behavior rather than raw trip frequencies. The income entropy index captures the diversity of income groups among destinations and ranges from 0 to ln(K), while the interaction strength index measures the normalized connection intensity between CBG pairs on a 0–1 scale. Because these indicators are derived from proportional and aggregated measures, the relationships between them and neighborhood characteristics are estimated through a linear regression framework. This approach allows the coefficients to be interpreted as changes in mobility diversity or interaction intensity associated with socioeconomic and built environmental factors across neighborhoods.

To provide a structured analytical framework, all explanatory variables are grouped into three major dimensions: socioeconomic characteristics, environment attributes, and spatial context factors. The socioeconomic dimension captures neighborhood-level demographic and economic attributes that may influence mobility behavior. For each CBG, eight socioeconomic factors are adopted in this study, including median household income, median housing price, percentage of low-education (residents without a high school diploma), percentage of Hispanic and Black populations, percentage of working-age population (aged 18–64 years), percentage of households owned vehicle, and population density. The built environment dimension represents physical and functional characteristics of neighborhoods. Two key indicators are calculated, including density of commercial areas and land-use diversity (measured by Shannon entropy). Finally, the spatial context dimension is reflected in the inter-CBG distance variable, which captures potential spatial restrictions between neighborhoods. The above mentioned variables are highly related to the human mobility pattern in cities (Liao et al., 2025; Van Ham et al., 2016). This classification clarifies each variable’s conceptual role and supports comparison across datasets in relation to socioeconomic, environmental, and locational factors. Table 1 shows the general distribution of explanatory variables in the study areas.

Table 1.

Distribution of demographic, socioeconomic, and built environment variables at CBG level.

	Data source	Dimension of analysis	Min.	Max.	Mean	Median	Std.
Distance (KM)	Calculated	Spatial context	0.0019	53.6640	11.5314	9.6288	8.1659
Median income (USD)	ACS	Socioeconomic	11,250.00	250,001.00	75,920.25	68,750.00	39,374.61
House price (USD)	ACS	Socioeconomic	9,999.00	1,200,000.00	226,617.00	190,950.00	142,662.34
Ownership of vehicles (%)	ACS	Socioeconomic	0.0000	1.0000	0.9436	0.9787	0.0867
Pop. density (person/KM²)	ACS	Socioeconomic	0.0244	8.4433	1.9840	1.7882	1.2512
Working age (%)	ACS	Socioeconomic	0.0000	1.0000	0.6151	0.6078	0.1149
Low education (%)	ACS	Socioeconomic	0.0000	0.7700	0.1600	0.1100	0.1500
Hispanic (%)	ACS	Socioeconomic	0.0000	1.0000	0.6316	0.6315	0.2417
Black (%)	ACS	Socioeconomic	0.0000	0.7434	0.0678	0.0299	0.0952
Commercial density	FEMA	Built environment	0.0000	412.7200	31.6835	19.7156	31.7453
Diversity of land use	FEMA	Built environment	0.0000	1.6024	0.3202	0.2584	0.2562

Results

Descriptive analysis

The first step in our analysis is to examine whether the two mobility datasets capture comparable patterns of urban travel behavior. Understanding how significant these datasets align is important not only to validate subsequent comparative analyses but also to highlight potential differences in their representativeness. To ensure a consistent basis for comparison, the correlation analysis between SG and Wejo OD flows is calculated using only the overlapping origin–destination (OD) pairs that appear in both datasets. OD pairs observed in only one dataset are excluded from the computation to avoid bias caused by differences in spatial coverage or sampling density. This intersection-based approach allows the comparison to focus on mobility structures in both datasets rather than areas where one dataset lacks observations. Figure 2 illustrates the correlation of the number of interactions between SG and Wejo datasets at the CBG level. Figure 2(a) presents the overall correlation between OD pair interactions captured by Wejo and SG. The analysis reveals a strong positive correlation (R² is 0.7569), indicating that the general structure of mobility flows is consistent between the two datasets. This result suggests that the overall number of interactions between neighborhoods captured by connected vehicle data is similar with that observed by mobile phone data. However, when focusing on the number of interactions between different income levels, significant variation is observed in the correlation between the two datasets.

Figure 2.

Correlation of trip counts between SG and Wejo datasets.

The nine figures of Figure 2(b) illustrate correlations for each income group pair. Generally, the correlation coefficients remain robust, with R² ranging from approximately 0.52 to 0.78. Notably, the highest alignment appears in interactions involving high-income block groups, such as the high-income to high-income (R² = 0.7823) and high-income to middle-income (R² = 0.7759) flows. This suggests that mobility patterns in higher-income areas are particularly well-represented in the connected vehicle dataset. One possible explanation is that high-income residents are more likely to own private vehicles and actively use connected vehicle technologies, contributing to a stronger correlation. While lower correlation values are observed for interactions involving low-income block groups. For instance, high-income to low-income pairs (R² = 0.5165) and low-income to high-income pairs (R² = 0.6916). The possible reason is that low-income residents are less likely to own newer private vehicles with telematic technologies, leading to their underrepresentation in the connected vehicle dataset. Additionally, low-income individuals may rely more on alternative travel modes such as public transportation or carpooling, which are not reflected in Wejo datasets but are more comprehensively captured through mobile phone data. Such correlation differences across different income groups imply the limitations of connected vehicle data in mobility analysis of lower income groups. We also investigate the travel distance distributions of two datasets. We find that both datasets show the same distance-based pattern in which high-income origins travel farther and low-income origins make shorter, localized trips, with SafeGraph capturing slightly longer and more variable distances due to its broader modal coverage. Full statistical results and distance distribution comparisons are provided in Supplemental S1.

These findings confirm that both datasets reveal consistent patterns of distance-based inequality in urban mobility. Higher-income communities tend to travel farther, on average, while lower-income neighborhoods exhibit more localized movement, similar to the findings in (Liao et al., 2025). This alignment is further supported by the strong correlation in OD flows across the two datasets, particularly within high-income groups, implying that both SG and Wejo capture a similar structure of urban travel behavior. Even in low-income areas, where data representation is often a challenge, the observed distance distributions between the two datasets remain remarkably close, indicating that both sources reflect key socioeconomic mobility dynamics.

Factors related to diversity of destination at income levels

To examine how mobility behavior reflects socioeconomic exposure across space, we calculate the entropy of destination at income levels for each CBG, interpreted as the diversity of income categories visited from each origin block group. A higher entropy value indicates a more balanced distribution of trips across low-, middle-, and high-income destinations, suggesting broader socioeconomic connectivity in travel behavior. A lower value, in contrast, reflects a more segregated or restricted mobility pattern toward specific income groups.

Figure 3 (a) and (c) present the spatial distribution of destination income diversity calculated from SG and Wejo datasets, respectively. Despite differences in data sources, both datasets exhibit a broadly consistent spatial pattern: entropy values are generally higher in areas close to the downtown San Antonio and decline toward the suburban areas. This suggests that central neighborhoods tend to have more varied travel destinations in terms of income levels, while trips originating in outer areas are more homogenous, likely reflecting mono-functional land use or socioeconomically isolated travel behavior. Notably, Figure 3(a) shows a more spatially widespread distribution of moderate-to-high entropy values in SG dataset, while Figure 3(c) exhibits greater heterogeneity of CV dataset, with more low-entropy patches appearing in peripheral and northern parts of the city. This may reflect the differing population coverage in the two datasets. SG captures a broader cross-section of travelers while Wejo primarily reflects connected vehicle users whose travel tends to be more spatially and socioeconomically structured.

Figure 3.

Spatial distribution and cluster analysis of destination diversity across San Antonio.

To explore the spatial dependencies of destination diversity in income levels, we apply Local Moran’s I to identify statistically significant clusters of high and low entropy values. As shown in Figure 3(b), the clusters of high entropy in the SG dataset are mainly concentrated in mid-northern neighborhoods in San Antonio, and low-entropy clusters are in the southern and parts of northern neighborhoods of the city. In contrast, the cluster results of CV dataset show a sharper north-south divide in Figure 3(d). The clusters of low entropy areas dominate the north and suburban areas (where high-income communities are primarily located), and high-entropy clusters are more compact and central, largely concentrated around the downtown areas. Such differences of cluster results between two datasets indicate that, compared to the dataset capturing a broader range of travel modes and user groups, the CV data may overestimate the income diversity of destination for lower income groups and may overemphasize the homogeneity of travel in high-income groups. These differences also imply that it is important to consider the representativeness of mobility data when interpreting the spatial structure of mobility-based social exposure.

We further build four regression models to investigate how the diversity is shaped by neighborhood-level socioeconomic and built environmental characteristics across two datasets, in terms of overall and each income group. The regression results are summarized in Table 2. In the overall models, several key predictors emerge consistently across the two datasets. Median income shows a negative association with entropy in both datasets, with a noticeably larger effect in the Wejo dataset (−0.0106) than in SG (−0.0037). This stronger negative correlation suggests that in connected vehicle data, individuals from higher-income neighborhoods are even more likely to engage in income-segregated travel patterns, preferring destinations similar to their own income level. Lower educational attainment is strongly associated with lower entropy in both SG and CV datasets, suggesting that communities with more educational disadvantages tend to engage in more socioeconomically homogeneous travel. The percentage of working-age population (aged 18–64) is positively associated with destination entropy in both datasets, indicating that neighborhoods with more residents in the labor force exhibit more diverse travel behavior, possibly due to a broader range of work-related destinations. In the CV data, higher shares of Hispanic and Black residents are associated with greater destination income diversity. In contrast, higher rates of multi-vehicle ownership correspond to lower diversity, reflecting the distinct, vehicle-dependent mobility patterns of CV users compared with the broader population. We provide a more detailed discussion of regression results in Supplemental S2.

Table 2.

Regression results for income diversity and explainable variables.

Census block groups	Overall		Low-income		Middle-income		High-income
Census block groups	SafeGraph (R² = 0.315)	Wejo (R² = 0.373)	SafeGraph (R² = 0.411)	Wejo (R² = 0.145)	SafeGraph (R² = 0.223)	Wejo (R² = 0.124)	SafeGraph (R² = 0.107)	Wejo (R² = 0.342)
Const	1.0737***	0.8591***	0.7281**	1.1878***	1.5379***	0.7619***	1.1343***	0.8245***
Median income	−0.0037*	−0.0106***
Hispanic		0.1924***	−0.2759***	−0.1138*		0.2335***		0.1831**
Low education	−0.2337***	−0.1161**	−0.1282**	−0.0837*	−0.097*		−0.1471*
Aging (18–64)	0.0957**	0.094*				0.2001*
Black		0.2513***				0.2521**		0.2489*
Median house price	0.12***		0.1334***	0.0601*	0.1178**			−0.1652*
Population density		0.0804***	−0.0655**				0.0558**	0.1578***
Multi. vehicles		−0.0704*						−0.2526***
Land use diversity		0.0688*	−0.1042**					0.212**
Commercial area density			0.0005**

* indicates 0.01 < t < 0.05, ** indicates 0.001 < t < 0.01, *** indicates t < 0.001, and blank cells indicate the results are not statistically significant.

However, residual analysis from the ordinary least squares models revealed statistically significant spatial autocorrelation, according to Moran’s I test (SG: I = 0.80, p < 0.01; Wejo: I = 0.79, p < 0.01). This indicates that the unobserved spatial processes or omitted spatial structure may not be well captured by the standard linear models, suggesting the need for spatial econometric approaches. Thus, we further adopt a spatial lag model, which explicitly incorporates spatial interactions between adjacent CBGs through a spatially lagged dependent variable. We adopted 8 nearest neighbors to calculate a row-standardized spatial weights matrix. As shown in Figure 4, compared to the OLS model, the spatial lag model substantially improves model fit (SG: pseudo-R² from 0.315 to 0.815; Wejo: from 0.373 to 0.758), suggesting that accounting for spatial dependence is crucial in modeling urban travel diversity. The spatial lag coefficient (ρ) is large and statistically significant in both models (SG: 0.874, p < 0.001; Wejo: 0.853, p < 0.001), indicating strong spatial spillover. Destination diversity in one CBG is influenced by that in neighboring CBGs. While some coefficients appear small in magnitude, the total effects (including spatial feedback) are substantially amplified. For instance, the effect of median income in the SG model has a total effect of −0.0132 compared to the coefficient in OLS of −0.0037. The difference between the SG and Wejo models further reinforces the differing representativeness of the two datasets. SG’s total effects are stronger for variables such as multi-vehicle ownership and education, whereas Wejo reflects more spatial clustering and attenuation of built environment effects.

Figure 4.

General regression results of spatial lag model with total effects of explainable variables on income diversity.

In summary, the regression models confirm and enrich the spatial clustering results while also revealing differences in data representativeness. SG data, which captures a broader and more general population through mobile phone signals, shows stronger associations with demographic factors in low-income neighborhoods. In contrast, CV data, derived from connected vehicle trajectories, better captures mobility patterns in high-income, vehicle-dependent communities. These differences reflect the inherent sampling biases and user profiles of each dataset and emphasize the value of using both sources to understand mobility diversity across socioeconomic contexts.

Impact of socioeconomic, demographic, and built environment on interaction strength across datasets

In this section, we examine how socioeconomic and spatial disparities shape interaction patterns between neighborhoods in SG and CV datasets. Firstly, we divide the normalized interaction strength into three categories, including the overall, the interactions within the same income level, and the interactions between different income levels. Then, we adopt regression model for each category to reveal the impacts of socioeconomic and built environment differences on the interaction strength. Based on the regression results of two datasets (as shown in Table 3), we can further identify the representativeness of SG and CV datasets on mobility analysis.

Table 3.

Regression results for interaction strength and explanatory variables (overall and intra-group interactions).

Income group	Overall		LL		MM		HH
Income group	SG (R² = 0.469)	Wejo (R² = 0.417)	SG (R² = 0.283)	Wejo (R² = 0.301)	SG (R² = 0.451)	Wejo (R² = 0.445)	SG (R² = 0.397)	Wejo (R² = 0.451)
Const	−4.9454***	−4.8868***	−4.4952***	−4.0437***	−3.9946***	−3.9585***	−4.5225***	−3.9349***
Distance	−1.1063***	−1.2257***	−0.7979***	−0.9968***	−1.074***	−1.2733***	−1.1921***	−1.377***
$Δ$ Median income	−0.0244***		−0.0408***	−0.0505***	−0.0327**		0.0228*	−0.0356***
$Δ$ Median house price	−0.0146***	0.0197***	−0.027***	0.0106***		−0.0175***	−0.0827***	0.0128**
$Δ$ Low education		0.0086***		−0.0199***	0.0341***	0.0394***	−0.0244**	0.0291***
$Δ$ Aging (18–64)	0.0364***	−0.0267***			0.0736***	−0.0367***	0.0327**	−0.0276***
$Δ$ Population density	−0.065***	0.0633***	−0.0262**	0.0649***	−0.0275*	0.1006***	−0.1058***	0.0878***
$Δ$ Black	−0.0169***	−0.0335***	0.0158***	0.0217***	−0.0803***	−0.0566***	−0.0497***	−0.0205***
$Δ$ Hispanic	0.0358***	−0.0343***	0.0512***		0.035**		0.1307***	0.0228***
$Δ$ Commercial area density	0.0837***	−0.0704***	0.0493***	−0.0358***	0.0825***	−0.0476***	0.189***	−0.0795***
$Δ$ Land use diversity	0.0923***	−0.0574***	0.0891***	−0.0442***	0.1021***	−0.0777***	0.1513***	0.0292***

* indicates 0.01 < t < 0.05, ** indicates 0.001 < t < 0.01, *** indicates t < 0.001, and blank cells indicate the results are not statistically significant.

Based on the regression results, SG data reveals greater interaction among socioeconomically similar and built environmentally diverse neighborhoods, indicating broader and more socially integrative mobility. In contrast, Wejo interactions are more segmented. In specific, CV users exhibit stronger ties within affluent, vehicle-oriented areas and weaker connections in socially or spatially diverse neighborhoods. Spatial distance remains the strongest negative factor across both datasets. Full variable-level results for overall, within-group, and cross-group interactions are provided in Supplemental S3.

Moreover, we estimate regression models for inter-group interactions to understand how socioeconomic disparity shapes mobility across social boundaries, and how these relationships differ between SG and Wejo data sources. Full regression results are provided in Supplemental S4. Based on the regression results shown in S4. Table 1, spatial distance remains a strong and negative predictor of interaction strength, consistent with prior findings. Beyond this, several socioeconomic disparities show meaningful contrasts. In both datasets, larger differences in median income and educational attainment are associated with stronger cross-income interactions, suggesting the presence of functional linkages such as commuting from lower- to higher-income neighborhoods. However, differences in demographic composition and built environmental characteristics diverge across SG and Wejo. SG generally captures stronger cross-income connections in neighborhoods with varied land-use patterns or greater social diversity, while Wejo shows weaker and more selective interaction patterns, reflecting the mode- and population-specific nature of connected-vehicle data.

Conclusion, discussion, and limitations

This study compares connected vehicle and mobile phone datasets to evaluate their effectiveness in capturing urban mobility patterns, with particular attention to destination diversity and spatial interaction across income groups. While both data sources reveal broad consistent patterns in aggregate flows and travel distance stratified by income, deeper analysis uncovers important divergences rooted in data representativeness and user behavior. These differences carry meaningful implications for how each dataset reflects urban social structures and mobility inequalities (Wang et al., 2018).

First, our findings highlight that connected vehicle data reflect a more behaviorally specialized and demographically selective subset of the urban population. CV users are disproportionately drawn from higher-income households with access to newer, telematics-equipped vehicles, which contributes to strong correlation in high-income OD flows and longer average trip distances for affluent communities. In contrast, the SafeGraph mobile phone dataset offers a more inclusive representation of travel behavior across income levels, capturing a broader diversity of travel modes and user demographics. However, as Shelton et al. argue, the apparent inclusiveness of digital trace data should be critically examined in light of structural biases in who generates and who is visible within these datasets (Shelton et al., 2015). This distinction becomes particularly evident in the modeling of destination income diversity, where SafeGraph explains significantly more variation among low-income communities, whereas CV data exhibit greater explanatory power in high-income areas. This reflects ongoing concerns in the mobility data literature about how data infrastructures systematically privilege certain populations while obscuring others (Zook and Graham, 2007).

Second, the spatial clustering of destination diversity reveals fundamental differences in the geographic distribution of social exposure. While both datasets identify higher diversity in central urban areas, the CV data display sharper spatial segmentation, with lower entropy concentrated in suburban, high-income zones. These findings are consistent with the spatial mismatch hypothesis (Kain, 1968; Ong and Blumenberg, 1998) which suggests that physical distance and infrastructure limit cross-group access to opportunities. These patterns suggest that CV users may be less likely to engage in cross-income travel compared to the general population. Such spatial homogeneity is further reinforced by regression results showing that multi-vehicle ownership and high median house prices are associated with lower diversity of destinations in the CV data, an indication of routine, car-dependent, and potentially socially insulated travel behavior.

Third, the analysis of interaction strength further differentiates the datasets. Although spatial distance consistently serves as a key mobility constraint in both, other socioeconomic and demographic factors exhibit contrasting effects. SafeGraph data point to higher interaction between demographically and socioeconomically similar areas, consistent with mobility bounded by residential segregation. Yet, they also capture stronger intergroup interaction where built environment diversity is high, suggesting that urban complexity fosters cross-boundary mobility. This supports recent conceptualizations of mobility as a form of social exposure and integration (Liao et al., 2025; Wang et al., 2018), where movement through diverse urban spaces contributes to contact across group boundaries. Conversely, CV data show a greater degree of segmentation, with negative associations between interaction strength and land use diversity, Hispanic population differences, and commercial activity differences. This may reflect not only behavioral specialization but also biases in data coverage, which are known to affect mobility estimates and have implications for downstream modeling (Schlosser et al., 2021). While some of this segmentation may arise from users’ more destination-specific or commuting-driven patterns, the possibility of underrepresentation in diverse or marginalized communities cannot be ruled out.

Finally, our disaggregated regression models provide additional insights into the behavioral structure of mobility across income groups. Cross-income mobility is positively associated with income and educational disparities, especially in the CV dataset, which may reflect commuting flows from lower-to higher-income areas. However, demographic differences, particularly in the share of Black residents, consistently reduce cross-income interaction in both datasets, pointing to the persistent role of racial homophily in shaping spatial exposure and reinforcing urban segregation (Athey et al., 2019; Vachuska, 2023). Notably, the SafeGraph data more consistently capture the integrative effects of urban form and social diversity, while the CV data highlight the influence of travel mode and user selectivity in shaping mobility patterns.

Based on the above findings, several observed patterns—such as the concentration of connected-vehicle trips in higher-income neighborhoods—align with expectations; however, their empirical confirmation offers valuable evidence for evaluating the representativeness of emerging mobility data sources. The comparison between SG and Wejo further quantifies how socioeconomic disparities manifest spatially and exposes systematic differences in data coverage that mirror the broader digital and vehicular divide. By comparing mobility patterns extracted from both datasets, the study also reveals neighborhood-level variations in travel mode preferences and mobility intensity, providing critical insight into how large-scale mobility datasets represent, or fail to represent, real-world travel behavior. To sum up, these findings emphasize the importance of interpreting connected-vehicle–based mobility analysis within a wider context of data accessibility and equity. Moreover, the comparative framework established in this study can be extended to other cities or data sources to assess how different sensing technologies capture urban mobility and social diversity.

Although the comparative advantages of mobile phone data in capturing diverse travel behaviors, some potential representative biases of the SafeGraph dataset could impact the reliability of inferences. Its representativeness depends on the underlying panel of mobile devices, which may under-sample certain population groups such as children, elderly individuals, or those without smartphones (Jardel and Delamater, 2024; Z. Li et al., 2024). Additionally, device-level anonymization and location sampling strategies can affect spatial precision and completeness of OD flows. While SafeGraph appears to capture greater social diversity in our analysis, potential biases in data coverage and signal quality should be acknowledged when interpreting its apparent inclusivity. Future research should consider validation using multiple mobile data sources to assess consistency and coverage.

While SafeGraph and Wejo datasets both capture large-scale mobility patterns, they differ in their modal coverage. SG data represent movements across all transportation modes, including walking, public transit, and driving, whereas Wejo data reflect only motorized trips generated by connected private vehicles. In this study, SG serves as a background representation of overall population mobility rather than a mode-specific comparison. This design highlights how connected-vehicle data represent a distinct subset of urban travel behaviors, particularly those associated with private vehicle use. Although SafeGraph does not contain explicit information about transportation mode, prior studies have shown that its aggregated mobility patterns tend to align more closely with car-based movements than with walking or transit activity (Jiang et al., 2025). This characteristic suggests that SG captures a substantial portion of vehicular travel, supporting its use as a meaningful background dataset for comparison with Wejo. However, identifying the subset of SG trips that are more likely associated with driving would allow for a more mode-consistent comparison with Wejo. Such a targeted comparison could reveal how well connected-vehicle data capture the spatial patterns of vehicular mobility represented in aggregated population data, providing a clearer understanding of the representativeness and potential biases of connected-vehicle–based mobility analysis.

Moreover, we acknowledge that the connected vehicle data used in this study represent only a portion of privately owned vehicles and do not include on-demand or fleet-based connected vehicles. In the next step, we will incorporate multiple sources of connected vehicle data in this comparison to better capture the diversity of the connected vehicle ecosystem. In addition, future analyses will compare SafeGraph and Wejo data across different periods of the day and day types. Such differentiation will help identify when connected-vehicle data align or diverge from population-level mobility. For example, stronger agreement during weekday commuting peaks and larger gaps on weekends or holidays could be observed. This temporal extension will help evaluate the stability of connected-vehicle representativeness within broader human mobility patterns.

At last, public transportation accessibility represents an important structural factor shaping urban mobility and its socioeconomic patterns. Areas with dense transit coverage often exhibit different travel behaviors and income compositions than those primarily dependent on private vehicles. While this study centers on resident-based vehicular and general mobility captured by connected vehicle and mobile phone data, the role of public transit remains crucial for understanding mobility equity and segregation in U.S. cities. In the future, we plan to incorporate measures of transit coverage and service availability to extend this comparative framework, enabling a more comprehensive examination of how public and private mobility systems jointly shape urban accessibility and social interaction.

Despite these limitations, the comparative framework developed in this study provides a transferable methodology for evaluating the representativeness and spatial bias of emerging mobility datasets. By jointly analyzing connected-vehicle and mobile-phone data within a unified spatial and socioeconomic structure, the framework enables researchers and planners to identify where data sources over- or under-represent certain populations and travel behaviors. This approach can support evidence-based mobility planning, improve the design of data-driven transportation models, and guide more equitable integration of multi-source mobility information in smart-city initiatives. Furthermore, it offers a foundation for future comparative studies across different cities and data types, helping to establish standardized benchmarks for assessing mobility data quality and inclusivity.

In summary, these findings emphasize the importance of evaluating data source characteristics when interpreting mobility-based social structures. While connected vehicle data offer detailed insights into private vehicular travel, they are limited in their representation of transit-dependent, lower-income, and racially marginalized populations. Mobile phone data, despite certain limitations in temporal granularity, provide a more comprehensive view of urban mobility, especially in capturing socially integrative and spatially complex behaviors. These distinctions underline the need for careful consideration in the application of emerging mobility datasets in transportation equity research, urban planning, and policy design.

Supplemental material

Supplemental material - Comparing connected vehicle and mobile phone data for urban mobility analysis: Examining mobility diversity and spatial interaction across income groups

Supplemental material for Comparing connected vehicle and mobile phone data for urban mobility analysis: Examining mobility diversity and spatial interaction across income groups by Xinyu Li, Shih-Lung Shaw, Dayong Wu, Xinyue Ye, Xiao Huang, Zhenlong Li, Qi Wang, and Chunwu Zhu in Environment and Planning B: Urban Analytics and City Science

Footnotes

ORCID iDs

Xinyu Li

Shih-Lung Shaw

Dayong Wu

Xinyue Ye

Xiao Huang

Zhenlong Li

Chunwu Zhu

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Science Foundation, (2323419, 2401860, 2430700, 2526487) and Texas Department of Transportation (TxDOT) (693JJ22330000Y560TX0511224).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The research team previously accessed Wejo connected vehicle data through TxDOT RTI Project 0-7200 under a statewide data agreement. This agreement has since expired, and the data are no longer available.

Supplemental material

Supplemental material for this article is available online.

Author biographies

Xinyu Li is a Lecturer in the Department of Geography, Planning, and Sustainability, University at Albany, State University of New York. His work integrates deep learning, connected vehicle data and spatial analysis to address urban transportation and human mobility challenges.

Shih-Lung Shaw is Chancellor’s Professor and Alvin and Sally Beaman Professor of Geography at the University of Tennessee, Knoxville. His research interests cover transportation geography, geographic information science (GIScience), space-time analytics, and human dynamics.

Dayong Wu is Associate Research Scientist at Texas A&M Transportation Institute. His research focuses on Traffic Safety, Intelligent Transportation Systems, Machine Learning, GIS in Transportation, and Traffic Mobility and Operation Analysis.

Xinyue Ye is Endowed Shelby Distinguished Professor of Artificial Intelligence in the Department of Geography and the Environment and Founding Director of Alabama Geospatial AI & Information Science Hub at the University of Alabama. His research focuses on GeoAI, urban informatics, and digital twins.

Xiao Huang is Assistant Professor in the Department of Environmental Sciences at Emory University. His research focuses on human-environment interaction, computational social sciences, vulnerability and resilience, urban informatics, disaster mapping and mitigation, GeoAI and deep learning, and disaster remote sensing.

Zhenlong Li is Associate Professor in the Department of Geography at the Pennsylvania State University, where he leads the Geoinformation and Big Data Research Laboratory. His research focuses on geospatial big data analytics, spatial computing, and GeoAI.

Qi Wang is Associate Professor and the Vice Chair for Research in the Department of Civil and Environmental Engineering at Northeastern University. His research focuses on utilizing urban informatics to enhance the understanding, sustainability, and resilience of urban and social systems.

Chunwu Zhu is Assistant Research Scientist at Texas A&M Transportation Institute. His research focuses on urban intelligence and urban data science, traffic safety, transportation policy evaluation, environmental justice, urban digital twin, and environmental criminology.

References

Abbasi

Min

(2021) Measuring destination-based segregation through mobility patterns: application of transport card data. Journal of Transport Geography 92: 103025. https://doi.org/10.1016/j.jtrangeo.2021.103025

Ali

Sharma

Haque

, et al. (2020) The impact of the connected environment on driving behavior and safety: a driving simulator study. Accident Analysis & Prevention 144: 105643. https://doi.org/10.1016/j.aap.2020.105643

Antonio

Co. S

(2019) Electric Vehicle Fleet Conversion and City-wide Electric Vehicle Infrastructure Study C. o. S. Antonio. https://www.sanantonio.gov/Portals/0/Files/Sustainability/ElectricVehicles/EVFleetConversion-EVInfrastructureStudy.pdf

Athey

Ferguson

Gentzkow

, et al (2019) Experienced Segregation . Working Paper 3785. https://gsb.stanford.edu/faculty-research/working-papers/experienced-segregation

Bourne

(2024) US Connected Cars 2024 Data-Driven Vehicles are the New Frontier of Mobile Marketing. https://www.emarketer.com/content/us-connected-cars-2024?utm_source=chatgpt.com#page-report

Bureau

USC

(2023a) B08006: Means of Transportation to Work by Sex for San Antonio City, Texas. https://data.census.gov/table/ACSDT5Y2023.B08006?q=B08006+San+antonio+

Bureau

USC

(2023b) Quickfacts: San Antonio City, Texas. U.S. Census Bureau. Retrieved 2025 April 14 from. https://www.census.gov/quickfacts/fact/table/sanantoniocitytexas/POP010220#POP010220

Clark

Fossett

(2008) Understanding the social context of the Schelling segregation model. Proceedings of the National Academy of Sciences 105(11): 4109–4114. https://doi.org/10.1073/pnas.0708155105

Gao

Han

Dong

, et al. (2019) Connected vehicle as a mobile sensor for real time queue length at signalized intersections. Sensors 19(9): 2059. https://doi.org/10.3390/s19092059

10.

Goodall

Smith

Park

(2013) Traffic signal control with connected vehicles. Transportation Research Record 2381(1): 65–72. https://doi.org/10.3141/2381-08

11.

Guo

Zhu

Jin

, et al. (2012) Discovering spatial patterns in origin‐destination mobility data. Transactions in GIS 16(3): 411–429. https://doi.org/10.1111/j.1467-9671.2012.01344.x

12.

Wang

She

, et al. (2021) Human mobility data in the COVID-19 pandemic: characteristics, applications, and challenges. International Journal of Digital Earth 14(9): 1126–1147. https://doi.org/10.1080/17538947.2021.1952324

13.

Huang

Zhao

Wang

, et al. (2022) Unfolding community homophily in US metropolitans via human mobility. Cities 129: 103929. https://doi.org/10.1016/j.cities.2022.103929

14.

Islam

Abdel-Aty

(2023) Traffic conflict prediction using connected vehicle data. Analytic methods in accident research 39: 100275. https://doi.org/10.1016/j.amar.2023.100275

15.

Iyer

Menezes

Barbosa

(2024) Mobility and transit segregation in urban spaces. Environment and Planning B: Urban Analytics and City Science 51(7): 1496–1512. https://doi.org/10.1177/23998083231219294

16.

Jardel

Delamater

(2024) Uncovering representation bias in large‐scale cellular phone‐based data: a case study in North Carolina. Geographical Analysis 56(4): 723–745. https://doi.org/10.1111/gean.12399

17.

Jiang

Han

, et al. (2025) Comparative analysis of human mobility patterns: utilizing taxi and mobile (SafeGraph) data to investigate neighborhood-scale mobility in New York city. Annals of GIS 31: 387–411.

18.

Jiang

Han

, et al. (2025) Comparative analysis of human mobility patterns: utilizing taxi and mobile (SafeGraph) data to investigate neighbourhood-scale mobility in New York city. Annals of GIS 31: 1–25. https://doi.org/10.1080/19475683.2025.2487984

19.

Kain

(1968) Housing segregation, negro employment, and metropolitan decentralization. Quarterly Journal of Economics 82(2): 175–197. https://doi.org/10.2307/1885893

20.

Khan

Dey

Chowdhury

(2017) Real-time traffic state estimation with connected vehicles. IEEE Transactions on Intelligent Transportation Systems 18(7): 1687–1699. https://doi.org/10.1109/tits.2017.2658664

21.

Dayong

(Jason) Wu

Sun

(2024a) Leveraging connected vehicle data for near-crash detection and analysis in urban environments. arXiv preprint arXiv:2409.11341.

22.

Ning

Jing

, et al. (2024b) Understanding the bias of mobile location data across spatial scales and over time: a comprehensive analysis of SafeGraph data in the United States. PLoS One 19(1): e0294430. https://doi.org/10.1371/journal.pone.0294430

23.

Liao

Gil

Yeh

, et al. (2025) Socio-spatial segregation and human mobility: a review of empirical evidence. Computers, Environment and Urban Systems 117: 102250. https://doi.org/10.1016/j.compenvurbsys.2025.102250

24.

Lin

Peeta

(2019) Efficient data collection and accurate travel time estimation in a connected vehicle environment via real-time compressive sensing. Journal of Big Data Analytics in Transportation 1(2): 95–107. https://doi.org/10.1007/s42421-019-00009-5

25.

Marin

Molinero

Arcaute

(2022) Uncovering structural diversity in commuting networks: global and local entropy. Scientific Reports 12(1): 1684. https://doi.org/10.1038/s41598-022-05556-6

26.

Moro

Calacci

Dong

, et al. (2021) Mobility patterns are associated with experienced income segregation in large US cities. Nature Communications 12(1): 4633. https://doi.org/10.1038/s41467-021-24899-8

27.

Olia

Abdelgawad

Abdulhai

, et al. (2016) Assessing the potential impacts of connected vehicles: mobility, environmental, and safety perspectives. Journal of Intelligent Transportation Systems 20(3): 229–243. https://doi.org/10.1080/15472450.2015.1062728

28.

Ong

Blumenberg

(1998) Job access, commute and travel burden among welfare recipients. Urban Studies 35(1): 77–93. https://doi.org/10.1080/0042098985087

29.

Schlosser

Sekara

Brockmann

, et al. (2021) Biases in human mobility data impact epidemic modeling. arXiv preprint arXiv:2112.12521.

30.

Schoettle

Sivak

(2014) A survey of public opinion about connected vehicles in the U.S., the U.K., and Australia. 2014 International Conference on Connected Vehicles and Expo (ICCVE). Vienna: IEEE, 687–692.

31.

Shelton

Poorthuis

Zook

(2015) Social media and the city: rethinking urban socio-spatial inequality using user-generated geographic information. Landscape and Urban Planning 142: 198–211. https://doi.org/10.1016/j.landurbplan.2015.02.020

32.

Shin

H-S

Callow

Dadvar

, et al. (2015) User acceptance and willingness to pay for connected vehicle technologies: adaptive choice-based conjoint analysis. Transportation Research Record 2531(1): 54–62. https://doi.org/10.3141/2531-07

33.

Sparks

Moehl

Weber

, et al. (2022) Shifting temporal dynamics of human mobility in the United States. Journal of Transport Geography 99: 103295. https://doi.org/10.1016/j.jtrangeo.2022.103295

34.

Spiegel

Clark

Domina

, et al. (2025) Peer income exposure across the income distribution. Proceedings of the National Academy of Sciences 122(7): e2410349122. https://doi.org/10.1073/pnas.2410349122

35.

Sultana

Hassan

(2025) Effects of connected vehicle (CV) technology on improving driver behavior in the presence of connected and automated vehicle (CAV) platoons. Transportation Research Part F: Traffic Psychology and Behaviour 109: 1419–1436. https://doi.org/10.1016/j.trf.2025.01.046

36.

Toole

Ulm

González

, et al. (2012) Inferring land use from mobile phone activity. Proceedings of the ACM SIGKDD international workshop on urban computing. Proceedings of the ACM SIGKDD international workshop on urban computing 1–8.

37.

Vachuska

(2023) Racial segregation in everyday mobility patterns: disentangling the effect of travel time. Socius 9: 23780231231169261. https://doi.org/10.1177/23780231231169261

38.

van Ham

Tammaru

de Vuijst

, et al. (2016) Spatial Segregation and Socio-Economic Mobility in European Cities. (Bonn: IZA Institute for the Study of Labor).

39.

Wang

Phillips

Small

, et al. (2018) Urban mobility and neighborhood isolation in America’s 50 largest cities. Proceedings of the National Academy of Sciences 115(30): 7735–7740. https://doi.org/10.1073/pnas.1802537115

40.

Xian

Yip

N-m.

(2022) Beyond home neighborhood: mobility, activity and temporal variation of socio-spatial segregation. Journal of Transport Geography 99: 103304. https://doi.org/10.1016/j.jtrangeo.2022.103304

41.

Xie

Yang

Ozbay

, et al. (2019) Use of real-world connected vehicle data in identifying high-risk locations based on a new surrogate safety measure. Accident Analysis & Prevention 125: 311–319. https://doi.org/10.1016/j.aap.2018.07.002

42.

Santi

Ratti

(2022) Beyond distance decay: discover homophily in spatially embedded social networks. Annals of the Association of American Geographers 112(2): 505–521. https://doi.org/10.1080/24694452.2021.1935208

43.

Yabe

Bueno

BGB

Dong

, et al. (2023) Behavioral changes during the COVID-19 pandemic decreased income diversity of urban encounters. Nature Communications 14(1): 2310. https://doi.org/10.1038/s41467-023-37913-y

44.

Yang

Samaranayake

Dogan

(2023) Assessing impacts of the built environment on mobility: a joint choice model of travel mode and duration. Environment and Planning B: Urban Analytics and City Science 50(9): 2359–2375. https://doi.org/10.1177/23998083231154263

45.

Yip

Forrest

Xian

(2016) Exploring segregation and mobilities: application of an activity tracking app on mobile phone. Cities 59: 156–163. https://doi.org/10.1016/j.cities.2016.02.003

46.

Zachary

Dobson

(2021) Urban development and complexity: Shannon entropy as a measure of diversity. Planning Practice and Research 36(2): 157–173. https://doi.org/10.1080/02697459.2020.1852664

47.

Zhang

Abdel-Aty

(2022) Real-time crash potential prediction on freeways using connected vehicle data. Analytic methods in accident research 36: 100239. https://doi.org/10.1016/j.amar.2022.100239

48.

Zhang

Cheng

, et al. (2023) Human mobility patterns are associated with experienced partisan segregation in US metropolitan areas. Scientific Reports 13(1): 9768. https://doi.org/10.1038/s41598-023-36946-z

49.

Zook

Graham

(2007) The creative reconstruction of the internet: google and the privatization of cyberspace and DigiPlace. Geoforum 38(6): 1322–1343. https://doi.org/10.1016/j.geoforum.2007.05.004

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.32 MB

0.00 MB