Abstract
Urban planners need up-to-date, global, and consistent street network models and indicators to measure resilience and performance, model accessibility, and target local quality-of-life interventions. This article presents up-to-date street network models and indicators for every urban area in the world. It uses 2025 urban area boundaries from the Global Human Settlement Layer, allowing users to join these data to hundreds of other urban attributes. Its workflow ingests 180 million OpenStreetMap nodes and 360 million OpenStreetMap edges across 10,351 urban areas in 189 countries. The code, models, and indicators are publicly available for reuse. These resources unlock worldwide urban street network science beyond samples as well as local analyses in under-resourced regions where models and indicators are otherwise less-accessible.
Introduction
Scholars and practitioners use spatial graphs to model street networks to understand or predict urban phenomena including traffic dynamics, accessibility to daily living needs, and the resilience and sustainability of the urban form (Barthelemy, 2022). These networks are defined by both their topology—for example, connections and configuration—and geometry—for example, positions, lengths, areas, and angles (Barthelemy, 2011). Accordingly, various topological and geometric indicators exist throughout the literature to measure important street network characteristics. For instance, node degrees reveal streets’ connectedness (Barrington-Leigh and Millard-Ball, 2020), weighted betweenness centralities identify relatively important parts of the network (Barthelemy and Boeing, 2025), and circuity indicates its efficiency or lack thereof (Giacomin and Levinson, 2015). Such indicators can then inform downstream urban analytics to target planning interventions or benchmark and monitor cities’ progress toward stated sustainability goals (Higgs et al., 2025).
Up-to-date, global, consistent urban street network models and indicators are needed now more than ever before as planners face intertwined sustainability and public health crises in cities around the world (Giles-Corti et al., 2022). Meanwhile, urban science seeks to expand beyond the limits of traditional sampling to build universal theory and better understand under-studied regions, such as the Global South (Lobo et al., 2020). Yet the limitations of traditional data sources and methods present headwinds to these efforts. Data on urban streets are often digitized inconsistently from place to place, thwarting apples-to-apples global comparisons and making analyses particularly difficult in under-resourced regions (Liu et al., 2022). Popular data sources such as OpenStreetMap offer reasonably high-quality data around the world but do not package it in graph-theoretic form nor provide indicators (Boeing, 2025a). Tools like OSMnx help, but still require coding knowledge to conduct the analysis and potentially extensive computational resources for someone trying to conduct global urban science.
This article presents a resource to fill this gap: street network models and indicators worldwide—plus reproducible code to regenerate them—for scholars and practitioners to easily reuse without reinventing the wheel. Using data from OpenStreetMap and boundaries from the 2025 Global Human Settlement Layer (GHSL), this study models and analyzes the street networks of every urban area in the world, ingesting 180 million OpenStreetMap nodes and 360 million OpenStreetMap edges across 10,351 urban areas in 189 countries. The rest of this article describes this open data repository of street network models and indicators, as well as the open-source software repository containing the code to generate them.
Reproducible methods
The following computational workflow, written in the Python programming language, generates the models and calculates the indicators.
Urban boundaries
The workflow begins by extracting the boundary polygons of each urban area in the world from the 2025 GHSL Urban Centre Database (UCD), which contains 11,422 entities. Mari Rivero et al. (2025) describe this input dataset in detail, but to summarize, the GHSL integrates a vast array of census data, remote sensing data, and volunteered geographic information to delineate the world’s urbanized areas’ boundaries and attach corresponding attribute data. We retain urban areas with >1 km2 built-up area and a “high” GHSL quality control score, resulting in 10,351 urban areas. This basic filtering ensures that we are modeling true urbanized areas rather than false positives or small villages.
Network modeling
We use OSMnx v2.0.2 to download OpenStreetMap raw data in February 2025 and then construct a spatial graph model of the street network within each urban area. The models’ theoretical framework is detailed in Boeing (2025b), but we define a street as a public drivable thoroughfare in an urban built environment. Thus, this includes cyclist and pedestrian infrastructure that is part of the street network itself but does not include other paths such as passageways through buildings, footpaths or equestrian trails through nature preserves, or cycleways through parks, as these are not “streets” by common definition. However, this code workflow provides free, flexible reuse by anyone, including adjusting the filters to easily model different transport networks by re-running the code for any other network type.
The resulting models are nonplanar directed multigraphs with possible self-loops (Boeing, 2025a). They have node/edge attribute data from OpenStreetMap, plus geographic coordinates and geometries, and we parameterize OSMnx to retain all graph components and run its edge simplification algorithm (Boeing, 2025b). Each urban area’s graph is saved as a GraphML file, a standard graph serialization format.
Elevation
We attach elevation, in meters above sea level, to each node in each urban area’s graph using two global digital elevation models (GDEMs): the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) v3 GDEM, and the Shuttle Radar Topography Mission (SRTM) version 3.0 GDEM with voids filled. Both are 1 arcsecond (approximately 30-m) resolution. We download all the GDEM raster files for ASTER (45,824 tiles) and SRTM (14,297 tiles) from NASA EarthData, build a virtual raster for each, and then use OSMnx to load each GraphML file and attach the elevation from ASTER and SRTM to each graph node.
We choose one elevation value to use as the “official” node elevation by comparing the ASTER and SRTM values to a “tie-breaker” value from Google. To do so, we download each node’s elevation from the Google Maps Elevation API and then choose whichever value is nearer to Google’s. Finally, we calculate edge grades and re-save each GraphML file with these node/edge attributes.
Indicator calculation
The indicators dataset contents. Variables carried over from GHSL are noted.
Data repository preparation
We convert each GraphML file to a GeoPackage and node/edge list files. The former allows users to work with these spatial networks in any GIS software. The latter provides a minimal, lightweight, highly compressible version of the models. Then we perform a series of file verification checks and create metadata files for the graphs’ node and edge attributes and all of the indicators. Finally, we compress and upload all model files (GeoPackages, GraphML, and node/edge lists), indicators, and metadata to the Harvard Dataverse.
Code and data products
Code repository
The preceding methods are fully reproducible by running the modeling and analytics workflow made publicly available in the source code repository 1 on GitHub under the terms of the MIT license. A well-equipped personal computer can execute this workflow, but given the resource requirements it may be better (and faster) to run it in a high-performance computing cluster, where available. The input data, dependencies, and resources required to run it are documented in the code repository.
Data repository
The data repository comprises five datasets nested within a top-level Dataverse 2 repository:
The model files are zipped at the country level, and each is identified by its urban area name and UCD ID. The latter allows users to join these indicators to GHSL attribute data.
Practical applications
In terms of downstream applications, these models and indicators can be used to estimate relationships between network topology and travel patterns or land use (Wang et al., 2018, 2020), identify relatively important parts of a network via empirical distributions of within-city betweenness centrality (Barthelemy and Boeing, 2025), measure sprawl (Gervasoni et al., 2017), simulate disasters’ effects on street network resilience around the world (Boeing and Ha, 2024), improve logistics’ last-mile trip predictions (Merchán et al., 2020), and estimate relationships between greenhouse gas emissions and street network design (Boeing et al., 2024). To date, this repository’s models and indicator data have been downloaded over 70,000 times by users.
Dyer et al. (2024) argue that urban scholars and practitioners need models and indicators that keep up with the pace of transformational change in an era of rapid urbanization. This project seeks to address this by building on prior work initially conducted in 2019–2020 that generated a preliminary version of the data repository (Boeing, 2022). That initial version was based on the 2015 version of the GHSL UCD and 2020 OpenStreetMap data. This new version uses the 2025 GHSL UCD and 2025 OpenStreetMap data to make several new contributions.
First, it includes over 1400 more urban areas and 11 more countries than the earlier version. This entails significantly more worldwide coverage in an era of rapid urban expansion. That is, these new models allow us to study more—and more-recently urbanized—cities than before.
Second, these new models incorporate 10 years of recent urbanization into their updated urban area boundaries and 5 years of new community additions to OpenStreetMap. As such, this workflow’s modeling includes approximately 20 million more street network nodes and 40 million more edges than the earlier version. The new urban boundaries allow users to link these street network models and indicators to hundreds of new, up-to-date GHSL attributes on urban climate, land use, economic conditions, etc. For instance, this allows us to simulate urban travel and access in the context of extreme heat or cold, generate integrated models of urban networks and land uses, or estimate relationships between economic activity and underlying transport infrastructure.
Third, it adds new attributes and indicators to the repository (most consequentially the betweenness centrality of every node in every urban area’s street network, which is time- and resource-intensive to calculate, yet unlocks useful analyses of network structure and resilience). For example, betweenness centrality can provide us with indicators of chokepoints when simulating network disruptions and resilience (Boeing and Ha, 2024) and indicators of the relative importance of network elements (Barthelemy and Boeing, 2025). Fourth, it uses finer-grained SRTM data (30 m instead of the previous 90 m resolution) for more precise elevation attribute values. This is especially useful when modeling active travel behavior, which is sensitive to arduous elevation changes (Barthelemy et al., 2024).
Fifth, from a code product perspective, the workflow’s code base has been wholly refactored and rewritten from the ground-up to significantly reduce its cyclomatic complexity, memory use, and runtime. This makes the workflow more maintainable, sustainable, and easier to re-run in the future to periodically update the data repository whenever new GHSL data are released. This also makes it easier for downstream users to re-run the code to model any network types they wish (including pedestrian networks or cycling networks).
Finally, these models and indicators themselves unlock other researchers’ work, such as through the example applications discussed above. This project provides a global dataset to conduct both worldwide urban street network science beyond samples as well as local analyses particularly in less-resourced regions where such models and indicators are most needed, yet most scarce (Giles-Corti et al., 2022).
Footnotes
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
Users may freely download the models or indicators directly from the aforementioned Dataverse, or access the source code and documentation at the aforementioned GitHub source code repository.
