Sage Journals: Discover world-class research

Abstract

This study addresses the challenges of monitoring urban development where sprawl, infill and soft densification are prevalent. To accurately identify the spatial distribution and characteristics of different development types, a multi-modal deep learning model was developed to predict parcel-level development types using very high-resolution land cover maps and tabular data representing parcel attributes of surface cover, built form and dwelling counts. This model was compared to a range of supervised and unsupervised classification models and evaluated in two contexts: (i) within Perth, where training data was abundant, and (ii) when generalising models to other cities across Australia. To undertake this analysis, a ground truth dataset of almost 70,000 labelled parcels were sampled from Perth, Western Australia, with fine-tuning and test datasets produced for other Australian cities (Melbourne, Sydney, Brisbane and Adelaide). The multi-modal deep learning model achieved an overall accuracy of 94% when evaluated in Perth and overall accuracy ranging from 91% to 94% when evaluated in other cities. The model was more accurate in classifying visually distinct development types that are relevant to monitoring infill and densification and performed well in transferring to other cities. These results indicate that deep learning models and transfer learning workflows can be leveraged to develop generalisable urban development type mapping models. This approach opens opportunities for multi-city analyses to further understand the spatial and temporal dynamics of urban growth and densification, and provides data to guide and evaluate the effects of policy that seeks to balance the need for housing our growing populations with maintaining healthy natural environments.

Keywords

deep learning multi-modal models transfer learning infill sprawl residential development urban growth monitoring

Introduction

Development is constantly reshaping cities (Loorbach and Shiroyama, 2016). Much of this occurs at the property parcel-level, for example, single unit lots may be subdivided into multiple residences, or adjacent parcels in inner-city areas may be agglomerated to accommodate higher-density developments such as apartments or townhouses. Here, a parcel is defined as an urban land unit with clearly defined boundaries that typically reflect land ownership. This may range from a single homeowner to a strata-managed property comprising multiple single dwellings or stacked multi-storey apartments, as well as non-residential parcels. At the parcel-level, development impacts built form, intensity and land use altering the function and character of neighbourhoods (Dovey et al., 2017; Huang et al., 2023; Polidoro et al., 2012). Understanding the nature of change at a fine scale, and the cumulative impact these transformations have across the urban landscape, is critical for supporting desired governance outcomes.

To this end, identifying and monitoring different types of development provides the evidence needed to link changes to the built form with impacts on local environments, micro-climates, economic activity and quality of life (Lungman et al., 2024; Stewart and Oke, 2012; Zhang et al., 2023). Such monitoring requires multi-temporal maps of development type at the parcel-level to enable tracking of fine-scale changes over time. Often, studies characterising urban patterns classify spatial units at coarser scales than the property parcel or use information from multiple adjacent parcels (Bobkova et al., 2021; Fleischmann and Arribas-Bel, 2024; Fleischmann et al., 2022; Vanderhaegen and Canters, 2017), However, there is also a need for approaches that classify a parcel’s development type using only information from within the parcel’s boundary. This is necessary to identify fine-scale and localised changes required for monitoring policies that seek to promote increased land utilisation or evaluate infill targets and rezoning initiatives.

Urban classification studies often use metrics describing the composition, shape or configuration of urban elements that are derived from vector-based GIS data (Fleischmann, 2019; Schirmer and Axhausen, 2015). These metrics are subsequently organised in a tabular form and used as predictor variables in classification tasks (Berghauser Pont et al., 2019a; Fleischmann et al., 2022). However, as noted by Chen et al. (2021) within the context of mapping urban vitality, deep learning vision models can extract task-relevant predictive features from geospatial data that align with a human’s visual perception of images. This has relevance to parcel development type classification, as by visually examining schematic plans, or land cover maps, humans can discriminate between development types.

With supervision from human-generated labels, it is plausible that deep learning vision models could learn to discriminate between development types based on perceptual spatial features. Such deep learning-based extraction of visual elements from images could advance or augment urban classification approaches that primarily use vector geometry-derived metrics. This could manifest through the detection of spatial features not captured by a-priori computation of spatial metrics, thus providing more information for classification tasks. Deep learning systems are trained to extract task-relevant salient features which can help address the challenge of identifying spatial metrics best suited for classification. The use of deep learning vision models to extract visual features from geospatial data is achievable using imagery datasets such as those generated from satellite, aerial or street view captures (Chen et al.’s, 2021).

However, existing studies of urban morphometry frequently classify urban form based on vector data reflecting property parcels, building footprints or street networks (Berghauser Pont et al., 2019a, 2019b). This is partly motivated by the paucity of information describing other elements of the built form (Schirmer and Axhausen, 2015). However, over recent years city-scale geospatial data capturing intra-parcel detail for a range of built environment features such as impervious and vegetation covers has become widely available. These extra thematic details have the potential to augment traditional variables used in urban morphology studies and facilitate discrimination between types of parcel development. As these richer urban datasets become available, it’s necessary to consider how they can be incorporated into classification approaches that advance mapping of urban development types.

To date, myriad approaches have been used to classify development across multiple scales, using different urban features and input datasets, for a range of applications and use cases, and using different analytic techniques and class definitions. Often, classification approaches are developed for one or a small number of geographic contexts, with it being unclear how such approaches generalise to new locations. These differences preclude comparison across multiple studies and/or multiple cities, and hamper understanding urban development dynamics (Zhang et al., 2023). Deep learning models have proven amenable to transfer learning workflows where models trained in one (source) domain can be fine-tuned for application in another. In geospatial machine learning, this involves training a model in one location where there is abundant labelled data and fine-tuning for application in a label-scarce environment (Ma et al., 2024). There is a need to assess if deep learning models and transfer learning approaches are suitable for generalisable and scalable urban development mapping approaches.

Building on these research gaps, a deep learning model is presented that can learn from different modalities of parcel-level geospatial data to classify development types and is amenable to transfer learning workflows. These data modalities include (i) spatial features extracted from representations of a parcel’s configuration of built and natural cover and (ii) a suite of tabular metrics representing built form, surface covers and dwelling counts. This multi-modal model was compared to a range of classification approaches including unsupervised classification of plot and building footprint morphometrics unsupervised and supervised classification of parcel surface cover and built form tabular metrics. Given the varied analytic approaches and datasets used to classify urban areas and the different class definitions and typologies used to characterise development (Berghauser Pont et al. 2019a; Bobkova et al., 2021; Fleischmann and Arribas-Bel, 2022, 2024; Hecht et al., 2015; Kim et al., 2024; Vanderhaegen and Canters, 2017; Zhang et al., 2023), a comparative analysis of predictive mapping approaches is warranted to guide urban planning practitioners and future research efforts (Fleischmann et al., 2021; Zhang et al., 2023).

The comparative analysis presented in this paper focuses on (i) the ability of different models to classify a range of development type classes necessary to capture varying profiles and intensities of densification and sprawl and (ii) the ability of models to generalise for multi-city analyses. Each of the classification models considered here was assessed in terms of their performance in classifying a range of parcel-level development type classes relevant to urban planning applications. Using representative city-scale evaluation datasets, analyses were undertaken comparing each model’s capacity to accurately classify parcel-level development types for a focal city where there is abundant training data (Perth, Western Australia) and their ability to generalise to different cities naively and via transfer learning workflows (Melbourne, Sydney, Brisbane and Adelaide). Section A of the Supplemental Material describes the urban development characteristics of Perth and provides the geographic location of all Australian cities included in this study. Finally, city-scale deployments of the multi-modal deep learning model are demonstrated across a range of parcel development type mapping applications.

Datasets

Surface cover and built form data

To map parcel-level development, Nearmap AI vector data was used to generate image-like representations of a parcel’s built and natural features along with tabular metrics describing a parcel’s surface cover and built form attributes (Table 1). Nearmap AI vector data is a commercial urban mapping product developed by segmenting 7.5 cm resolution aerial RGB images into various built and natural surface cover classes using a proprietary deep learning algorithm (Nearmap, 2024). Nearmap AI-derived maximum building height (m) and number of storeys (1, 2 and 3+) were extracted to provide additional 3D building attributes representing the built form. For each city, data was acquired during summer months (underlying aerial image capture closest to the 1st of February) in 2019 and 2023.

Table 1.

Spatial metrics used to describe each parcel.

Attribute	Description
Parcel area	Total area of parcel in squared metres
% Medium and high vegetation	Percentage of ‘medium and high vegetation (>2 m)’ relative to the total parcel area
% Low vegetation	Percentage of ‘low vegetation (0.5 m–2 m)’ relative to the total parcel area
% Very low vegetation	Percentage of ‘very low vegetation (<0.5 m)’ (e.g. plants in a garden bed, rough native grasses) relative to the total parcel area
% Lawn	Percentage of ‘lawn’ (maintained grass) relative to the total parcel area
% Impervious	Percentage of ‘hard surface’ relative to the total parcel area
% Pervious	Percentage of ‘soft surface’ relative to the total parcel area
% Building	Percentage of ‘building’ relative to the total parcel area
Maximum height	Height of the tallest building in metres
Maximum storeys	Number of storeys of the highest building
G-NAF count	The number of G-NAF addresses intersecting the parcel

Property parcel layers

For each city, and for time points matching the acquisitions of Nearmap AI data, property parcel data was obtained by processing the relevant state’s cadastral datasets (see Section B of the Supplemental Material for the cadastral databases’ sources) to extract residential properties, vacant lots with development potential or built-up parcels with non-residential uses. Additionally, the number of addresses in each parcel was computed from a temporally matched Geoscape Geocoded National Addresses File (G-NAF) database (Geoscape Australia, 2024) and included in the tabular metrics (Table 1).

Development type classes

In this analysis, models were evaluated in terms of performance in classifying a parcel as one of the following development types: single house, multi-unit occupancy, cluster, low-rise apartment, high-rise apartment, other urban, rural, vegetation and bare earth (Table 2 and Figure 1). Any parcel with a Nearmap AI construction site feature class covering more than 30% of its area was classified as construction. These development type classes were informed by existing Australian urban typologies and design codes (London et al., 2020; Western Australian Planning Commission, 2024), making them relevant to the Australian context, and reflect variability in infill development in Australian cities (Table 2). These selected classes are descriptive and relate to planning and policy applications, emerged as predominant through iterative ground-truth labelling, and are transferable across Australian cities. The development type classes used in this study span residential and non-residential uses, including natural features, making them suitable for analysing diverse urban dynamics such as infill, densification and sprawl.

Table 2.

Description of each development type class.

Class	Number of dwellings per parcel	Characteristics	WAPC design codes^a	CRCWSC^b
Single house	1	Detached or semi-detached from adjacent dwellings. Includes private outdoor space	Single detached house, semi-detached house, terrace house	Single detached house, terrace, townhouse
Multi-unit occupancy	Typically 2–4 but can be higher	Dwellings can be detached, semi-detached or joined or conjoined. Often with private driveways, which can be shared	‘Plexes (duplex, triplex, quadruplex), row houses	Two-unit occupancy
Cluster	More than 3	Grouped dwellings surrounding a shared driveway	Villas	Cluster
Low-rise apartment	Typically more than 5	Two-storey buildings with shared outdoor space; may include a parking lot	Apartment house, low-rise apartment building	Stacked dwellings, walk-up apartments
High-rise apartment	Typically more than 10	Residential buildings with three or more storeys and shared outdoor space; usually include a parking lot		Mid-rise apartments
Rural	Predominantly 1	Large residential properties with fields, pasture and/or natural vegetation
Other urban	0	A variety of non-residential land-use parcels with built infrastructure, including non-residential buildings, car parks, concrete pathways and roads
Vegetation	0	Predominantly vegetated parcels
Bare earth	0	Predominantly sandy or pervious covers

^aWestern Australian Planning Commission design codes (Western Australian Planning Commission, 2024).

^bInfill typologies catalogue by cooperative research centre for water-sensitive cities (London et al., 2020).

Figure 1.

Examples of Nearmap aerial RGB images (left) and Nearmap AI vector surface cover and built form images (right) for each development type: (a) single house; (b) multi-unit occupancy; (c) cluster; (d) low-rise apartment; (e) high-rise apartment; (f) rural; (g) other urban; (h) vegetation and (i) bare earth. Parcel boundaries are indicated by red outlines. Nearmap RGB aerial images and Nearmap AI vector data are date-matched, both from 18th January 2023.

Ground truth datasets

A ground truth dataset of 69,403 parcels sampled from Perth was created and split into training, validation and test datasets, which contained 43,656, 12,256 and 13,491 parcels, respectively, for Perth model training and evaluation. To evaluate the model’s transferability to other localities outside of the training domain, a total of 14,578, 10,135, 12,888 and 15,803 parcels were sampled from Melbourne, Sydney, Brisbane and Adelaide, respectively, to form the transfer cities’ test datasets. Further, a smaller sample of parcels that were out-of-distribution from the Perth training data were selected from each transfer city using the approach following Meyer and Pebesma (2021) to form the fine-tuning dataset, which comprised a sum of 3340 parcels and was used to fine-tune the Perth model. A multi-step sampling approach was developed to ensure parcels across different land-use types and dwelling densities were selected (see Sections B–D in the Supplemental Material for more details on the sampling approach and summary tables for each dataset). All sampled parcels were manually labelled via visual inspection of temporally matched land cover maps (Nearmap AI) and high-resolution aerial images (Nearmap RGB aerial images). If a parcel’s label was ambiguous, its label type was resolved through review by multiple analysts.

Methods

Multi-modal deep learning model

Using the Perth training dataset and the transfer cities fine-tuning dataset, a range of machine learning models were trained to classify a property parcel’s development type. Specifically, each model was trained on the Perth training dataset and then its parcel development type classification performance was evaluated using test datasets for each of the five cities considered here. This allowed for the assessment of how well different models perform in a focal city where training data is abundant (i.e. the Perth example here) and how well models trained in one city generalise to others (i.e. evaluating the Perth model in different Australian cities). Next, each model was re-trained or fine-tuned using a combination of the Perth dataset and the other cities’ fine-tuning dataset. This exercise assessed how well a model generalised to a new city via transfer learning workflows or when a small amount of data from the target city was available to retrain the model.

The main model under consideration here was a custom multi-modal deep learning model that classifies a parcel’s development type by learning to extract (i) visual predictive features from geospatial data representing surface cover and built form elements within a parcel and (ii) tabular predictive features from metrics summarising parcel attributes. This model was benchmarked and compared against a range of other machine learning models that align with approaches presented in the literature for mapping urban development. These other machine learning models include a deep neural network, random forest and unsupervised classifier (k-means) trained using built form and surface cover tabular metrics, and unsupervised classifiers trained using urban morphometry metrics. The implementation details of these models are described in Section E of the Supplemental Material.

A multi-modal deep learning architecture (DL) was developed that combined a Swin vision transformer to learn predictive visual spatial features from colourised representations of built and natural elements within a parcel and a fully connected neural network (FCNN) to learn from tabular inputs of metrics describing parcel surface cover, built form and dwelling attributes. This study used a Swin Transformer V2 architecture for the vision component of the model (Liu et al., 2022). A schematic of the multi-modal model architecture is presented in Figure 2.

Figure 2.

Multi-modal model architecture combining images and tabular data to output a probability distribution across the development classes.

The multi-modal model (Figure 2.) was trained to classify the development type of parcels using the Perth training and validation datasets. The Swin Transformer V2 was initialised with ImageNet weights and fine-tuned using Low-Rank Adaptation (LoRA). All weights in the FCNN layers that process the tabular metric inputs and all weights in the final classification FCNN layers were randomly initialised and learnt from scratch during model training. The ADAM optimiser was used with an initial learning rate of 0.001, followed by a gradual decay scheduled by cosine annealing (Kingma and Ba, 2017). A weighted cross-entropy loss function was used with losses weighted by the inverse of the class sample frequency relative to the total number of samples, due to the imbalance between classes in the training dataset. The model was trained for 20 epochs with a batch size of 32. During training, random horizontal and vertical flips were applied to images to enhance the model’s robustness to images with different orientations. The model weights from the epoch with the lowest validation loss were saved and used for evaluation. The Perth model was adapted for use in other cities via transfer learning by updating its weights via LoRA using the other cities’ fine-tuning dataset and training for 5 epochs with an initial learning rate of 0.001 and a cosine annealing scheduler.

Model evaluation

The performance of the models was assessed using a range of metrics. The Perth test dataset was used to evaluate models trained with Perth training data. The other cities' test datasets were used to evaluate both the original models and the fine-tuned/re-trained models to assess how well the models transfer to locations outside the training data. For each class, the recall, precision and F1-score were calculated using equations (1)–(3), respectively. These provided an indication of the model’s performance for individual development type classes. To summarise the model’s overall performance, the accuracy and macro average F1-score were computed using equations (4) and (5), respectively. Additionally, overall and per-class top-2 accuracy were computed to assess if the correct development type label was in the model’s top two class probability scores.

{P r e c i s i o n}_{i} = \frac{{T P}_{i}}{{T P}_{i} + {F P}_{i}}

(1)

{R e c a l l}_{i} = \frac{{T P}_{i}}{{T P}_{i} + {F N}_{i}}

(2)

{F 1 S c o r e}_{i} = 2 \times \frac{{P r e c i s i o n}_{i} \times {R e c a l l}_{i}}{{P r e c i s i o n}_{i} + {R e c a l l}_{i}}

(3)

O v e r a l l A c c u r a c y = \frac{T P + T N}{T P + F N + T N + F P}

(4)

M a c r o A v e r a g e A c c u r a c y = \frac{\sum_{i = 1}^{C} {F 1 S c o r e}_{i}}{C}

(5)

in these equations, i represents a development type class, TP is the true positives, FP is the false positives, TN is the true negatives, FN is the false negatives and C is the number of classes.

Results

Evaluation results – focus city: Perth

The multi-modal deep learning model, trained and evaluated using data from Perth, achieved an overall accuracy of 94%, a macro average accuracy of 89%, and a top-2 accuracy of 98.5% (Table 3). The top-2 accuracy refers to the percentage of test cases when the correct class corresponded to the two highest class confidence scores predicted by the model. The multi-modal deep learning model’s overall accuracy and macro average accuracy metrics were higher than all other model’s trained and tested using the Perth dataset (Table 3). Both the multi-modal deep learning model and random forest model were able to accurately classify all classes as shown by the high F1-scores (Table 3). The random forest model achieved results comparable to the multi-modal deep learning model, with a 0.7% drop in overall accuracy and a 1.0% drop in macro average accuracy.

Table 3.

Evaluation of models on Perth test dataset using F1-scores and accuracy metrics (overall, macro-average accuracy and top-2 accuracy), with the number of samples per class shown in parentheses.

Model	F1-score (%)									Accuracy (%)
Model	Bare earth (226)	Cluster (1192)	High-rise apartment (107)	Low-rise apartment (488)	Multi-unit occupancy (1301)	Other urban (1346)	Rural (348)	Single house (8160)	Vegetation (323)	Overall	Macro	Top-2
Multi-modal DL	88.7	94.2	84.7	85.3	91.4	91.6	70.7	97.4	91.4	94.4	88.4	98.5
DNN	84.2	89.7	70.5	77.7	85.9	79.6	39.3	97.5	65.1	90.7	76.6	-
RF	90.8	90.0	77.1	84.0	88.7	85.4	80.1	97.7	93.0	93.7	87.4	-
K-means tabular metrics	70.8	0.0	46.7	0.0	0.0	49.4	0.0	89.0	52.8	75.3	34.3	-
K-means morphology metrics	0.0	0.0	0.0	0.0	0.0	8.6	0.0	81.8	2.8	69.2	10.4	-

Bold font denotes the most performant model for a given performance metric.

The multi-modal deep learning model was more accurate in classifying development types with distinct visual characteristics, such as clusters of dwellings, multi-units, low-rise and high-rise apartments, compared to the random forest and tabular deep neural network model (Table 3). This indicates that the vision component of the multi-modal model, which can extract visual spatial features for prediction, aids in the classification of certain development types.

The groupings generated by the k-means algorithms did not align well with the development type classes (Table 3). These performance metrics are not reflecting the k-means algorithm’s ability to cluster the dataset into groups, just that the cluster groups do not align well with pre-determined development type classes. This drop in classification performance, relative to the supervised models, is to be expected as the supervised models have been trained to directly classify the development type classes in the test set.

Evaluation results – transfer to other cities

The multi-modal deep learning model trained just using the Perth dataset had an improved predictive performance relative to the other models when evaluated using test data from Melbourne, Sydney, Brisbane and Adelaide. Notably, the distinction between the random forest model and multi-modal deep learning model becomes more apparent when transferring to new geographic regions. This finding holds when the random forest model is retrained using training data from Perth and the other cities. When applied to Melbourne, Sydney and Adelaide, the Perth-trained multi-modal deep learning model outperformed the random forest model trained with data from the other cities by 5.1%, 1.8% and 1.8%, respectively, in terms of overall accuracy (Table 4). The Perth multi-modal deep learning model and the random forest model, trained with data from the other cities, had the same overall accuracy with the Brisbane test dataset (Table 4).

Table 4.

Evaluation of models on Melbourne, Sydney, Brisbane and Adelaide test datasets using overall accuracy, macro average accuracy and top-2 accuracy.

Model	Melbourne			Sydney			Brisbane			Adelaide
Model	Overall	Macro	Top-2	Overall	Macro	Top-2	Overall	Macro	Top-2	Overall	Macro	Top-2
TabularSwin-ft	92.9	87.1	98.4	91.3	83.5	97.3	93.0	81.3	98.6	93.5	82.6	98.6
TabularSwin	92.6	86.8	98.4	87.6	74.9	96.4	91.7	79.7	98.2	92.1	76.8	97.6
DNN-rt	82.0	66.4	—	79.2	61.9	—	86.2	64.4	—	87.8	64.1	—
DNN	83.2	67.0	—	78.3	61.8	—	88.2	67.7	—	87.7	61.1	—
RF-rt	87.5	80.0	—	85.8	73.0	—	91.7	79.0	—	90.3	75.5	—
RF	86.2	78.1	—	84.1	69.7	—	90.5	74.5	—	89.9	73.4	—
K-means tabular metrics-rt	57.6	29.6	—	56.4	25.9	—	78.9	29.4	—	—	—	—
K-means tabular metrics	57.9	27.8	—	59.1	31.3	—	78.3	31.0	—	—	—	—
K-means morphology metrics-rt	53.0	9.8	—	48.0	8.5	—	75.0	10.9	—	—	—	—
K-means morphology metrics	53.0	9.8	—	48.0	8.6	—	75.0	10.9	—	—	—	—

Bold font denotes the most performant model for a given performance metric. ft denotes the Perth model fine-tuned for other cities, and rt denotes a model re-trained with data from Perth and other cities.

Fine-tuning with out-of-distribution parcels sampled from the other cities improved the predictive performance of the multi-modal deep learning model relative to the Perth model (Table 4). This improvement through fine-tuning was observed for the overall accuracy, macro average accuracy and in the precision, recall and F1-scores for most development type classes (Table 4 and Section F of the Supplemental Material).

Development type mapping applications

The multi-modal deep learning model achieved the highest overall accuracy for Perth and the fine-tuned multi-modal deep learning model had the highest accuracy for other cities (Tables 3 and 4). These models were used to classify every property parcel across each city to evaluate their performance in applied mapping settings. Figure 3 demonstrates the Perth model’s capacity to capture fine-scale variability in development types in Perth (Figure 3(a)). The fine-tuned model’s ability to transfer to other Australian cities is also showcased with case studies from the transfer cities: mixed-density developments in Melbourne (Figure 3(b)), a large-scale high-rise apartment complex in Sydney (Figure 3(c)), rows of low- and high-rise apartment buildings in Brisbane (Figure 3(d)) and blocks of large-scale clusters interspersed with non-residential parcels in Adelaide (Figure 3(e)). The multi-modal deep learning model can be deployed for entire cities to create wall-to-wall maps with parcel-level detail that captures broader-scale patterns of development (Figure 4).

Figure 3.

Examples of development type classification by the Perth model in (a) Balcatta, Perth, and fine-tuned model in (b) Carnegie, Melbourne; (c) Forest Lodge, Sydney; (d) St. Lucia, Brisbane; (e) and Unley-Parkside, Adelaide, using input data from February 2023. The basemap images are Nearmap RGB aerial images date-matched to the Nearmap AI vector data used to classify the parcel’s development type.

Figure 4.

Model deployment for (a) Perth; (b) Melbourne; (c) Sydney; (d) Brisbane; and (e) Adelaide using input data from February 2023. Black line shows the SA3 boundary.

With input data spanning two time points, the multi-modal model effectively detects changes in development type over time. Figure 5 presents an example of a city-level change analysis illustrating how development in the form of multi-unit occupancy has progressed throughout the city between 2019 and 2023. The model’s raw outputs provide a queryable database of development type parcel points (Figure 5(a) and (c)), which can be aggregated by Statistical Area 1 (SA1) to produce heat maps (Figure 5(b) and (d)). This enables a detailed investigation into the spatial pattern of a specific development type over time (Figure 5(e)).

Figure 5.

Maps of Perth suburbs in 2019 and 2023 showing different types of development: (a) high-rise apartment and (b) low-rise apartment at the parcel level; (c) multi-unit occupancy and (d) single house at the SA1 level with heat maps showing counts and difference between the 2 years. The basemap images are Nearmap RGB aerial images date-matched to the Nearmap AI vector data used to classify the parcel’s development type. Black solid line indicates the SA3 boundary.

Discussion

The results presented here demonstrate that multi-modal deep learning models, primarily trained in one location with abundant training data, can generalise to other cities to map parcel-level development types (Table 4 and Figure 4). These maps can characterise fine-scale variability in development types between parcels (Figure 3) and capture urban patterns at the city-level (Figure 4). Figure 5 demonstrates how static development type classes can translate into dynamic urban processes, including high-density infill through apartments (Figure 5(a) and (b)), low-density infill through multi-unit developments (Figure 5(c)), and greenfield expansion through new single houses (Figure 5(d)). This highlights the model’s ability to monitor various forms of urban development over time. This data is relevant to a range of urban applications such as evaluating the effect of planning and policy on on-the-ground development, monitoring city expansion and infill targets, providing input layers to urban growth forecasting, and attributing changes in surface cover and built form to different development types. As cities increasingly contend with challenges of growing populations and meeting subsequent housing demand while being cognizant of environmental and sustainability goals, these maps contribute rich data to guide the navigation of these trade-offs (Ehrhardt et al., 2023, 2025; Wellmann et al., 2020; Zhang et al., 2023).

The multi-modal deep learning model only trained on data from Perth had high accuracies when evaluated in other cities and was more accurate than tabular-only random forest and deep neural network models that were developed with training data from other cities (Table 4). This implies that the multi-modal deep learning model, with its capacity to extract visual spatial features and leverage multiple sources of data, learnt predictive features that generalise across geographic contexts. Potential explanations for this are (i) the nature of supervised training teaching the model to learn salient task-relevant spatial features for prediction and (ii) the relationship between visual features and development types generalises better than learnt rules mapping a tabular metric value to a class label.

Fine-tuning the Perth multi-modal deep learning model with a smaller sample of data points from other cities further improved the model’s performance and capacity to generalise to new locations (Table 4). This indicates that algorithms developed for predictive urban mapping in locations with abundant training data can be applied to other locations via transfer learning. For example, if a model is trained for a specific project but opportunities to use the model in different locations arise where resources for model development are lacking. This use of transfer learning with deep learning models can help address noted challenges regarding generalisability of models that map urban form from one location to another.

Across most development type classes and different cities, the multi-modal deep learning model tended to be the most accurate as indicated by the F1-scores, precision, and recall metrics (Table 3 and Section F of the Supplemental Material). Of note, is the higher accuracy of the multi-modal deep learning model for classes that have visually distinct characteristics such as clusters, where multiple dwellings are arranged around a shared driveway, and low-rise apartments, which often consist of a two-storey elongated building and an adjacent car park. This indicates that the multi-modal deep learning model, which learns predictive visual spatial features, is suited to discriminating between these development types. In Perth, where the multi-modal deep learning model and the random forest model had similar overall accuracies (Table 3), the multi-modal deep learning model was more accurate in classifying high-rise apartments, low-rise apartments, clusters and multi-units, which all represent multiple dwellings on a parcel via different building configurations. Discriminating between these classes is important for many urban applications, such as characterising patterns of infill and densification, which highlights the suitability of multi-modal deep learning models for predictive mapping of parcel development types.

The poor correspondence between groupings in the data generated via the k-means algorithm and development type classes illustrates a well-known challenge with using unsupervised algorithms to generate specific classes for an application (Tables 3 and 4). This highlights the importance of labelled training data to develop application-relevant predictive mapping algorithms and the benefits of transfer learning approaches, such as the example presented here (Tables 3 and 4). However, recent advances in large deep neural network image-to-text models and geospatial foundation models suggest possibilities for leveraging remotely sensed and geospatial data for urban mapping applications when training data is scarce. Such models are trained without task-specific labels but can synthesise geospatial data into useful variables for downstream analysis tasks. For example, image-to-text models can be provided with a remote sensing image and a text prompt to perform a range of analysis tasks including counting or segmenting urban features or describing a scene (Yao et al., 2025; Yuan et al., 2025). Geospatial foundation models, such as AnySat, are trained in a self-supervised manner to convert multiple modalities of geospatial data into numerical summaries of a location that can be used as predictor variables for a range of analysis tasks (Astruc et al., 2025). Such models have relevance in cases where large volumes of geospatial data are available but where insufficient labelled data to train task-specific deep learning models exists. With advances in large deep learning models and growing availability of city-scale geospatial databases, further research is warranted to understand how such models can be used for predictive urban mapping in label-scarce settings.

This detailed evaluation and model comparison signals opportunities for further developing multi-modal models for development type mapping. The Perth multi-modal deep learning model’s classification performance was noticeably lower for the rural class (F1-score of 70.7; Table 3). The random forest model, which did not use visual features, had higher accuracies for the rural class over Perth (Table 3). This could be because the random forest model primarily discriminates rural parcels based on parcel area, whereas the multi-modal deep learning model extracts spatial features to make predictions which could be confused with similar vegetative features apparent in urban vegetation parcels or on larger single-unit lots. The vision component of the multi-modal deep learning model might not be able to differentiate these features across classes. Also, with the multi-modal deep learning model there were instances of confusion between single-unit or construction parcels classified as other-urban. These cases typically occurred when the majority of the parcel was covered with built features, which appear as a homogeneous surface cover using Nearmap AI vector data meaning it is not possible to discern differences in the built and human-made features of dense single-unit housing, non-residential uses and construction material.

In these instances of confusion, incorporating high-resolution RGB imagery (e.g. orthophotos or similar) or including information from neighbouring parcels could aid discrimination (e.g. as in the contextual metrics of Fleischmann et al., 2022). For example, high-resolution RGB imagery could provide more detail on the nature of vegetation or built materials within a parcel that could reveal differences between rural vegetation and urban vegetation or differences between built features on non-residential and residential parcels, respectively. Such within surface cover detail is masked using geometry-based features. Similarly, including contextual information from a parcel’s neighbourhood might catch cases where an other-urban parcel among a dense grouping of single-units is unlikely to occur. With multi-modal models, it is possible to include RGB imagery or contextual predictor variables. However, incorporating RGB imagery alongside vector representations of surface cover would substantially increase the computational load, especially when training and deploying models for multi-city applications. There are also risks to including a parcel’s contextual information as predictors as fine-scale change and variability in parcel development types could be ‘smoothed’ out. For many urban mapping applications, such as monitoring infill and densification, sensitivity to fine-scale variation is important.

Conclusion

This paper presents a multi-modal deep learning model that classifies a property parcel’s development type. This model learnt to extract spatial features from colourised representations of a parcel’s built and natural features and tabular data representing a parcel’s surface cover, built form and dwelling count attributes. The multi-modal deep learning model’s predictive performance was compared to a range of commonly used supervised and unsupervised models. These comparative analyses were undertaken for Perth, a case where there is abundant training data, and in cases where models were generalised to other cities across Australia. To undertake this analysis, a ground truth dataset comprising over 50,000 labelled property parcels from Perth was created for training and validation, alongside geographically distinct test sets of over 10,000 parcels per-city for model evaluation. The ground truth datasets were derived from a geospatial database of Nearmap AI and address point data representing Perth, Melbourne, Sydney, Brisbane and Adelaide, covering the years 2019 and 2023.

The multi-modal deep learning model broadly displayed higher F1, precision, and recall scores for the different development types. Of note, the multi-modal deep learning model performed better in classifying classes that are visually distinctive and characterised by the arrangement of built features, such as cluster dwellings or apartments. These classes are often most relevant for mapping and monitoring different types of infill and densification. This highlights the benefit of using multi-modal models, which can integrate a range of data sources such as predictive features extracted from visually represented geospatial data and tabular data, to discriminate between parcel-level development types. The utility of the multi-modal deep learning model for generating predictive maps to monitor different types and patterns of infill and sprawl is demonstrated with examples that capture fine-scale parcel-to-parcel variability in development types to city-scale change analyses.

The multi-modal deep learning model outperformed other models when tested with the Perth dataset, achieving an overall accuracy of 94% and a top-2 accuracy of 99%. The multi-modal deep learning model was also more accurate when generalising to other Australian cities, either naively applying the Perth model to another city or via transfer learning with a small fine-tuning dataset. The fine-tuned multi-modal deep learning model had overall accuracies between 91% and 94% when evaluated on test sets from Melbourne, Sydney, Adelaide and Brisbane. These results indicate that deep learning models and transfer learning workflows can be leveraged to develop generalisable urban development type mapping models. This opens opportunities for multi-city applications and comparative studies that further understanding of the spatial and temporal dynamics of urban growth, undertake analysis to better understand links between urban development and sustainability and liveability outcomes, and evaluate the effects of policy that seeks to balance the need for housing our growing populations with maintaining healthy natural environments.

Supplemental Material

Supplemental Material - Automated property parcel classification using very high-resolution land cover imagery and multi-modal deep learning: Effective city-scale monitoring for urban expansion outcome modelling

Supplemental Material for Automated property parcel classification using very high-resolution land cover imagery and multi-modal deep learning: Effective city-scale monitoring for urban expansion outcome modelling by Warin Chotirosniramit, John M. A. Duncan, Joe Hurley, Sally Thompson, Marco Amati, Qian Sun and Bryan Boruff in Journal of Environment and Planning B: Urban Analytics and City Science.

Footnotes

Acknowledgements

The authors would like to formally acknowledge Nearmap for preparing data and providing technical and best practices support under a Research License for Nearmap AI data. Finally, we acknowledge the use of location information data licensed from Western Australian Land Information Authority (WALIA) trading as Landgate, © 2023 Western Australian Land Information Authority.

ORCID iDs

Warin Chotirosniramit

Joe Hurley

Sally Thompson

Marco Amati

Qian (Chayn) Sun

Author contributions

Warin Chotirosniramit: Methodology, data curation, software, formal analysis, validation, visualisation, writing – review and editing and writing – original draft. John M.A. Duncan: Funding acquisition, conceptualisation, methodology, data curation, formal analysis, software, validation, writing – review and editing and writing – original draft. Joe Hurley: Funding acquisition, conceptualisation, validation, writing – review and editing and writing – original draft. Sally Thompson: Funding acquisition, conceptualisation, validation and writing – review and editing. Marco Amati: Funding acquisition, conceptualisation, validation and writing – review and editing. Qian Sun: Funding acquisition, conceptualisation, validation and writing – review and editing. Bryan Boruff: Supervision, project administration, funding acquisition, conceptualisation, methodology, writing – review and editing and writing – original draft.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported in full by the Australian Government through the Australian Research Council’s Discovery Projects funding scheme (project DP230103095).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The data that support the findings of this study are available from © Nearmap. Restrictions apply to the availability of these data, which were used under license for this study.

Supplemental Material

Supplemental material for this article is available online.

Author biographies

Warin Chotirosniramit is a research associate specialising in Geographic Information Systems (GIS) and machine learning for extracting insights from remotely sensed data. His work focuses on applying spatial science techniques and artificial intelligence to address challenges across both urban and natural environments. He has experience deploying advanced AI models for diverse applications, including mapping flood events in tropical regions and classifying residential development in Australian cities.

John Duncan has expertise in GIS, spatial data analysis and geospatial machine learning; remote sensing and image analysis; and software development. John’s research develops tools and techniques to create maps of land cover, land use or other features on the Earth’s surface. John is interested in the drivers and consequences of changes to built, natural, and agricultural environments. Often, this work is focused on Australian agricultural or urban landscapes and in support of international rural development projects. John has a BSc and PhD in Geography, awarded by The University of Southampton, and a Masters in Computer Science from The University of York.

Joe Hurley is a Professor in the Sustainability and Urban Planning program and a member of the Centre for Urban Research. His work examines the links between urban planning, sustainability, and the governance and policy processes that shape sustainable outcomes. He is also committed to strengthening connections between research and practice, working to reduce barriers and promote effective collaboration.

Sally Thompson is an associate professor of Environmental Engineering, who focuses her research on ecohydrology and how vegetation interacts with the water cycle. After completing her PhD at Duke University and working at University of California Berkeley, Sally moved to UWA where she co-directs the Centre for Water and Spatial Science.

Marco Amati is an environmental scientist specialising in urban research, with expertise spanning urban greening, metropolitan tree studies, and urban history. His current work focuses on the future of urban outdoor spaces, integrating research on the cooling benefits of shade and tree cover with social science perspectives and public transport in Japan. Through this interdisciplinary approach, he advances an integrative understanding of cities and ecoystemic urbanism.

Qian (Chayn) Sun is a geospatial scientist whose research spans quantitative geography, environmental and urban informatics, and the impacts of urbanisation and human movement across space and time. She develops spatial and statistical algorithms to model patterns and uncover drivers of complex human–environment interactions. Her current work integrates cloud computing, large-scale Earth observation data, and AI with social–ecological theory to address climate-related challenges. Through an interdisciplinary approach, she produces spatially explicit insights to guide strategies for mitigating these impacts, with a strong commitment to informing solutions to real-world problems.

Bryan Boruff is an environmental geographer specializing in the application of Geographic Information Systems (GIS) and Remote Sensing technologies to the analysis of environmental hazards. His current research interests span various dimensions of environmental management, including renewable energy production, population health, sustainable livelihoods, and urban and regional planning.

References

Astruc

Gonthier

Mallet

, et al. (2025) AnySat: one Earth observation model for many resolutions, scales, and modalities. ArXiv Preprint arXiv:2412.14123.

Berghauser Pont

Stavroulaki

Bobkova

, et al. (2019a) The spatial distribution and frequency of street, plot and building types across five European cities. Environment and Planning B: Urban Analytics and City Science 46(7): 1226–1242.

Berghauser Pont

Stavroulaki

Marcus

(2019b) Development of urban types based on network centrality, built density and their impact on pedestrian movement. Environment and Planning B: Urban Analytics and City Science 46(8): 1549–1564.

Bobkova

Berghauser Pont

Marcus

(2021) Towards analytical typologies of plot chensystems: quantitative profile of five European cities. Environment and Planning B: Urban Analytics and City Science 48(4): 604–620.

Chen

Biljecki

(2021) Classification of urban morphology with deep learning: application on urban vitality. Computers, Environment and Urban Systems 90: 101706.

Dovey

Pike

Woodcock

(2017) Incremental urban intensification: transit-oriented Re-development of small-lot corridors. Urban Policy and Research 35(3): 261–274.

Ehrhardt

Behnisch

Jehling

, et al. (2023) Mapping soft densification: a geospatial approach for identifying residential infill potentials. Buildings and Cities 4(1): 295.

Ehrhardt

Behnisch

Michaeli

, et al. (2025) Understanding incremental densification – determinants of residential infill on vacant lots. Landscape and Urban Planning 260: 105375.

Fleischmann

(2019) Momepy: urban morphology measuring toolkit. Journal of Open Source Software 4(43): 1807.

10.

Fleischmann

Arribas-Bel

(2022) Geographical characterisation of British urban form and function using the spatial signatures framework. Scientific Data 9(1): 546.

11.

Fleischmann

Arribas-Bel

(2024) Decoding (urban) form and function using spatially explicit deep learning. Computers, Environment and Urban Systems 112: 102147.

12.

Fleischmann

Romice

Porta

(2021) Measuring urban form: overcoming terminological inconsistencies for a quantitative and comprehensive morphologic analysis of cities. Environment and Planning B: Urban Analytics and City Science 48(8): 2133–2150.

13.

Fleischmann

Feliciotti

Romice

, et al. (2022) Methodological foundation of a numerical taxonomy of urban form. Environment and Planning B: Urban Analytics and City Science 49(4): 1283–1299.

14.

Geoscape Australia (2024) Geoscape geocoded national address file (G-NAF)|datasets|data.gov.au—beta. https://www.data.gov.au/dataset/ds-dga-19432f89-dc3a-4ef3-b943-5326ef1dbecc/details

15.

Hecht

Meinel

Buchroithner

(2015) Automatic identification of building types based on topographic databases – a comparison of different data sources. International Journal of Cartography 1(1): 18–31.

16.

Huang

Lieske

Liu

(2023) Factors influencing vertical urban development at the parcel scale: the case in Brisbane, Australia. Environment and Planning B: Urban Analytics and City Science 50(3): 694–708.

17.

Kim

Jun

H-J

, et al. (2024) The detection of residential developments in urban areas: exploring the potentials of deep-learning algorithms. Computers, Environment and Urban Systems 107: 102053.

18.

Kingma

(2017) Adam: a method for stochastic optimization. ArXiv Preprint arXiv:1412.6980.

19.

Liu

Lin

, et al. (2022) Swin transformer V2: scaling up capacity and resolution. ArXiv Preprint arXiv:2111.09883.

20.

London

Bertram

Sainsbury

, et al. (2020) Infill Typologies Catalogue. CRC for Water Sensitive Cities. https://watersensitivecities.org.au/content/infill-typologies-catalogue/

21.

Loorbach

Shiroyama

(2016) The challenge of sustainable urban development and transforming cities. In: Loorbach

Wittmayer

Shiroyama

, et al. (eds) Governance of Urban Sustainability Transitions. Springer, 3–12.

22.

Lungman

Khomenko

Barboza

, et al. (2024) The impact of urban configuration types on urban heat islands, air pollution, CO₂ emissions, and mortality in Europe: a data science approach. The Lancet Planetary Health 8(7): e489–e505.

23.

Chen

Ermon

, et al. (2024) Transfer learning in environmental remote sensing. Remote Sensing of Environment 301: 113924.

24.

Meyer

Pebesma

(2021) Predicting into unknown space? Estimating the area of applicability of spatial prediction models. Methods in Ecology and Evolution 12(9): 1620–1633.

25.

Nearmap (2024, July) About nearmap content. Help Center. Available at: https://help.nearmap.com/kb/articles/105-about-nearmap-content.

26.

Polidoro

Lollo

JAD

Barros

MVF

(2012) Urban sprawl and the challenges for urban planning. Journal of Environmental Protection 03(09): 1010–1019.

27.

Schirmer

Axhausen

(2015) A multiscale classification of urban morphology. Journal of Transport and Land Use 9(1): 667.

28.

Stewart

Oke

(2012) Local climate zones for urban temperature studies. Bulletin of the American Meteorological Society 93(12): 1879–1900.

29.

Vanderhaegen

Canters

(2017) Mapping urban form and function at city block level using spatial metrics. Landscape and Urban Planning 167: 399–409.

30.

Wellmann

Lausch

Andersson

, et al. (2020) Remote sensing in urban planning: contributions towards ecologically sound policies? Landscape and Urban Planning 204: 103921.

31.

Western Australian Planning Commission (2024) Residential Design Codes Volume 1. Department of Planning, Lands and Heritage. https://www.wa.gov.au/system/files/2024-03/r-codes-volume-1-explanatory-guidelines-mar2024.pdf

32.

Yao

Yang

, et al. (2025) Falcon: a remote sensing vision-language foundation model. ArXiv preprint arXiv:2503.11070.

33.

Yuan

Huang

, et al. (2025) OmniGeo: towards a multimodal large language models for geospatial artificial intelligence. ArXiv preprint arXiv:2503.16326.

34.

Zhang

Ghosh

Park

(2023) Spatial measures and methods in sustainable urban morphology: a systematic review. Landscape and Urban Planning 237: 104776.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

2.69 MB