Developing a two-level machine-learning approach for classifying urban form for an East Asian mega-city

Abstract

Having had the most rapid urbanization in the world since the 1990s, mega-cities in East Asia featured highly compact and atomized modernist architecture. With densely built modernist architecture and relatively free building regulations, it is challenging to trace the actual development of the whole city. Compared to European cities, their overall urban landscapes are much denser, higher, and functionally mixed. In order to achieve a quicker and more accurate identification of urban forms in mega-cities, this study proposed a two-level machine-learning approach. At the building level, we extracted features from topographic maps and building licenses to automatically classify building types. Four state-of-the-art multi-class classification models were compared. At the block level, we used building types as input data and compared two methods for block clustering. In total 61,426 buildings from Taipei were classified and grouped into 10 block types. Different from Western cities, many of the block types in Taipei were mixtures of different types of buildings. This approach is efficient in exploring new urban form types, especially for emerging mega-cities where block types are previously unknown. The result not only sheds light on the features of East Asian urban landscapes but also serves as important basis of type-based strategic plans for contemporary urban issues.

Keywords

machine learning urban form building classification clustering East Asia

Introduction

Recent studies have recognized the relevance of urban forms to sustainable urban development (Arundel and Ronald, 2017; Beermann et al., 2014; Hermand and Quesada, 2019; Yosef, 2006). Urban forms are crucial not only for socio-economic aspects (Conzen, 2004), but they are also a key to many contemporary urban issues (Hackenbruch, 2018; Pauleit, 2016; Su et al., 2021). Among the nested structure of urban landscapes—buildings, blocks, quarters, and towns (Reicher, 2017), street blocks are the smallest elements that still possess urban characters. Block types not only provide information about construction periods and buildings styles but also suggest surface-to-volume ratios (Ratti et al., 2003), degrees of open spaces, energy consumption, and land-use intensity (Beck et al., 2020; Li et al., 2018; Salat, 2009; Steadman et al., 2014). However, the analysis of urban forms in building and block levels is a time-intensive task that heavily relies on expert knowledge (cf. Caniggia and Maffei, 2001; Whitehand, 2001). While existing literature mainly focuses on Western cities or historic towns, only a few studies have covered the urban forms of contemporary mega-cities in the East (Levy, 1999; Whitehand, 2011). Having had the fastest urbanization rate in the world since the 1990s (ESCAP, 2013), East Asian cities feature highly compact and atomized modernist architecture, which is dominated by free-standing tower-buildings that have less connection to surrounding environment. Compared to European cities, the overall urban landscapes are much denser, higher, and functionally mixed (Hamnett and Forbes, 2011; Schneider et al., 2015; Wolff et al., 2018). As one of the fastest-growing mega-cities, Taipei is greatly shaped by rapid urbanization since the post-war era (Bai et al., 2011). With densely built modernist architecture and relatively free building regulations, it is challenging to trace the actual block development of the whole city. In order to facilitate an efficient method of urban form analysis, we propose a data-driven approach for classifying street blocks of the mega-city scale.

Machine learning is a rising field of study in urban form classification. This approach uses algorithms to automatically optimize the classifier by minimizing errors between training and testing data, which is efficient in providing consistent results over a large area. With given training data, the same process is widely applicable in other cities. Supervised machine-learning models (classification), which are trained against known target classes, are much faster and easier to build, because they do not require predefined rules as in rule-based classification (Belgiu et al., 2014; Orford and Radcliffe, 2007). Depending on the scale and objectives of classification, different datasets are available, for example, (vector or raster-based) topographic maps, 3D-city models, and satellite images. Among them, 3D-city models and vector-based topographic maps with a minimum resolution of 1:25,000 were found to have the best results (Hecht et al., 2015). For a smaller number of classes, binary classification models such as Support Vector Machine (SVM) has found useful in the building style classification in Germany (Henn et al., 2012) and the classification of urban tissues (in quarter level) across European cities (Steiniger et al., 2008). For a larger number of classes, multi-class classification models such as Random Forest (RF), Artificial Neural Networks (ANNs), and Light Gradient Boosted Machine (LightGBM) are reported to have better performances than binary classifiers (Hecht, 2014; Hecht et al., 2015; McCarty et al., 2020). Random forest is an ensemble-classifier based on decision trees that is often used for feature selection, such as examining the effectiveness of new spatial metrics for street blocks (Vanderhaegen and Canters, 2017). While ANNs are a useful tool for pattern recognition in the cartography field (Yan et al., 2019), a single-hidden-layer neural network was found to have slightly worse performance than RF in the building classification from Hecht (2014). LightGBM is an emerging algorithm in the remote-sensing field for land cover classification (Candido et al., 2021; Chen et al., 2021; McCarty et al., 2020). LightGBM and XGBoost are both tree-based models similar to RF, but they use boosting method instead of bagging. While bagging builds multiple trees at once and gets the mean results at the end, boosting builds one tree at a time and adjusts branches as the tree grows.

On the other hand, unsupervised machine-learning algorithms such as k-means clustering and the Gaussian mixture model (GMM) are frequently used for clustering urban forms on larger scales. K-means has been one of the most popular clustering methods since it was first published in 1955 (Jain, 2010). This method has been used in various urban form studies such as neighborhoods (Francisco et al., 2017; Song and Knaap, 2007), building groups (Schirmer and Axhausen, 2013), and street blocks (Gil et al., 2012). Besides K-means, recent studies have started to explore GMM in clustering block and neighborhood types (Fleischmann et al., 2021; Ma et al., 2021). As an advanced version of k-means, GMM considers the data density and distribution, which has been applied in classification of more complicated urban forms (Huang et al., 2007; Li and Quan, 2023; Samuelsson et al., 2019; Su et al., 2021). While many of these studies are raster-based that focus on large urban scales, other studies developed block-level spatial metrics (Araldi and Fusco, 2019; Fleischmann et al., 2021; Gil et al., 2012; Hermosilla et al., 2014; Ma et al., 2021; Vanderhaegen and Canters, 2017) to classify street blocks. Among them, Fleischmann et al. (2021), Gil et al. (2012), and Ma et al. (2021) used GMM to cluster blocks based on spatial metrics that described block geometries and positions of buildings inside. However, these studies focused on European or American cities. The systematic review from Chen et al. (2020) revealed that especially urban areas in Taiwan, South Korea, and the cities of Singapore and Hong Kong are characterized through high population density and diversity concerning urban forms and land uses. This is a result of existing planning systems with few construction restrictions, cultural background, social norms, and economic development, while European cities such as Berlin with a highly controlled planning regime have a more continuous urban fabric (Burdett and Rode, 2018: 529; Chen et al., 2020). This makes block types in European cities more homogeneous and easier to identify. For complex urban landscapes in East Asia mega-cities, applying these clustering methods directly on the block level could be very complicated.

This study sets out to explore the underlying block types in East Asian mega-cities. Considering the street blocks in Taipei are highly heterogeneous due to juxtaposition of free-standing buildings, we use a similar approach by Meinel et al. (2008) to derive block types from the buildings within. We propose a two-level approach that incorporates both supervised and unsupervised machine-learning methods. In the building level, multi-class classification algorithms are used to classify building types. In the block level, clustering methods based on results from building types are used to explore street-block types. This approach utilizes big data to provide detailed results for a very large spatial coverage and unveiled the block types that were previously too complicated for qualitative methods.

Methods

Figure 1 shows the analytical framework of the analysis. First, the topographic map and building license were prepared as input data for the (1) multi-class classification for building types, which used supervised machine-learning algorithms to train four classifiers. The result of the best classifier was prepared as input data for the (2) clustering of street-blocks, which used two unsupervised machine-learning models to derive block types in Taipei.

Figure 1.

Analytical framework of the two-level machine-learning approach for classifying urban forms. The upper part shows the workflow of the (1) multi-class classification for building types, and the lower part is the process of (2) clustering of street-blocks.

Building classification

Data preparation is an important step before the building classification. First, we defined the building types in Taipei according to architectural features. Then, we extracted features from the topographic map and building license and collected sample data for training.

Building type in Taipei

Since there is no official definition of building types in Taipei, this paper used building shapes and building heights as indicators to describe the residential and commercial buildings. For building shapes, we used the definition of generic types by Ratti et al. (2003) and separated building footprints into solidary, linear, and block. Because there are various kinds of tall buildings (higher than 10 floors) in Taipei, we sub-divided them into three categories according to the building technical regulations in Taiwan: under 10 floors, between 10 and 15 floors, and above 15 floors. The combination of these criteria made up in total of 12 building classes (Table 1, Table S1).

Table 1.

Building types in Taipei.

	Solidary	Linear	Block/Cluster
<10F	Single-family house	Shophouse	Courtyard building
	Apartment	Row building	Inner-city block
		Curvilinear apartment	Industrial building
10–15F	High tower	High linear apartment	—
>15F	Super-high tower	Super-high linear apartment	—

Feature extraction and data preprocessing

By the time of this research, there was only a part of the 3D-city models in Taipei available for the public. While 3D-city models contain much information about building heights and shapes, topographic maps are found to have similar performance in prior studies (Hecht et al., 2015) and are widely available in most places. Using vector-based topographic maps also enables an increased level of detail and avoids the problem of boundary definition as in most of the raster-based analysis (Clifton et al., 2008; Horner, 2007; Huang et al., 2007; Ye and Van Nes, 2014). In this study, we used the 1:2500 topographic map of Taipei for the analysis, which was produced by National Land Surveying and Mapping Center (2016). The map contains building footprints and street blocks, but it does not include information such as building heights, land uses, plots, and street widths. Therefore, we extracted information from building usage licenses from the years 1951 to 2016 from the open data platform of Taipei City (Taipei City Construction Management Office, 2019) and joined them to the topographic map based on plot addresses. Finally, the cadastral map, land-use survey, and street width data from open data platform were joined spatially in ArcGIS to provide additional feature data.

In ArcGIS, the buildings on the topographic map were generalized, where building edges that are shorter than 0.25 m were simplified and buildings with area <46 m² were omitted. In total, 94 features were extracted from 61,426 effective buildings using geoprocessing tools from ArcGIS, QGIS, and Python scripts (Bard, 2004; Beyhan et al., 2020; Maceachren, 1985; Schumm, 1956). Among the input features, 38 of them were building features, 27 were block- or urban-level features, 17 were information from building licenses, and 12 features were from streets, plots, and land-use data (Table S2).

Following the feature extraction, categorical data was encoded into numeric data. We filled the missing values of building-license features with the mean value while the rest of the features with zero. For models such as Artificial Neural Networks (ANNs), we normalized the data according to the data range. Finally, we built a building dataset of 10,000 samples (reference data), which was collected between March and May 2021 by one trained expert to ensure consistency. The data was collected through Google Maps and street views, where the number of samples is evenly distributed across the 12 districts in Taipei city with each district having similar ratios of building classes.

Training classification model

We trained four state-of-the-art classification models and compared their results, including Random Forest (RF), Light Gradient Boosted Machine (LightGBM), eXtreme Gradient Boosting (XGBoost), and Artificial Neural Networks (ANNs) (Table S3). In the model training step, we first split the 10,000 sample data into 70% training and validation data and 30% testing data to tune the hyperparameters (Table S3), where a 5-fold cross-validation was used in the grid search. The results were evaluated using 10-fold cross-validation across all sample data to check the variance of accuracy among different training/testing data splits. We evaluated the models using the mean accuracy scores and their standard deviation. We chose the best classifier according to the mean accuracy on testing data, which tells the model’s ability to predict data. Because the building-class distribution is imbalanced, we also included the Cohen's kappa coefficient (κ) in the evaluation to ensure the accuracy is not skewed in some classes. Finally, the best model with the best hyperparameters was trained again with 90% training data to predict building types for the whole city.

Block clustering model

Aspects of block typology

The block forms in Taipei are varied, because the legal plan only specifies land-use zoning in the street-block level und there are rarely master plans for residential areas. Given the lack of prior knowledge, clustering is a good way to derive block typologies. Block typologies are usually the ensemble of building types (Hecht et al., 2015); therefore, the combination of building types serve as the basis of block typologies, which not only implies building styles but also construction periods and building functions. Moreover, block types should also inform about density, degrees of open spaces, and land uses, which are criteria related to urban sustainability. These aspects therefore serve as the criteria for the choice of block types.

Data preparation and clustering

To find street-block types in Taipei, we first aggregated the building-type results in the block level by calculating the area percentage of each building type in the blocks. Here, street-blocks with area >100,000 m² and built density <0.05 (Ma et al., 2021) were filtered out (e.g., rivers or parks). The effective blocks with area ratios of each building type, unbuilt area, and block area were prepared as input data for clustering (Table S4). For the analysis of block types, we compared the clustering results from k-means and the Gaussian mixture model (GMM).

K-means assigns observations into k clusters so that each cluster has maximum in-group homogeneity and maximum between-group heterogeneity. This method optimizes the result by minimizing the Euclidean distance between points and cluster centroids. The result produces a circular decision boundary that approximates a Voronoi partition. K-means clustering requires a user-defined number of clusters k. In this study, we used silhouette coefficient ( $s$ ) to determine the number of k. The value of silhouette coefficient ranges from −1 to 1; a score closer to 1 indicates point $i$ is in the right cluster and −1 is the wrong cluster. As a soft version of k-means, the Gaussian mixture model (GMM) assumes each cluster corresponds to a multivariate Gaussian distribution. The model does not force assigning points to a cluster like k-means; instead, it outputs probabilities of the point belonging to each cluster. Like k-means, the Gaussian mixture model also needs a user-defined number of components k. In this study, the number of k is evaluated by using the Bayesian information criterion (BIC). A lower BIC score means the model has a larger likelihood and therefore a better fit.

Result

Accuracy evaluation

The accuracy scores were evaluated using the averaged results of 10-fold cross-validation (Table 2). For the given dataset, LightGBM model had the best performance with an overall accuracy of 0.81 on the test dataset and a kappa value of 0.78. The RF model was fast to train, but had the lowest test accuracy of 0.74. The training accuracy of the ANNs model increased from 0.77 to 0.83 by adding hidden layers, increasing batch sizes, and raising the number of nodes in each layer, but the testing scores only had a marginal increase by 0.3, which indicated that the model was prone to overfitting. The parameters used in this study were summarized in Table S5.

Table 2.

Accuracy rates of classification models on the Taipei dataset.

		LightGBM	XGBoost	RF	ANNs
Train overall accuracy	Mean	0.87	0.86	0.80	0.83
Train overall accuracy	Std.	0.0016	0.0013	0.0019	0.0031
Test overall accuracy	Mean	0.81	0.80	0.74	0.78
Test overall accuracy	Std.	0.015	0.017	0.013	0.011
Mean Kappa ( $κ$ )		0.78	0.77	0.70	0.74
Run time (sec)		57.99	105.84	42.96	119.12

The between-class accuracy is visualized in the confusion matrixes in Figure S1. In comparison to RF and ANNs, our results showed that LightGBM and XGBoost had much higher accuracy and coherent results. Most of the classes had accuracy scores higher than 0.8. On the other hand, RF and ANNs also had very high accuracy scores for classes such as single-family houses and apartments, but they performed worse in tall buildings. RF had very low accuracy rates in shophouses, industrial buildings, and courtyard buildings. Overall, gradient boosting models were able to identify more complicated buildings such as super-high towers, super-high linear apartments, and courtyard buildings.

Feature importance

This study used a large number of features. Therefore, we used permutation feature importance to examine the features’ relevance to the classification results. Permutation importance observes the decrease in model scores after randomly shuffling the features. A large decrease, or a higher permutation score, implies the feature is more important. Figure 2 shows the 20 most important features for each model. Among the 94 features, the number of floors above ground, length-width ratio of the minimum bounding rectangle (MBR), and land value are the most important features. Many of these features were extracted from the building license. In comparison, LightGBM and XGBoost depend on similar features such as shape index, width of MBR, and number of plots. All models except RF depend on urban features such as the ratio of building to block area, road id, and distances to the city center, while for ANNs, features from the building license such as construction types and structure types are more important.

Figure 2.

Feature importance (top 20) of building classification models on the Taipei dataset using (a) LightGBM, (b) XGBoost, (c) RF, and (d) ANNs.

Building types in Taipei

We used the LightGBM model to predict building types throughout the whole city. The result of in total 61,426 buildings is shown in Figure 3. In Taipei, the most common building type is apartments, which take up 29.8% of all buildings. It is followed by single-family houses (18%), row buildings (11%), curvilinear apartments (6%), and inner-city blocks (5%). Free-standing apartment buildings under 10 floors are especially concentrated in some neighborhoods in the southern center, but they could also be found throughout the city and especially on street corners. As the second most frequent types in Taipei, single-family houses usually appear in gated communities in the periphery areas. Only a few of them are located inside the city center. Row buildings are one of the most typical buildings in Taipei as a result of a series of housing projects in the 1960s and 70s. These buildings usually come in groups and are mostly found in the eastern part of the city. Curvilinear buildings are also a very common type as a result of the housing projects in the 1970s. Inner-city blocks are densely built street blocks that are mostly located near the river coast in the west. This area was once the old city center and therefore has much higher built density and fewer open spaces.

Figure 3.

Building types in Taipei classified by the LightGBM model.

Moreover, tall buildings are also dominant in the urban landscape of Taipei, which takes up around 13.5% of all buildings. Among them, 11.8% are high buildings between 10 and 15 floors while 1.7% are super-high towers above 15 floors. These tall buildings are mostly located along main streets in the center and eastern parts of Taipei. In general, traditional buildings such as inner-city blocks and shophouses are mostly located in the western part of Taipei. Other modernist or post-modernist buildings often appear in the eastern part of the city, including row buildings, apartments, and various tall buildings. On the urban periphery, there are groups of single-family houses surrounding the city area in the north, the east, and the south.

Clustering results

For block clustering, it is important to determine the optimal number of clusters. Figure S2 shows that the k-means model had the highest silhouette score with k = 7, which was shortly followed by k = 13 and 10. For the GMM model, the BIC score first stopped at k = 5, then slowed down again between k = 10. In order to compare both models, we chose to cluster the street-blocks into 10 groups.

Figure 4 summarizes the clustering results. Among them, the k-means model produced one block type of mainly inner-city blocks (Figure 4(a), group 1), two types of tall buildings (group 4 and 5), two types of apartments (group 6 and 7), two for row buildings (group 8 and 9), and three other less frequent block types (group 2, 3, and 10). The GMM model also produced similar results, such as street-blocks of inner-city blocks (Figure 4(b), group 1) and public buildings (Figure 4(b), group 10). However, their building composition is more complex and the featuring building types of each cluster are different. For instance, instead of having apartment blocks or row-building blocks, the GMM model categorized both types as the same group (Figure 4(b), group 8). Other than grouping all tall buildings into one group, the GMM model separated them into three different clusters (group 5, 6, and 9).

Figure 4.

Composition of the block types in Taipei as results of clustering from (a) k-means. (b) GMM model. Both models had a value of k = 10. The Result of other numbers of k is shown in Figure S3 in supplementary material.

We further compared both models with box plots (Figure 5). Among them, group 2 of the k-means model has the highest built density and least open spaces, which represents the narrow and densely built historic shopping streets (Figure 6). In contrast, group 4 and 5 are street-blocks with modern tall buildings, which have the lowest density and more open spaces. Apartments and row buildings are common residential constructions in Taipei, while the former has slightly lower built density and more open spaces. Overall, clusters from the k-means model differentiate themselves better than the GMM model, as each cluster has its unique value ranges and has fewer outlier. This result is in accordance with the building composition in Figure 4, in which clusters from the GMM model are more complex and therefore more heterogeneous within groups. Instead of finding the simpler and more “classic” block types, the GMM model was robust against street-blocks with mixed composition and therefore set the cluster centers on those kinds of blocks. Concerning the aspects related to land-use policies and sustainability, the result from k-means of the given dataset and methods may be more suitable for urban planning practices.

Figure 5.

Density, degree of open space, and block area of the block types in Taipei. All numbers are scaled from 0 to 1.

Figure 6.

Visualization of the block types in Taipei using the k-means model (k = 10). Full extent of the visualization is shown in Figure S4 and Figure S5.

Discussion

This article compared four state-of-the-art machine-learning algorithms for automatic building classification. For the given dataset and methodology, after hyperparameter tuning, we found that gradient boosting models substantially increased the overall accuracy, and their performance greatly exceeded other conventional models such as RF and ANNs. In particular, our study found that LightGBM produced the highest overall accuracy in both training and testing datasets within a very short run time.

However, the accuracy rates of the same algorithms such as RF and ANNs were found approx. 20% lower in Taipei than the study by Hecht (2014). The possible reason could be differences in building types and the information on the topographic map. Building types in Taipei with higher accuracy are mostly similar to those in Western cities, for example, single-family houses, apartments, and row buildings. While East Asian specific types such as shophouses and tall buildings tend to have lower accuracy. Many of these buildings have similar footprints but very different vertical dimensions and functions. Moreover, the models in this study relied on different features in building classification as Hecht’s study (2014) of European cities. In Taipei, building heights and more complicated geometric features such as length-width ratios of MBR and shape indexes are more important in the decision-making. This implies that building types in Taipei are much variant in the vertical dimension. Potential ways to improve the results include using more-detailed datasets such as 3D-city models or using deep-learning algorithms for image segmentation.

At the block level, we used building-type ratios of the blocks as input features, which is different from prior studies (Fleischmann et al., 2021; Gil et al., 2012; Hermosilla et al., 2014; Ma et al., 2021) that used the average value of building height, building coverage, and building distance as input data. However, in East Asian mega-cities where street blocks are highly heterogeneous, averaging building features may have misleading results. While building-type ratios keep information about the geometric and functional features of individual buildings, it also efficiently reduced the number of input features for clustering models and made the result easy to interpret. For the given dataset and methods, the results of block clustering showed that the k-means model accurately identified homogeneous blocks such as inner-city blocks, row-building blocks, and apartment blocks, whereas the GMM model focused more on the various mixed blocks and different tall buildings. For practical purposes, we chose the clustering result from the k-means model for more distinguishable form features.

In comparison to Western cities, Taipei has a high proportion of tall-building blocks. The city is primarily made up of apartments, row buildings, curvilinear apartments, and high-rise linear buildings, which differs from the courtyard buildings common in Western cities. The old city center in Taipei is made up of inner-city blocks and shophouses, and its boundary is spread out and blended in the rest of the city. Because the buildings are well-mixed, many street-blocks in Taipei are identified as mixtures of several building types. For example, row-building blocks (Figure 4(a), group 8, 9) usually contain some apartments at street corners. In fact, this is a very common phenomenon in contemporary mega-cities as a result of the land use zoning, where building-coverage ratios and floor-area ratios are regulated in the block level, which gives architects much freedom in deciding building sizes and volumes within a street-block. With the use of big-data and machine-learning methods, we were able to shed light on this phenomenon in detail.

Conclusion

This paper discusses the seldomly analyzed urban forms in East Asian mega-cities by using a two-level approach to automatically generate block typologies. In the building level, we used the topographic map, building licenses, streets, and cadastral maps to extract input features for the building classification. The result showed that even without the state-of-the-art 3D-city models, this method reached an accuracy rate of 80% by using supplementary sources and gradient boosting models. In the block level, we used clustering methods based on building-type ratios to explore block typologies. The result showed that the street-blocks in Taipei may be categorized in different ways depending on the choice of clustering algorithms. We evaluated the clusters based on building types, density, open spaces, and block areas, which are the aspects related to land-use policies and sustainable urban developments.

As most East Asian cities still lack sufficient urban-form data, this approach would be an efficient and low-cost alternative to analyze urban landscapes. Especially for cities such as Taipei which has fewer building regulations, the method can help to keep track of changes in the urban development. The result also provides detailed information of the mega-city scale, which serves as a crucial spatial reference for sustainable development strategies. The block typologies, which implies construction periods, density, and functions, support a type-based strategic plan for urban issues such as urban regeneration, mitigation of urban heat island effects, and allocation of infrastructure. The two-level information provides the possibility to localize potential urban planning interventions such as areas of re-densification, climate adaptation measures, or areas which require new zoning regulations. Although urban forms have regional and size differences (Hecht, 2014; Steiniger et al., 2008), this approach may be applied in other contexts with pre-collected training data. The sampling process may be time-intensive if no existing data is available. However, the amount of work can be reduced for future studies by building an open-source library to provide pretrained building-classification models and sample data, which will not only improve the model accuracy and efficiency but also increase the applicability in other cities.

Supplemental Material

Supplemental Material - Developing a two-level machine-learning approach for classifying urban form for an East Asian mega-city

Supplemental Material for Developing a two-level machine-learning approach for classifying urban form for an East Asian mega-city by Chih-Yu Chen, Florian Koch and Christa Reicher in Environment and Planning B: Urban Analytics and City Science

Footnotes

Author’s note

We declare that this manuscript is original, has not been published before, and is not under consideration for another journal. As Corresponding Author, I confirm that the manuscript has been read and approved for submission by all authors.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the PhD grant by Ministry of Education, Taiwan.

Supplemental Material

Supplemental material for this article is available online.

Ms. Chih-Yu Chen is a PhD candidate at the Faculty of Architecture of the RWTH Aachen University, Germany. She holds a scholarship from the Ministry of Education in Taiwan for her PhD studies. Ms. Chen obtained a master’s degree in urban planning in the National Cheng Kung University, Taiwan, in 2019. She has presented in several international conferences and received best-paper awards, such as Asian Conference on Environment-Behaviour Studies and International Seminar on Urban Form. Her research interests include methods of GIS and computer science in urban form studies, urban formation in East Asia, and strategies of sustainable urban development.

Prof. Dr. Florian Koch received a PhD degree at the Department of Social Sciences of the Humboldt University of Berlin and the Faculty of Architecture of the Bauhaus University, Weimar, in 2008. He is currently professor for Real Estate Development, Urban Development, Smart Cities of the University of Applied Sciences HTW Berlin, since 2018. Prior to his current position, he was a researcher at the Helmholtz-Centre for Environmental Research, Leipzig, and professor for European Studies and Regional Development of the University of the North, Colombia. His research activities include smart cities, digitalization and sustainable development.

Prof. Dipl.-Ing. Christa Reicher is the chair of Urban Design and professor for Urban Design and European Urbanism at the Faculty of Architecture of the RWTH Aachen University, since 2018. Prior to her current position, she was professor for Spatial Planning of the Technical University of Dortmund. She is a practicing urban planner and has published several books and gives invited keynotes in urban development planning, urban renewal and district development. Her current research activities include technical innovations and mobility, resources and energy in district development and campus development in Aachen.

References

Abadi

Agarwal

Barham

, et al. (2015) TensorFlow: large-scale machine learning on heterogeneous systems (version 2.5.0). Available at: https://www.tensorflow.org/

Araldi

Fusco

(2019) From the street to the metropolitan region: pedestrian perspective in urban fabric analysis. Environment and Planning B: Urban Analytics and City Science 46(7): 1243–1263.

Arundel

Ronald

(2017) The role of urban form in sustainability of community: the case of Amsterdam. Environment and Planning B: Urban Analytics and City Science 44(1): 33–53.

Bai

Juang

J-Y

Kondoh

(2011) Urban warming and urban heat islands in Taipei, Taiwan. In: Taniguchi

(ed) Groundwater and Subsurface Environments. Tokyo: Springer Japan, 231–246.

Bard

(2004) Quality assessment of cartographic generalisation. Transactions in GIS 8(1): 63–81.

Beck

Long

Boyd

, et al. (2020) Automated classification metrics for energy modelling of residential buildings in the UK with open algorithms. Environment and Planning B: Urban Analytics and City Science 47(1): 45–64.

Beermann

Berchtold

Baumüller

, et al. (2014) Städtebaulicher Rahmenplan Klimaanpassung für die Stadt Karlsruhe (Teil II). Karlsruhe: LUBW.

Belgiu

Tomljenovic

Lampoltshammer

, et al. (2014) Ontology-based classification of building types detected from airborne laser scanning data. Remote Sensing 6(2): 1347–1366.

Beyhan

Güler

Tağa

(2020) An algorithm for maximum inscribed circle based on Voronoi diagrams and geometrical properties. Journal of Geographical Systems 22(3): 391–418.

10.

Burdett

Rode

(eds) (2018) Shaping Cities in an Urban Age. London: Phaidon Press.

11.

Candido

Blanco

Medina

, et al. (2021) Improving the consistency of multi-temporal land cover mapping of Laguna lake watershed using light gradient boosting machine (LightGBM) approach, change detection analysis, and Markov chain. Remote Sensing Applications: Society and Environment 23: 100565.

12.

Caniggia

Maffei

(2001) Architectural Composition and Building Typology: Interpreting Basic Building. Firenze: Alinea.

13.

Chen

Guestrin

(2016) XGBoost: a scalable tree boosting system (version 1.5.0). Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY: ACM, 785–794.

14.

Chen

T-L

Chiu

H-W

Lin

Y-F

(2020) How do east and southeast Asian cities differ from western cities? A systematic review of the urban form characteristics. Sustainability 12(6): 2423.

15.

Chen

Biljecki

(2021) Classification of urban morphology with deep learning: application on urban vitality. Computers, Environment and Urban Systems 90: 101706.

16.

Chollet

others (2015) Keras (version 2.5.0). Available at: https://keras.io

17.

Clifton

Ewing

Knaap

G-J

, et al. (2008) Quantitative analysis of urban form: a multidisciplinary review. Journal of Urbanism: International Research on Placemaking and Urban Sustainability 1(1): 17–45.

18.

Conzen

(2004) Thinking about Urban Form: Papers on Urban Morphology, 1932–1988. Oxford: Peter Lang.

19.

ESCAP (2013) Urbanization trends in Asia and the pacific. Available at: https://www.unescap.org/resources/urbanization-trends-asia-and-pacific# (accessed 13 April 2021).

20.

Fleischmann

Feliciotti

Romice

, et al. (2021) Methodological foundation of a numerical taxonomy of urban form. Environment and Planning B: Urban Analytics and City Science 49: 1283–1299. DOI: 10.1177/23998083211059835.

21.

Francisco

Gisbert

Cantarino Martí

, et al. (2017) Clustering cities through urban metrics analysis. Journal of Urban Design 22(5): 689–708.

22.

Gil

Beirão

Montenegro

, et al. (2012) On the discovery of urban typologies: data mining the many dimensions of urban form. Urban Morphology 16(1): 27–40.

23.

Google Maps (2019) “Sreet View,” Digital Images, Google Maps: Photograph of Taipei, Taiwan. http://maps.google.com

24.

Hackenbruch

(2018) Anpassungsrelevante Klimaänderungen für Städtische Baustrukturen und Wohnquartiere. Karlsruhe: KIT Scientific Publishing.

25.

Hamnett

Forbes

(2011) Planning Asian Cities: Risks and Resilience. London and New York, NY: Routledge.

26.

Hecht

(2014) Automatische Klassifizierung von Gebäudegrundrissen: Ein Beitrag zur kleinräumigen Beschreibung der Siedlungsstruktur. Berlin: Rhombos-Verl. Zugl.: Dresden, Techn. Univ., Diss., 2013.

27.

Hecht

Meinel

Buchroithner

(2015) Automatic identification of building types based on topographic databases – a comparison of different data sources. International Journal of Cartography 1(1): 18–31.

28.

Henn

Römer

Gröger

, et al. (2012) Automatic classification of building types in 3D city models. GeoInformatica 16(2): 281–306.

29.

Hermand

Quesada

(2019) Rethinking the impact of urban form in sustainable urban planning policy. European Journal of Sustainable Development 8(2): 325.

30.

Hermosilla

Palomar-Vázquez

Balaguer-Beser

, et al. (2014) Using street based metrics to characterize urban typologies. Computers, Environment and Urban Systems 44: 68–79.

31.

Horner

(2007) A multi-scale analysis of urban form and commuting change in a small metropolitan area (1990–2000). The Annals of Regional Science 41(2): 315–332.

32.

Huang

Sellers

(2007) A global comparative analysis of urban form: applying spatial metrics and remote sensing. Landscape and Urban Planning 82(4): 184–197.

33.

Jain

(2010) Data clustering: 50 years beyond K-means. Pattern Recognition Letters 31(8): 651–666.

34.

Meng

Finley

, et al. (2017) Lightgbm: a highly efficient gradient boosting decision tree (version 3.1.1). Advances in Neural Information Processing Systems 30: 3146–3154.

35.

Levy

(1999) Urban morphology and the problem of the modern urban fabric: some questions for research. Urban Morphology 3(2): 79–85.

36.

Quan

(2023) Identifying urban form typologies in Seoul using a new Gaussian mixture model-based clustering framework. Environment and Planning B: Urban Analytics and City Science 0(0): 1–17. Available at: https://doi.org/10.1177/23998083231151688

37.

Song

Kaza

(2018) Urban form and household electricity consumption: a multilevel study. Energy and Buildings 158: 181–193.

38.

Chen

(2021) An elastic urban morpho-blocks (EUM) modeling method for urban building morphological analysis and feature clustering. Building and Environment 192: 107646.

39.

Maceachren

(1985) Compactness of geographic shape: comparison and evaluation of measures. Geografiska Annaler - Series B: Human Geography 67(1): 53–67.

40.

McCarty

Kim

Lee

(2020) Evaluation of light gradient boosted machine learning technique in large scale land use and land cover classification. Environments 7(10): 84.

41.

Meinel

Hecht

Herold

, et al. (2008) Automatische Ableitung von stadtstrukturellen Grundlagendaten und Integration in Einem Geographischen Informationssystem. Bonn: Bundesamt für Bauwesen und Raumordnung.

42.

National Land Surveying and Mapping Center (2016) 1:2500 Topographic Map of Taipei. Taiwan: National Land Surveying and Mapping Center.

43.

Orford

Radcliffe

(2007) Modelling UK residential dwelling types using OS Mastermap data: a comparison to the 2001 census. Computers, Environment and Urban Systems 31(2): 206–227.

44.

Pauleit

(2016) Welche Beziehungen bestehen zwischen der räumlichen Stadtstruktur und den ökologischen Eigenschaften der Stadt? In: Breuste

Pauleit

Haase

, et al. (eds) Stadtökosysteme: Funktion, Management und Entwicklung. Berlin, Heidelberg: Springer Berlin Heidelberg, 31–60.

45.

Pedregosa

Varoquaux

Gramfort

, et al. (2011) Scikit-learn: machine learning in Python (version 0.24.2). Journal of Machine Learning Research 12: 2825–2830.

46.

Ratti

Raydan

Steemers

(2003) Building form and environmental performance: archetypes, analysis and an arid climate. Energy and Buildings 35(1): 49–59.

47.

Reicher

(2017) Städtebauliches Entwerfen. Wiesbaden: Springer Fachmedien Wiesbaden.

48.

Salat

(2009) Energy loads, CO2 emissions and building stocks: morphologies, typologies, energy systems and behaviour. Building Research & Information 37(5–6): 598–609.

49.

Samuelsson

Colding

Barthel

(2019) Urban resilience at eye level: spatial analysis of empirically defined experiential landscapes. Landscape and Urban Planning 187: 70–80.

50.

Schirmer

Axhausen

(2013) A multiscale classification of urban morphology. Journal of Transport and Land Use 9: 101–130. DOI: 10.5198/jtlu.2015.667

51.

Schneider

Mertes

Tatem

, et al. (2015) A new urban landscape in East\textendash2010. Environmental Research Letters 10(3): 34002.

52.

Schumm

(1956) Evolution of drainage systems and slopes in badland at Perth Amboy, New Jersey. Geological Society of America Bulletin 67(5): 597–646.

53.

Song

Knaap

G-J

(2007) Quantitative classification of neighbourhoods: the neighbourhoods of new single-family homes in the Portland metropolitan area. Journal of Urban Design 12(1): 1–24.

54.

Steadman

Hamilton

Evans

(2014) Energy and urban built form: an empirical and statistical approach. Building Research & Information 42(1): 17–31.

55.

Steiniger

Lange

Burghardt

, et al. (2008) An approach for the classification of urban building structures based on discriminant analysis techniques. Transactions in GIS 12(1): 31–59.

56.

Chen

Liao

(2021) The impact of land use change on disaster risk from the perspective of efficiency. Sustainability 13(6): 3151.

57.

Taipei City Construction Management Office (2019) Summary of building usage licenses. Available at: https://data.taipei/dataset/detail?id=c876ff02-af2e-4eb8-bd33-d444f5052733 (accessed 03 November 2020).

58.

Vanderhaegen

Canters

(2017) Mapping urban form and function at city block level using spatial metrics. Landscape and Urban Planning 167: 399–409.

59.

Whitehand

JWR

(2001) British urban morphology: the Conzenian tradition. Urban Morphology 5(2): 103–109.

60.

Whitehand

JWR

(2011) Issues in urban morphology. Urban Morphology 16: 55–65.

61.

Wolff

Haase

(2018) Compact or spread? A quantitative spatial model of urban areas in Europe since 1990. PLoS One 13(2): e0192326.

62.

Yan

Yang

, et al. (2019) A graph convolutional neural network for classification of building patterns using spatial vector data. ISPRS Journal of Photogrammetry and Remote Sensing 150: 259–273.

63.

van Nes

(2014) Quantitative tools in urban morphology: combining space syntax, spacematrix and mixed-use index in a GIS framework. Urban Morphology 18(2): 97–118.

64.

Yosef

(2006) Sustainable urban forms: their typologies, models, and concepts. Journal of Planning Education and Research 26(1): 38–52.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

1.71 MB