Categorizing urban space based on visitor density and diversity: A view through social media data

Abstract

Analyses of urban spaces have often stressed the importance of both the density and diversity of the people they attract. However, the diversity of people is a challenging concept to operationalize within the context of urban spaces, which is why many evaluations of urban space have relied primarily on density-based measures. We argue that a focus on only one of the two aspects misses important aspects of the variety of urban spaces in our cities. To address this, we design a methodology that evaluates both the density and diversity of human behavior in urban spaces based on geosocial media data. We operationalize density as the frequency of tweets from visitors to a particular location and diversity as the variety of the home neighborhoods of those visitors. Taking Singapore as a test case, we identify networks between the home neighborhoods of 28k Twitter users based on 2.2 million geolocated tweets collected between 2012 and 2016. Based on these data, we categorize the urban landscape of Singapore into four “performance” categories, namely High-Density/High-Diversity, High-Density/Low-Diversity, Low-Density/High-Diversity, and Low-Density/Low-Diversity. Our findings illustrate that this combined indicator provides useful nuance compared to differentiation between well and less performing spaces based on density alone. By enabling a categorization of urban spaces that fits closer to the diversity of human behavior in these spaces, human mobility data sets, such as the social media data we use, open the door to a practical evaluation of the design and planning of our heterogeneous urban environment.

Keywords

urban spaces categorization human behavior density-diversity visitors social media data

Introduction

Density and diversity are key aspects of the urban environment, each with their own characteristics. Density of people and the built environment is often considered as a prerequisite for a vital and sustainable city (Jacobs, 1961: 205), whereas diversity, too, plays a key role in fostering a dynamic and active city life (Montgomery, 1998). However, rapid urban growth—as we have seen over the last decades—can also be at odds with the ostensible heterogeneity of a vital city. Modern cities can run the risk of providing a relatively dense yet segregated or even homogeneous urban landscape (Mumford, 1961; Trancik, 1986). On that account, urban thinkers and designers as far back as the 1960s have advocated for the role of diversity in reviving or safeguarding this essential quality of urban life (Cullen, 1961; Jacobs, 1961).

However, to actually measure or empirically identify diversity whether it relates to the physical characteristics or social dimensions of urban spaces, is a formidable challenge. Venturi et al. (1997) pointed out that the design of our built environments relies mostly on “decorative shades” of standard boxes. The physicality of building facades may appear in a mix of shapes and sizes but do not necessarily reflect their uses and programs. Therefore, the physical diversity of the urban landscape is often approached in conjunction with variety of services and activities that the built environment offers (Carmona, 2001, 2019). Similarly, social diversity can be challenging to measure directly but can, for example, be proxied by the diversity of uses in an urban environment (Jacobs, 1961; Talen, 2012). This is why studies of urban diversity have adopted neighborhood-level land use (Fuentes et al., 2022; Talen, 2005), housing typology mix (Talen & Lee, 2018), and policy controls (Kleinhans, 2004) as indicators of social diversity in urban areas. More recent studies have also started relying on digital technologies and big data resources to analyze, for example, the diversity of building footprints (Scepanovic et al., 2021) and changes in neighborhood characteristics from street view (Naik et al., 2017) to evaluate social diversity at large.

In this research, instead of quantifying diversity from the physical environment or “use” perspective, we focus on the “user” dimension analyzing spaces from the perspective of different visitor groups. This perspective of the visitor, which constitutes one critical aspect of social diversity, can help capture meaningful insights around diversity and the formation of social capital at both neighborhood and urban levels (Glaeser et al., 2002). Thus, we explore more directly how this visitor-based approach can complement our understanding of the different characteristics of urban spaces. Diversity in our approach is the co-location of people from various neighborhoods. Therefore, instead of viewing urban spaces through the functional lens of housing, work, recreation, and transportation (Cullen, 1961; Mumford, 1961; Živković, 2019) that construct the physical form of the city, we analyze the social dimension of urban spaces through the daily mobility of people.

During the last two decades, geolocated social media has become prolific data sources to measure different aspects of urban spaces (Martí et al., 2019). This type of data allows us to conduct analysis to be at a granular scale than previously was possible by traditional, on-site data collection methods. More so, the detailed nature of these datasets and the ability to connect data over longer time periods, create opportunities to start to measure more intangible aspects of urban spaces, such as quantifying the diversity of participants of urban spaces, as we do in this article.

However, research utilizing geosocial media data often relies on metrics derived from the frequency of posts, essentially measuring density, as a proxy for the activity or “vitality” of a specific geographic location (Chen et al., 2019; Kim, 2018). Although a certain density of people might be an important contributor to an active social environment, there does not need to be a direct, linear correlation with diversity (Talen, 2012). Diversity differs from density in the urban design context because diversity might contribute to the potential for economic and social exchange and, ultimately, the vitality of a place (Greenberg, 1995). Furthermore, diversity in urban environments also links to social equity as it might engender access to the city for all social groups and generate opportunities for mixing and increased understanding between population groups (Talen, 2012). From this point of view, we argue studies of urban vitality can be significantly enhanced by not only using density-based metrics but also including a specific diversity dimension.

Here, we leverage geosocial media, Twitter specifically, to quantify this relatively intangible aspect of urban spaces. We first derive home locations from geotagged Twitter data and use this as a basis to construct networks of flows between urban locations. Then we infer diversity based on the variety of origins that visitors of a certain location have and analyze density according to the frequency of visitors within that certain location. Subsequently, we combine the two (i.e., diversity and density) metrics into a 2-dimensional categorization of the activity-level in urban spaces (see Fig. S1 in the Supplementary Material). In this way, we construct a bottom-up activity-based classification of urban spaces, rather than one that is based on land-use or policy-prescribed urban functions.

To illustrate this potential, we select Singapore as a case study and combine the inferred 2-dimensional categorization of each location’s activity level with land-use data to contextualize our findings. As such, the research sketches out the overall socio-spatial landscape of Singapore, with patterns of “vitality” (high density and high diversity) and “low profile” (low density and low diversity) places clearly identified throughout the city. Beyond that, most interestingly, our approach also identifies locations that might easily be overlooked if diversity would not be considered (i.e., low density and high diversity). For instance, local neighborhood parks that are loved by local residents can attract diverse visiting groups but do not draw the same crowds that popular tourist destinations do. Although these locations are valued urban amenities in spatial planning frameworks, the extent to which specific places contribute to this type of visitor-based diversity might not always be known or fully appreciated. As such, diversity-based metrics based on social media can offer an additional perspective on urban spaces study that helps urban planners and designers better understand and appreciate the variety of urban spaces in the city.

Related work

In urban studies, the concept of diversity has been used in a wide variety of context and has taken on many different definitions, from a mixture of physical characteristics (Jacobs and Appleyard, 1987; Montgomery, 1998), urban functions (Pendall, 2000), to social activities (Montgomery, 1998). Moreover, social diversity is seen as a normative goal in urban design and planning in order to promote social cohesiveness (Talen, 2012), to “bridge” social capital (Putnam, 2000), to capitalize on “collective efficiency” (Sampson, 2019), and to enhance urban vitality (Kang et al., 2021).

It is, therefore, no surprise that urban designers are constantly searching for solutions to bridge the relationship between diversity and the design of the built environment. For example, there have been various comprehensive models that attempt to evaluate and measure urban spaces from these aspects. The Star-Model proposed by Varna (2014) evaluates the attributes of ownership, control, and physical configuration. Public Space Index (PSI) model empirically assesses the inclusiveness, meaningfulness, safety, and comfort through 43 variables that range from variety of activities to articulation in architectural features of context buildings (Mehta, 2014). The Tool for Urban Space Analysis (TUSA) evaluates all aspects of urban spaces in categories of Software (social dimension), Hardware (physical and spatial value), and Orgware (operation and management aspects) (Cho et al., 2016). These models all include the aspect of diversity in physical configuration and activity in a space based on visual observation, only the PSI model embeds measure of social diversity according to observations to infer people’s age, gender, and ethnicity, but not other aspects of social diversity (e.g., income and education).

Consequently, in empirical research of urban design and spaces, social diversity itself is often implied rather than directly observed or calculated. This is in contrast to analyses of social diversity in urban studies that often use metrics derived from census or survey data at the neighborhood level (Altman, 2018; Smith, 1998), as well as interviews to collect detailed demographic information. However, the use of such data in analyses of small-scale urban spaces can be both resource-intense and time-consuming and still only allows for a relatively coarse spatio-temporal understanding of social diversity.

Recently, new digital data sources on human mobility patterns are quickly becoming a valuable mainstay in urban research, as people’s spatial activities can be derived from GPS logs (Kong et al., 2018), mobile phone data (Li et al., 2019; Zhao et al., 2019) as well as geosocial media platforms (Psyllidis et al., 2018; Song et al., 2020). Geosocial media data have enabled new and creative approaches to study the urban landscape due to its relatively accessible nature and the detailed spatio-temporal resolution of the data. Such data have gained significant popularity as a way to investigate the spatial, social, and temporal characteristics of urban spaces (Li et al., 2019; Martí et al., 2019; Shaw and Sui, 2020). In addition, researchers have used these data to infer socio-geographical relations between people and places and use these relations to shine light on a number of urban issues such as individual activity spaces (Ayala-Azcárraga et al., 2019; Jurdak et al., 2015), social interaction (Prasetyo et al., 2016), gentrification (Poorthuis et al., 2021), and social inequality (Pendall and Hedman, 2016; Shelton et al., 2015).

Although, geosocial media presents outstanding potential, much of this research is based on an analysis of the “density” of social media activity but does not necessarily take into account the characteristics of who is creating this activity. As such, the diversity of urban space is understudied in this regard, except for some notable exceptions. For example, Hristova et al. (2016) explore the potential of geosocial networks (Twitter and Foursquare) to capture the social diversity of urban locations through mobility patterns of their visitors. Similarly, Wu et al. (2019) use mobile phone application data to study people’s exposure to racial diversity by considering their entire activity space, while Dong et al. (2020) and Moro et al. (2021) use mobile phone and Twitter data to analyze (income) segregation. These studies rely on a proxy for the socio-economic characteristics of individual users in mobility datasets. Often little is known about each user in this regard, but a home location can be inferred from each mobility pattern. This home location can be tied to a specific census tract, of which the socio-economic characteristics can be used as an—imperfect—proxy for the characteristics of people from that tract. In this paper, we will use a similar approach—using a person’s home location as a proxy for the diversity of their socio-economic characteristics—but instead of taking residents as our ultimate unit of analysis, we apply this technique to analyze the density and diversity of small-scale urban spaces.

Although geotagged Twitter data have proven to offer valuable affordances for urban research, it is worth noting that such data have clear limitations as well. These limitations range from the more technical, such as spatial accuracy and precision (e.g., Mittal et al., 2019), to the representational—the uneven data “shadows,” potential biased representation, and overall lack of socio-demographic information (Longley et al., 2015)—to the ethical (Boyd and Crawford, 2012; Zook, 2017). Different efforts have been made to partially address these challenges, for example, through aggregating mobility patterns or inferring larger patterns based on statistical models (e.g., Alexander et al., 2015; Phithakkitnukoon et al., 2010; Ye et al., 2009), through contextualizing data with nearby land use information and point of interests (POIs) (Horozov et al., 2006), and through specifically evaluating the bias in these types of data (e.g., Longley et al., 2015; Luo et al., 2016). It should be noted that in most applications of Twitter data any such efforts can only partially address these limitations, for example, because a “ground truth” is simply not known. In this regard, such data are not a replacement for census and surveys based on a representative sampling strategy.

In this research, we address this challenge by using a dataset provided by Chen and Poorthuis (2021) as our foundation. It is based on all Twitter users posting from Singapore between 2012 and 2016 and carefully filters out both sporadic users (because not enough data points exist to reliably infer meaningful locations) as well as “power” users (as these are likely bots and other automatic accounts). For the remaining users, a home location is inferred using both a set of derived variables that weigh spatio-temporal “home-like” behavior as well as a set of stringent filters designed to prevent false positives (see Table S1). Using this strategy, we filter out 93% of all users but the remaining 7% (∼28k users) follow the distribution of the population quite closely. Chen and Poorthuis (2021) report a precision of 77% based on a manual verification and a correlation coefficient between the number of detected users and actual residents per planning zone of 0.96. The results in this paper are further contextualized through and verified by the local knowledge of the authors. We will continue to discuss these issues in the following section and the conclusion.

Research methodology

As people move around the city, they create a web—a network—of connections between the different places they visit. In this work, we infer this network of connections from Twitter data. When users opt-in to this, tweets contain explicit coordinates and timestamps that reflect the spatio-temporal path of its creator. The more often a person tweets from a place, the more likely that place is meaningful to this person and thus the stronger the connection to that location (Hossain et al., 2016; McGee et al., 2013). We use these networks to ultimately understand the specific make-up of the visitors to a particular place and by extension the social diversity of a location. As such, we are able to operationalize both the density (i.e., frequency of visits) as well as the diversity of visitors (where individuals are visiting from).

To illustrate our approach, we begin by evaluating the visitor density, followed by analyzing diversity based on the variety of home locations of these visitors. We then combine these two concepts into a 2x2 matrix (see Figure S1) that allows for the categorization of activity levels in urban spaces. We take Singapore as the case study and use an anonymized locational based social media (LBS) dataset that contains approximately 22 million tweets sent by around 405.5k users from July 2012 to Jun 2016 (Chen and Poorthuis, 2021) for the analyses. It is important to note that the precision of the geographical attributes of social media is varied based on both user settings (automatic or self-entered) and the accuracy of the used smartphone devices (Shelton, 2017). To circumvent this issue, we aggregate data points to hexagonal grid cells with a 300-meter resolution. In doing so, our analysis includes a total of 10,910 grid cells with 71% of them containing tweeting activity.

Density analysis

We define density based on the visitor frequency of tweets that can be geographically linked to a location. However, simply aggregating data points can result in the mere mapping of a tweet population distribution without yielding any meaningful insights (Poorthuis et al., 2014). For instance, in the Singapore context, this will only identify the “global” (i.e., island-wide) high-density places, like well-known tourist spots (e.g., Marina Bay Area, Chinatown, and Orchard Road) but easily dismiss other significant neighborhood spaces with much less footfall relative to these city-wide hotspots. This can yield a skewed understanding of the overall social landscape of a city. Therefore, we instead opt to calculate a “local” version of a density metric that makes the density in a location relative to the density in its vicinity. Specifically, we use a 1km buffer, which encompasses a community with approximately a 15-minute ped shed and compares the density value of each grid cell to that of the 1km neighboring area. Therefore, the local density (D) of a grid cell is calculated as follows

D_{i} = \frac{N_{i}}{\sum_{j = 1}^{k} N_{j}}

(1)

where

N_{i}

refers to the number of visitors observed in the

i

th grid cell and

k

is the total number of grid cells within a 1km buffer of the

i

th grid cell, which is not include the

i

th grid cell itself.

To ensure grid cells have sufficient observations for the analyses, all grid cells with fewer than 100 tweets and fewer than 5 different visitors during the entire study period are eliminated. As such, we retain a total of 4588 (60%) grid cells for subsequent analysis.

Diversity analysis

We define diversity as the variety of visitors’ home locations to each urban space (i.e., the 300m grid cell). To operationalize diversity, we first infer home locations of visitors by using “homelocator” package developed by Chen and Poorthuis (2021). The embedded HMLC algorithm weights data points sent by a user across multiple time frames and extracts the location with the highest accumulated weights as the user’s home location. The detailed parameters used in the algorithm are attached as Table S1 in the Supplementary Material. In doing so, we are able to identify 27,892 (7%) users’ home locations (see Fig. S3(a)). As a validity test of this approach, we compare the spatial distribution of inferred home locations with Singapore’s public housing locations—derived from the Singapore Land Authority data, where 80% of Singaporeans are housed. The home location pattern correlates heavily with the public housing clusters, indicating that the identified home locations can be sensibly used as a proxy for the subsequent diversity analysis (see Figure S3(b)).

Following, we apply the concept of biological diversity from ecology by adopting the Shannon diversity index (H)¹. It evaluates both the richness and evenness of the distribution for all species within an ecological environment. This concept resonates with the social variables used to analyze the composition of diversity in visitor groups at the neighborhood level (Talen, 2012). Borrow the concept of “species” from ecology, we construct dynamic buffers for each grid cell based on different radius distances and directions (citation omitted for blind review). Specifically, we construct buffers with 1 km, 3km, 5km, and 10km radii around each grid cell, and further divided these circular buffers by the four cardinal directions (i.e., Northeast, Southeast, Northwest, and Southwest), yielding a total of 24 different zones (see Fig. S4). These dynamic zones reflect the different distances as well as related purposes and motivations that lie under people’s travel behavior (Spinney and Millward, 2013). For example, people visiting from within the 1 km buffer are more likely to be local residents who use the space in a daily manner. As the distance increases, visitors would be visiting for different reasons and frequencies, and are thus construed as a different “species” in the diversity calculation. In doing so, we are able to compute a single quantitative metric to represent the different groups of visitors visiting from different geographical zones. The Shannon index (H) is calculated as follows

H = - \sum_{i}^{S} p_{i} \ln p_{i}

(2)

where

p_{i}

is the proportion of individuals that are allied to species

i

and

S

refers to the number of species (i.e., number of zones in this case).

However, it is important to note that the size of the zones increases at larger distances, which may result in a bias of the diversity findings. To alleviate the issue, we use visitor density instead of actual counts of visitors by normalizing by the zone area.

Spatial classification

The concepts of “density” and “diversity” are then combined to form a 2x2 categorization matrix for classifying the activity level in an urban space. Due to the different calculation methods as described above, the natural distribution for density value is skewed (skewness = 34.21) ranging from 0.00074 to 5.07246, while diversity value ranges from 0 to 2.5. To ensure the density and diversity metrics are on the same scale, we first apply log transformation on the density value, and then rescale the diversity based on the minimum and maximum value of the logged density. The scaled density and diversity are used as x-axis and y-axis separately in the matrix (see Figure S2). Subsequently, locations with density and diversity values above the median (50% benchmark) are classified as High-Density and High-Diversity (H-density/H-diversity). These places are the high “vitality” locations in the city that attract many visitors from different parts of the city. The same logic applies to the rest of the three categories, namely High-Density and Low-Diversity (H-density/L-diversity), Low-Density and High-Diversity (L-density/H-diversity), and Low-Density and Low-Diversity (L-density/L-diversity) as, respectively, with different levels of density and diversity measures.

Furthermore, within each category, we identify the top 30 places, representing examples of an “ideal” type in that category. In the empirical analysis, we will visualize these geographical locations, and contextualize them so they can be used as an informed indicator for understanding Singapore’s urban spaces’ socio-spatial characteristics.

Results and discussion

In this section, we first discuss the results of our analysis of density and diversity in Singapore separately, before combining the two measures in the categorization of urban spaces. As discussed previously, density encompasses a localized measure of density relative to a 1km buffer around each area. Figure 1(a) shows the overall spatial distribution of density. To contextualize the resulting patterns, we zoom in from city-scale to neighborhood scale. Specifically, we take the neighborhood of Tampines as an example (see Figure 1(b)), which is one of the four regional centers in Singapore housing with an array of facilities and urban programs to support residents’ needs. The places with the highest local density in Tampines are the transportation interchange, IKEA shopping complex, ITE college campus, and the exposition center. This makes sense as urban transportation interchanges, large commercial complexes, education institutions, and event spaces are bound to attract large crowds. The second category of relatively dense places (relatively “active”) is identified at the center of the large public housing complex developed by the Singapore Housing Department Board (HDB) in the area. HDBs, as they are colloquially called, are high-density residential clusters that house ∼80% of Singaporeans. Again, these results by themselves do not reveal anything unexpected about urban vitality in this area of Singapore. Where it is expected to have a much higher density of population in the area and exhibit relatively high social activity.

Figure 1.

Spatial distribution of density and diversity. (a) Logged local density of Singapore; (b) Logged local density of Tampines; (c) Normalized diversity of Singapore; (d) Normalized diversity of Tampines.

The mapping of the diversity metric in Figure 1(c) highlights the downtown area and marks the major express highways, reflecting that both roads and the downtown area generally attract a wider diversity of people than the residential neighborhoods in the peripheral ring around Singapore (colloquially referred to as “the heartlands” in Singapore). However, in the context of Tampines (see Figure 1(d)), the diversity mapping reveals an immediate contrasting pattern to the density mapping in Figure 1(b). The most distinct difference is that the high-density HDB areas are identified with very low diversity measures. In this sense, residential areas are predominantly local or private, that domain with fewer public incentives to attract visitors from different neighborhoods. This is especially the case in the Singapore context where family members often live in the district of the city. Some of the most diverse places also come back as the denser places in the previous section. But importantly, there are some clear discrepancies as well. For example, Tampines Eco Green—an ecological park focused on nature trails—does not rank as a very dense place, but it does seem to attract visitors from a wide range of neighborhoods.

As demonstrated from plotting “density” and “diversity” measures (see Figure S2), there is a minimal association between the two (Pearson’s correlation coefficient = 0.01). In other words, the two metrics are independent of each other in the sense that high-density places can certainly have a low diversity in the origin of visitors (e.g., HDB housing), or high-diversity places can have a limited number of visitors (e.g., Tampines Eco Green). Based on either variable individually, evaluating urban space can result in a biased assumption of what a “well-performing” (high-density and high-diversity) urban space is. Therefore, we argue that both variables should be equally considered to obtain a balanced and comprehensive view of our urban spaces.

To illustrate how the combined density and diversity measure can be used to classify urban space, we take a few representative locations from the top-30 places mentioned earlier in each category and discuss the underlying reasons that may contribute to specific activity levels in each place. Figure 2 outlines a general typology for each spatial category, where H-Density/H-Diversity places are prone to be high profile tourist destinations, H-Density/L-Diversity places are located mostly at high-density residential neighborhoods, L-Density/H-Diversity places are identified with leisure programs, and L-Density/L-Diversity places that tend to be low density with unitary urban functions.

Figure 2.

Spatial categories and representative spots. 1) Ion Orchard MRT Station, photography by Fabio Achilli; 2) Old Airport Road, photography by One More Bite Blog; 3) Vivo City, photography by Wojtek Gurak; 4) Holland Village, photography by Nagono; 5) Causeway Point Mall, photography by Cmglee; 6) Punggol Plaza, photography by Deoma12; 7) Pioneer Mall, photography by Waynema; 8) Faber Park, photography by, Jordan Rockerbie; 9) Istana Park, photography by Choo Yut Shing; 10) Sports Stadium, photography by Rodrigo Soldon Souza;11) Upper Thomson, photography by Nethaniel;12) Serangoon Terrace, photography by Project Manhattan.

More specifically, in the H-density/H-diversity category, HarbourFront Vivo City, Holland Village, Orchard MRT station vicinity, and the Old Airport Road Food Centre are four well-known destinations. It is expected that these high-profile places would have both a high density and diversity of visitors. For instance, the Orchard MRT station area is a prominent transportation hub that inter-connected four different subway lines and the heart of Singapore with vibrant commercial and social activities. HarbourFront is a landmark destination that houses a range of leisure and recreation facilities for both international and local visitors. Other places like Holland Village, which is a well-known expatriate social enclave that houses services for foreigners and is also home to many famous and characteristic pubs and restaurants to draw visitors from all over the city. Most interestingly, it is the places like Old Airport Road Food Center, a local favorite with classic dishes and long history that attracts both locals from all different areas of Singapore but less well-known for tourists, that can be identified using the localized version of density we use here. This illustrates the capability of the methodology for reflecting insights that are contextualized to both global and local conditions.

As demonstrated earlier, high-density places do not directly imply a high-diversity place and vice versa. Looking at the H-density/L-diversity category, spaces that fall in this category are primarily dense residential community centers or local shopping centers, such as Woodlands Causeway Point Mall, Jurong West Pioneer Mall, and Punggol Plaza. These neighborhoods are Singapore’s “heartlands.” They are planned to be relatively self-contained to cater to residents’ daily needs with markets, shopping malls, cinema complexes, restaurants, and schools. In this sense, there are relatively few reasons to travel from one residential community to a community or shopping center in another neighborhood. From this point of view, these heartland areas are highly populated with residents and cater primarily to locals, which are in line with our expectation of high density but low diversity urban space.

Among the four categories, L-density/H-diversity spaces make up the most interesting quadrant. It is here that the addition of diversity to the study of urban activity makes the most difference. Research that focuses only on density can easily overlook locations with diverse visitor presence. The underlying characteristics of locations that attract highly diverse groups of visitors even if they do not draw large crowds, can be informative to derive insights for future urban space design to be more inclusive. Based on our findings, the representative locations are Istana Domain and green recreation spaces (e.g., Faber Park, Singapore sport stadium parks). The Istana is the official residence of the President of Singapore and its grounds are often used for state functions and ceremonial occasions. The result shows that the Istana incurs diverse but less dense visitor groups from different areas across the city, which is akin to the nature of its program. In addition, the categorization also identifies green recreational spaces, for example, Faber Park is one of the oldest parks with a lookout point, which offers stunning panoramic views and a wide range of activities to visitors. Although Faber Park is a well-loved recreational park, compared to high-profile tourist destinations such as Marina Bay, the frequency of visitors is distinctively lower. If solely focusing on the density index, we could easily overlook such a successful urban precedent. This example again demonstrates the capacity of the method for capturing the hidden well-performing urban spaces.

The last category is the L-density/L-diversity urban spaces. We observe places at the landed property neighborhoods where housing density is relatively low or places along the country’s periphery adjacent to open green. For example, landed property neighborhoods at Seletar Hill and Thompson community. These locations are expected to have a lower density and diversity of visitors. Although they are less interesting from an urban design point-of-view, this category does provide a useful sanity check for the categorization we employ here.

Conclusion and future research

In this article, we have applied a quantitative analysis of the density and diversity of urban spaces through geosocial media data. This combination of density and diversity into a single categorization allows us to classify urban spaces based on human behavior. Instead of analyzing urban landscapes through the more common functional lens of land use or urban programming, this analysis foregrounds the social dimension of activity in urban spaces. The resulting 2x2 activity-based matrix helps to categorize urban spaces into four types, namely H-density/H-diversity, H-density/L-diversity, L-density/H-diversity, and L-density/L-diversity categories. This matrix offers a scan of the overall socio-spatial landscape of Singapore that identifies “vital” (H-density/H-diversity) urban spaces as well as “low profile” (L-density/L-diversity) locations. However, most interesting is the capability to highlight locations that score high on only one metric, and would likely be missed by analyses taking into account either density OR diversity.

As such, we argue that, although both the density and diversity of participants in urban spaces play a key role for the vitality of the city, the two concepts should be treated as independent, uncorrelated variables and be taken into account together. In conventional urban research, the intangible quality of diversity poses significant challenges for the operationalization and measurement of this concept. In order to address this challenge, we design a systematic approach to tabulate “diversity” of visitor groups based on social media data (or other similar datasets) so that it can be evaluated together with “density.” We operationalize density as the frequency of tweets from visitors to a particular location and diversity as the variety of the home neighborhoods of those visitors. Even this relatively simple proxy for diversity helps to create a better understanding and evaluation of urban spaces and their performance. Subsequent analysis with a more complete definition of diversity, including socio-economic characteristics, is likely to yield a more comprehensive view.

The Singapore case study reveals insights demonstrating that the concept of diversity is an important determinant for identifying “well-performing” urban spaces. Specifically, through contextualization with our own local knowledge, we show that adding diversity to density allows for an understanding of how people use urban spaces that is closer to the ground-truth of our qualitative understanding of many spaces. The analysis has successfully identified places that were expected to be in specific categories, such as highlighting top tourist destinations like Marina Bay and the Orchard Road area in the H-density/H-diversity category, which provides an important validity check for this research method. However, the most fruitful insight is the capability to locate the “in-between” classifications categories (i.e., L-density/H-diversity and H-density/L-diversity) and also the “outliers” within each performance category. These findings open the opportunity to examine further underlying factors that contribute to a specific categorization or high value on either metric. For instance, Tampines Eco-Green is a park and nature reserve that falls in the L-density/H-diversity category. During a physical visit, it is often quiet and not that many can be seen. However, the analysis here suggests that the visitors that it does attract are very diverse: they come from many different neighborhoods. We would argue that such a place, with low-density visits but that sufficient incentives (e.g., comfort, identity, accessibility, spatial quality, and programs) to draw diverse groups of visitors, provides solid ground to be further studied and learned from to inform the future design of inclusive (diverse) urban spaces.

This activity-based perspective on urban space in the city creates many potential research opportunities, such as studies focusing on the functionality of space (e.g., do we find specific uses in different density/diversity categories?), or their physical qualities (e.g., can certain design aspects help explain why places have a high density/diversity score?). The research can also be expanded by taking into account other geo-located datasets, for example, mobile application data, which potentially offers a more finely grained spatio-temporal resolution or by incorporating other ancillary datasets (e.g., standard census data), which can provide additional explanatory value in terms of users’ socio-demographic characteristics. This would also help alleviate the clear limitations and bias of Twitter and other social media datasets. We would like to stress here again that our Twitter-based analysis only provides a partial perspective on the diversity and density of urban spaces in Singapore. It is used here as an illustration of the value of combining diversity and density for analyzing urban spaces with mobility data. In addition, as discussed previously, the specific definition of diversity can significantly impact the resulting view on urban spaces. As such, exploring other diversity definitions (e.g., socio-economic, language, text, temporal, and travel distance) can contribute to a more comprehensive view of urban space.

Nonetheless, this research further widens the use of LBS data for urban analysis by adding the dimension of diversity to existing (density-based) approaches to understanding urban spaces. It opens the door to more informed evaluations of spatial performance, quick scans of social hotspots or underutilized spaces, and the foundation to make informed decisions for future urban design and development.

Supplemental Material

Supplemental Material - Categorizing urban space based on visitor density and diversity: A view through social media data

Supplemental Material for Categorizing urban space based on visitor density and diversity: A view through social media data by I-Ting Chuang, Qingqing Chen, Ate Poorthui in Environment and Planning B: Urban Analytics and City Science

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Data availability

All code used to reproduce the analysis and figures within this paper are available at: All data used can be available upon request due to the large file size.

ORCID iDs

I-Ting Chuang

Qingqing Chen

Ate Poorthui

Supplemental Material

Supplemental material for this article is available online.

Note

Author Biographies

I-Ting Chuang is a Lecturer at the School of Architecture and Planning of the University of Auckland. Her current research interests focus on design informatics in urban geography, emphasizing data analytics and the spatial quality of public spaces. Her research leverages large geolocated datasets’ potential in understanding the complexity of our contemporary urban environment. Dr. Chuang spent seven years in New York practicing Architecture and Urban Design in renowned firms before embarking on her academic career. She received Ph.D from Singapore University of Technology and Design (ASD) and M.Des from Harvard GSD.

Qingqing Chen is a Ph.D. candidate in the Department of Geography at the University at Buffalo – SUNY. Her research focuses on critically understanding urban space by leveraging big data, combined with data science and machine learning techniques. She is interested in urban data science, geocomputation, non-visual sensory measuring and monitoring, social media & big data. She received a M.S. in Physics from National University of Singapore and B.S. in Physics from Minjiang University of China. Prior to starting her PhD, she worked in the Singapore University of Technology and Design and the Singapore-MIT Alliance for Research and Technology Centre.

Ate Poorthuis is an Assistant Professor of Big Data and Human-Environment Systems in the Department of Earth and Environmental Sciences at KU Leuven, Belgium. His research focuses on exploring the possibilities and limitations of big data, through quantitative analysis and visualization, to better understand how our cities work, where he is particularly interested in applying these academic insights within urban planning and policy.

References

Alexander

Jiang

Murga

, et al. (2015) Origin–destination trips by purpose and time of day inferred from mobile phone data. Transportation Research Part C: Emerging Technologies 58: 240–250. Big Data in Transportation and Traffic Engineering. DOI: 10.1016/j.trc.2015.02.018.

Altman

(2018) Beyond Closing the Gap: Valuing Diversity in Indigenous Australia. Canberra, ACT: Centre for Aboriginal Economic Policy Research (CAEPR).

Ayala-Azcárraga

Diaz

Zambrano

(2019) Characteristics of urban parks and their relation to user well-being. Landscape and Urban Planning 189: 27–35. DOI: 10.1016/j.landurbplan.2019.04.005.

Boyd

Crawford

(2012) Critical questions for big data: provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society 15(5): 662–679. DOI: 10.1080/1369118X.2012.678878.

Carmona

(2001) The Value of Urban Design: A Research Project Commissioned by CABE and DETR to Examine the Value Added by Good Urban Design. Thomas Telford.

Carmona

(2019) Principles for public space design, planning to do better. Urban Design International 24(1): 47–59. DOI: 10.1057/s41289-018-0070-3.

Chen

Hui

, et al. (2019) Identifying urban spatial structure and urban vibrancy in highly dense cities using georeferenced social media data. Habitat International 89: 102005. DOI: 10.1016/j.habitatint.2019.102005.

Chen

Poorthuis

(2021) Identifying Home Locations in Human Mobility Data: An Open-Source R Package for Comparison and Reproducibility.

Cho

Heng

Trivic

(2016) Re-framing Urban Space: Urban Design for Emerging Hybrid and High-Density Conditions. New York: Routledge, Taylor & Francis Group.

10.

Glaeser

Laibson

Sacerdote

(2002) An economic approach to social capital. The Economic Journal, 112(483): F437–F458.

11.

Cullen

(1961) The Concise Townscape. Van Nostrand Reinhold Company.

12.

Dong

Morales

Jahani

, et al. (2020) Segregated interactions in urban and online space. EPJ Data Science 9(1): 20. DOI: 10.1140/EPJDS/S13688-020-00238-7.

13.

Fuentes

Truffello

Flores

(2022) Impact of land use diversity on daytime social segregation patterns in Santiago de Chile. Buildings 20221212(2): 149.

14.

Greenberg

(1995) The Poetics of Cities: Designing Neighborhoods that Work. The Ohio State University Press.

15.

Horozov

Narasimhan

Vasudevan

(2006) Using Location for Personalized POI Recommendations in Mobile Environments. Citeseer, pp. 124–129.

16.

Hossain

Feizi

, et al. (2016) Inferring Fine-Grained Details on User Activities and Home Location from Social Media: Detecting Drinking-While-Tweeting Patterns in Communities.

17.

Hristova

Williams

Musolesi

, et al. (2016) Measuring urban social diversity using interconnected geo-social networks. In: The 25th International Conference, April, 2016

18.

Jacobs

Appleyard

(1987) Toward an urban design manifesto. Journal of the American Planning Association 53(1): 112–120. DOI: 10.1080/01944368708976642.

19.

Jacobs

(1961) Death and Life of Great American Cities.

20.

Jurdak

Zhao

Liu

, et al. (2015) Understanding human mobility from Twitter. PLOS ONE 10(7): e0131469. DOI: 10.1371/journal.pone.0131469.

21.

Kang

Fan

Jiao

(2021) Validating activity, time, and space diversity as essential components of urban vitality. Environment and Planning B: Urban Analytics and City Science 48(5): 1180–1197. DOI: 10.1177/2399808320919771.

22.

Kim

(2018) Seoul's Wi-Fi hotspots: Wi-Fi access points as an indicator of urban vitality. Computers, Environment and Urban Systems 72: 13–24. DOI: 10.1016/j.compenvurbsys.2018.06.004.

23.

Kleinhans

(2004) Social implications of housing diversification in urban renewal: a review of recent literature. Journal of Housing and the Built Environment 19(44): 367–390.

24.

Kong

Song

Xia

, et al. (2018) LoTAD: long-term traffic anomaly detection based on crowdsourced bus trajectory data. World Wide Web 21(3): 825–847. DOI: 10.1007/s11280-017-0487-4.

25.

Gao

, et al. (2019) Reconstruction of human movement trajectories from large-scale low-frequency mobile phone data. Computers, Environment and Urban Systems 77: 101346. DOI: 10.1016/j.compenvurbsys.2019.101346.

26.

Longley

Adnan

Lansley

(2015) The geotemporal demographics of Twitter usage Environment and Planning A: Economy and Space. SAGE Publications Ltd 47(2): 465–484. DOI: 10.1068/a130122p.

27.

Luo

Cao

Mulligan

, et al. (2016) Explore spatiotemporal and demographic characteristics of human mobility via Twitter: a case study of Chicago. Applied Geography 70: 11–25. DOI: 10.1016/j.apgeog.2016.03.001.

28.

Martí

Serrano-Estrada

Nolasco-Cirugeda

(2019) Social media data: challenges, opportunities and limitations in urban studies. Computers, Environment and Urban Systems 74: 161–174. DOI: 10.1016/j.compenvurbsys.2018.11.001.

29.

McGee

Caverlee

Cheng

(2013) Location prediction in social media based on tie strength. In: Proceedings of the 22Nd ACM International Conference on Information & Knowledge Management. New York, NY, USA, 2013.

30.

Mehta

(2014) Evaluating public space. Journal of Urban Design 19(1): 53–88. DOI: 10.1080/13574809.2013.854698.

31.

Mittal

Siriaraya

Lee

, et al. (2019) Accurate spatial mapping of social media data with physical locations. In: 2019 IEEE International Conference on Big Data. December, 2019

32.

Montgomery

(1998) Making a city: urbanity, vitality and urban design. Journal of Urban Design 3(1): 93–116. DOI: 10.1080/13574809808724418.

33.

Moro

Calacci

Dong

, et al. (2021) Mobility patterns are associated with experienced income segregation in large US cities. Nature Communications 12(11): 1–10.

34.

Mumford

(1961) The City in History: Its Origins, its Transformations, and its Prospects. Houghton Mifflin Harcourt.

35.

Naik

Kominers

Raskar

, et al. (2017) Computer vision uncovers predictors of physical urban change. Proceedings of the National Academy of Sciences of the United States of America 114(29): 7571–7576.

36.

Pendall

(2000) Local land use regulation and the chain of exclusion. Journal of the American Planning Association 66(2): 125–142. DOI: 10.1080/01944360008976094.

37.

Pendall

Hedman

(2016) Worlds Apart: Inequality between America's Most and Least Affluent Neighborhoods.

38.

Phithakkitnukoon

Horanont

Di Lorenzo

, et al. (2010) Activity-aware map: identifying human daily activity pattern using mobile phone data. In: Salah

Gevers

Sebe

, et al. (eds), Human Behavior Understanding. Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 14–25.

39.

Poorthuis

Zook

Shelton

, et al. (2014) Using Geotagged Digital Social Data in Geographic Research. Rochester, NY: Social Science Research Network. ID 2513938, SSRN Scholarly Paper, 23 October.

40.

Poorthuis

Shelton

Zook

(2021) Changing neighborhoods, shifting connections: mapping relational geographies of gentrification using social media data. Urban Geography 1(1): 960–983. DOI: 10.1080/02723638.2021.1888016.

41.

Prasetyo

Achananuparp

Lim

(2016) On analyzing geotagged tweets for location-based patterns. In: Proceedings of the 17th International Conference on Distributed Computing and Networking - ICDCN, Singapore, 2016.

42.

Psyllidis

Yang

Bozzon

(2018) Regionalization of social interactions and points-of-interest location prediction with geosocial data. IEEE Access 6: 34334–34353. DOI: 10.1109/ACCESS.2018.2850062.

43.

Putnam

(2000) Bowling Alone: The Collapse and Revival of American Community. New York: Simon & Schuster.

44.

Sampson

(2019) Neighborhood effects and beyond: explaining the paradoxes of inequality in the changing American metropolis. Urban Studies 56(1): 3–32. DOI: 10.1177/0042098018795363.

45.

Scepanovic

Joglekar

Law

, et al. (2021) Jane Jacobs in the sky: predicting urban vitality with open satellite data. Proceedings of the ACM on Human-Computer Interaction 5(CSCW1): 1–25.

46.

Shaw

Sui

(2020) Understanding the new human dynamics in smart spaces and places: toward a spatial framework. Annals of the American Association of Geographers, 110(2), 339–348. DOI: 10.1080/24694452.2019.1631145.

47.

Shelton

(2017) Spatialities of data: mapping social media 'beyond the geotag. GeoJournal 82(4): 721–734. DOI: 10.1007/s10708-016-9713-3.

48.

Shelton

Poorthuis

Zook

(2015) Social media and the city: rethinking urban socio-spatial inequality using user-generated geographic information. Landscape and Urban Planning 142: 198–211. DOI: 10.1016/j.landurbplan.2015.02.020.

49.

Smith

(1998) Discovering stable racial integration. Journal of Urban Affairs 20(1): 1–25. DOI: 10.1111/j.1467-9906.1998.tb00407.x.

50.

Song

Richards

Tan

(2020) Using social media user attributes to understand human–environment interactions at urban parks. Scientific Reports 10(1): 808. DOI: 10.1038/s41598-020-57864-4.

51.

Spinney

Millward

(2013) Investigating travel thresholds for sports and recreation activities. Environment and Planning B: Planning and Design 40(3): 474–488. DOI: 10.1068/b37161.

52.

Talen

(2005) Land use zoning and human diversity: exploring the connection. Journal of Urban Planning and Development 131(4): 214–232.

53.

Talen

(2012) Design for Diversity. Routledge.

54.

Talen

Lee

(2018) Mix design for Social Diversity, pp. 101–124.

55.

Trancik

(1986) Finding Lost Space: Theories of Urban Design. New York: J. Wiley.

56.

Varna

(2014) Measuring Public Space - the Start Model. UK: Routledge.

57.

Venturi

Izenour

Brown

(1977) Learning from Las Vegas - Revised Edition: The Forgotten Symbolism of Architectural Form. (revised edition. Cambridge: The MIT Press.

58.

Yang

Huang

, et al. (2019) Inferring demographics from human trajectories and geographical context. Computers, Environment and Urban Systems 77: 101368.

59.

Zheng

Chen

, et al. (2009) Mining individual life pattern based on location history. Tenth International Conference on Mobile Data Management: Systems. May 2009.

60.

Zhao

Shaw

Yin

, et al. (2019) The effect of temporal sampling intervals on typical human mobility indicators obtained from mobile phone location data. International Journal of Geographical Information Science, 33(7): 1471–1495. DOI: 10.1080/13658816.2019.1584805.

61.

Živković

(2019) Urban form and function. In: Leal Filho

Azeiteiro

Azul

, et al. (eds), Climate Action. Cham: Springer International Publishing, pp. 1–10. DOI: 10.1007/978-3-319-71063-1_78-1.

62.

Zook

(2017) Crowd-sourcing the smart city: using big geosocial media metrics in urban governance. Big Data & Society 4(1): 2053951717694384. DOI: 10.1177/2053951717694384.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.17 MB