Sage Journals: Discover world-class research

Abstract

Urban perception is fundamental to understanding the built environment and has been increasingly observed through social sensing, yet most studies overlook differences between population groups. This limitation becomes especially consequential in urban regeneration contexts, where tourists and residents often experience and represent space differently. This study proposes a group-sensitive multimodal framework to compare how tourists and residents express urban environments across text, image, and emotion. Using 14,300 geo-tagged posts and 78,632 images from Beijing, we quantify narrative divergence and analyze its relationship with built-environment (BE) factors. Results show a clear modality difference: visual divergence remains low (Mean JSD_image = 0.140), textual narratives differ greatly (Mean JSD_text = 0.435), and emotional divergence is moderate (Mean ED = 0.146). These differences are associated with BE factors including functional mix, spatial visibility, and amenity context. In urban regeneration contexts, these divergences help identify tensions and offer insights for planning strategies and design decisions. More broadly, the study reveals that conventional aggregated social sensing may produce a filter effect by amplifying dominant narratives while neglecting everyday experiences of different user groups. It highlights the need for differentiated user perspectives in multimodal social sensing to support more inclusive urban analysis and planning.

Keywords

social sensing urban perception tourist-resident divergence urban regeneration Beijing

Introduction

Urban perception, which means how people sense and understand the city, is an important dimension for studying the urban environment. To capture this dimension at scale, social sensing offers us an observational lens via the digital traces of human behavior (Liu et al., 2015). Traces such as mobility records, search behaviors, and social media content provide a bottom-up view of how urban space is experienced and represented (Shen et al., 2024). Among them, social media data have received a certain amount of attention for its real-time, multimodal, and user-generated expressions (Chen et al., 2023; Chuang et al., 2023). It reflects how people actively narrate and feel places, and becomes an important information source for research on urban perception (Chuang et al., 2022; Lang et al., 2022). Recent studies have used them to identify urban hotspots, assess place attachment, and model collective perceptions across spatial contexts (Fan and Zhang, 2023; Molinillo et al., 2019).

Despite these advances, most social sensing research continues to regard urban populations as homogeneous. Recent work has started to address this limitation by distinguishing between population groups in urban analysis (Stylidis et al., 2014; Yang and Liu, 2022), highlighting that urban perception is not the same. Among these differentiated views, a particularly salient contrast lies between tourists and residents. They travel the city at different paces, yet often share the same historic and mixed-use spaces (Fan and Zhang, 2023). This issue becomes especially significant in urban regeneration contexts, where tourism-oriented development is frequently used to enhance cultural visibility and economic vitality (Chen and Chen, 2025; Li, 2025). Such processes can blur the boundary between everyday life and visitor-oriented consumption, making tensions in spatial use, perception, and representation more visible (Zhao et al., 2025). However, these differences in perception have rarely been examined in a systematic and multimodal way, particularly concerning urban regeneration and underlying spatial mechanisms.

To fill this gap, we explore tourist–resident divergence in multimodal social media expressions of the city. We focus on divergences in terms of text, image, and emotion, which reveal underlying spatial tensions in the context of urban regeneration. The central research question is: How can we use multimodal social media expressions to identify narrative divergence, and how can the findings be translated into urban regeneration strategies?

The remainder of this paper is structured as follows. First, we build an analytical framework that integrates textual, visual, and emotional expressions from social media to capture differentiated perceptions of the city. Using geo-tagged posts and images, we then quantify narrative divergence across three modalities and examine its spatial distribution. Next, we analyze how these divergences relate to BE factors. Finally, we discuss how multimodal divergence between tourists and residents can inform urban regeneration strategies and outline future research directions.

Literature review

Social sensing for spatial analysis

Compared with traditional methods such as survey or mapping (Chen et al., 2022; Huang et al., 2021), social sensing serves as a real-time and multimodal approach for understanding urban perception. Social media platforms such as Instagram, Weibo, and Xiaohongshu have large amounts of texts and images that reflect how users interpret urban environments (Fan and Zhang, 2023; Su et al., 2023; Zhang et al., 2020). Previous studies have used text mining, image recognition, and topic modeling to extract different dimensions of city image (Bai et al., 2024; Zhan et al., 2024).

Among social sensing sources, visual platforms are particularly important. Boy and Uitermark (Boy and Uitermark, 2017) showed that Instagram content shapes the stories people tell about cities, which not only highlights famous attractions but also reveals the unknown spaces. Other works found tourist attention on social media tends to cluster at specific spots, which reinforces digital inequality (Indaco and Manovich, 2016; Paul i Agustí, 2021). In China, Xiaohongshu has played a significant role in understanding visual perception (Bai et al., 2024; Jin et al., 2025), and Weibo is used to track public sentiment as it happens (Zhu et al., 2024a, 2024b). Most recent studies also attempt to combine text and image for deeper analysis (Huang et al., 2024; Kang et al., 2021). Wang and Hou et al. integrated street view photos and social media stories (Hou et al., 2024; Wang, 2024), which offers a fuller picture of user perception. Other researchers looked at how these public narratives change over time as well, which has been especially relevant during major disruptions like the COVID-19 pandemic (Kim and Yang, 2026; Song et al., 2025).

Overall, existing social sensing research has shown strong potential for capturing urban perception at scale. However, much of this work still relies on aggregated representations of users, paying insufficient attention to how different population groups may perceive and narrate the same urban space in different ways.

Divergent expression on social media

When using social media data to study how different groups perceive cities, research conclusions largely depend on how locals and non-locals are distinguished in the data. Chuang et al. linked geo-tagged social media posts to users’ home areas and classified urban spaces based on how many visitors they attract and how diverse those visitors are. Their results show that separating regular users from short-term visitors can reveal differences that are hidden when all users are analyzed together (Chuang et al., 2023). Other studies move beyond simple residence-based labels and use mobile phone location data to capture non-residential presence and how exposure to urban space changes over time (Hanaoka, 2018). More recent work has also treated specific groups, such as night-shift workers, as a distinct urban population, and mapped where they work, move, and spend time using spatial classification methods (Mavrogeni and Cheshire, 2025).

Previous research has confirmed that tourists and residents experience urban spaces differently. Residents pay more attention to daily life and local culture, while tourists are usually drawn to famous landmarks (Aranburu et al., 2016; Stylidis et al., 2017). This divergence has been confirmed via both traditional and digital methods (Molinillo et al., 2019; Vu et al., 2021). In recent years, social media data have made it easier to examine these differences in detail (Derdouri and Osaragi, 2021; Yubero et al., 2021). Peng et al. analyzed Weibo posts in Beijing and found tourists mostly talk about well-known landmarks, whereas residents mention a wider range of everyday places (Peng et al., 2020). Cross-cultural research further demonstrates how cultural background shapes digital representation. For example, research in Tokyo found that tourists from Western and Asian countries pay attention to different locations and discuss different topics (Kang et al., 2021).

Recent work has started to unpack the implications of tourist–resident divergences. Research showed that they have different consumption habits (Cerdan Schwitzguebel and Romero Bartomeus, 2019) and visually attractive places tend to receive much more attention from tourists than from residents (Fan and Zhang, 2023). However, few works explore spaces where resident and tourist experiences overlap or conflict (Gomez et al., 2018; Luo et al., 2025). This problem is getting bigger with the growth of local-experience tourism where tourists want to join everyday urban life. Instead of visiting famous landmarks, they often spend time in ordinary places such as alleyways, street markets, or neighborhood parks (Zheng et al., 2024). For instance, hybrid urban areas such as Beijing hutong or Shanghai lilong act as both tourist spots and local neighborhoods. In these spaces, different lifestyles constantly encounter each other. What people share on social media highlights the gap or the connection between daily life and curated experience (Kontokosta et al., 2024). However, existing research still provides limited understanding of how such overlap spaces generate different textual, visual, and emotional narratives, and how these divergences relate to the spatial conditions of urban regeneration.

Urban regeneration of hybrid spaces

Historic and mixed-use neighborhoods are hybrid spaces where residents and tourists coexist (Zhao et al., 2025). Previous research has indicated that they frequently show different narratives to the same physical space. For tourists, hutongs may be seen as cultural attractions or visually attractive destinations, while residents often regard them as crowded and losing characteristics of the neighborhood (Yu, 2025). Some research also shows the strong influence of tourism on local businesses, spatial conflicts, and meanings of place (Sebrek et al., 2025; Wang and Li, 2024). In response to these problems, recent research has called for more inclusive and context-aware governance approaches. These approaches combine public participation, small-scale spatial adjustments, and data-based tools to better connect planning decisions with people’s day-to-day lives (Li et al., 2024; Lin et al., 2025; Omidipoor et al., 2019). In China, this has supported a move toward community micro-regeneration strategies in response to specific user needs (Li, 2025; Zhao et al., 2025). Case studies of neighborhood regeneration show that tools such as interface design and flexible management by time help to reduce conflicts (Xiong et al., 2022).

However, what remains underexplored is tourist–resident relations in hybrid spaces. While many studies point out differences between the two groups, few examine how their everyday activities and expressions interact and shape each other (Liu et al., 2019). In historic areas such as Nanluoguxiang and Shichahai in Beijing, daily encounters between the two groups strongly influence the use and understanding of space (Shi and Bian, 2016). Research on the quality of interaction suggests that regeneration results cannot be explained independently without these mutual perceptions (Ye et al., 2021; Zhou et al., 2021). This indicates that attention should be paid not only to difference itself, but also to how multimodal tourist–resident divergence emerges within shared urban settings. Such an understanding is essential for identifying the sources of tension in hybrid spaces and for developing regeneration strategies that respond to the differentiated experiences of the same urban environment.

Methodology

Analytical framework

This study constructs a two-stage analytical framework (Figure 1). First, we assess how tourists and residents express urban space differently across modalities. We use Jensen–Shannon Divergence (JSD) for text and image themes, and emotion difference (ED) for sentiment variation. Second, we examine how divergence patterns relate to built environment (BE) factors. Furthermore, we consider how such findings can be translated into urban regeneration strategies that are sensitive to group-specific perceptions.

Figure 1.

Analytical framework.

These two stages correspond to the following hypotheses:

H1 (narrative divergence): Tourists and residents exhibit systematic divergence in how they represent urban space, reflected in textual, visual, and emotional modalities.

H2 (BE factors): Narrative divergence is associated with BE factors such as urban function structure, amenities, spatial visibility, and accessibility.

Study area and data pre-processing

This study focuses on the area within the Fifth Ring Road of Beijing, which includes a large number of historical and cultural landmarks, commercial zones, and neighborhoods. Our dataset consists of social media data and urban spatial data. The social media dataset is within 2025 summer (June to September), since summer time period features a substantial temporal and spatial overlap between tourist activities and residents’ daily routines, making it easier to examine their interactions and behavioral differences. Social media data includes user ID, check-in location, posting time, and content (text and image). A total of 18,265 geo-tagged posts were initially collected from Weibo and Xiaohongshu. To ensure data reliability, we removed posts generated by automated or suspicious accounts and got a dataset of 14,300 posts. Since Xiaohongshu lacks explicit location tagging, we made a reference list based on Weibo check-in points, and obtained location information from text through searching and matching.

To classify tourists and residents, we first constructed a manually labeled dataset of 500 posts, based on explicit annotation rules and multi-stage human review. We then fine-tuned a BERT-based model and applied it to the full dataset. The model achieved an accuracy of 84%, which is further validated through comparison with other LLMs such as QWEN2.5 and GPT-5. To reduce the potential risk of circular reasoning and misclassification caused by performative expression, we further conducted robustness checks under alternative confidence thresholds and introduced an auxiliary validation based on user-level Beijing posting patterns. Detailed annotation rules, model performance, and validation results are reported in Supplementary Material A. The final dataset has 7,400 tourist and 6,900 resident check-ins.

Urban spatial dataset includes AOI layers, POI layers, as well as transportation infrastructure such as subway stations and road networks. To ensure consistency with our social media dataset, we removed AOIs for purely functional or low-perceptual environments such as schools, hospitals, and administrative facilities. The remaining AOIs represent urban spaces that are more likely to appear in users’ posts, including places for leisure, consumption, cultural activities, and everyday social interaction.

Data analysis

Identifying hybrid AOIs

We spatially joined all posts with Beijing’s AOI polygons in ArcGIS Pro. AOIs with fewer than 50 posts were excluded for data reliability. For each remaining AOI, we calculated the tourist–resident ratio to represent the presence of each group:

tr_ratio = \frac{N_{tourist}}{N_{resident}}

where N_tourist and N_resident are the number of posts identified as originating from tourists and residents respectively. Extreme values (top and bottom 10%) were removed.

We defined hybrid AOIs as those with a tr ratio between 0.3 and 3.0. This threshold was determined by analyzing the frequency distribution of the tr ratio, excluding extreme cases where one group dominates. This range was chosen so that the minority group accounts for at least approximately 25% of total, guaranteeing sufficient visibility for potential tourist–resident interaction.

Measuring narrative divergence

We extracted textual, visual, and emotional features from the multimodal data. Textual theme extraction was conducted via ChatGPT-5 to classify thematic keywords (Ma et al., 2026). Based on structured prompt, top three ranked keywords were extracted to summarize the dominant meaning of each post. Following a pre-scan, 36 high-frequency keywords were identified, and the highest-ranked keyword for each post was further aggregated into six broader theme categories according to existing frameworks: Art & Culture, Citywalk & Experience, Food & Lifestyle, History & Heritage, Nature & Leisure, Social & Participation (Table S2). The extraction quality was manually reviewed on a random subsample of 100 posts, and the keyword-to-theme mapping was independently checked by two researchers to improve coding consistency (see Supplementary Material B).

Imagery theme extraction was conducted via Places365 scene recognition model (Zhu et al., 2026). We mapped the outputs onto six visual thematic categories that aligned with the textual framework (Table S3). However, the two modalities operate at different ontological levels: textual themes reflect behavioral and affective intent, whereas Places365 captures physical scene types. The image-to-theme mapping should therefore be understood as an interpretive aggregation, and cross-modal JSD comparisons are used here only to indicate relative divergence patterns rather than strict semantic equivalence.

As for sentiment classification, emotional orientation (positive, neutral, negative) was determined using ChatGPT-5 through sentiment analysis. To ensure validity, a manual audit on a random subsample of 200 posts reached 92% accuracy, confirming the reliability of sentiment classification by LLMs.

Based on these extracted features, we analyzed narrative divergence at AOI level using two metrics: JSD and ED. The JSD value was calculated to quantify differences in thematic distributions of text and image between the two groups using the following formula:

J S D (P ‖ Q) = \frac{1}{2} (D_{K L} (P ‖ M) + D_{K L} (Q ‖ M))

where P and Q are the probability distributions of tourists and residents for different themes. M is the average distribution of P and Q. D_KL is the Kullback–Leibler divergence.

Emotional features were coded numerically (1, 0, −1), and ED was calculated as the difference in mean scores:

E D = M e a n_{t o u r i s t s} - M e a n_{r e s i d e n t s}

Together, these metrics provide a multimodal quantification of narrative divergence that supports Hypothesis 1.

Modeling BE factors of divergence

OLS regression was employed to assess the relationships between BE factors and three forms of narrative divergence (JSD_text, JSD_image, and ED). Independent variables were organized into four BE dimensions including urban function structure, amenities, spatial visibility, and accessibility. Control variables include platform activity, the densities of residential and life-service POIs (all variables are listed in Table S4).

Results and findings

Distribution of hybrid AOIs

Hybrid AOIs are concentrated in specific functional environments. As visualized in Figure 2, a total of 79 hybrid AOIs are identified. Most of them cluster in Beijing’s inner city, particularly within the Third Ring Road. In terms of function, commercial areas appear the most, followed by tourist scenic spots, residential areas, and urban parks. This spatial distribution sets the context for our following analysis.

Figure 2.

Distribution of hybrid AOIs in Beijing. The bar heights represent tr_ratio. Selected AOIs are color-coded according to their original five primary categories.

Structured narrative divergence across text, image, and emotion

Within hybrid AOIs, there is a structured narrative divergence across modalities. While visual divergence remains relatively low, textual narratives differ greatly, revealing different ways of perceiving the city. Textual narratives show the highest level of divergence (Mean JSD_text = 0.435). Results indicate that tourists see city as aesthetic and performative, focusing more on symbolic themes such as Citywalk & Experience and Art & Culture. In contrast, residents emphasize everyday and socially embedded activities such as Food & Lifestyle and Social & Participation. Visual divergence is the lowest of all the metrics (Mean JSD_image = 0.140). Although tourists post slightly more History & Heritage and residents post more Social & Participation, overall visual themes show greater overlap between groups. This suggests that the visual dominance of the physical scene may, in some cases, exceed the functional differences captured in text. However, the relatively low visual divergence should be interpreted cautiously, as it may partly reflect the scene-based ontology of Places365 and its lower sensitivity to experiential differences compared with text-based classification. The overall emotional divergence is moderate (Mean ED = 0.146), and 64% of the posts are positive. Residents show 32% higher proportion of negative emotions than tourists, as they express fatigue and spatial disruption caused by over-tourism more often.

In addition, divergences vary between functions (Table S5). For instance, residential areas show the highest JSD_text and JSD_image, as tourists view these neighborhoods as cultural scenery while residents view them as functional places. Parks show the highest ED, as residents tend to express more positive emotions including joy and relaxation, while tourists sometimes express disappointment or frustration. Figure 3 provides an integrated visualization of these contrasts, visualizing how textual, visual, and emotional layers interact. These findings point to underlying BE factors, which we formally examine in the next part.

Figure 3.

Multimodal narrative patterns and divergence across hybrid AOIs. The figure includes tourist, resident, and divergence layer. Representative AOIs are annotated with thematic or emotional icons, with radar graphs illustrating differences between groups.

BE factors related to narrative divergence

The regression results (Table 1) reveal that BE factors influence narrative divergence through urban functional structure, spatial visibility, and specific amenities.

Table 1.

Regression results.

Variables	JSD_text	JSD_image	ED
Urban function structure
function mix	0.087 ** (0.051)	−0.060 (0.042)	0.145 ** (0.085)
function density	−0.001 ** (0.000)	0.001 *** (0.000)	0.000 (0.000)
Amenities
shopping-mall-N	0.019 ** (0.007)	0.006 (0.006)	0.005 (0.012)
park N	0.006 (0.008)	0.019 ** (0.008)	−0.018 (0.014)
catering N	−0.001 (0.001)	0.000 (0.001)	0.002 (0.002)
cafe N	−0.006 (0.003)	−0.003 (0.003)	−0.005 (0.006)
Spatial visibility
tourist_spot_N	−0.001 ** (0.000)	−0.001 (0.001)	0.002 *** (0.001)
Accessibility
subway N	0.014 (0.019)	0.028 (0.017)	0.016 (0.032)
junction density	0.000 (0.001)	0.000 (0.001)	−0.002 (0.002)
Control variables
Rednote_N	0.001 (0.004)	0.000 (0.003)	0.011 ** (0.006)
residential N	0.000 (0.000)	0.000 (0.000)	0.001 (0.001)
life service N	0.000 (0.001)	0.000 (0.000)	−0.001 (0.001)
Constant	0.507 *** (0.064)	0.131 ** (0.058)	−0.210 * (0.106)
F-statistic	2.289	3.142	2.396
R-squared	0.314	0.386	0.324
Adj. R-squared	0.177	0.263	0.189
N	79	79	79

p < 0.1, **p < 0.05, ***p < 0.01.

In terms of urban function structure, functional mix increases both textual and emotional divergence, raising JSD_text (b = 0.087, p < 0.05) and ED (b = 0.145, p < 0.05). This may be because mixed-use areas bring together tourist-oriented activities and residential services, producing competing themes and emotional responses. Influence of functional density varies across modalities, decreasing JSD_text (b = −0.001, p < 0.05) while increasing JSD_image (b = 0.001, p < 0.01). This is because functional density reflects the number of functions rather than their diversity. Similar functions lead to similar textual themes, but the complexity of urban environment still leads to visual divergences.

Spatial visibility, measured by the density of tourist spots, aligns topic themes but differentiates emotions. It decreases JSD_text (b = −0.001, p < 0.05), as tourists and residents describe well-known spaces and landmarks with similar high-frequency themes. However, ED increases with spatial visibility (b = 0.002, p < 0.01), where tourists express excitement, while residents associate these spaces with congestion and noise.

Finally, specific amenities and platform mechanisms shape divergence in modality-specific ways. Shopping malls increase JSD_text (b = 0.019, p < 0.05). Tourists tend to use keywords related to pop-up exhibitions or trendy check-in spots. Residents, however, discuss practical details like parking availability, supermarket discounts, or family dining. Number of parks increases JSD_image (b = 0.019, p < 0.05). Tourists focus chiefly on diverse scenery, while residents mostly take casual photos of social activities like square dancing or jogging paths. Platform activity (Rednote_N) further intensifies ED (b = 0.011, p < 0.05). This suggests that the platform’s algorithm rewards intense feelings. It pushes users to express extreme excitement or strong complaints to gain more visibility.

Discussions

Re-image cities in social media age

Classic urban image theories have long examined how people perceive and interpret the city through cognitive mapping, everyday routines, and symbolic meaning. Lynch’s framework of the “Image of the city” (Lynch, 1960) remains central in this tradition, where he described how residents form mental maps structured by paths, edges, districts, nodes, and landmarks, and how these images are built through long-term familiarity. However, this view reflects a context in which spatial perception was primarily internal and residents were the main interpreters of urban form. In today’s digital environment, spatial experience is increasingly expressed through posts, images, and emotions (Huang et al., 2021), creating a public and continuously accumulating layer of representation (Zhu et al., 2026). Tourists and residents also participate jointly in this process, and their different expectations (Zheng et al., 2024) and uses of space produce multiple narratives. The contemporary image of the city is therefore plural and multimodal. Figure 4 summarizes this conceptual shift by comparing Lynch’s single-layer mental map with the multi-layer narrative structure observed in tourist–resident social media expressions. Our findings confirm that Lynch’s elements continue to shape spatial discourse, but they now operate through differentiated group dynamics rather than a unified public perception. Landmarks act as shared anchors. People talk about the same famous sites as visibility theory suggests (Boy and Uitermark, 2017). But these places also generate strong emotional conflicts, showing that shared topics and themes do not necessarily imply shared experiences. Besides, functional nodes and mixed-use districts reveal divergent perception rather than coherent district images. For instance, functional mix amplifies textual divergence due to different requirements of tourists and residents, and nodes like parks and shopping malls increase textual and visual divergence.

Figure 4.

From a single-layer mental map to multi-layer digital expression of space.

Taken together, these patterns indicate that although spatial elements continue to shape how we see and use the city, they do not exclusively determine urban image (Filomena et al., 2019). The online representations are also influenced by group-specific uses and expressive modalities. Analyzing social media data without distinguishing between groups may create representational bias, and lead to a possibly underlying filter effect that can distort how planners understand urban perception. This suggests that social sensing should not be treated as a neutral aggregate reflection of urban perception, but as a layered field of representation shaped by different user groups and modalities of expression.

Urban regeneration strategy through narrative divergence

Urban regeneration efforts in historic districts frequently expose deep narrative tensions between tourists and residents (Li, 2025). These areas include spaces of mixed functions and multi-subject negotiation (Sebrek et al., 2025), where everyday routines and tourist expectations intersect (Yu, 2025). However, existing regeneration practices face the challenge of identifying where specific tensions appear (Zhao et al., 2025).

Our findings demonstrate that multimodal divergences are not random but follow a structured pattern aligned with specific BE factors, thus allowing us to translate social sensing data into a framework for informing urban regeneration strategies (Figure 5). Multimodal divergence can serve as a diagnostic tool for identifying different forms of tourist–resident tension and linking them to particular spatial elements. In this way, the regeneration of historic districts can be transformed into a process that responds to both physical conditions and the differentiated perceptions of tourists and residents.

Figure 5.

Urban regeneration of historic areas through social media. Using three representative AOIs in Beijing, the figure illustrates how different narrative divergences inform urban regeneration strategies for mixed-use nodes, iconic landmarks, and symbolic districts.

Mixed-use nodes often show functional tension with sharp textual contrasts, and managing this is therefore a priority. Regeneration strategies should avoid uniform touristification that displaces local functions (Chen and Chen, 2025). Instead, planners need to focus more on spatial interface management, such as creating buffer zones and separating visitor flows from resident paths (Li, 2025). In iconic landmarks, visual divergence is the primary source of conflict, making it crucial to balance representational tension in these highly visible areas. Tourists tend to take standardized photos of iconic scenes due to platform trends. However, residents’ perspectives on social activities or neighborhood life are rarely promoted by platform algorithms (Zhu et al., 2024a, 2024b). To correct this, planners can set up different sightseeing routes, install signs that share resident stories, and create new viewing points to encourage the change of photo-taking positions (Li et al., 2024). Symbolic districts face mainly experiential tension, as their large scale and high tourist density lead to emotional overload for residents. To solve this problem, urban regeneration strategies should reduce emotional pressure while maintaining urban vitality (Ye et al., 2021). Planners could establish quiet zones adjacent to residential areas (Yu, 2025), regulate opening hours of bars and entertainment spaces (Sebrek et al., 2025), and install buffers such as pocket parks to reduce noise (Shi and Bian, 2016).

Contributions, limitations, and future directions

This study moves beyond aggregate social sensing to examine the narrative divergence of tourists and residents. By proposing a group-sensitive multimodal framework, it shows that city image is a layered perception of space in which textual, visual, and emotional dimensions diverge across user groups. Theoretically, the study revisits classic city image theory in the social media age, showing how Lynch’s spatial elements continue to structure urban discourse while giving rise to multi-layered digital expressions. Methodologically, it shows that analyzing tourists and residents together can hide important differences between groups and lead to a biased understanding of urban perception. Practically, it illustrates the implications of multimodal tourist–resident divergence for urban regeneration, showing how group-differentiated perceptions can inform more context-sensitive planning and governance in hybrid urban environments.

However, several limitations still exist. First, social media data contain inherent representational biases shaped by user demographics and platform algorithms (Boy and Uitermark, 2017). Older people who rarely use social media are excluded from digital platforms, and their voices remain unheard as well. Future work could combine social media data with qualitative methods like interviews to obtain a more holistic understanding of diverse voices. Second, although our fine-tuned BERT model performs well in classifying users, it relies on semantic features in user posts which can be performative sometimes (Cheng et al., 2025). Residents may intentionally craft their posts in a way that resembles tourist content to gain visibility, potentially leading to misclassification. For this reason, the tourist–resident distinction in this study should be understood as a social-sensing-based grouping supported by both semantic and behavioral evidence. Future research should further strengthen this distinction by incorporating longer-term posting trajectories or other independent mobility-related evidence where available. Moreover, our data were collected during the summer and focus on Beijing’s central districts, meaning that seasonal variations and other urban contexts are not yet considered. Accordingly, the findings are most applicable to large metropolitan contexts characterized by high tourism intensity, strong functional mix, and active social-media-based place expression, where tourists and residents are spatially intertwined in historic and mixed-use areas. Their applicability to smaller cities, places with weaker tourism economies, or contexts with less tourist–resident overlap should therefore be interpreted more cautiously. Future research could extend the analysis across both time and space, since longitudinal studies across seasons and years could reveal dynamic evolutions in tourist–resident relationships, and comparative studies across different cities would validate the stability of the identified spatial patterns.

Conclusion

This study develops a group-sensitive multimodal framework to examine how tourists and residents express urban space on social media. The results show that urban narratives diverge across text, image, and emotion, and that these differences are systematically related to BE factors. More importantly, the study indicates that conventional aggregated social sensing may hide important differences between groups and lead to a biased understanding of urban perception. By revealing these hidden divergences, the study highlights the importance of differentiated user perspectives for understanding hybrid urban environments and informing more context-sensitive urban regeneration practice.

Supplemental Material

sj-docx-1-tus-10.1177_27541231261445467 – Supplemental material for Tourist-resident divergence in multimodal social sensing and its implications for urban regeneration

Supplemental material, sj-docx-1-tus-10.1177_27541231261445467 for Tourist-resident divergence in multimodal social sensing and its implications for urban regeneration by Yating Wang, Shiqi Dai, Jianshi Li and Yuan Lai in Transactions in Urban Data, Science, and Technology

Footnotes

Acknowledgements

This research is sponsored by the National Natural Science Foundation of China (#72274101).

ORCID iDs

Yating Wang

Shiqi Dai

Jianshi Li

Yuan Lai

Ethical considerations

No ethics approval was required for this research as it relies solely on secondary data.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the National Natural Science Foundation of China (#72274101).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability statement

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Supplemental material

Supplemental material for this article is available online.

Author biographies

Yating Wang is a Masters student in the School of Architecture, Tsinghua University. Her research focuses on urban data science, social sensing, and urban planning.

Shiqi Dai is a Masters student in the School of Architecture, Tsinghua University. Her research focuses on urban informatics, urban systems, and spatial intelligence.

Jianshi Li is a Masters student in the Department of Urban Planning and Design, Harvard Graduate School of Design. His research focuses on housing, urban regeneration, and community development.

Yuan Lai, PhD, is Associate Professor and Special Research Fellow in the School of Architecture, Tsinghua University. His research focuses on urban informatics, applied urban data science, and urban systems.

References

Aranburu

Plaza

Esteban

(2016) Tourism destination competitiveness from demand and supply sides: What do we learn from experts and consumers? Tourism Management Perspectives 20: 131–140. https://doi.org/10.1016/j.tmp.2016.09.005

Bai

Jiao

Zhang

, et al. (2024) Public perception of city image hotspots based on social media: A case study of Nanjing, China. IEEE Access. Epub ahead of print 19 August 2024. https://doi.org/10.1109/ACCESS.2024.3445995

Boy

Uitermark

(2017) Reassembling the city through Instagram. Transactions of the Institute of British Geographers 42(4): 612–624. https://doi.org/10.1111/tran.12185

Cerdan Schwitzguebel

Romero Bartomeus

(2019) Location-based social network data for exploring spatial and functional urban tourists and residents consumption patterns. Ara: Revista De Investigacin En Turismo 8(2): 32–52. https://doi.org/10.1344/ara.v8i2.27103

Chen

(2025) Community micro-renewal tourism experience and tourists’ behavioral intention: A mixed-methods study based on the creative micro-renewal space of Huangjueping Community, China. Tourism Planning & Development Epub ahead of print 25 September 2025. https://doi.org/10.1080/21568316.2025.2558675

Chen

Han

, et al. (2022) Urban tourism destination image perception based on LDA integrating social network and emotion analysis: The example of Wuhan. Sustainability 14(1): 12. https://doi.org/10.3390/su14010012

Chen

Niu

Silva

(2023) The road to recovery: Sensing public opinion towards reopening measures with social media data in post-lockdown cities. Cities 132: 104054. https://doi.org/10.1016/j.cities.2022.104054

Cheng

Jian

, et al. (2025) Spatial concentration of intra-urban tourist activities and inter-group differences between Asian, European and North American travelers in Korean Cities. Tourism Management 107: 105064. https://doi.org/10.1016/j.tourman.2024.105064

Chuang

Benita

Tunçer

(2022) Effects of urban park spatial characteristics on visitor density and diversity: A geolocated social media approach. Landscape and Urban Planning 226: 104514. https://doi.org/10.1016/j.landurbplan.2022.104514

10.

Chuang

Chen

Poorthuis

(2023) Categorizing urban space based on visitor density and diversity: A view through social media data. Environment and Planning B: Urban Analytics and City Science 50(6): 1471–1485. https://doi.org/10.1177/23998083221139848

11.

Derdouri

Osaragi

(2021) Exploring the differences between tourists and locals in urban settings through multi-labeled geotagged photos: The case of Tokyo. Proceedings of the ICA 4: 1–8. https://doi.org/10.5194/ica-proc-4-26-2021

12.

Fan

Zhang

(2023) Study on the hotspots of urban tourism spaces based on Instagram-worthy locations data: Taking Beijing as an example. Environment and Planning B: Urban Analytics and City Science 50(7): 1822–1837. https://doi.org/10.1177/23998083221146542

13.

Filomena

Verstegen

Manley

(2019) A computational approach to ‘The Image of the City’. Cities 89: 14–25. https://doi.org/10.1016/j.cities.2019.01.006

14.

Gomez

Gibert

, et al. (2018) Learning from Barcelona Instagram data what locals and tourists post about its Neighbourhoods. arXiv>cs>aXiv:1808.06369. https://doi.org/10.48550/arXiv.1808.06369

15.

Hanaoka

(2018) New insights on relationships between street crimes and ambient population: Use of hourly population data estimated from mobile phone users’ locations. Environment and Planning B: Urban Analytics and City Science 45(2): 295–311. https://doi.org/10.1177/0265813516672454

16.

Hou

Quintana

Khomiakov

, et al. (2024) Global streetscapes—a comprehensive dataset of 10 million street-level images across 688 cities for urban science and analytics. ISPRS Journal of Photogrammetry and Remote Sensing 215: 216–238. https://doi.org/10.1016/j.isprsjprs.2024.06.023

17.

Huang

Obracht-Prondzynska

Kamrowska-Zaluska

, et al. (2021) The image of the City on social media: A comparative study using “Big Data” and “Small Data” methods in the Tri-City Region in Poland. Landscape and Urban Planning 206:103977. https://doi.org/10.1016/j.landurbplan.2020.103977

18.

Huang

Wang

Cong

(2024) Zero-shot urban function inference with street view images through prompting a pretrained vision-language model. International Journal of Geographical Information Science 38(7): 1414–1442. https://doi.org/10.1080/13658816.2024.2347322

19.

Indaco

Manovich

(2016) Urban social media inequality: Definition, measurements, and application (version 2). arXiv. https://doi.org/10.48550/ARXIV.1607.01845

20.

Jin

Zhang

Yang

, et al. (2025) Understanding the internet-famous tourist city: Interaction within digital technology in an accelerated society. Cities 167: 106294. https://doi.org/10.1016/j.cities.2025.106294

21.

Kang

Cho

Yoon

, et al. (2021) Transfer learning of a deep learning model for exploring tourists’ urban image using geotagged photos. ISPRS International Journal of Geo-Information 10(3): 137. https://doi.org/10.3390/ijgi10030137

22.

Kim

Yang

(2026) GeoAi-driven analysis of urban activity shifts using geotagged social media data across the COVID-19 pandemic. Environment and Planning B: Urban Analytics and City Science. Epub ahead of print 30 December 2025. https://doi.org/10.1177/23998083251413306

23.

Kontokosta

Freeman

Lai

(2021) Up-and-coming or down-and-out? Social media popularity as an indicator of neighborhood change. Journal of Planning Education and Research 44(2): 662–673. https://doi.org/10.1177/0739456x21998445

24.

Lang

Hui

, et al. (2022) Measuring urban vibrancy of neighborhood performance using social media data in Oslo, Norway. Cities 131: 103908. https://doi.org/10.1016/j.cities.2022.103908

25.

(2025) Urban renewal of historic districts: The renovation of Lihuangpi Road neighborhood in Wuhan. Journal of Urban Management 15(1): 375–390. https://doi.org/10.1016/j.jum.2025.07.008

26.

Jiang

(2024) State-led versus market-led: How institutional arrangements impact collaborative governance in participatory urban regeneration in China. Habitat International 150: 103134. https://doi.org/10.1016/j.habitatint.2024.103134

27.

Lin

Liu

Yuan

, et al. (2025) Harmonizing stakeholder interests in urban renewal: A novel planning approach using explainable machine learning and spatial optimization. Land Use Policy 155: 107588. https://doi.org/10.1016/j.landusepol.2025.107588

28.

Liu

Gao

, et al. (2015) Social sensing: A new approach to understanding our socioeconomic environments. Annals of the Association of American Geographers 105(3): 512–530. https://doi.org/10.1080/00045608.2015.1018773

29.

Liu

Wang

(2019) Isolated or integrated? Planning and management of urban renewal for historic areas in Old Beijing city, based on the association network system. Habitat International 93: 102049. https://doi.org/10.1016/j.habitatint.2019.102049

30.

Luo

Wang

, et al. (2025) Mapping the resilience of tourist city: Spatial correlation and elasticity of tourism resource value. Humanities and Social Sciences Communications 12(1): 1–18. https://doi.org/10.1057/s41599-025-05567-4

31.

Lynch

(1960) The Image of the City. MIT Press.

32.

Enomoto

, et al. (2026) Revealing the amenity-perception connection: Integrating social sensing with generative AI. Environment and Planning B: Urban Analytics and City Science 3(1): 32–48. https://doi.org/10.1177/23998083251348746

33.

Mavrogeni

Cheshire

(2025) Creating the London night-worker geodemographic classification. Environment and Planning B: Urban Analytics and City Science. Epub ahead of print 26 December 2025. https://doi/org/10.1177/23998083251410392

34.

Molinillo

Anaya-Snchez

Morrison

, et al. (2019) Smart city communication via social media: Analysing residents’ and visitors’ engagement. Cities 94: 247–255. https://doi.org/10.1016/j.cities.2019.06.003

35.

Omidipoor

Jelokhani-Niaraki

Moeinmehr

, et al. (2019) A GIS-based decision support system for facilitating participatory urban renewal process. Land Use Policy 88: 104150. https://doi.org/10.1016/j.landusepol.2019.104150

36.

Paul i Agustí

(2021) The clustering of city images on Instagram: A comparison between projected and perceived images. Journal of Destination Marketing & Management 20: 100608. https://doi.org/10.1016/j.jdmm.2021.100608

37.

Peng

Bao

Huang

(2020) Perceiving Beijing’s “city image” across different groups based on geotagged social media data. IEEE Access 8: 93868–93881. https://doi.org/10.1109/ACCESS.2020.2995066

38.

Sebrek

Fotiadis

Michalk

, et al. (2025) From conflict to coexistence: A tourism- oriented ecosystem framework for urban nightlife management. Cities 169: 106604. https://doi.org/10.1016/j.cities.2025.106604

39.

Shen

Ding

Kong

, et al. (2024) From physical space to cyberspace: Recessive gender biases in social media mirror the real world. Cities 152: 105149. https://doi.org/10.1016/j.cities.2024.105149

40.

Shi

Bian

(2016) Regeneration of historic area with social orientation: Case study on Nanluoguxiang area in Beijing. International Review for Spatial Planning and Sustainable Development 4(1): 91–104. https://doi.org/10.14246/irspsd.4.191

41.

Song

Hsu

CHC

Pan

, et al. (2025) How covid-19 has changed tourists’ behaviour. Nature Human Behaviour 9(1): 43–52. https://doi.org/10.1038/s41562-024-02037-w

42.

Stylidis

Sit

Biran

(2014) An exploratory study of residents’ perception of place image. Journal of Travel Research 55(5): 659–674. https://doi.org/10.1177/0047287514563163

43.

Chen

Zhou

Fan

(2023) Exploring city image perception in social media big data through deep learning: A case study of Zhongshan City. Sustainability 15(4): 3311. https://doi.org/10.3390/su15043311

44.

Muskat

, et al. (2021) Improving the resident–tourist relationship in urban hotspots. Journal of Sustainable Tourism 29(4): 595–615. https://doi.org/10.1080/09669582.2020.1818087

45.

Wang

(2024) Social conflicts and their resolution paths in the commercialized renewal of old urban communities in China under the perspective of public value. Journal of Urban Management 14(2): 402–417. https://doi.org/10.1016/j.jum.2024.11.007

46.

Wang

(2024) Fusing multi-source social media data and street view imagery to inform urban space quality: A study of user perceptions at Kampong Glam and Haji Lane. Urban Informatics 3(1): 21. https://doi/org/10.1007/s44212-024-00052-w

47.

Xiong

Zheng

Zhang

(2022) Optimization strategy of commercial space in Xianyukou hutong based on kernel density and space syntax. Journal of World Architecture 6(6): 40–48. https://doi.org/10.26689/jwa.v6i6.4510

48.

Yang

Liu

(2022) Social media data in urban design and landscape research: A comprehensive literature review. Land 11(10): 1796. https://doi.org/10.3390/land11101796

49.

Peng

Aniche

, et al. (2021) Urban renewal as policy innovation in China: From growth stimulation to sustainable development. Public Administration and Development 41(1): 23–33. https://doi.org/10.1002/pad.1903

50.

(2025) Renewal design of public space in old alleys based on the theory of spatial conflict: A case study of cherry oblique street. Lecture Notes in Education Psychology and Public Media 81(1): 119–125. https://doi.org/10.54254/2753-7048/2025.20783

51.

Yubero

Condeço-Melhorado

Garca-Hernndez

, et al. (2021) Comparing spatial and content analysis of residents and tourists using geotagged social media data. the historic neighbourhood of Alfama (Lisbon), a case study. Investigaciones Tursticas 22: 95. https://doi.org/10.14198/INTURI2021.22.5

52.

Zhan

Cheng

Zhu

(2024) Progress on image analytics: Implications for tourism and hospitality research. Tourism Management 100: 104798. https://doi.org/10.1016/j.tourman.2023.104798

53.

Zhang

Chen

Lin

(2020) Mapping destination images and behavioral patterns from user-generated photos: A computer vision approach. Asia Pacific Journal of Tourism Research 25(11): 1199–1214. https://doi.org/10.1080/10941665.2020.1838586

54.

Zhao

Chen

, et al. (2025) Unraveling the renewal priority of urban heritage communities via macro-micro dimensional assessment- A case study of Nanjing City, China. Sustainable Cities and Society 124: 106317. https://doi.org/10.1016/j.scs.2025.106317

55.

Zheng

Zhang

Mou

, et al. (2024) Selection biases in crowdsourced big data applied to tourism research: An interpretive framework. Tourism Management 102: 104874. https://doi/org/10.1016/j.tourman.2023.104874

56.

Zhou

Sun

, et al. (2021) Dynamic and drivers of spatial change in rapid urban renewal within Beijing inner city. Habitat International 111: 102349. https://doi.org/10.1016/j.habitatint.2021.102349

57.

Zhu

(2024a) The spatial injustice in tourism-led historic urban area renewal: An analytical framework from stakeholder analysis. Current Issues in Tourism 27(8): 1229–1248. https://doi.org/10.1080/13683500.2023.2203849

58.

Zhu

Cheng

Wang

(2024b) Measuring Chinese mobility behaviour during covid-19 using geotagged social media data. Humanities and Social Sciences Communications 1(1): 1–12. https://doi.org/10.1057/s41599-024-03050-0

59.

Zhu

Brenken

Biljecki

, et al. (2026) Unveiling the meaning in the image of the city: A novel approach using place reviews and large language models. Environment and Planning B: Urban Analytics and City Science 53(1): 49–67. https://doi.org/10.1177/23998083251369143

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.05 MB