Abstract
Urban perception is fundamental to understanding the built environment and has been increasingly observed through social sensing, yet most studies overlook differences between population groups. This limitation becomes especially consequential in urban regeneration contexts, where tourists and residents often experience and represent space differently. This study proposes a group-sensitive multimodal framework to compare how tourists and residents express urban environments across text, image, and emotion. Using 14,300 geo-tagged posts and 78,632 images from Beijing, we quantify narrative divergence and analyze its relationship with built-environment (BE) factors. Results show a clear modality difference: visual divergence remains low (Mean JSDimage = 0.140), textual narratives differ greatly (Mean JSDtext = 0.435), and emotional divergence is moderate (Mean ED = 0.146). These differences are associated with BE factors including functional mix, spatial visibility, and amenity context. In urban regeneration contexts, these divergences help identify tensions and offer insights for planning strategies and design decisions. More broadly, the study reveals that conventional aggregated social sensing may produce a filter effect by amplifying dominant narratives while neglecting everyday experiences of different user groups. It highlights the need for differentiated user perspectives in multimodal social sensing to support more inclusive urban analysis and planning.
Introduction
Urban perception, which means how people sense and understand the city, is an important dimension for studying the urban environment. To capture this dimension at scale, social sensing offers us an observational lens via the digital traces of human behavior (Liu et al., 2015). Traces such as mobility records, search behaviors, and social media content provide a bottom-up view of how urban space is experienced and represented (Shen et al., 2024). Among them, social media data have received a certain amount of attention for its real-time, multimodal, and user-generated expressions (Chen et al., 2023; Chuang et al., 2023). It reflects how people actively narrate and feel places, and becomes an important information source for research on urban perception (Chuang et al., 2022; Lang et al., 2022). Recent studies have used them to identify urban hotspots, assess place attachment, and model collective perceptions across spatial contexts (Fan and Zhang, 2023; Molinillo et al., 2019).
Despite these advances, most social sensing research continues to regard urban populations as homogeneous. Recent work has started to address this limitation by distinguishing between population groups in urban analysis (Stylidis et al., 2014; Yang and Liu, 2022), highlighting that urban perception is not the same. Among these differentiated views, a particularly salient contrast lies between tourists and residents. They travel the city at different paces, yet often share the same historic and mixed-use spaces (Fan and Zhang, 2023). This issue becomes especially significant in urban regeneration contexts, where tourism-oriented development is frequently used to enhance cultural visibility and economic vitality (Chen and Chen, 2025; Li, 2025). Such processes can blur the boundary between everyday life and visitor-oriented consumption, making tensions in spatial use, perception, and representation more visible (Zhao et al., 2025). However, these differences in perception have rarely been examined in a systematic and multimodal way, particularly concerning urban regeneration and underlying spatial mechanisms.
To fill this gap, we explore tourist–resident divergence in multimodal social media expressions of the city. We focus on divergences in terms of text, image, and emotion, which reveal underlying spatial tensions in the context of urban regeneration. The central research question is: How can we use multimodal social media expressions to identify narrative divergence, and how can the findings be translated into urban regeneration strategies?
The remainder of this paper is structured as follows. First, we build an analytical framework that integrates textual, visual, and emotional expressions from social media to capture differentiated perceptions of the city. Using geo-tagged posts and images, we then quantify narrative divergence across three modalities and examine its spatial distribution. Next, we analyze how these divergences relate to BE factors. Finally, we discuss how multimodal divergence between tourists and residents can inform urban regeneration strategies and outline future research directions.
Literature review
Social sensing for spatial analysis
Compared with traditional methods such as survey or mapping (Chen et al., 2022; Huang et al., 2021), social sensing serves as a real-time and multimodal approach for understanding urban perception. Social media platforms such as Instagram, Weibo, and Xiaohongshu have large amounts of texts and images that reflect how users interpret urban environments (Fan and Zhang, 2023; Su et al., 2023; Zhang et al., 2020). Previous studies have used text mining, image recognition, and topic modeling to extract different dimensions of city image (Bai et al., 2024; Zhan et al., 2024).
Among social sensing sources, visual platforms are particularly important. Boy and Uitermark (Boy and Uitermark, 2017) showed that Instagram content shapes the stories people tell about cities, which not only highlights famous attractions but also reveals the unknown spaces. Other works found tourist attention on social media tends to cluster at specific spots, which reinforces digital inequality (Indaco and Manovich, 2016; Paul i Agustí, 2021). In China, Xiaohongshu has played a significant role in understanding visual perception (Bai et al., 2024; Jin et al., 2025), and Weibo is used to track public sentiment as it happens (Zhu et al., 2024a, 2024b). Most recent studies also attempt to combine text and image for deeper analysis (Huang et al., 2024; Kang et al., 2021). Wang and Hou et al. integrated street view photos and social media stories (Hou et al., 2024; Wang, 2024), which offers a fuller picture of user perception. Other researchers looked at how these public narratives change over time as well, which has been especially relevant during major disruptions like the COVID-19 pandemic (Kim and Yang, 2026; Song et al., 2025).
Overall, existing social sensing research has shown strong potential for capturing urban perception at scale. However, much of this work still relies on aggregated representations of users, paying insufficient attention to how different population groups may perceive and narrate the same urban space in different ways.
Divergent expression on social media
When using social media data to study how different groups perceive cities, research conclusions largely depend on how locals and non-locals are distinguished in the data. Chuang et al. linked geo-tagged social media posts to users’ home areas and classified urban spaces based on how many visitors they attract and how diverse those visitors are. Their results show that separating regular users from short-term visitors can reveal differences that are hidden when all users are analyzed together (Chuang et al., 2023). Other studies move beyond simple residence-based labels and use mobile phone location data to capture non-residential presence and how exposure to urban space changes over time (Hanaoka, 2018). More recent work has also treated specific groups, such as night-shift workers, as a distinct urban population, and mapped where they work, move, and spend time using spatial classification methods (Mavrogeni and Cheshire, 2025).
Previous research has confirmed that tourists and residents experience urban spaces differently. Residents pay more attention to daily life and local culture, while tourists are usually drawn to famous landmarks (Aranburu et al., 2016; Stylidis et al., 2017). This divergence has been confirmed via both traditional and digital methods (Molinillo et al., 2019; Vu et al., 2021). In recent years, social media data have made it easier to examine these differences in detail (Derdouri and Osaragi, 2021; Yubero et al., 2021). Peng et al. analyzed Weibo posts in Beijing and found tourists mostly talk about well-known landmarks, whereas residents mention a wider range of everyday places (Peng et al., 2020). Cross-cultural research further demonstrates how cultural background shapes digital representation. For example, research in Tokyo found that tourists from Western and Asian countries pay attention to different locations and discuss different topics (Kang et al., 2021).
Recent work has started to unpack the implications of tourist–resident divergences. Research showed that they have different consumption habits (Cerdan Schwitzguebel and Romero Bartomeus, 2019) and visually attractive places tend to receive much more attention from tourists than from residents (Fan and Zhang, 2023). However, few works explore spaces where resident and tourist experiences overlap or conflict (Gomez et al., 2018; Luo et al., 2025). This problem is getting bigger with the growth of local-experience tourism where tourists want to join everyday urban life. Instead of visiting famous landmarks, they often spend time in ordinary places such as alleyways, street markets, or neighborhood parks (Zheng et al., 2024). For instance, hybrid urban areas such as Beijing hutong or Shanghai lilong act as both tourist spots and local neighborhoods. In these spaces, different lifestyles constantly encounter each other. What people share on social media highlights the gap or the connection between daily life and curated experience (Kontokosta et al., 2024). However, existing research still provides limited understanding of how such overlap spaces generate different textual, visual, and emotional narratives, and how these divergences relate to the spatial conditions of urban regeneration.
Urban regeneration of hybrid spaces
Historic and mixed-use neighborhoods are hybrid spaces where residents and tourists coexist (Zhao et al., 2025). Previous research has indicated that they frequently show different narratives to the same physical space. For tourists, hutongs may be seen as cultural attractions or visually attractive destinations, while residents often regard them as crowded and losing characteristics of the neighborhood (Yu, 2025). Some research also shows the strong influence of tourism on local businesses, spatial conflicts, and meanings of place (Sebrek et al., 2025; Wang and Li, 2024). In response to these problems, recent research has called for more inclusive and context-aware governance approaches. These approaches combine public participation, small-scale spatial adjustments, and data-based tools to better connect planning decisions with people’s day-to-day lives (Li et al., 2024; Lin et al., 2025; Omidipoor et al., 2019). In China, this has supported a move toward community micro-regeneration strategies in response to specific user needs (Li, 2025; Zhao et al., 2025). Case studies of neighborhood regeneration show that tools such as interface design and flexible management by time help to reduce conflicts (Xiong et al., 2022).
However, what remains underexplored is tourist–resident relations in hybrid spaces. While many studies point out differences between the two groups, few examine how their everyday activities and expressions interact and shape each other (Liu et al., 2019). In historic areas such as Nanluoguxiang and Shichahai in Beijing, daily encounters between the two groups strongly influence the use and understanding of space (Shi and Bian, 2016). Research on the quality of interaction suggests that regeneration results cannot be explained independently without these mutual perceptions (Ye et al., 2021; Zhou et al., 2021). This indicates that attention should be paid not only to difference itself, but also to how multimodal tourist–resident divergence emerges within shared urban settings. Such an understanding is essential for identifying the sources of tension in hybrid spaces and for developing regeneration strategies that respond to the differentiated experiences of the same urban environment.
Methodology
Analytical framework
This study constructs a two-stage analytical framework (Figure 1). First, we assess how tourists and residents express urban space differently across modalities. We use Jensen–Shannon Divergence (JSD) for text and image themes, and emotion difference (ED) for sentiment variation. Second, we examine how divergence patterns relate to built environment (BE) factors. Furthermore, we consider how such findings can be translated into urban regeneration strategies that are sensitive to group-specific perceptions.

Analytical framework.
These two stages correspond to the following hypotheses:
Study area and data pre-processing
This study focuses on the area within the Fifth Ring Road of Beijing, which includes a large number of historical and cultural landmarks, commercial zones, and neighborhoods. Our dataset consists of social media data and urban spatial data. The social media dataset is within 2025 summer (June to September), since summer time period features a substantial temporal and spatial overlap between tourist activities and residents’ daily routines, making it easier to examine their interactions and behavioral differences. Social media data includes user ID, check-in location, posting time, and content (text and image). A total of 18,265 geo-tagged posts were initially collected from Weibo and Xiaohongshu. To ensure data reliability, we removed posts generated by automated or suspicious accounts and got a dataset of 14,300 posts. Since Xiaohongshu lacks explicit location tagging, we made a reference list based on Weibo check-in points, and obtained location information from text through searching and matching.
To classify tourists and residents, we first constructed a manually labeled dataset of 500 posts, based on explicit annotation rules and multi-stage human review. We then fine-tuned a BERT-based model and applied it to the full dataset. The model achieved an accuracy of 84%, which is further validated through comparison with other LLMs such as QWEN2.5 and GPT-5. To reduce the potential risk of circular reasoning and misclassification caused by performative expression, we further conducted robustness checks under alternative confidence thresholds and introduced an auxiliary validation based on user-level Beijing posting patterns. Detailed annotation rules, model performance, and validation results are reported in Supplementary Material A. The final dataset has 7,400 tourist and 6,900 resident check-ins.
Urban spatial dataset includes AOI layers, POI layers, as well as transportation infrastructure such as subway stations and road networks. To ensure consistency with our social media dataset, we removed AOIs for purely functional or low-perceptual environments such as schools, hospitals, and administrative facilities. The remaining AOIs represent urban spaces that are more likely to appear in users’ posts, including places for leisure, consumption, cultural activities, and everyday social interaction.
Data analysis
Identifying hybrid AOIs
We spatially joined all posts with Beijing’s AOI polygons in ArcGIS Pro. AOIs with fewer than 50 posts were excluded for data reliability. For each remaining AOI, we calculated the tourist–resident ratio to represent the presence of each group:
where Ntourist and Nresident are the number of posts identified as originating from tourists and residents respectively. Extreme values (top and bottom 10%) were removed.
We defined hybrid AOIs as those with a tr ratio between 0.3 and 3.0. This threshold was determined by analyzing the frequency distribution of the tr ratio, excluding extreme cases where one group dominates. This range was chosen so that the minority group accounts for at least approximately 25% of total, guaranteeing sufficient visibility for potential tourist–resident interaction.
Measuring narrative divergence
We extracted textual, visual, and emotional features from the multimodal data. Textual theme extraction was conducted via ChatGPT-5 to classify thematic keywords (Ma et al., 2026). Based on structured prompt, top three ranked keywords were extracted to summarize the dominant meaning of each post. Following a pre-scan, 36 high-frequency keywords were identified, and the highest-ranked keyword for each post was further aggregated into six broader theme categories according to existing frameworks: Art & Culture, Citywalk & Experience, Food & Lifestyle, History & Heritage, Nature & Leisure, Social & Participation (Table S2). The extraction quality was manually reviewed on a random subsample of 100 posts, and the keyword-to-theme mapping was independently checked by two researchers to improve coding consistency (see Supplementary Material B).
Imagery theme extraction was conducted via Places365 scene recognition model (Zhu et al., 2026). We mapped the outputs onto six visual thematic categories that aligned with the textual framework (Table S3). However, the two modalities operate at different ontological levels: textual themes reflect behavioral and affective intent, whereas Places365 captures physical scene types. The image-to-theme mapping should therefore be understood as an interpretive aggregation, and cross-modal JSD comparisons are used here only to indicate relative divergence patterns rather than strict semantic equivalence.
As for sentiment classification, emotional orientation (positive, neutral, negative) was determined using ChatGPT-5 through sentiment analysis. To ensure validity, a manual audit on a random subsample of 200 posts reached 92% accuracy, confirming the reliability of sentiment classification by LLMs.
Based on these extracted features, we analyzed narrative divergence at AOI level using two metrics: JSD and ED. The JSD value was calculated to quantify differences in thematic distributions of text and image between the two groups using the following formula:
where P and Q are the probability distributions of tourists and residents for different themes. M is the average distribution of P and Q. DKL is the Kullback–Leibler divergence.
Emotional features were coded numerically (1, 0, −1), and ED was calculated as the difference in mean scores:
Together, these metrics provide a multimodal quantification of narrative divergence that supports Hypothesis 1.
Modeling BE factors of divergence
OLS regression was employed to assess the relationships between BE factors and three forms of narrative divergence (JSDtext, JSDimage, and ED). Independent variables were organized into four BE dimensions including urban function structure, amenities, spatial visibility, and accessibility. Control variables include platform activity, the densities of residential and life-service POIs (all variables are listed in Table S4).
Results and findings
Distribution of hybrid AOIs
Hybrid AOIs are concentrated in specific functional environments. As visualized in Figure 2, a total of 79 hybrid AOIs are identified. Most of them cluster in Beijing’s inner city, particularly within the Third Ring Road. In terms of function, commercial areas appear the most, followed by tourist scenic spots, residential areas, and urban parks. This spatial distribution sets the context for our following analysis.

Distribution of hybrid AOIs in Beijing. The bar heights represent tr_ratio. Selected AOIs are color-coded according to their original five primary categories.
Structured narrative divergence across text, image, and emotion
Within hybrid AOIs, there is a structured narrative divergence across modalities. While visual divergence remains relatively low, textual narratives differ greatly, revealing different ways of perceiving the city. Textual narratives show the highest level of divergence (Mean JSDtext = 0.435). Results indicate that tourists see city as aesthetic and performative, focusing more on symbolic themes such as Citywalk & Experience and Art & Culture. In contrast, residents emphasize everyday and socially embedded activities such as Food & Lifestyle and Social & Participation. Visual divergence is the lowest of all the metrics (Mean JSDimage = 0.140). Although tourists post slightly more History & Heritage and residents post more Social & Participation, overall visual themes show greater overlap between groups. This suggests that the visual dominance of the physical scene may, in some cases, exceed the functional differences captured in text. However, the relatively low visual divergence should be interpreted cautiously, as it may partly reflect the scene-based ontology of Places365 and its lower sensitivity to experiential differences compared with text-based classification. The overall emotional divergence is moderate (Mean ED = 0.146), and 64% of the posts are positive. Residents show 32% higher proportion of negative emotions than tourists, as they express fatigue and spatial disruption caused by over-tourism more often.
In addition, divergences vary between functions (Table S5). For instance, residential areas show the highest JSDtext and JSDimage, as tourists view these neighborhoods as cultural scenery while residents view them as functional places. Parks show the highest ED, as residents tend to express more positive emotions including joy and relaxation, while tourists sometimes express disappointment or frustration. Figure 3 provides an integrated visualization of these contrasts, visualizing how textual, visual, and emotional layers interact. These findings point to underlying BE factors, which we formally examine in the next part.

Multimodal narrative patterns and divergence across hybrid AOIs. The figure includes tourist, resident, and divergence layer. Representative AOIs are annotated with thematic or emotional icons, with radar graphs illustrating differences between groups.
BE factors related to narrative divergence
The regression results (Table 1) reveal that BE factors influence narrative divergence through urban functional structure, spatial visibility, and specific amenities.
Regression results.
p < 0.1, **p < 0.05, ***p < 0.01.
In terms of urban function structure, functional mix increases both textual and emotional divergence, raising JSDtext (b = 0.087, p < 0.05) and ED (b = 0.145, p < 0.05). This may be because mixed-use areas bring together tourist-oriented activities and residential services, producing competing themes and emotional responses. Influence of functional density varies across modalities, decreasing JSDtext (b = −0.001, p < 0.05) while increasing JSDimage (b = 0.001, p < 0.01). This is because functional density reflects the number of functions rather than their diversity. Similar functions lead to similar textual themes, but the complexity of urban environment still leads to visual divergences.
Spatial visibility, measured by the density of tourist spots, aligns topic themes but differentiates emotions. It decreases JSDtext (b = −0.001, p < 0.05), as tourists and residents describe well-known spaces and landmarks with similar high-frequency themes. However, ED increases with spatial visibility (b = 0.002, p < 0.01), where tourists express excitement, while residents associate these spaces with congestion and noise.
Finally, specific amenities and platform mechanisms shape divergence in modality-specific ways. Shopping malls increase JSDtext (b = 0.019, p < 0.05). Tourists tend to use keywords related to pop-up exhibitions or trendy check-in spots. Residents, however, discuss practical details like parking availability, supermarket discounts, or family dining. Number of parks increases JSDimage (b = 0.019, p < 0.05). Tourists focus chiefly on diverse scenery, while residents mostly take casual photos of social activities like square dancing or jogging paths. Platform activity (Rednote_N) further intensifies ED (b = 0.011, p < 0.05). This suggests that the platform’s algorithm rewards intense feelings. It pushes users to express extreme excitement or strong complaints to gain more visibility.
Discussions
Re-image cities in social media age
Classic urban image theories have long examined how people perceive and interpret the city through cognitive mapping, everyday routines, and symbolic meaning. Lynch’s framework of the “Image of the city” (Lynch, 1960) remains central in this tradition, where he described how residents form mental maps structured by paths, edges, districts, nodes, and landmarks, and how these images are built through long-term familiarity. However, this view reflects a context in which spatial perception was primarily internal and residents were the main interpreters of urban form. In today’s digital environment, spatial experience is increasingly expressed through posts, images, and emotions (Huang et al., 2021), creating a public and continuously accumulating layer of representation (Zhu et al., 2026). Tourists and residents also participate jointly in this process, and their different expectations (Zheng et al., 2024) and uses of space produce multiple narratives. The contemporary image of the city is therefore plural and multimodal. Figure 4 summarizes this conceptual shift by comparing Lynch’s single-layer mental map with the multi-layer narrative structure observed in tourist–resident social media expressions. Our findings confirm that Lynch’s elements continue to shape spatial discourse, but they now operate through differentiated group dynamics rather than a unified public perception. Landmarks act as shared anchors. People talk about the same famous sites as visibility theory suggests (Boy and Uitermark, 2017). But these places also generate strong emotional conflicts, showing that shared topics and themes do not necessarily imply shared experiences. Besides, functional nodes and mixed-use districts reveal divergent perception rather than coherent district images. For instance, functional mix amplifies textual divergence due to different requirements of tourists and residents, and nodes like parks and shopping malls increase textual and visual divergence.

From a single-layer mental map to multi-layer digital expression of space.
Taken together, these patterns indicate that although spatial elements continue to shape how we see and use the city, they do not exclusively determine urban image (Filomena et al., 2019). The online representations are also influenced by group-specific uses and expressive modalities. Analyzing social media data without distinguishing between groups may create representational bias, and lead to a possibly underlying filter effect that can distort how planners understand urban perception. This suggests that social sensing should not be treated as a neutral aggregate reflection of urban perception, but as a layered field of representation shaped by different user groups and modalities of expression.
Urban regeneration strategy through narrative divergence
Urban regeneration efforts in historic districts frequently expose deep narrative tensions between tourists and residents (Li, 2025). These areas include spaces of mixed functions and multi-subject negotiation (Sebrek et al., 2025), where everyday routines and tourist expectations intersect (Yu, 2025). However, existing regeneration practices face the challenge of identifying where specific tensions appear (Zhao et al., 2025).
Our findings demonstrate that multimodal divergences are not random but follow a structured pattern aligned with specific BE factors, thus allowing us to translate social sensing data into a framework for informing urban regeneration strategies (Figure 5). Multimodal divergence can serve as a diagnostic tool for identifying different forms of tourist–resident tension and linking them to particular spatial elements. In this way, the regeneration of historic districts can be transformed into a process that responds to both physical conditions and the differentiated perceptions of tourists and residents.

Urban regeneration of historic areas through social media. Using three representative AOIs in Beijing, the figure illustrates how different narrative divergences inform urban regeneration strategies for mixed-use nodes, iconic landmarks, and symbolic districts.
Mixed-use nodes often show functional tension with sharp textual contrasts, and managing this is therefore a priority. Regeneration strategies should avoid uniform touristification that displaces local functions (Chen and Chen, 2025). Instead, planners need to focus more on spatial interface management, such as creating buffer zones and separating visitor flows from resident paths (Li, 2025). In iconic landmarks, visual divergence is the primary source of conflict, making it crucial to balance representational tension in these highly visible areas. Tourists tend to take standardized photos of iconic scenes due to platform trends. However, residents’ perspectives on social activities or neighborhood life are rarely promoted by platform algorithms (Zhu et al., 2024a, 2024b). To correct this, planners can set up different sightseeing routes, install signs that share resident stories, and create new viewing points to encourage the change of photo-taking positions (Li et al., 2024). Symbolic districts face mainly experiential tension, as their large scale and high tourist density lead to emotional overload for residents. To solve this problem, urban regeneration strategies should reduce emotional pressure while maintaining urban vitality (Ye et al., 2021). Planners could establish quiet zones adjacent to residential areas (Yu, 2025), regulate opening hours of bars and entertainment spaces (Sebrek et al., 2025), and install buffers such as pocket parks to reduce noise (Shi and Bian, 2016).
Contributions, limitations, and future directions
This study moves beyond aggregate social sensing to examine the narrative divergence of tourists and residents. By proposing a group-sensitive multimodal framework, it shows that city image is a layered perception of space in which textual, visual, and emotional dimensions diverge across user groups. Theoretically, the study revisits classic city image theory in the social media age, showing how Lynch’s spatial elements continue to structure urban discourse while giving rise to multi-layered digital expressions. Methodologically, it shows that analyzing tourists and residents together can hide important differences between groups and lead to a biased understanding of urban perception. Practically, it illustrates the implications of multimodal tourist–resident divergence for urban regeneration, showing how group-differentiated perceptions can inform more context-sensitive planning and governance in hybrid urban environments.
However, several limitations still exist. First, social media data contain inherent representational biases shaped by user demographics and platform algorithms (Boy and Uitermark, 2017). Older people who rarely use social media are excluded from digital platforms, and their voices remain unheard as well. Future work could combine social media data with qualitative methods like interviews to obtain a more holistic understanding of diverse voices. Second, although our fine-tuned BERT model performs well in classifying users, it relies on semantic features in user posts which can be performative sometimes (Cheng et al., 2025). Residents may intentionally craft their posts in a way that resembles tourist content to gain visibility, potentially leading to misclassification. For this reason, the tourist–resident distinction in this study should be understood as a social-sensing-based grouping supported by both semantic and behavioral evidence. Future research should further strengthen this distinction by incorporating longer-term posting trajectories or other independent mobility-related evidence where available. Moreover, our data were collected during the summer and focus on Beijing’s central districts, meaning that seasonal variations and other urban contexts are not yet considered. Accordingly, the findings are most applicable to large metropolitan contexts characterized by high tourism intensity, strong functional mix, and active social-media-based place expression, where tourists and residents are spatially intertwined in historic and mixed-use areas. Their applicability to smaller cities, places with weaker tourism economies, or contexts with less tourist–resident overlap should therefore be interpreted more cautiously. Future research could extend the analysis across both time and space, since longitudinal studies across seasons and years could reveal dynamic evolutions in tourist–resident relationships, and comparative studies across different cities would validate the stability of the identified spatial patterns.
Conclusion
This study develops a group-sensitive multimodal framework to examine how tourists and residents express urban space on social media. The results show that urban narratives diverge across text, image, and emotion, and that these differences are systematically related to BE factors. More importantly, the study indicates that conventional aggregated social sensing may hide important differences between groups and lead to a biased understanding of urban perception. By revealing these hidden divergences, the study highlights the importance of differentiated user perspectives for understanding hybrid urban environments and informing more context-sensitive urban regeneration practice.
Supplemental Material
sj-docx-1-tus-10.1177_27541231261445467 – Supplemental material for Tourist-resident divergence in multimodal social sensing and its implications for urban regeneration
Supplemental material, sj-docx-1-tus-10.1177_27541231261445467 for Tourist-resident divergence in multimodal social sensing and its implications for urban regeneration by Yating Wang, Shiqi Dai, Jianshi Li and Yuan Lai in Transactions in Urban Data, Science, and Technology
Footnotes
Acknowledgements
This research is sponsored by the National Natural Science Foundation of China (#72274101).
Ethical considerations
No ethics approval was required for this research as it relies solely on secondary data.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the National Natural Science Foundation of China (#72274101).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability statement
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
Supplemental material
Supplemental material for this article is available online.
Author biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
