Bridging perceived and lived space in urban perception: Multimodal LLM approaches in evaluating urban parks

Abstract

In this study, we develop and test a quantitative comparative framework that leverages Multimodal Large Language Models (MLLMs) to examine and partially bridge the gap between Street View Imagery (SVI) and User-Generated Content (UGC), enabling quantitative and spatially explicit urban perception mining in public parks. Understanding how individuals perceive urban spaces is central to inclusive urban planning. Building on Lefebvre’s spatial triad, our approach frames SVI as perceived space and UGC as lived space, integrating geo-computation and large-scale visual–textual analysis to address the modality gaps inherent to SVI. We compare SVI and UGC and assess whether MLLMs, specifically GPT-4o, can bridge the gaps between them. Empirical analysis shows that SVI captures functional, infrastructure-based attributes, while UGC highlights aesthetic and experiential dimensions of urban form and livability. These findings underscore the complementary nature of both data sources and the limitations of relying solely on SVI-based indices. By leveraging MLLMs to interpret visual and textual data within a unified framework, we demonstrate a scalable method for computational modelling of urban perception that integrates both perceived and lived space, offering new insights for data-driven planning and the optimization of urban environments.

Keywords

urban perception multimodal large language model computer vision street view image user-generated contents

Get full access to this article

View all access options for this article.

References

Agustí

(2021) The clustering of city images on Instagram: a comparison between projected and perceived images. Journal of Destination Marketing & Management 20: 100608. https://doi.org/10.1016/j.jdmm.2021.100608

Biljecki

Ito

(2021) Street view imagery in urban analytics and GIS: a review. Landscape and Urban Planning 215: 104217. https://doi.org/10.1016/j.landurbplan.2021.104217

Blei

Jordan

(2003) Latent dirichlet allocation. Journal of Machine Learning Research 3(Jan): 993–1022.

Cai

(2021) Natural language processing for urban research: a systematic review. Heliyon 7(3): e06322. https://doi.org/10.1016/j.heliyon.2021.e06322

Carmona

(2021) Public Places Urban Spaces: The Dimensions of Urban Design. Routledge.

Chen

Liu

Chen

, et al. (2022) The interaction between human demand and urban greenspace supply for promoting positive emotions with sentiment analysis from Twitter. Urban Forestry and Urban Greening 78: 127763. https://doi.org/10.1016/j.ufug.2022.127763

Egger

(2022) A topic modeling comparison between lda, nmf, top2vec, and bertopic to demystify Twitter posts. Frontiers in Sociology 7: 886498. https://doi.org/10.3389/fsoc.2022.886498

Gao

Liu

Kang

, et al. (2021) User-Generated Content: A Promising Data Source for Urban Informatics. Urban Informatics, 503–522.

Gatti

Procentese

(2021) Experiencing urban spaces and social meanings through social media: unravelling the relationships between Instagram city-related use, sense of Place, and Sense of community. Journal of Environmental Psychology 78: 101691. https://doi.org/10.1016/j.jenvp.2021.101691

10.

Huang

Obracht-Prondzynska

Kamrowska-Zaluska

, et al. (2021) The image of the city on social media: a comparative study using “Big Data” and “Small Data” methods in the Tri-City Region in Poland. Landscape and Urban Planning 206: 103977. https://doi.org/10.1016/j.landurbplan.2020.103977

11.

Inoue

Manabe

Murayama

, et al. (2022) Landscape value in urban neighborhoods: a pilot analysis using street-level images. Landscape and Urban Planning 221: 104357. https://doi.org/10.1016/j.landurbplan.2022.104357

12.

Ito

Biljecki

(2021) Assessing bikeability with street view imagery and computer vision. Transportation Research Part C: Emerging Technologies 132: 103371. https://doi.org/10.1016/j.trc.2021.103371

13.

Jang

Kim

(2025) Multimodal large language models as built environment auditing tools. The Professional Geographer 77(1): 84–90. https://doi.org/10.1080/00330124.2024.2404894

14.

Jeon

Woo

(2023) Deep learning analysis of street panorama images to evaluate the streetscape walkability of neighborhoods for subsidized families in Seoul, Korea. Landscape and Urban Planning 230: 104631. https://doi.org/10.1016/j.landurbplan.2022.104631

15.

Lee

(2021) Analyzing the effects of green view index of neighborhood streets on walking time using google street view and deep learning. Landscape and Urban Planning 205: 103920. https://doi.org/10.1016/j.landurbplan.2020.103920

16.

Chen

Lee

, et al. (2023) A novel walkability index using google street view and deep learning. Sustainable Cities and Society 99: 104896. https://doi.org/10.1016/j.scs.2023.104896

17.

Lee

Park

, et al. (2025) Measuring nuanced walkability: leveraging chatgpt's vision reasoning with multisource spatial data. Computers, Environment and Urban Systems 121: 102319. https://doi.org/10.1016/j.compenvurbsys.2025.102319

18.

Kim

Lee

(2024) POI GPT: extracting POI information from social media text data. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences 48: 113–118.

19.

Kim

Seong

Lee

, et al. (2025) Decoding multiculturalism through linguistic landscapes: a deep learning–based OCR analysis of street view images. Urban Informatics 4(1): 1–16. https://doi.org/10.1007/s44212-025-00071-1

20.

Kozloff

(2015) Understanding the Value of Urban Open Space. Urban Land Institute.

21.

Larkin

Chen

, et al. (2021) Predicting perceptions of the built environment using GIS, satellite and street view image approaches. Landscape and Urban Planning 216: 104257. https://doi.org/10.1016/j.landurbplan.2021.104257

22.

Lee

Kang

(2021) Mining tourists’ destinations and preferences through LSTM-Based text classification and spatial clustering using flickr data. Spatial Information Research 29(6): 825–839. https://doi.org/10.1007/s41324-021-00397-3

23.

Lefebvre

(1991) The Production of Space. Blackwell: Oxford.

24.

Yabuki

Fukuda

(2022) Exploring the association between street built environment and street vitality using deep learning methods. Sustainable Cities and Society 79: 103656. https://doi.org/10.1016/j.scs.2021.103656

25.

Liu

Silva

, et al. (2017) A machine learning-based method for the large-scale evaluation of the qualities of the urban environment. Computers, Environment and Urban Systems 65: 113–125. https://doi.org/10.1016/j.compenvurbsys.2017.06.003

26.

Wang

Liu

, et al. (2024) Assessment of street space quality and subjective well-being mismatch and its impact, using multi-source big data. Cities 147: 104797. https://doi.org/10.1016/j.cities.2024.104797

27.

Nasar

(1984) Visual preferences in urban street scenes: a cross-cultural comparison between Japan and the United States. Journal of Cross-Cultural Psychology 15(1): 79–93. https://doi.org/10.1177/0022002184015001005

28.

Olteanu

Castillo

Diaz

, et al. (2019) Social data: biases, methodological pitfalls, and ethical boundaries. Frontiers in Big Data 2: 13. https://doi.org/10.3389/fdata.2019.00013

29.

OpenAI . (2024). Introducing vision to the fine-tuning API. Retrieved from. https://openai.com/index/introducing-vision-to-the-fine-tuning-api/. Accessed 2024.10.09.

30.

Ordonez

Berg

(2014) Learning high-level judgments of urban perception In: European Conference on Computer Vision. Cham: Springer International Publishing, pp. 494–510.

31.

Oteros-Rozas

Martín-López

Fagerholm

, et al. (2018) Using social media photos to explore the relation between cultural ecosystem services and landscape features across five European sites. Ecological Indicators 94: 74–86. https://doi.org/10.1016/j.ecolind.2017.02.009

32.

Panero

Zelnik

(1979) Human dimension and interior space: a source book of design reference standards. Watson-Guptill.

33.

Phillips

(2011) Unique to Boston: Better and Little Known Attractions [Conference Report]. IEEE Pulse 2(4): 12–20. https://doi.org/10.1109/mpul.2011.941722

34.

Schroeder

Anderson

(1984) Perception of personal safety in urban recreation sites. Journal of Leisure Research 16(2): 178–194. https://doi.org/10.1080/00222216.1984.11969584

35.

Song

Richards

, et al. (2020) Does geo-located social media reflect the visit frequency of urban parks? A city-wide analysis using the count and content of photographs. Landscape and Urban Planning 203: 103908. https://doi.org/10.1016/j.landurbplan.2020.103908

36.

Song

Ning

, et al. (2022) Analyze the usage of urban greenways through social media images and computer vision. Environment and Planning B: Urban Analytics and City Science 49(6): 1682–1696. https://doi.org/10.1177/23998083211064624

37.

Sreetheran

Van Den Bosch

CCK

(2014) A socio-ecological exploration of fear of crime in urban green spaces–A systematic review. Urban Forestry and Urban Greening 13(1): 1–18. https://doi.org/10.1016/j.ufug.2013.11.006

38.

Sun

, et al. (2023) A spatial analysis of urban streets under deep learning based on street view imagery: quantifying perceptual and elemental perceptual relationships. Sustainability 15(20): 14798. https://doi.org/10.3390/su152014798

39.

Tsai

Chang

(2013) Three‐dimensional positioning from google street view panoramas. IET Image Processing 7(3): 229–239. https://doi.org/10.1049/iet-ipr.2012.0323

40.

Tuan

(2013) Landscapes of Fear. U of Minnesota Press.

41.

van Zanten

van Berkel

Meentemeyer

, et al. (2016) Continental-scale quantification of landscape values using social media data. Proceedings of the National Academy of Sciences 113(46): 12974–12979. https://doi.org/10.1073/pnas.1614158113

42.

Wang

(2024) Fusing multi-source social media data and street view imagery to inform urban space quality: a study of user perceptions at kampong glam and haji Lane. Urban Informatics 3(1): 21. https://doi.org/10.1007/s44212-024-00052-w

43.

Wei

Yue

, et al. (2022) Mapping human perception of urban landscape from street-view images: a deep-learning approach. International Journal of Applied Earth Observation and Geoinformation 112: 102886. https://doi.org/10.1016/j.jag.2022.102886

44.

Gao

, et al. (2023) Using street view images to examine the association between human perceptions of locale and urban vitality in shenzhen, China. Sustainable Cities and Society 88: 104291. https://doi.org/10.1016/j.scs.2022.104291

45.

Xia

Yabuki

Fukuda

(2021) Development of a system for assessing the quality of urban street-level greenery using street view images and deep learning. Urban Forestry and Urban Greening 59: 126995. https://doi.org/10.1016/j.ufug.2021.126995

46.

Xie

Wang

, et al. (2021) SegFormer: simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems 34: 12077–12090.

47.

Zeng

Shen

, et al. (2019) The visual quality of streets: a human-centred continuous measurement based on machine learning algorithms and street view images. Environment and Planning B: Urban Analytics and City Science 46(8): 1439–1457. https://doi.org/10.1177/2399808319828734

48.

Yin

Zhao

, et al. (2023) A Survey on Multimodal Large Language Models. arXiv preprint arXiv:2306.13549.

49.

Zhang

Zhou

Liu

, et al. (2018) Measuring human perceptions of a large-scale urban region using machine learning. Landscape and Urban Planning 180: 148–160. https://doi.org/10.1016/j.landurbplan.2018.08.020

50.

Zhang

, et al. (2024) Mllms: Recent Advances in Multimodal Large Language Models. arXiv preprint arXiv:2401.13601.

51.

Zhang

Fukuda

, et al. (2025) Urban safety perception assessments via integrating multimodal large language models with street view images. Cities 165: 106122. https://doi.org/10.1016/j.cities.2025.106122

52.

Zhao

Zhou

, et al. (2023) A Survey of Large Language Models. arXiv preprint arXiv:2303.18223.

53.

Zhao

Liang

Biljecki

, et al. (2025) Quantifying seasonal bias in street view imagery for urban form assessment: a global analysis of 40 cities. Computers, Environment and Urban Systems 120: 102302. https://doi.org/10.1016/j.compenvurbsys.2025.102302

54.

Zhu

Gao

Zhang

, et al. (2021) Quantifying emotional differences in urban green spaces extracted from photos on social networking sites: a study of 34 parks in three cities in northern China. Urban Forestry and Urban Greening 62: 127133. https://doi.org/10.1016/j.ufug.2021.127133

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

3.02 MB