Sage Journals: Discover world-class research

Abstract

Although some intelligent large-scale models have been proposed recently, due to the diversity and complexity of urban planning tasks, there are still no domain-specific large-scale models tailored to urban planning that support varied tasks. To address this gap, this paper presents the Semantic Multimodal Analysis and Retrieval approach for Planning (SMART-Plan), an innovative multimodal large model that supports complex and diverse urban planning tasks such as domain knowledge questioning, multimodal data retrieval and image-based plan generation. Another key innovation of this model lies in the automated construction of a domain-specific knowledge graph, which combines textual and visual data to represent urban planning entities and their relationships comprehensively. By leveraging the constructed knowledge graph and the designed three-phase domain fine-tuning, the performance of the model was significantly improved across multiple tasks in urban planning, addressing the challenges of fragmented data and specialized terminology in urban planning. Extensive experiments demonstrated that SMART-Plan significantly outperforms existing models in accuracy, logic and professionalism, with an average improvement of 6.25% and 7.2% in knowledge Q&A and image–text Q&A, compared to state-of-the-art methods.

Keywords

Urban planning Multimodal knowledge graph Large language models Multimodal large models Domain fine-tuning

Get full access to this article

View all access options for this article.

References

Zhu

Wang

Jiang

Sun

Wang

. Multi-modal knowledge graph construction and application: A survey. IEEE Trans Knowl Data Eng 2022; 36(2): 715–735.

Devlin

Chang

M-W

Lee

Toutanova

Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.

Shen

Wallis

Allen-Zhu

Wang

Chen

LoRA : Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.

Fan

Ding

Ning

Wang

Yin

Chua

T-S

. A survey on rag meeting LLMs: towards retrieval-augmented large language models. In: Proceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining, Barcelona, Spain, 25–29 August 2024, pp. 6491–6501.

Tinn

Cheng

Lucas

Usuyama

Liu

Naumann

Gao

Poon

. Domain-specific language model pretraining for biomedical natural language processing. ACM Trans Comput for Healthcare (HEALTH) 2021; 3(1): 1–23.

Hartsock

Rasool

. Vision-language models for medical report generation and visual question answering. A review. Front Artif Intell 2024; 7: 1430984.

Wong

Zhang

Usuyama

Liu

Yang

Naumann

Poon

Gao

. LLaVA-Med: training a large language-and-vision assistant for biomedicine in one day. Adv Neural Inf Proces Syst 2023 36: 28541–28564.

Domain-graph enhanced multi-task multimodal large models for urban planning

Abstract

Keywords

Get full access to this article

References