Abstract
Although some intelligent large-scale models have been proposed recently, due to the diversity and complexity of urban planning tasks, there are still no domain-specific large-scale models tailored to urban planning that support varied tasks. To address this gap, this paper presents the Semantic Multimodal Analysis and Retrieval approach for Planning (SMART-Plan), an innovative multimodal large model that supports complex and diverse urban planning tasks such as domain knowledge questioning, multimodal data retrieval and image-based plan generation. Another key innovation of this model lies in the automated construction of a domain-specific knowledge graph, which combines textual and visual data to represent urban planning entities and their relationships comprehensively. By leveraging the constructed knowledge graph and the designed three-phase domain fine-tuning, the performance of the model was significantly improved across multiple tasks in urban planning, addressing the challenges of fragmented data and specialized terminology in urban planning. Extensive experiments demonstrated that SMART-Plan significantly outperforms existing models in accuracy, logic and professionalism, with an average improvement of 6.25% and 7.2% in knowledge Q&A and image–text Q&A, compared to state-of-the-art methods.
Keywords
Get full access to this article
View all access options for this article.
