Abstract
With the rapid advancement of new technologies, the number of academic papers in various fields has grown exponentially, making traditional keyword-based search methods insufficient to capture semantic information comprehensively. This article presents a method called SETM, which combines Text-embedding-ada-002, BERTopic and PageRank models to perform semantic extraction, topic modelling and ranking on large-scale academic papers. UMAP is used for dimensionality reduction and visualisation, revealing semantic relationships between papers. The system employs a tree-like layout combined with multi-level interactive features, allowing users to conveniently explore and retrieve papers, from a coarse-grained ‘paper galaxy’ view to fine-grained individual papers. Experimental results demonstrate that the proposed approach achieves high accuracy in semantic mining and clustering, providing an effective tool for dynamic visualisation and analysis of large-scale academic papers.
Get full access to this article
View all access options for this article.
