Abstract
Retrieving and analyzing vulnerability reports remains a critical challenge in cybersecurity, exacerbated by the exponential growth of disclosed vulnerabilities and the increasing complexity of Proof-of-Concept (PoC) reports. Confronted with massive numbers of vulnerability reports, automated tools and models are urgently required to facilitate the understanding and analysis of vulnerability reports and PoC reports, thereby supporting security professionals in filtering similar vulnerabilities and extracting critical vulnerability attributes. Existing methods for vulnerability semantic similarity either rely on rule-based keyword matching (failing to capture contextual nuances) or generic pre-trained language models, leading to suboptimal retrieval performance. To address these gaps, this work focuses on the semantic similarity analysis of vulnerability descriptions using semantic representation learning. To fully exploit the semantic information embedded in vulnerability reports, we propose a task-specific fine-tuned Sentence Transformer based model for calculating vulnerability semantic similarity. Our approach integrates domain-specific knowledge of vulnerability reports into the model training process, enabling it to capture nuanced semantic relationships unique to cybersecurity. On this basis, we further construct an end-to-end vulnerability retrieval system that uses our fine-tuned similarity model with Elasticsearch vector indexing, realizing intelligent retrieval of both vulnerability and PoC reports. Experimental results demonstrate that the proposed model captures domain knowledge more effectively, and the enriched semantic information significantly enhances the effectiveness of vulnerability report retrieval.
Get full access to this article
View all access options for this article.
