Abstract
The Direct Segment Anything Model (DirectSAM), pre-trained on the SA-1B dataset, which serves as the training set for the Segment Anything Model, demonstrates exceptional performance in class-agnostic edge detection. This research explores its application to remote sensing imagery, emphasizing the practical significance of semantic edge detection, including structures like buildings, roadways, and coastlines. Currently, these applications are managed by separately training specialized models on individual datasets in each specific domain. We present DirectSAM-remote sensing (RS), a foundation model built upon DirectSAM. It retains the powerful segmentation abilities acquired from natural images while leveraging a large-scale dataset designed for semantic edge detection remote sensing. The dataset contains over 34k image-text-edge triplets, making it over 30 times larger than any individual dataset. DirectSAM-RS incorporates a prompter module, consisting of a text encoder and cross-attention layers added to the DirectSAM architecture, enabling flexible conditioning on target class labels or reference expressions. We assess DirectSAM-RS in both zero-shot and fine-tuning scenarios, showing that it delivers state-of-the-art results on various downstream benchmarks.
Get full access to this article
View all access options for this article.
