Abstract
Representation learning is critical for multimodal methods; traditional consistency-based multimodal methods always constrain the disagreements among different modality embeddings or predictions as an extra regularization. However, these methods may appear to cause performance degeneration in open environments. This is mainly attributed to the interference of asymmetric information, that is, different modality information exists divergence, whereas consistency regularization prefers to simply minimize the divergence rather than optimal classifiers. Therefore, it is unsafe to directly use consistency regularization. To this end, we propose modality-specific subspace learning (MSSL). It learns the modality-specific subspace representations by treating modality divergence and consistency separately. In particular, MSSL is a semi-supervised framework that maps different modality feature embeddings into shared and independent subspaces. The shared subspace applies reliable consistency regularization by measuring intermodality structural similarities. The independent subspace uses a discriminative modality-separation network to emphasize modality complementary information. Finally, labeled instances from different modalities are classified with weighted predictions over concatenated embeddings. Consequently, MSSL improves both the single modal and ensemble classification results and acquires more robust mapping among different modalities. Empirical studies show the superior performance of MSSL on real-world datasets.
Get full access to this article
View all access options for this article.
