Abstract
MobileBert is a generic lightweight model suffering from a large network depth and parameter cardinality. Therefore, this paper proposes a secondary lightweight model entitled LightMobileBert, which retains the bottom 12 Transformers structure of the pre-trained MobileBert and utilizes the tensor decomposition technique to process the model to skip pre-training and further reduce the parameters. At the same time, the joint loss function is constructed based on the improved Supervised Contrastive Learning loss function and the Cross-Entropy loss function to improve performance and stability. Finally, the LMBert_Adam optimizer, an improved Bert_Adam optimizer, is used to optimize the model. The experimental results demonstrate that LightMobileBert has a comparatively higher performance than MobileBert and other popular models while requiring 57% fewer network parameters than MobileBert, confirming that LightMobileBert retains a higher performance while being lightweight.
Keywords
Get full access to this article
View all access options for this article.
