Sage Journals: Discover world-class research

Abstract

MobileBert is a generic lightweight model suffering from a large network depth and parameter cardinality. Therefore, this paper proposes a secondary lightweight model entitled LightMobileBert, which retains the bottom 12 Transformers structure of the pre-trained MobileBert and utilizes the tensor decomposition technique to process the model to skip pre-training and further reduce the parameters. At the same time, the joint loss function is constructed based on the improved Supervised Contrastive Learning loss function and the Cross-Entropy loss function to improve performance and stability. Finally, the LMBert_Adam optimizer, an improved Bert_Adam optimizer, is used to optimize the model. The experimental results demonstrate that LightMobileBert has a comparatively higher performance than MobileBert and other popular models while requiring 57% fewer network parameters than MobileBert, confirming that LightMobileBert retains a higher performance while being lightweight.

Keywords

Natural language processing lightweight model tensor decomposition supervised contrastive learning

Get full access to this article

View all access options for this article.

References

Devlin

, Chang

M.W.

, Lee

, Toutanova

Bert: pre-trained of deep bidirectional transformers for language understanding, Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2019), 4171–4186.

Chen

, Ma

, Wei

, Zhu

, Ma

, Gong

and Zhou

, A text-based multi-span network for reading comprehension, Journal of Intelligent & Fuzzy Systems 41 (2021), 5807–5819.

Chen

, Ma

, Wei

, Ma

and Zhu

, MTQA: Text-based multitype question and answer reading comprehension model, Computational Intelligence and Neuroscience 2021 (2021), 1–12.

Gordon

M.A.

, Duh

, Andrews

Compressing bert: Studying the effects of weight pruning on transfer learning, International Conference on Learning Representations (2020).

Guo

F.M.

, Liu

, Mungall

F.S.

, Lin

, Wang

Reweighted proximal pruning for large-scale language representation, International Conference on Learning Representations (2020).

Fan

, Grave

, Joulin

Reducing transformer depth on demand with structured dropout, arXiv preprint arXiv:1909.11556 (2019).

, Zhou

, Ge

, Wei

, Zhou

Bert-oftheseus: Compressing bert by progressive module replacing, Conference on Empirical Methods in Natural Language Processing(EMNLP) (2020), 7859–7869.

Zafrir

, Boudoukh

, Izsak

, Wasserblat

Q8bert: Quantized 8bit bert, In 2019 FifthWorkshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS), 2019, 36-39.

Shen

, Dong

, Ye

, Ma

, Yao

, Gholami

, Keutzer

Q-bert: Hessian based ultra low precision quantization of bert, In Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 8815-8821.

10.

Zhang

, Hou

, Yin

, Shang

, Chen

, Jiang

, Liu

Ternarybert: Distillation-aware ultra-low bit bert, The 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020, 509-521.

11.

Liu

, Li

, Cheng

Hardware acceleration of fully quantized bert for efficient natural language processing, In 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE) (2021), 513–516.

12.

Lan

Z.Z.

, Chen

, Goodman

, Gimple

, Sharma

, Soricut

ALBERT:ALite Bert for Self-supervised Learning of Language Representations, International Conference on Learning Representations (2020), 1–17.

13.

Liu

, An

, Qiu

Y-Tuning: An Efficient Tuning Paradigm for Large-Scale Pre-Trained Models via Label Representation Learning, arXiv preprint arXiv:2202. 09817 (2022).

14.

Zhang

, Zheng

, Yang

, Li

, Wang

, Chao

, Ji

You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient, arXiv preprint arXiv:2106.02435 (2021).

15.

Sanh

, Debut

, Chaumond

, Wolf

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, The 2020 Conference on Empirical Methods in Natural Language Processing (2020), 38–45.

16.

Jiao

, Yin

, Shang

, Jiang

, Chen

, Li

, Liu

Tinybert: Distilling bert for natural language understanding, The 2020 Conference on Empirical Methods in Natural Language Processing (2020), 2563–2576.

17.

Chen

, He

, Hui

, Sun

Simplified tinybert: Knowledge distillation for document retrieval, In European Conference on Information Retrieval (2021), 241–248.

18.

Lee

, Saxe

, Harang

CATBERT: Context-aware tiny BERT for detecting social engineering emails, arXiv preprint arXiv:2010.03484 (2020).

19.

Sun

, Yu

, Song

, Liu

, Yang

, Zhou

Mobilebert: a compact task-agnostic bert for resourcelimited devices, International Conference on Learning Representations (2020).

20.

Chen

, Kornblith

, Norouzi

, Hinton

A simple framework for contrastive learning of visual representations, In International Conference on Machine Learning(PMLR) (2020), 1597–1607.

21.

Gunel

, Du

, Conneau

, Stoyanov

Supervised contrastive learning for pre-trained language model finetuning, Conference and Workshop on Neural Information Processing Systems (2020).

22.

Khosla

, Teterwak

, Wang

, Sarna

, Tian

, Isola

and Krishnan

, Supervised contrastive learning, Advances in Neural Information Processing Systems 33 (2021), 18661–18673.

23.

Kingma

D.P.

, Ba

Adam: A method for stochastic optimization, International Conference on Learning Representations (2014), 1–15.

24.

Wang

, Singh

, Michael

, Hill

, Levy

, Bowman

S.R.

GLUE:Amulti-task benchmark and analysis platform for natural language understanding, In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (2018), 353–355.

25.

, Zhang

, Liu

Ternary weight networks, arXiv preprint arXiv:1605.04711 (2016).

26.

Hou

, Kwok

J.T.

Loss-aware weight quantization of deep networks, In International Conference on Learning Representations (2018).

27.

Pennington

, Socher

, Manning

C.D.

Glove: Global vectors for word representation, In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014), 1532–1543..

28.

Wei

, Zou

Eda: Easy data augmentation techniques for boosting performance on text classification tasks, arXiv preprint arXiv:1901.11196 (2019).

29.

Zhang

, Wu

, Katiyar

, Weinberger

K.Q.

, Artzi

Revisiting few-sample BERT fine-tuning, The International Conference on Learning Representations (2021).

30.

Dagan

, Glickman

, Magnini

The pascal recognising textual entailment challenge, In Machine Learning Challenges Workshop (2005), 177–190.

31.

Haim

R.B.

, Dagan

, Dolan

, Ferro

, Giampiccolo

, Magnini

, Szpektor

The second pascal recognising textual entailment challenge, In Proceedings of the Second PASCAL Challenges Workshop on Recognising Textual Entailment (2006), 7.

32.

Dolan

, Brockett

Automatically constructing a corpus of sentential paraphrases, In Third International Workshop on Paraphrasing (IWP2005) (2005).

33.

Warstadt

, Singh

and Bowman

S.R.

, Neural network acceptability judgments, Transactions of the Association for Computational Linguistics 7 (2019), 625–641.

34.

Dolan

, Brockett

Automatically constructing a corpus of sentential paraphrases, In Third International Workshop on Paraphrasing (IWP2005) (2005).

35.

Levesque

, Davis

, Morgenstern

The winograd schema challenge, In Thirteenth International Conference on the Principles of Knowledge Representation and Reasoning (2012).

36.

Rajpurkar

, Zhang

, Lopyrev

, Liang

SQuAD: 100,000+questions for machine comprehension of text, Conference on Empirical Methods in Natural Language Processing (2016), 2383–2392.

37.

Chen

, Zhang

, Zhao

Quora question pairs, University of Waterloo (2018), 1–7.

38.

Williams

, Nangia

, Bowman

S.R.

A broadcoverage challenge corpus for sentence understanding through inference, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Long Papers (2018).

LightMobileBert: A secondary lightweight model based on MobileBert

Abstract

Keywords

Get full access to this article

References