Sage Journals: Discover world-class research

Abstract

Pre-trained language models have become a critical natural language processing component in many E-commerce applications. As businesses continue to evolve, the pre-trained models should be able to adopt new domain knowledge and new tasks. This paper proposes a novel sequential multi-task pre-trained language framework, ICL-BERT (In-loop Continual Learning BERT), which enables evolving the current model with new knowledge and new tasks. The contributions of ICL-BERT are (1) vocabularies and entities are optimized on E-commerce corpus; (2) a new glyph embedding is introduced to learn glyph information for vocabularies and entities; (3) specific and general tasks are designed to encode E-commerce knowledge for pre-training ICL-BERT; and (4) a new task-gating mechanism, called ICL (In-loop continual Learning), is proposed for sequential multi-task learning, which evolves the current model effectively and efficiently. Our evaluation results demonstrate that ICL-BERT outperforms existing models in both CLUE and e-commerce tasks, with an average accuracy improvement of 1.73% and 3.5%, respectively. Furthermore, ICL-BERT serves as a fundamental pre-trained language model that runs online in JingDong’s daily business.

Keywords

pre-train language model In-loop continual learning e-commerce language model multi-task learning common dataset of e-commerce

Get full access to this article

View all access options for this article.

References

Devlin

Chang

M-W

Lee

, et al. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 2018.

Liu

Ott

Goyal

, et al. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019b.

Sun

Wang

Feng

, et al. Ernie 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation. arXiv preprint arXiv:2107.02137, 2021a.

Sun

Wang

, et al. Ernie: Enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223, 2019.

Sun

Wang

, et al. Ernie 2.0: A continual pre-training framework for language understanding. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp.8968–8975, 2020.

Vaswani

Shazeer

Parmar

, et al. Attention is all you need. In: Advances in Neural Information Processing Systems, 2017, pp.5998–6008.

Mangalgi

Kumar

Tallamraju

. Deep contextual embeddings for address classification in e-commerce. arXiv preprint arXiv:2007.03020, 2020.

Yuan

, et al. K-plug: Knowledge-injected pre-trained language model for natural language understanding and generation in e-commerce. arXiv preprint arXiv:2104.06960, 2021.

Cui

Che

Liu

, et al. Pre-training with whole word masking for chinese bert. IEEE/ACM Trans Audio Speech Lang Process 2021; 29: 3504–3514.

10.

Joshi

Chen

Liu

, et al. Spanbert: Improving pre-training by representing and predicting spans. Transactions of the Association for Computational Linguistics 2020; 8: 64–77.

11.

Sun

, et al. Chinesebert: Chinese pretraining enhanced by glyph and pinyin information. arXiv preprint arXiv:2106.16038, 2021b.

12.

Gao

Yao

Chen

. Simcse: Simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821 2021.

13.

Raffel

Shazeer

Roberts

, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 2020; 21: 1–67.

14.

Microsoft. Turing-nlg: A 17-billion-parameter language model by microsoft. https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/, 2020.

15.

Liu

Chen

, et al. Multi-task deep neural networks for natural language understanding. arXiv preprint arXiv:1901.11504, 2019a.

16.

Chen

Liu

. Lifelong machine learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 2018; 12: 1–207.

17.

Parisi

Kemker

Part

, et al. Continual lifelong learning with neural networks: A review. Neural Netw 2019; 113: 54–71.

18.

Cong

Sun

, et al. Self-paced weight consolidation for continual learning. IEEE Trans Circuits Syst Video Technol 2023; 34: 2209–2222.

19.

Mai

Jeong

, et al. Online continual learning in image classification: An empirical survey. Neurocomputing 2022; 469: 28–51.

20.

Smith

Tian

Halbe

, et al. A closer look at rehearsal-free continual learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pP.2410–2420.

21.

Gopalakrishnan

Singh

Fayek

, et al. Knowledge capture and replay for continual learning. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2022, pp.10–18.

22.

Liu

, et al. Prototype-guided memory replay for continual learning. IEEE Trans Neural Netw Learn Syst 2024; 35: 10973–10983.

23.

Merlin

Lomonaco

Cossu

, et al. Practical recommendations for replay-based continual learning methods. In: International Conference on Image Analysis and Processing, Springer, 2022, pp.548–559.

24.

Douillard

Ramé

Couairon

, et al. Dytox: Transformers for continual learning with dynamic token expansion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.9285–9295, 2022.

25.

Xue

Zhang

Song

, et al. Meta-attention for vit-backed continual learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp.150–159.

26.

Kirkpatrick

Pascanu

Rabinowitz

, et al. Overcoming catastrophic forgetting in neural networks. Proce Nat Acad Sci 2017; 114: 3521–3526.

27.

Pfülb

Gepperth

Abdullah

, et al. Catastrophic forgetting: still a problem for dnns. In: Artificial Neural Networks and Machine Learning–ICANN 2018: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, October 4-7, 2018, Proceedings, Part I 27, Springer, 2018, pp.487–497.

28.

Wang

Isola

. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In: International conference on machine learning, 2020, pp.9929–9939. PMLR.

29.

Srivastava

Hinton

Krizhevsky

, et al. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 2014; 15: 1929–1958.

30.

Lin

T-Y

Goyal

Girshick

, et al. Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, 2017, pp.2980–2988.

31.

Rasley

Rajbhandari

Ruwase

, et al. Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020, pp.3505–3506.

32.

Niu

Guan

Wang

, et al. Dnnfusion: accelerating deep neural networks execution with advanced operator fusion. In: Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2021, pp.883–898.

33.

Zhang

, et al. Clue: A chinese language understanding evaluation benchmark. arXiv preprint arXiv:2004.05986, 2020.

34.

. Iflytek: a multiple categories chinese text classifier. competition official website, 2019.

ICL: In-loop continual learning framework for language model pre-training for E-commerce

Abstract

Keywords

Get full access to this article

References