Sage Journals: Discover world-class research

Abstract

This article combined BERT (Bidirectional Encoder Representation from Transformers), Bi-LSTM (Bidirectional Long Short-Term Memory), and CRF (Conditional Random Field) models to transform unstructured legal text into structured data through information extraction, improving the effectiveness of legal information extraction. The BERT model can be used for deep semantic embedding of legal texts, generating context-sensitive representations for each word. The Bi-LSTM network can capture long-distance dependencies in the text, extract sequence features, and apply CRF layers to globally optimize sequence labels to ensure accurate annotation of entity boundaries and relationships. In the dataset for extracting legal entity relationships related to prostitution constructed in this article, the accuracy, precision, recall rate, and F1 score of entity classification reached 93.6%, 92.7%, 92.1%, and 92.4%, respectively. All 153 samples in the Engage_in_ prostitution relationship were correctly classified. In order to analyze the stability of legal information extraction and classification, the model proposed by this article was tested on five datasets: CAIL2019, CJRC (Chinese Judicial Reading Comprehension), LexGLUE (Legal General Language Understanding Evaluation), COLIEE (Competition on Legal Information Extraction/Appointment), and ECHR (European Court of Human Rights). The accuracy of the article’s model fluctuated only 1.2% on different datasets, while the precision remained stable and the recall fluctuated by 0.7%. This article provided reliable technical support for legal intelligence research by combining BERT, Bi-LSTM, and CRF to accurately extract and classify legal information.

Keywords

legal information extraction relation classification bidirectional encoder representations from transformers bidirectional long short-term memory conditional random field prostitution laws

Get full access to this article

View all access options for this article.

References

Tan

Bowen

, et al. Research on sentencing prediction methods for legal documents. Chin J Inform 2020; 34(3): 107–114.

Wang

Pei

, et al. A method for identifying named entities in legal documents based on JCWA-DLSTM. Chin J Inform 2020; 34(10): 51–58.

Lihua

Yang

Suge

, et al. Identification of named entities in legal documents based on matching strategies and community attention mechanisms. Chin J Inform 2022; 36(2): 85–92.

Zhenwei

Yuxuan

Yansong

. Natural language understanding for legal documents. Chin J Inform 2022; 36(8): 1–11.

Fan

. A comprehensive review of conditional random fields: variants, hybrids and applications. Artif Intell Rev 2020; 53(6): 4289–4333.

Wang

, et al. A comprehensive review of Markov random field and conditional random field approaches in pathology image analysis. Arch Comput Methods Eng 2022; 29(1): 609–639.

Chunjie

Zhu

. Overview and application analysis of numerical information extraction research. Intell Sci 2019; 37(2): 40–45.

Yang

Wang

Qian

, et al. Automated extraction of enterprise bankruptcy events in ruling documents. J East China Norm Univ Nat Sci Ed 2020; 4: 88.

Adnan

Akbar

. An analytical study of information extraction from unstructured and multidimensional big data. J Big Data 2019; 6(1): 1–38.

10.

Martinez-Rodriguez

Hogan

Lopez-Arevalo

. Information extraction meets the semantic web: a survey. Semant Web 2020; 11(2): 255–335.

11.

Gao

Fei

, et al. A deep neural network model for speakers coreference resolution in legal texts. Inf Process Manag 2020; 57(6): 102365.

12.

Enamoto

Santos

ARAS

Maia

, et al. Multi-label legal text classification with BiLSTM and attention. Int J Comput Appl Technol 2022; 68(4): 369–378.

13.

Hao

Zhang

Liu

, et al. Sentiment recognition and analysis method of official document text based on BERT–SVM model. Neural Comput Appl 2023; 35(35): 24621–24632.

14.

Zhang

Pan

. Legal judgment elements extraction approach with law article-aware mechanism. ACM Trans Asian Low-Resour Lang Inf Process 2021; 21(3): 1–15.

15.

Chalkidis

Kampas

. Deep learning in law: early adaptation and legal word embeddings trained on large corpora. Artif Intell Law 2019; 27(2): 171–198.

16.

Firoozeh

Nazarenko

Alizon

, et al. Keyword extraction: issues and methods. Nat Lang Eng 2020; 26(3): 259–291.

17.

Viji

Revathy

. A hybrid approach of Weighted Fine-Tuned BERT extraction with deep Siamese Bi–LSTM model for semantic text similarity identification. Multimed Tool Appl 2022; 81(5): 6131–6157.

18.

Kumar

Verma

Sharan

. ATE-SPD: simultaneous extraction of aspect-term and aspect sentiment polarity using Bi-LSTM-CRF neural network. J Exp Theor Artif Intell 2021; 33(3): 487–508.

19.

Zhu

Zhang

, et al. Building extraction from high spatial resolution remote sensing images via multiscale-aware and segmentation-prior conditional random fields. Remote Sens 2020; 12(23): 3983.

20.

Liao

Huang

, et al. Extracting knowledge entities from sci-tech intelligence resources based on BiLSTM and conditional random field. IEICE Trans Info Syst 2021; 104(8): 1214–1221.

21.

Zhenyu

Jiang

Zhang

, et al. Research on the name recognition of film critics based on multi-feature Bi-LSTM-CRF. Chin J Inform 2019; 33.3: 94–101.

22.

Liu

Zhang

Chen

, et al. Extraction of Chinese geological time information based on BiLSTM-CRF. Prog Earth Sci 2021; 36(2): 211–220.

23.

Zhang

, et al. Global digital compact: A mechanism for the governance of online discriminatory and misleading content generation. Int J Hum Comput Interact 2025; 41: 1381–1396. DOI: 10.1080/10447318.2024.2314350.

24.

Cao

Jin

. Research on the extraction method of expert information based on the expert homepage. Intell Exp 2019; 1(12): 1–2.

25.

Guo

Xiaolong

. Intelligent identification method of legal case entities based on BERT-BiLSTM-CRF. J Beijing Univ Posts Telecommun 2021; 44(4): 129.

26.

Chunnan

Wang

Sun

, et al. The entity identification method named in the legal document of theft based on BERT. Chin J Inform 2021; 35(8): 73–81.

27.

Suebsombut

Sekhari

Sureephong

, et al. Field data forecasting using LSTM and Bi-LSTM approaches. Appl Sci 2021; 11(24): 11820.

28.

Zhou

Liu

. Social network spam detection based on ALBERT and combination of Bi‐LSTM with self‐attention. Secur Commun Network 2021; 1: 5567991–5568011.

29.

Jasmir

Nurmaini

Firsandaya Malik

, et al. Bigram feature extraction and conditional random fields model to improve text classification clinical trial document. TELKOMNIKA (Telecommunication Computing Electronics and Control) 2021; 19(3): 886–892.

30.

Pradhan

Yajnik

. Parts-of-speech tagging of Nepali texts with bidirectional LSTM, conditional random fields and HMM. Multimed Tool Appl 2024; 83(4): 9893–9909.

Legal information extraction and classification using BERT,Bi-LSTM,and CRF models

Abstract

Keywords

Get full access to this article

References