Abstract
This article combined BERT (Bidirectional Encoder Representation from Transformers), Bi-LSTM (Bidirectional Long Short-Term Memory), and CRF (Conditional Random Field) models to transform unstructured legal text into structured data through information extraction, improving the effectiveness of legal information extraction. The BERT model can be used for deep semantic embedding of legal texts, generating context-sensitive representations for each word. The Bi-LSTM network can capture long-distance dependencies in the text, extract sequence features, and apply CRF layers to globally optimize sequence labels to ensure accurate annotation of entity boundaries and relationships. In the dataset for extracting legal entity relationships related to prostitution constructed in this article, the accuracy, precision, recall rate, and F1 score of entity classification reached 93.6%, 92.7%, 92.1%, and 92.4%, respectively. All 153 samples in the Engage_in_ prostitution relationship were correctly classified. In order to analyze the stability of legal information extraction and classification, the model proposed by this article was tested on five datasets: CAIL2019, CJRC (Chinese Judicial Reading Comprehension), LexGLUE (Legal General Language Understanding Evaluation), COLIEE (Competition on Legal Information Extraction/Appointment), and ECHR (European Court of Human Rights). The accuracy of the article’s model fluctuated only 1.2% on different datasets, while the precision remained stable and the recall fluctuated by 0.7%. This article provided reliable technical support for legal intelligence research by combining BERT, Bi-LSTM, and CRF to accurately extract and classify legal information.
Keywords
Get full access to this article
View all access options for this article.
