Abstract
Academic libraries in Sub-Saharan Africa and China face persistent challenges in cross-lingual retrieval due to linguistic fragmentation and uneven metadata infrastructures. This study constructed and evaluated a multilingual academic corpus designed to enhance semantic retrieval, metadata interoperability, and inclusive access across 13 languages. The core innovation lies in the Multilingual Adaptive Corpus for Retrieval Equity (MACRE), a modular architecture that integrates language-specific adapters, ontology-driven metadata harmonisation, and an intent disambiguation engine features that collectively surpass existing retrieval frameworks. The project aligns with core LIS objectives by advancing user-centred discovery, cross-language cataloguing, and metadata standardisation in institutional repositories. A 41,129,582-token corpus was compiled from 39,725 academic records drawn from university repositories across both regions. The corpus incorporated Mandarin, English, and 11 African languages selected to reflect regional LIS priorities. Metadata was harmonised to SKOS and Schema.org standards. The proposed MACRE retrieval model was benchmarked against ColBERT-X, SwahiliDocBERT, and CrossLingual2Vec using cosine similarity, MRR, MAP, and NDCG. Evaluation included ablation and post hoc analysis. Mandarin and English accounted for 64.6% of all tokens; Swahili reached 16.9%, while nine African languages contributed under 1.8% each. MACRE significantly outperformed all baselines (MRR = 0.864; MAP = 0.812; p < .001), particularly in LIS-aligned fields such as metadata accuracy (98.6%) and entity completion (94.9%). Adapter performance exceeded 90% in dominant languages but revealed key gaps in under-annotated African records. These findings illustrate that retrieval accuracy is not just a technical challenge, but also reflects underlying LIS concerns, such as language equity, cataloguing depth, and metadata policy enforcement. This study contributes a scalable LIS infrastructure for multilingual academic retrieval, advancing both technological and policy innovations for cross-lingual access in library systems.
Keywords
Get full access to this article
View all access options for this article.
