Abstract
This paper explores the strategies employed to develop a system leveraging deep learning and natural language processing techniques for automating aspects of record coding in statistical data products. The work focuses on the Mexican Statistical Institute (INEGI), where the proposed AI model aims to reduce the volume of records requiring manual coding. The experiments conducted demonstrate the potential for these methodological innovations to partially replace traditional manual coding processes. Specifically, a novel phase has been introduced into INEGI’s data encoding workflows, utilizing an AI model to select a subset of records for which the model has a high degree of confidence in assigning the correct codes. This approach seeks to minimize the need for manual intervention. To evaluate the effectiveness of this proposal, a production line was implemented that mirrored the existing process but included the AI-based phase. The results show that the use of deep learning algorithms can achieve a significant 50% reduction in manual coding tasks without compromising the quality of the output.
Get full access to this article
View all access options for this article.
