Abstract
In this paper we explore the potential of large language models (LLMs) for COICOP classification in the household budget survey (HBS). The major goal is to reduce or even eliminate manual coding in the production process. We describe Norway's last survey, HBS 2022, where large savings were realized from the use of machine learning while significant manual coding remained necessary. Initial experiments with a commercial LLM were very promising, with the latest free model of ChatGPT achieving similar accuracy to a human coder. We developed a prototype classification pipeline using self-hosted LLMs. It uses retrieval augmented generation (RAG) to retrieve information about relevant codes to insert into the prompt. Performance is still limited on the smaller LLMs our computing setup at the time could handle. However, support for larger models is quickly expanding at Statistics Norway, and further development on using better quality embeddings and larger LLMs is ongoing. In addition, we describe the performance of several other methods for classification, such as a BERT-based classifier or hierarchical prompting. We also mention approaches to issues such as adapting models to a less widely-spoken language like Norwegian and extracting a measure of an LLM's confidence in individual predictions. Finally, we give some recommendations for using LLMs in conjunction with machine learning and human-in-the-loop coding.
Keywords
Get full access to this article
View all access options for this article.
