Sage Journals: Discover world-class research

Abstract

In this paper we explore the potential of large language models (LLMs) for COICOP classification in the household budget survey (HBS). The major goal is to reduce or even eliminate manual coding in the production process. We describe Norway's last survey, HBS 2022, where large savings were realized from the use of machine learning while significant manual coding remained necessary. Initial experiments with a commercial LLM were very promising, with the latest free model of ChatGPT achieving similar accuracy to a human coder. We developed a prototype classification pipeline using self-hosted LLMs. It uses retrieval augmented generation (RAG) to retrieve information about relevant codes to insert into the prompt. Performance is still limited on the smaller LLMs our computing setup at the time could handle. However, support for larger models is quickly expanding at Statistics Norway, and further development on using better quality embeddings and larger LLMs is ongoing. In addition, we describe the performance of several other methods for classification, such as a BERT-based classifier or hierarchical prompting. We also mention approaches to issues such as adapting models to a less widely-spoken language like Norwegian and extracting a measure of an LLM's confidence in individual predictions. Finally, we give some recommendations for using LLMs in conjunction with machine learning and human-in-the-loop coding.

Keywords

Large language models generative AI COICOP classification retrieval augmented generation machine learning household budget survey.

Get full access to this article

View all access options for this article.

References

Davies

McEvoy

. Faster, bigger, cheaper: how AI can improve UK price data. Economics Observatory Working Paper, 2024 May 16.

Hess

. Use of a large language model to derive the economic sector of businesses from unstructured text on economic activities. In: Conference on Foundations and Advances of Machine Learning in Official Statistics, Wiesbaden, Germany, 2024 April 3–5.

Fiedler

Hofmann

Loogman

, et al. Domain adaptation of a BERT Model for analyzing job advertisements at the German Federal Employment Agency. In: Conference on Foundations and Advances of Machine Learning in Official Statistics, Wiesbaden, Germany, 2024 April 3–5.

Martindale

Rowland

Flower

, et al. Semi-supervised machine learning with word embedding for classification in price statistics. Data & Policy 2020; 2. doi:10.1017/dap.2020.13

Montbroussous

Monziols

. Classification of scanner data into COICOP: a machine learning approach. In: Ongoing work. Paper for meeting hosted by Statistics Canada and Bank of Canada, Ottawa, Canada, 2024 May 13–15.

Toth

. Large language models for COICOP classification in Norway’s household budget survey. In: New techniques and technologies for statistics conference (NTTS 2025), Brussels, Belgium, 2025 March 11–13.

Toth

Mustad

Jentoft

. Machine learning methods to support a modernized household budget survey. In: New techniques and technologies for statistics conference (NTTS 2023), Brussels, Belgium, 2023 March 7–10.

Lewis

, et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In: Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS’20), 2020 Dec 6–12, article no. 793, pp. 9459–9474; online meeting.

Joshi

, et al. Adapting multilingual LLMs to low-resource languages using continued pre-training and synthetic corpus. arXiv preprint arXiv:2410.14815; 2024.

10.

Lehmann

Simonyi

Henkel

, et al. Bilingual transfer learning for online product classification. In: Proceedings of the Workshop on Natural Language Processing in E-Commerce, Barcelona, Spain, 2020 Sept 13, pp. 21–31. Association for Computational Linguistics.

11.

Yin

Hay

Roth

. Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach. arXiv preprint arXiv:1909.00161; 2019.

Exploring the use of large language models for COICOP classification

Abstract

Keywords

Get full access to this article

References