Sage Journals: Discover world-class research

Abstract

While natural language documents, such as intervention transcripts and participant writing samples, can provide highly nuanced insights into educational and psychological constructs, researchers often find these materials difficult and expensive to analyze. Recent developments in machine learning, however, have allowed social scientists to harness the power of artificial intelligence for complex data categorization tasks. One approach, supervised learning, supports high-performance categorization yet still requires a large, hand-labeled training corpus, which can be costly. An alternative approach—zero- and few-shot classification with pretrained large language models—offers a cheaper, compelling alternative. This article considers the application of zero-shot and few-shot classification in educational research. We provide an overview of large language models, a step-by-step tutorial on using the Python openai package for zero-shot and few-shot classification, and a discussion of relevant research considerations for social scientists.

Keywords

large language models LLMs artificial intelligence educational measurement

Get full access to this article

View all access options for this article.

References

Anglin

K. L.

Boguslav

Hall

(2022). Improving the science of annotation for natural language processing: The use of the single-case study for piloting annotation projects. Journal of Data Science, 20(3), 339–357. https://doi.org/10.6339/22-JDS1054

Baker

R. S.

Hawn

(2021). Algorithmic bias in education. International Journal of Artificial Intelligence in Education, 32, 1052–1092. https://doi.org/10.1007/s40593-021-00285-9

Boguslav

(2024). Parsing coaching practice: A systematic framework for describing coaching discourse. AERA Open, 10. https://doi.org/10.1177/23328584241263861

Bommasani

Hudson

D. A.

Adeli

Altman

Arora

von Arx

Bernstein

M. S.

Bohg

Bosselut

Brunskill

(2021). On the opportunities and risks of foundation models. arXiv Preprint arXiv:2108.07258.

Brown

Mann

Ryder

Subbiah

Kaplan

J. D.

Dhariwal

Neelakantan

Shyam

Sastry

Askell

(2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901. https://doi.org/10.48550/arXiv.2005.14165

Can

Marín

R. A.

Georgiou

P. G.

Imel

Z. E.

Atkins

D. C.

Narayanan

S. S.

(2016). “It sounds like …”: A natural language processing approach to detecting counselor reflections in motivational interviewing. Journal of Counseling Psychology, 63(3), 343. https://doi.org/10.1037/cou0000111

Chollet

(2021). Deep learning with Python. Simon and Schuster.

Cohen

Wong

Krishnamachari

Berlin

(2020). Teacher coaching in a simulated environment. Educational Evaluation and Policy Analysis, 42(2), 208–231. https://doi.org/10.3102/0162373720906217

Deng

Liu

(2018). Deep learning in natural language processing. Springer.

10.

Derry

Krzywinski

Altman

(2023). Neural networks primer. Nature Methods, 20(2), 165–167. https://doi.org/10.1038/s41592-022-01747-1

11.

Devlin

Chang

M.-W.

Lee

Toutanova

(2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv Preprint arXiv:1810.04805.

12.

Fernández

Garcia

Herrera

Chawla

N. V.

(2018). SMOTE for learning from imbalanced data: Progress and challenges, marking the 15-year anniversary. Journal of Artificial Intelligence Research, 61, 863–905. https://doi.org/10.1613/jair.1.11192

13.

Goodfellow

Bengio

Courville

(2016). Deep learning. MIT Press.

14.

Hastie

Tibshirani

Friedman

J. H.

(2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Springer.

15.

Jain

A. K.

Mao

Mohiuddin

K. M.

(1996). Artificial neural networks: A tutorial. Computer, 29(3), 31–44. https://doi.org/doi.org/10.1109/2.485891

16.

Lucy

Demszky

Bromley

Jurafsky

(2020). Content analysis of textbooks via natural language processing: Findings on gender, race, and ethnicity in Texas U.S. History Textbooks. AERA Open, 6(3), 1–27. https://doi.org/10.1177/2332858420940312

17.

Manning

C. D.

Schütze

(1999). Foundations of statistical natural language processing (p. 680). MIT Press. https://dl.acm.org/citation.cfm?id=311445

18.

Mcfarland

D. A.

Khanna

Domingue

B. W.

Pardos

Z. A.

(2021). Education data science: Past, present, future. AERA Open, 7(1), 1–12. https://doi.org/10.1177/23328584211052055

19.

Mikolov

Chen

Corrado

Dean

(2013). Efficient estimation of word representations in vector space. https://doi.org/10.48550/arXiv.1301.3781

20.

Mitchell

Zaldivar

Barnes

Vasserman

Hutchinson

Spitzer

Raji

I. D.

Gebru

(2019). Model cards for model reporting. Proceedings of the Conference on Fairness, Accountability, and Transparency. https://doi.org/10.1145/3287560.3287596

21.

Mosbach

Andriushchenko

Klakow

(2020). On the stability of fine-tuning bert: Misconceptions, explanations, and strong baselines. arXiv Preprint arXiv:2006.04884.

22.

Nayak

(2019, October 25). Understanding searches better than ever before. Google. https://blog.google/products/search/search-language-understanding-bert/

23.

OpenAI. (2023). Enterprise privacy. https://openai.com/enterprise-privacy

24.

OpenAI. (2024a). Fine-tuning. https://platform.openai.com

25.

OpenAI. (2024b). Models. https://platform.openai.com

26.

OpenAI. (2024c). OpenAI Platform. Libraries. https://platform.openai.com/docs/libraries/python-library

27.

Ouyang

Jiang

Almeida

Wainwright

Mishkin

Zhang

Agarwal

Slama

Ray

(2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744. https://doi.org/10.48550/arXiv.2203.02155

28.

Plaza-del-Arco

F. M.

Martín-Valdivia

M.-T.

Klinger

(2022). Natural language inference prompts for zero-shot emotion classification in text across corpora. arXiv Preprint arXiv:2209.06701.

29.

Radford

Narasimhan

Salimans

Sutskever

(2018). Improving language understanding by generative pre-training. Preprint. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf

30.

Radford

Child

Luan

Amodei

Sutskever

(2019). Language models are unsupervised multitask learners. OpenAI Blog, 1(8), 9.

31.

Ray

P. P.

(2023). ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems, 3, 121–154. https://doi.org/10.1016/j.iotcps.2023.04.003

32.

Reynolds

McDonell

(2021). Prompt programming for large language models: Beyond the few-shot paradigm. Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, pp. 1–7. https://doi.org/10.48550/arXiv.2102.07350

33.

Savoy

(1997). Statistical inference in retrieval effectiveness evaluation. Information Processing & Management, 33(4), 495–512. https://doi.org/10.1016/S0306-4573(97)00027-7

34.

Schulman

Wolski

Dhariwal

Radford

Klimov

(2017). Proximal policy optimization algorithms. arXiv Preprint arXiv:1707.06347. https://doi.org/10.48550/arXiv.1707.06347

35.

Sennrich

Haddow

Birch

(2015). Neural machine translation of rare words with subword units. arXiv Preprint arXiv:1508.07909. https://doi.org/10.48550/arXiv.1508.07909

36.

Vaswani

Shazeer

Parmar

Uszkoreit

Jones

Gomez

A. N.

Kaiser

Ł.

Polosukhin

(2017). Attention is all you need. Advances in Neural Information Processing Systems, 30, 1–11. https://arxiv.org/abs/1706.03762.

37.

Yogatama

Dyer

Ling

Blunsom

(2017). Generative and discriminative text classification with recurrent neural networks. arXiv Preprint arXiv:1703.01898.

Automatic Text Classification With Large Language Models: A Review of openai for Zero- and Few-Shot Classification

Abstract

Keywords

Get full access to this article

References