Abstract
The field of machine learning has recently made significant progress in reducing the requirements for labeled training data when building new models. These “cheaper” learning techniques hold significant potential for the social sciences, where development of large labeled training datasets is often a significant practical impediment. In this article we review three “cheap” techniques that have developed in recent years: Weak supervision, transfer learning and prompt engineering. For the latter, we also review the particular case of zero-shot prompting of large language models. For each technique, we provide a guide of how it works and demonstrate its application and the presence of systematic biases across two different and realistic social science tasks paired with three different dataset makeups. We show good performance for all techniques and we demonstrate how prompting of large language models can achieve high accuracy at very low cost, but biases must be considered.
Keywords
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
