Introduction to Neural Transfer Learning With Transformers for Social Science Text Analysis

Abstract

Transformer-based models for transfer learning have the potential to achieve high prediction accuracies on text-based supervised learning tasks with relatively few training data instances. These models are thus likely to benefit social scientists that seek to have as accurate as possible text-based measures, but only have limited resources for annotating training data. To enable social scientists to leverage these potential benefits for their research, this article explains how these methods work, why they might be advantageous, and what their limitations are. Additionally, three Transformer-based models for transfer learning, BERT, RoBERTa, and the Longformer, are compared to conventional machine learning algorithms on three applications. Across all evaluated tasks, textual styles, and training data set sizes, the conventional models are consistently outperformed by transfer learning with Transformers, thereby demonstrating the benefits these models can bring to text-based social science research.

Keywords

natural language processing deep learning neural networks transfer learning Transformer BERT

Get full access to this article

View all access options for this article.

References

Abadi

Martín

Agarwal

Ashish

Barham

Paul

Brevdo

Eugene

Chen

Zhifeng

Citro

Craig

Corrado

Greg S.

Davis

Andy

Dean

Jeffrey

Devin

Matthieu

Ghemawat

Sanjay

Goodfellow

Ian

Harp

Andrew

Irving

Geoffrey

Isard

Michael

Jia

Yangqing

Jozefowicz

Rafal

Kaiser

Lukasz

Kudlur

Manjunath

Levenberg

Josh

Mane

Dan

Monga

Rajat

Moore

Sherry

Murray

Derek

Olah

Chris

Schuster

Mike

Shlens

Jonathon

Steiner

Benoit

Sutskever

Ilya

Talwar

Kunal

Tucker

Paul

Vanhoucke

Vincent

Vasudevan

Vijay

Viegas

Fernanda

Vinyals

Oriol

Warden

Pete

Wattenberg

Martin

Wicke

Martin

Yuan

Zheng

Xiaoqiang

. 2016. “TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems.” arXiv Preprint. arXiv:1603.04467.

Agarwal

Sushant

Jabbari

Shahin

Agarwal

Chirag

Upadhyay

Sohini

Steven

Lakkaraju

Himabindu

. 2021. “Towards the Unification and Robustness of Perturbation and Gradient Based Explanations.” Proceedings of Machine Learning Research 139:110-9.

Akbik

Alan

Blythe

Duncan

Vollgraf

Roland

. 2018. “Contextual String Embeddings for Sequence Labeling.” pp. 1638-49 in Proceedings of the 27th International Conference on Computational Linguistics, edited by Emily M. Bender, Leon Derczynski, and Pierre Isabelle. Stroudsburg, PA, USA: Association for Computational Linguistics.

Alammar

Jay

. 2018a. “The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning).” Retrieved July 6, 2020 (http://jalammar.github.io/illustrated-bert/).

Alammar

Jay

. 2018b. “The Illustrated Transformer.” Retrieved July 6, 2020 (http://jalammar.github.io/illustrated-transformer/).

Alammar

Jay

. 2018c. “Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention).” Retrieved July 6, 2020 (https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/).

Amidi

Afshine

Amidi

Shervine

. 2019. “Recurrent Neural Networks Cheatsheet.” Retrieved August 14, 2020 (https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks).

Amsalem

Eran

Fogel-Dror

Yair

Shenhav

Shaul R.

Sheafer

Tamir

. 2020. “Fine-Grained Analysis of Diversity Levels in the News.” Communication Methods and Measures 14 (4): 266-84.

Anastasopoulos

L. Jason

Bertelli

Anthony M.

. 2020. “Understanding Delegation Through Machine Learning: A Method and Application to the European Union.” American Political Science Review 114 (1): 291-301.

10.

Ancona

Marco

Ceolini

Enea

Öztireli

Cengiz

Gross

Markus

. 2018. “Towards Better Understanding of Gradient-Based Attribution Methods for Deep Neural Networks.” in 6th International Conference on Learning Representations (ICLR 2018), edited by Yann LeCun.

11.

Aribandi

Vamsi

Tay

Schuster

Tal

Rao

Jinfeng

Zheng

Huaixiu Steven

Mehta

Sanket Vaibhav

Zhuang

Honglei

Tran

Vinh Q.

Bahri

Dara

Jianmo

Gupta

Jai

Hui

Kai

Ruder

Sebastian

Metzler

Donald

. 2022. “ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning.” in International Conference on Learning Representations (ICLR 2022), edited by Katja Hofmann and Alexander Rush.

12.

Aßenmacher

Matthias

Heumann

Christian

. 2020. “On the Comparability of Pre-trained Language Models.” in Proceedings of the 5th Swiss Text Analytics Conference (SwissText) & 16th Conference on Natural Language Processing (KONVENS), edited by Sarah Ebling, Don Tuggener, Manuela Hürlimann, Mark Cieliebak, and Martin Volk. CEUR-WS.org.

13.

Jimmy Lei

Kiros

Jamie Ryan

Hinton

Geoffrey E.

. 2016. “Layer Normalization.” arXiv Preprint. arXiv:1607.06450.

14.

Babu

Arun

Wang

Changhan

Tjandra

Andros

Lakhotia

Kushal

Qiantong

Goyal

Naman

Singh

Kritika

von Platen

Patrick

Saraf

Yatharth

Pino

Juan

Baevski

Alexei

Conneau

Alexis

Auli

Michael

. 2021. “XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale.” arXiv Preprint. arXiv:2111.09296.

15.

Baevski

Alexei

Zhou

Yuhao

Mohamed

Abdelrahman

Auli

Michael

. 2020. “wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations.” Pp. 12449-460 in Advances in Neural Information Processing Systems 33, edited by H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin. Curran Associates, Inc.

16.

Bahdanau

Dzmitry

Cho

Kyunghyun

Bengio

Yoshua

. 2015. “Neural Machine Translation by Jointly Learning to Align and Translate.” in 3rd International Conference on Learning Representations (ICLR 2015), edited by Yoshua Bengio and Yann LeCun.

17.

Bapna

Ankur

Chung

Yu-an

Nan

Gulati

Anmol

Jia

Clark

Jonathan H.

Johnson

Melvin

Riesa

Jason

Conneau

Alexis

Zhang

. 2021. “SLAM: A Unified Encoder for Speech and Language Modeling via Speech-Text Joint Pre-Training.” arXiv Preprint. arXiv:2110.10329.

18.

Barberá

Pablo

Boydstun

Amber E.

Linn

Suzanna

McMahon

Ryan

Nagler

Jonathan

. 2021. “Automated Text Classification of News Articles: A Practical Guide.” Political Analysis 29 (1): 19-42.

19.

Belinkov

Yonatan

Glass

James

. 2019. “Analysis Methods in Neural Language Processing: A Survey.” Transactions of the Association for Computational Linguistics 7:49-72.

20.

Beltagy

Peters

Matthew E.

Cohan

Arman

. 2020. “Longformer: The Long-Document Transformer. arXiv Preprint.” arXiv:2004.05150.

21.

Bengio

Yoshua

Ducharme

Réjean

Vincent

Pascal

Janvin

Christian

. 2003. “A Neural Probabilistic Language Model.” Journal of Machine Learning Research 3:1137-55.

22.

Benoit

Ken

. 2020. “Text as Data: An Overview.” Pp. 461-97 in The SAGE Handbook of Research Methods in Political Science and International Relations, edited by L. Curini and R. Franzese. London: SAGE Publications.

23.

Benoit

Kenneth

Watanabe

Kohei

Wang

Haiyan

Nulty

Paul

Obeng

Adam

Müller

Stefan

Matsuo

Akitaka

. 2018. “quanteda: An R Package for the Quantitative Analysis of Textual Data.” Journal of Open Source Software 3(30):774.

24.

Bojanowski

Piotr

Grave

Edouard

Joulin

Armand

Mikolov

Tomas

. 2017. “Enriching Word Vectors With Subword Information.” Transactions of the Association for Computational Linguistics 5:135-46.

25.

Bommasani

Rishi

Hudson

Drew A.

Adeli

Ehsan

Altman

Russ

Arora

Simran

Arx

Sydney von

Bernstein

Michael S.

Bohg

Jeannette

Bosselut

Antoine

Brunskill

Emma

Brynjolfsson

Erik

Buch

Shyamal

Card

Dallas

Castellon

Rodrigo

Chatterji

Niladri S.

Chen

Annie S.

Creel

Kathleen

Davis

Jared Quincy

Demszky

Dorottya

Donahue

Chris

Doumbouya

Moussa

Durmus

Esin

Ermon

Stefano

Etchemendy

John

Ethayarajh

Kawin

Fei-Fei

Finn

Chelsea

Gale

Trevor

Gillespie

Lauren

Goel

Karan

Goodman

Noah D.

Grossman

Shelby

Guha

Neel

Hashimoto

Tatsunori

Henderson

Peter

Hewitt

John

Daniel E.

Hong

Jenny

Hsu

Kyle

Huang

Jing

Icard

Thomas

Jain

Saahil

Jurafsky

Dan

Kalluri

Pratyusha

Karamcheti

Siddharth

Keeling

Geoff

Khani

Fereshte

Khattab

Omar

Koh

Pang Wei

Krass

Mark S.

Krishna

Ranjay

Kuditipudi

Rohith

et al. 2021. “On the Opportunities and Risks of Foundation Models.” arXiv Preprint. arXiv:2108.07258.

26.

Boser

Bernhard E.

Guyon

Isabelle M.

Vapnik

Vladimir N.

1992. “A Training Algorithm for Optimal Margin Classifiers.” Pp. 144-52 in Proceedings of the Fifth Annual Workshop on Computational Learning Theory, edited by David Haussler. New York, NY, USA: Association for Computing Machinery.

27.

Brockman

Greg

Murati

Mira

Welinder

Peter

and OpenAI. 2020. “OpenAI API.” Retrieved December 17, 2020 (https://openai.com/blog/openai-api/).

28.

Brown

Tom B.

Mann

Benjamin

Ryder

Nick

Subbiah

Melanie

Kaplan

Jared

Dhariwal

Prafulla

Neelakantan

Arvind

Shyam

Pranav

Sastry

Girish

Askell

Amanda

Agarwal

Sandhini

Herbert-Voss

Ariel

Krueger

Gretchen

Henighan

Tom

Child

Rewon

Ramesh

Aditya

Ziegler

Daniel M.

Jeffrey

Winter

Clemens

Hesse

Christopher

Chen

Mark

Sigler

Eric

Litwin

Mateusz

Gray

Scott

Chess

Benjamin

Clark

Jack

Berner

Christopher

McCandlish

Sam

Radford

Alec

Sutskever

Ilya

Amodei

Dario

. 2020. “Language Models are Few-Shot Learners.” arXiv Preprint. arXiv:2005.14165.

29.

Brownlee

Jason

. 2020a. “How to Control the Stability of Training Neural Networks With the Batch Size.” Retrieved November 23, 2020 (https://machinelearningmastery.com/how-to-control-the-speed-and-stability-of-training-neural-networks-with-gradient-descent-batch-size/).

30.

Brownlee

Jason

. 2020b. “Random Oversampling and Undersampling for Imbalanced Classification.” Retrieved November 03, 2020 (https://machinelearningmastery.com/random-oversampling-and-undersampling-for-imbalanced-classification/).

31.

Budhwar

Aditya

Kuboi

Toshihiro

Dekhtyar

Alex

Khosmood

Foaad

. 2018. “Predicting the Vote Using Legislative Speech.” Pp. 1-10 in Proceedings of the 19th Annual International Conference on Digital Government Research: Governance in the Data Age, edited by Anneke Zuiderwijk and Charles C. Hinnant. New York, NY, USA: Association for Computing Machinery.

32.

Caulfield

Brian

. 2009. “What’s the Difference Between a CPU and a GPU?” Retrieved October 28, 2020 (https://blogs.nvidia.com/blog/2009/12/16/whats-the-difference-between-a-cpu-and-a-gpu/).

33.

Ceron

Andrea

Curini

Luigi

Iacus

Stefano M.

2015. “Using Sentiment Analysis to Monitor Electoral Campaigns: Method Matters—Evidence From the United States and Italy.” Social Science Computer Review 33 (1): 3-20.

34.

Ceron

Andrea

Curini

Luigi

Iacus

Stefano M.

Porro

Giuseppe

. 2014. “Every Tweet Counts? How Sentiment Analysis of Social Media Can Improve Our Knowledge of Citizens’ Political Preferences with an Application to Italy and France.” New Media & Society 16 (2): 340-58.

35.

Chang

Charles

Masterson

Michael

. 2020. “Using Word Order in Political Text Classification With Long Short-Term Memory Models.” Political Analysis 28 (3): 395-411.

36.

Chen

Tianqi

Guestrin

Carlos

. 2016. “XGBoost: A Scalable Tree Boosting System.” Pp. 785-94 in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: Association for Computing Machinery.

37.

Child

Rewon

Gray

Scott

Radford

Alec

Sutskever

Ilya

. 2019. “Generating Long Sequences with Sparse Transformers. arXiv Preprint.” arXiv:1904.10509.

38.

Cho

Kyunghyun

van Merriënboer

Bart

Gulcehre

Caglar

Bahdanau

Dzmitry

Bougares

Fethi

Schwenk

Holger

Bengio

Yoshua

. 2014. “Learning Phrase Representations Using RNN Encoder–Decoder for Statistical Machine Translation.” Pp. 1724-34 in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), edited by Alessandro Moschitti, Bo Pang, and Walter Daelemans. Stroudsburg, PA, USA: Association for Computational Linguistics.

39.

Chollet

Francois

. 2020. Deep Learning with Python. New York: Manning Publications.

40.

Clark

Kevin

Khandelwal

Urvashi

Levy

Omer

Manning

Christopher D.

2019. “What Does BERT Look at? An Analysis of BERT’s Attention.” Pp. 276-86 in Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, edited by Tal Linzen, Grzegorz Chrupała, Yonatan Belinkov, and Dieuwke Hupkes. Stroudsburg, PA, USA: Association for Computational Linguistics.

41.

Clark

Kevin

Luong

Minh-Thang

Quoc V.

Manning

Christopher D.

2020. “ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators.” in 8th International Conference on Learning Representations, ICLR 2020, edited by Alexander Rush. OpenReview.net.

42.

Clark

Kevin

Luong

Thang

. 2020. “More Efficient NLP Model Pre-training with ELECTRA.” Retrieved November 13, 2020 (https://ai.googleblog.com/2020/03/more-efficient-nlp-model-pre-training.html).

43.

Clevert , Djork-Arné, Unterthiner

Thomas

Hochreiter

Sepp

. 2016. “Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs).” in 4th International Conference on Learning Representations, ICLR 2016, edited by Yoshua Bengio and Yann LeCun.

44.

Colleoni

Elanor

Rozza

Alessandro

Arvidsson

Adam

. 2014. “Echo Chamber or Public Sphere? Predicting Political Orientation and Measuring Political Homophily in Twitter Using Big Data.” Journal of Communication 64 (2): 317-32.

45.

Conneau

Alexis

Khandelwal

Kartikay

Goyal

Naman

Chaudhary

Vishrav

Wenzek

Guillaume

Guzmán

Francisco

Grave

Edouard

Ott

Myle

Zettlemoyer

Luke

Stoyanov

Veselin

. 2020. “Unsupervised Cross-lingual Representation Learning at Scale.” Pp. 8440-51 in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, edited by Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault. Stroudsburg, PA, USA: Association for Computational Linguistics.

46.

Cortes

Corinna

Vapnik

Vladimir

. 1995. “Support-Vector Networks.” Machine Learning 20 (3): 273-97.

47.

Dai

Zihang

Yang

Zhilin

Yang

Yiming

Carbonell

Jaime

Quoc V.

Salakhutdinov

Ruslan

. 2019. “Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context.” arXiv Preprint. arXiv:1901.02860.

48.

Davison

Joe

. 2020a. “New Pipeline for Zero-Shot Text Classification.” Retrieved December 28, 2021 (https://discuss.huggingface.co/t/new-pipeline-for-zero-shot-text-classification/681).

49.

Davison

Joe

. 2020b. “Zero-Shot Learning in Modern NLP.” Retrieved December 28, 2021 (https://joeddav.github.io/blog/2020/05/29/ZSL.html).

50.

Denny

Matthew J.

Spirling

Arthur

. 2018. “Text Preprocessing for Unsupervised Learning: Why It Matters, When It Misleads, and What to Do About It.” Political Analysis 26 (2): 168-89.

51.

Devlin

Jacob

. 2019. “Multilingual BERT.” Retrieved December 31, 2021 (https://github.com/google-research/bert/blob/master/multilingual.md).

52.

Devlin

Jacob

Chang

Ming-Wei

Lee

Kenton

Toutanova

Kristina

. 2018. “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” arXiv Preprint. arXiv:1810.04805v1.

53.

Devlin

Jacob

Chang

Ming-Wei

Lee

Kenton

Toutanova

Kristina

. 2019. “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” Pp. 4171-86 in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, edited by Jill Burstein, Christy Doran, and Thamar Solorio. Stroudsburg, PA, USA: Association for Computational Linguistics.

54.

Di Cocco

Jessica

Monechi

Bernardo

. 2021. “How Populist are Parties? Measuring Degrees of Populism in Party Manifestos Using Supervised Machine Learning.” Political Analysis pp. 1-17.

55.

Diermeier

Daniel

Godbout

Jean-François

Bei

Kaufmann

Stefan

. 2011. “Language and Ideology in Congress.” British Journal of Political Science 42 (1): 31-55.

56.

D’Orazio

Vito

Landis

Steven T.

Palmer

Glenn

Schrodt

Philip

. 2014. “Separating the Wheat from the Chaff: Applications of Automated Document Classification Using Support Vector Machines.” Political Analysis 22 (2): 224-42.

57.

Doshi-Velez

Finale

Kim

Been

. 2017. “Towards A Rigorous Science of Interpretable Machine Learning.” arXiv Preprint. arXiv:1702.08608.

58.

Dosovitskiy

Alexey

Beyer

Lucas

Kolesnikov

Alexander

Weissenborn

Dirk

Zhai

Xiaohua

Unterthiner

Thomas

Dehghani

Mostafa

Minderer

Matthias

Heigold

Georg

Gelly

Sylvain

Uszkoreit

Jakob

Houlsby

Neil

. 2021. “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.” in 9th International Conference on Learning Representations (ICLR 2021), edited by Shakir Mohamed.

59.

Duchi

John

Hazan

Elad

Singer

Yoram

. 2011. “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.” Journal of Machine Learning Research 12:2121-59.

60.

Duthie

Rory

Budzynska

Katarzyna

. 2018. “A Deep Modular RNN Approach for Ethos Mining.” Pp. 4041-47 in Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, edited by Jérôme Lang. International Joint Conferences on Artificial Intelligence Organization.

61.

Elman

Jeffrey L

. 1990. “Finding Structure in Time.” Cognitive Science 14 (2): 179-211.

62.

Ennser-Jedenastik

Laurenz

Meyer

Thomas M.

2018. “The Impact of Party Cues on Manual Coding of Political Texts.” Political Science Research and Methods 6 (3): 625-33.

63.

Firth

John R

. 1957. Studies in Linguistic Analysis (Philological Society). Publications of the Philological Society.Oxford: Blackwell.

64.

Fowler

Erika Franklin

Franz

Michael M.

Martin

Gregory J.

Peskowitz

Zachary

Ridout

Travis N.

. 2021. “Political Advertising Online and Offline.” American Political Science Review 115 (1): 130-49.

65.

Tsu-Jui

Linjie

Gan

Zhe

Lin

Kevin

Wang

William Yang

Wang

Lijuan

Liu

Zicheng

. 2021. “VIOLET: End-to-End Video-Language Transformers with Masked Visual-token Modeling. arXiv Preprint.” arXiv:2111.12681.

66.

Gatt

Albert

Krahmer

Emiel

. 2018. “Survey of the State of the Art in Natural Language Generation: Core Tasks, Applications and Evaluation.” Journal of Artificial Intelligence Research 61 (1): 65-170.

67.

Glavaš

Goran

Nanni

Federico

Ponzetto

Simone Paolo

. 2017. “Cross-Lingual Classification of Topics in Political Texts.” Pp. 42-46 in Proceedings of the Second Workshop on NLP and Computational Social Science, edited by Dirk Hovy, Svitlana Volkova, David Bamman, David Jurgens, Brendan O’Connor, Oren Tsur, and A. Seza Doğruöz. Stroudsburg, PA, USA: Association for Computational Linguistics.

68.

Goldberg

Yoav

. 2016. “A Primer on Neural Network Models for Natural Language Processing.” Journal of Artificial Intelligence Research 57 (1): 345-420.

69.

Goodfellow

Ian

Bengio

Yoshua

Courville

Aaron

. 2016. Deep Learning. Cambridge, MA, USA: MIT Press.

70.

Google Colaboratory. 2020. “Google Colaboratory Frequently Asked Questions.” Retrieved October 28, 2020 (https://research.google.com/colaboratory/faq.html).

71.

Goyal

Priya

Caron

Mathilde

Lefaudeux

Benjamin

Min

Wang

Pengchao

Pai

Vivek

Singh

Mannat

Liptchinsky

Vitaliy

Misra

Ishan

Joulin

Armand

Bojanowski

Piotr

. 2021. “Self-supervised Pretraining of Visual Features in the Wild.” arXiv Preprint. arXiv:2103.01988.

72.

Greene

Kevin T.

Park

Baekkwan

Colaresi

Michael

. 2019. “Machine Learning Human Rights and Wrongs: How the Successes and Failures of Supervised Learning Algorithms Can Inform the Debate About Information Effects.” Political Analysis 27 (2): 223-30.

73.

Grimmer

Justin

Stewart

Brandon M.

2013. “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.” Political Analysis 21 (3): 267-97.

74.

Han

Rujun

Gill

Michael

Spirling

Arthur

Cho

Kyunghyun

. 2018. “Conditional Word Embedding and Hypothesis Testing via Bayes-by-Backprop.” Pp. 4890-95 in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, edited by Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii. Stroudsburg, PA, USA: Association for Computational Linguistics.

75.

Hansen

Casper

. 2020. “Activation Functions Explained - GELU, SELU, ELU, ReLU and more.” Retrieved August 11, 2020 (https://mlfromscratch.com/activation-functions-explained/).

76.

Kaiming

Zhang

Xiangyu

Ren

Shaoqing

Sun

Jian

. 2015. “Deep Residual Learning for Image Recognition.” arXiv Preprint. arXiv:1512.03385.

77.

Hendrycks

Dan

Gimpel

Kevin

. 2016. “Gaussian Error Linear Units (GELUs).” arXiv Preprint. arXiv:1606.08415.

78.

Hinton

Geoffrey

Srivastava

Nitish

Swerky

Kevin

. 2012. “Neural Networks for Machine Learning, Lecture 6a: Overview of Mini-Batch Gradient Descent.” Lecture Notes, Coursera. Retrieved December 17, 2020 (http://www.cs.toronto.edu/~hinton/coursera/lecture6/lec6.pdf).

79.

Hochreiter

Sepp

Schmidhuber

Jürgen

. 1997. “Long Short-Term Memory.” Neural Computation 9 (8): 1735-80.

80.

Howard

Jeremy

Ruder

Sebastian

. 2018. “Universal Language Model Fine-tuning for Text Classification.” Pp. 328-39 in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), edited by Iryna Gurevych and Yusuke Miyao. Stroudsburg, PA, USA: Association for Computational Linguistics.

81.

Minghao

Peng

Yuxing

Huang

Zhen

Dongsheng

Yiwei

. 2019. “Open-Domain Targeted Sentiment Analysis via Span-Based Extraction and Classification.” Pp. 537-46 in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, edited by Anna Korhonen, David Traum, and Lluís Màrquez. Stroudsburg, PA, USA: Association for Computational Linguistics.

82.

Hunter

John D

. 2007. “Matplotlib: A 2D Graphics Environment.” Computing in Science & Engineering 9 (3): 90-5.

83.

Iyyer

Mohit

Enns

Peter

Boyd-Graber

Jordan

Resnik

Philip

. 2014. “Political Ideology Detection Using Recursive Neural Networks.” Pp. 1113-22 in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), edited by Kristina Toutanova and Hua Wu. Stroudsburg, PA, USA: Association for Computational Linguistics.

84.

Jacovi

Alon

Goldberg

Yoav

. 2020. “Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness?” Pp. 4198-205 in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, edited by Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault. Stroudsburg, PA, USA: Association for Computational Linguistics.

85.

Jaegle

Andrew

Borgeaud

Sebastian

Alayrac

Jean-Baptiste

Doersch

Carl

Ionescu

Catalin

Ding

David

Koppula

Skanda

Zoran

Daniel

Brock

Andrew

Shelhamer

Evan

Henaff

Olivier J

Botvinick

Matthew

Zisserman

Andrew

Vinyals

Oriol

Carreira

Joao

. 2022. “Perceiver IO: A General Architecture for Structured Inputs & Outputs.” in International Conference on Learning Representations (ICLR 2022), edited by Katja Hofmann and Alexander Rush.

86.

Jaegle

Andrew

Gimeno

Felix

Brock

Andy

Vinyals

Oriol

Zisserman

Andrew

Carreira

Joao

. 2021. “Perceiver: General Perception with Iterative Attention.” Pp. 4651-64 in Proceedings of the 38th International Conference on Machine Learning, edited by Marina Meila and Tong Zhang. PMLR.

87.

James

Gareth

Witten

Daniela

Hastie

Trevor

Tibshirani

Robert

. 2013. An Introduction to Statistical Learning With Applications in R. New York: Springer.

88.

Jawahar

Ganesh

Sagot

BenÖ

Seddah

Djamé

. 2019. “What Does BERT Learn about the Structure of Language?” Pp. 3651-57 in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, edited by Anna Korhonen, David Traum, and Lluís Màrquez. Stroudsburg, PA, USA: Association for Computational Linguistics.

89.

Jigsaw/Conversation AI. 2018. “Toxic Comment Classification Challenge.” Retrieved October 27, 2020 (https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge).

90.

Johnson

Justin

. 2017. “Derivatives, Backpropagation, and Vectorization.” Lecture Handout, CS231n: Convolutional Neural Networks for Visual Recognition, Stanford University. http://cs231n.stanford.edu/handouts/derivatives.pdf.

91.

Katagiri

Azusa

Min

Eric

. 2019. “The Credibility of Public and Private Signals: A Document-Based Approach.” American Political Science Review 113 (1): 156-72.

92.

Kentaro

Wada

. 2020. “gdown: Download a Large File from Google Drive.” Computer Software. https://github.com/wkentaro/gdown.

93.

Keskar

Nitish Shirish

Mudigere

Dheevatsa

Nocedal

Jorge

Smelyanskiy

Mikhail

Tang

Ping Tak Peter

. 2017. “On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima.” arXiv Preprint. arXiv:1609.04836.

94.

Kim

Jiseon

Griggs

Elden

Kim

In Song

Alice

. 2021. “Learning Bill Similarity with Annotated and Augmented Corpora of Bills.” Pp. 10048-064 in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), edited by Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih. Stroudsburg, PA, USA: Association for Computational Linguistics.

95.

Kingma

Diederik P.

Jimmy

. 2015. “Adam: A Method for Stochastic Optimization.” in 3rd International Conference on Learning Representations (ICLR 2015), edited by Yoshua Bengio and Yann LeCun.

96.

Kipf

Thomas N.

Welling

Max

. 2017. “Semi-Supervised Classification with Graph Convolutional Networks.” in 5th International Conference on Learning Representations, ICLR 2017, edited by Yoshua Bengio and Yann LeCun. OpenReview.net.

97.

Kirkpatrick

James

Pascanu

Razvan

Rabinowitz

Neil

Veness

Joel

Desjardins

Guillaume

Rusu

Andrei A.

Milan

Kieran

Quan

John

Ramalho

Tiago

Grabska-Barwinska

Agnieszka

Hassabis

Demis

Clopath

Claudia

Kumaran

Dharshan

Hadsell

Raia

. 2017. “Overcoming Catastrophic Forgetting in Neural Networks.” Proceedings of the National Academy of Sciences 114 (13): 3521-6.

98.

Kitaev

Nikita

Kaiser

Łukasz

Levskaya

Anselm

. 2020. “Reformer: The Efficient Transformer.” arXiv Preprint. arXiv:2001.04451.

99.

Kobayashi

Goro

Kuribayashi

Tatsuki

Yokoi

Sho

Inui

Kentaro

. 2020. “Attention is Not Only a Weight: Analyzing Transformers with Vector Norms.” Pp. 7057-75 in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), edited by Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu. Stroudsburg, PA, USA: Association for Computational Linguistics.

100.

Kokhlikyan

Narine

Miglani

Vivek

Martin

Miguel

Wang

Edward

Alsallakh

Bilal

Reynolds

Jonathan

Melnikov

Alexander

Kliushkina

Natalia

Araya

Carlos

Yan

Siqi

Reblitz-Richardson

Orion

. 2020. “Captum: A Unified and Generic Model Interpretability Library for PyTorch.” arXiv Preprint. arXiv:2009.07896.

101.

Kozlowski

Austin C.

Taddy

Matt

Evans

James A.

2019. “The Geometry of Culture: Analyzing the Meanings of Class Through Word Embeddings.” American Sociological Review 84 (5): 905-49.

102.

Kwon

K. Hazel

Priniski

J. Hunter

Chadha

Monica

. 2018. “Disentangling User Samples: A Supervised Machine Learning Approach to Proxy-Population Mismatch in Twitter Research.” Communication Methods and Measures 12 (2-3): 216-37.

103.

Lan

Zhenzhong

Chen

Mingda

Goodman

Sebastian

Gimpel

Kevin

Sharma

Piyush

Soricut

Radu

. 2020. “ALBERT: A Lite BERT for Self-supervised Learning of Language Representations.” in 8th International Conference on Learning Representations (ICLR 2020), edited by Alexander Rush. OpenReview.net.

104.

Laver

Michael

Benoit

Kenneth

Garry

John

. 2003. “Extracting Policy Positions from Political Texts Using Words as Data.” American Political Science Review 97 (2): 311-31.

105.

Quoc

Mikolov

Tomas

. 2014. “Distributed Representations of Sentences and Documents.” Proceedings of the 31st International Conference on Machine Learning 32 (2): 1188-96.

106.

Guillaume

Lemaître

Nogueira

Fernando

Aridas

Christos K.

2017. “Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning.” Journal of Machine Learning Research 18 (17): 1-5.

107.

Lewis

Mike

Liu

Yinhan

Goyal

Naman

Ghazvininejad

Marjan

Mohamed

Abdelrahman

Levy

Omer

Stoyanov

Veselin

Zettlemoyer

Luke

. 2020. “BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension.” Pp. 7871-80 in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, edited by Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault. Stroudsburg, PA, USA: Association for Computational Linguistics.

108.

Fei-Fei

Krishna

Ranjay

Danfei

. 2020a. “CS231n: Convolutional Neural Networks for Visual Recognition—Optimization I. Lecture Notes, Stanford University.” Retrieved July 6, 2020 (https://cs231n.github.io/optimization-1/).

109.

Fei-Fei

Krishna

Ranjay

Danfei

. 2020b. “CS231n: Convolutional Neural Networks for Visual Recognition—Optimization II. Lecture Notes, Stanford University.” Retrieved July 6, 2020 (https://cs231n.github.io/optimization-2/).

110.

Liu

Yinhan

Ott

Myle

Goyal

Naman

Jingfei

Joshi

Mandar

Chen

Danqi

Levy

Omer

Lewis

Mike

Zettlemoyer

Luke

Stoyanov

Veselin

. 2019. “RoBERTa: A Robustly Optimized BERT Pretraining Approach.” arXiv Preprint. arXiv:1907.11692.

111.

Loshchilov

Ilya

Hutter

Frank

. 2019. “Decoupled Weight Decay Regularization.” in 7th International Conference on Learning Representations, ICLR 2019, edited by Tara Sainath. OpenReview.net.

112.

Luo

Huaishao

Lei

Shi

Botian

Huang

Haoyang

Duan

Nan

Tianrui

Jason

Bharti

Taroon

Zhou

Ming

. 2020. “UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation.” arXiv Preprint. arXiv:2002.06353.

113.

Luong

Thang

Pham

Hieu

Manning

Christopher D.

2015. “Effective Approaches to Attention-based Neural Machine Translation.” Pp. 1412-21 in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, edited by Lluís Màrquez, Chris Callison-Burch, and Jian Su. Stroudsburg, PA, USA: Association for Computational Linguistics.

114.

MacCartney

Bill

. 2014, July 16. “Understanding Natural Language Understanding.” [Conference Presentation]. ACM SIGAI Bay Area Chapter Inaugural Meeting, San Mateo, CA, United States. https://nlp.stanford.edu/~wcmac/papers/20140716-UNLU.pdf.

115.

Manning

Christopher D

Raghavan

Prabhakar

Schütze

Hinrich

. 2008. Introduction to Information Retrieval. Cambridge: Cambridge University Press.

116.

Martin

Louis

Muller

Benjamin

Suárez

Pedro Javier Ortiz

Dupont

Yoann

Romary

Laurent

de la Clergerie

Éric

Seddah

Djamé

Sagot

Benoît

. 2020. “CamemBERT: a Tasty French Language Model.” Pp. 7203-19 in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, edited by Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault. Stroudsburg, PA, USA: Association for Computational Linguistics.

117.

Masters

Dominic

Luschi

Carlo

. 2018. “Revisiting Small Batch Training for Deep Neural Networks.” arXiv Preprint. arXiv:1804.07612.

118.

McCann

Bryan

Bradbury

James

Xiong

Caiming

Socher

Richard

. 2018. “Learned in Translation: Contextualized Word Vectors.” arXiv Preprint. arXiv:1708.00107.

119.

McCormick

Chris

Ryan

Nick

. 2019. “BERT Fine-Tuning Tutorial with PyTorch.” Retrieved September 11, 2020 (https://mccormickml.com/2019/07/22/BERT-fine-tuning/).

120.

McKinney

Wes

. 2010. “Data Structures for Statistical Computing in Python.” Pp. 51-56 in Proceedings of the 9th Python in Science Conference (SciPy 2010), edited by Stéfan v. d. Walt and Jarrod Millman.

121.

Meidinger

Maximilian

Aßenmacher

Matthias

. 2021. “A New Benchmark for NLP in Social Sciences: Evaluating the Usefulness of Pre-trained Language Models for Classifying Open-ended Survey Responses.” Pp. 866-73 in Proceedings of the 13th International Conference on Agents and Artificial Intelligence, ICAART 2021, edited by Ana Paula Rocha, Luc Steels, and H. Jaap v. d. Herik. SCITEPRESS.

122.

Mikhaylov

Slava

Laver

Michael

Benoit

Kenneth R.

2012. “Coder Reliability and Misclassification in the Human Coding of Party Manifestos.” Political Analysis 20 (1): 78-91.

123.

Mikolov

Tomas

Chen

Kai

Corrado

Greg

Dean

Jeffrey

. 2013a. “Efficient Estimation of Word Representations in Vector Space.” arXiv Preprint. arXiv:1301.3781.

124.

Mikolov

Tomas

Sutskever

Ilya

Chen

Kai

Corrado

Greg

Dean

Jeffrey

. 2013b. “Distributed Representations of Words and Phrases and Their Compositionality.” Pp. 3111-19 in Proceedings of the 26th International Conference on Neural Information Processing Systems – Volume 2, edited by Christopher J. C. Burges, Léon Bottou, Max Welling, Zoubin Ghahramani, and Kilian Q. Weinberger. Red Hook, NY, USA: Curran Associates, Inc.

125.

Miller

Blake

Linder

Fridolin

Mebane

Walter R.

2020. “Active Learning Approaches for Labeling Text: Review and Assessment of the Performance of Active Learning Approaches.” Political Analysis 28 (4): 532-51.

126.

Miller

Tim

. 2019. “Explanation in Artificial Intelligence: Insights From the Social Sciences.” Artificial Intelligence 267:1-38.

127.

Mitchell

Margaret

Aguilar

Jacqui

Wilson

Theresa

Van Durme

Benjamin

. 2013. “Open Domain Targeted Sentiment.” Pp. 1643-54 in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics.

128.

Mitts

Tamar

. 2019. “From Isolation to Radicalization: Anti-Muslim Hostility and Support for ISIS in the West.” American Political Science Review 113 (1): 173-94.

129.

Mohammad

Saif M.

Sobhani

Parinaz

Kiritchenko

Svetlana

. 2017. “Stance and Sentiment in Tweets.” ACM Transactions on Internet Technology 17 (3): 26:1-22.

130.

Molnar

Christoph

. 2022. Interpretable Machine Learning. https://christophm.github.io/interpretable-ml-book/.

131.

Muchlinski

David

Yang

Xiao

Birch

Sarah

Macdonald

Craig

Ounis

Iadh

. 2021. “We Need to Go Deeper: Measuring Electoral Violence Using Convolutional Neural Networks and Social Media.” Political Science Research and Methods 9:122-39.

132.

Münchener Digitalisierungszentrum der Bayerischen Staatsbibliothek (dbmdz). 2021. Model Card for bert-base-german-uncased from dbmdz. Retrieved May 19, 2021 (https://huggingface.co/dbmdz/bert-base-german-uncased).

133.

Nair

Vinod

Hinton

Geoffrey E.

2010. “Rectified Linear Units Improve Restricted Boltzmann Machines.” Pp. 807-14 in Proceedings of the 27th International Conference on International Conference on Machine Learning, edited by Johannes Fürnkranz and Thorsten Joachims. Madison, WI, USA: Omnipress.

134.

Nelson

Laura K.

Burk

Derek

Knudsen

Marcel

McCall

Leslie

. 2021. “The Future of Coding: A Comparison of Hand-Coding and Three Types of Computer-Assisted Text Analysis Methods.” Sociological Methods & Research 50 (1): 202-37.

135.

Nguyen

Dat Quoc

Tuan Nguyen

Anh

. 2020. “PhoBERT: Pre-trained Language Models for Vietnamese.” Pp. 1037-42 in Findings of the Association for Computational Linguistics: EMNLP 2020, edited by Trevor Cohn, Yulan He, and Yang Liu. Stroudsburg, PA, USA: Association for Computational Linguistics.

136.

Nozza

Debora

Bianchi

Federico

Hovy

Dirk

. 2020. “What the [MASK]? Making Sense of Language-Specific BERT Models.” arXiv Preprint. arXiv:2003.02912.

137.

NVIDIA. 2021. Reproducibility Issue with Transformers (BERT) and TF2.2. Retrieved January 20, 2022 (https://github.com/NVIDIA/framework-determinism/issues/19).

138.

Oliphant

Travis E

. 2015. A Guide to NumPy. Austin, TX: Continuum Press.

139.

Osnabrügge

Moritz

Ash

Elliott

Morelli

Massimo

. 2021. “Cross-Domain Topic Classification for Political Texts.” Political Analysis. pp. 1-22.

140.

Pan

Sinno Jialin

Yang

Qiang

. 2010. “A Survey on Transfer Learning.” IEEE Transactions on Knowledge and Data Engineering 22 (10): 1345-59.

141.

Pang

Lee

Lillian

Vaithyanathan

Shivakumar

. 2002. “Thumbs Up? Sentiment Classification Using Machine Learning Techniques.” Pp. 79-86 in Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg, PA, USA: Association for Computational Linguistics.

142.

Park

Baekkwan

Greene

Kevin

Colaresi

Michael

. 2020. “Human Rights are (Increasingly) Plural: Learning the Changing Taxonomy of Human Rights from Large-Scale Text Reveals Information Effects.” American Political Science Review 114 (3): 888-910.

143.

Paszke

Adam

Gross

Sam

Massa

Francisco

Lerer

Adam

Bradbury

James

Chanan

Gregory

Killeen

Trevor

Lin

Zeming

Gimelshein

Natalia

Antiga

Luca

Desmaison

Alban

Kopf

Andreas

Yang

Edward

DeVito

Zachary

Raison

Martin

Tejani

Alykhan

Chilamkurthy

Sasank

Steiner

Benoit

Fang

Bai

Junjie

Chintala

Soumith

. 2019. “PyTorch: An Imperative Style, High-Performance Deep Learning Library.” Pp. 8024-35 in Advances in Neural Information Processing Systems 32, edited by Hanna Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d. Alché-Buc, Emily Fox, and Roman Garnett. Red Hook, NY, USA: Curran Associates, Inc.

144.

Pedregosa

Varoquaux

Gramfort

Michel

Thirion

Grisel

Blondel

Prettenhofer

Weiss

Dubourg

Vanderplas

Passos

Cournapeau

Brucher

Perrot

Duchesnay

2011. “Scikit-learn: Machine Learning in Python.” Journal of Machine Learning Research 12:2825-30.

145.

Pennington

Jeffrey

Socher

Richard

Manning

Christopher

. 2014. “GloVe: Global Vectors for Word Representation.” Pp. 1532-43 in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), edited by Alessandro Moschitti, Bo Pang, and Walter Daelemans. Stroudsburg, PA, USA: Association for Computational Linguistics.

146.

Perry

Patrick O.

Benoit

Kenneth

. 2017. “Scaling Text with the Class Affinity Model.” arXiv Preprint. arXiv:1710.08963.

147.

Peters

Matthew

Neumann

Mark

Iyyer

Mohit

Gardner

Matt

Clark

Christopher

Lee

Kenton

Zettlemoyer

Luke

. 2018a. “Deep Contextualized Word Representations.” Pp. 2227-37 in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, edited by Marilyn Walker, Heng Ji, and Amanda Stent. Stroudsburg, PA, USA: Association for Computational Linguistics.

148.

Peters

Matthew E.

Neumann

Mark

Zettlemoyer

Luke

Yih

Wen-tau

. 2018b. “Dissecting Contextual Word Embeddings: Architecture and Representation.” Pp. 1499-1509 in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, edited by Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii. Stroudsburg, PA, USA: Association for Computational Linguistics.

149.

Peters

Matthew E.

Ruder

Sebastian

Smith

Noah A.

2019. “To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks.” Pp. 7-14 in Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), edited by Isabelle Augenstein, Spandana Gella, Sebastian Ruder, Katharina Kann, Burcu Can, Johannes Welbl, Alexis Conneau, Xiang Ren, and Marek Rei. Stroudsburg, PA, USA: Association for Computational Linguistics.

150.

Pilehvar

Mohammad Taher

Camacho-Collados

Jose

. 2020. Embeddings in Natural Language Processing: Theory and Advances in Vector Representations of Meaning. San Rafael, CA, USA: Morgan & Claypool Publishers.

151.

Pilny

Andrew

McAninch

Kelly

Slone

Amanda

Moore

Kelsey

. 2019. “Using Supervised Machine Learning in Automated Content Analysis: An Example Using Relational Uncertainty.” Communication Methods and Measures 13 (4): 287-304.

152.

Porter

Ethan

Velez

Yamil R.

2021. “Placebo Selection in Survey Experiments: An Agnostic Approach.” Political Analysis. Pp. 1-14.

153.

Princeton University. 2010. “About WordNet.” Retrieved December 3, 2021 (https://wordnet.princeton.edu/).

154.

R Core Team. 2020. “R: A Language and Environment for Statistical Computing.” Computer Software. Vienna, Austria: R Foundation for Statistical Computing.

155.

Radford

Alec

Kim

Jong Wook

Hallacy

Chris

Ramesh

Aditya

Goh

Gabriel

Agarwal

Sandhini

Sastry

Girish

Askell

Amanda

Mishkin

Pamela

Clark

Jack

Krueger

Gretchen

Sutskever

Ilya

. 2021. “Learning Transferable Visual Models From Natural Language Supervision.” Pp. 8748-63 in Proceedings of the 38th International Conference on Machine Learning, edited by Marina Meila and Tong Zhang. PMLR.

156.

Radford

Alec

Narasimhan

Karthik

Salimans

Tim

Sutskever

Ilya

. 2018. “Improving Language Understanding by Generative Pre-Training.” Manuscript. OpenAI. https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf.

157.

Radford

Alec

Jeff

Child

Rewon

Luan

David

Amodei

Dario

Sutskever

Ilya

. 2019. “Language Models are Unsupervised Multitask Learners. Manuscript.” OpenAI. https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf.

158.

Raffel

Colin

Shazeer

Noam

Roberts

Adam

Lee

Katherine

Narang

Sharan

Matena

Michael

Zhou

Yanqi

Wei

Liu

Peter J.

2020. “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.” Journal of Machine Learning Research 21 (140): 1-67.

159.

Ramesh

Aditya

Pavlov

Mikhail

Goh

Gabriel

Gray

Scott

Voss

Chelsea

Radford

Alec

Chen

Mark

Sutskever

Ilya

. 2021. “Zero-Shot Text-to-Image Generation.” Proceedings of Machine Learning Research 139:8821-31.

160.

Ramey

Adam J.

Klingler

Jonathan D.

Hollibaugh

Gary E.

2019. “Measuring Elite Personality Using Speech.” Political Science Research and Methods 7 (1): 163-84.

161.

Raschka

Sebastian

. 2020. “watermark.” Computer Software. https://github.com/rasbt/watermark.

162.

Rehbein

Ines

Ponzetto

Simone Paolo

Adendorf

Anna

Bahnsen

Oke

Stoetzer

Lukas

Stuckenschmidt

Heiner

. 2021a. “Come Hither or Go Away? Recognising Pre-Electoral Coalition Signals in the News.” Pp. 7798-7810 in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), edited by Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih. Stroudsburg, PA, USA: Association for Computational Linguistics.

163.

Rehbein

Ines

Ruppenhofer

Josef

Bernauer

Julian

. 2021b. “Who is We? Disambiguating the Referents of First Person Plural Pronouns in Parliamentary Debates.” Pp. 147-58 in Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021), edited by Kilian Evang, Laura Kallmeyer, Rainer Osswald, Jakub Waszczuk, and Torsten Zesch. Düsseldorf, Germany: KONVENS 2021 Organizers.

164.

Reimers

Nils

Gurevych

Iryna

. 2019. “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks.” Pp. 3982-92 in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), edited by Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan. Stroudsburg, PA, USA: Association for Computational Linguistics.

165.

Rheault

Beelen

Cochrane

Hirst

2016. “Measuring Emotion in Parliamentary Debates with Automated Textual Analysis.” PLoS ONE 11 (12): e0168843.

166.

Rheault

Ludovic

Cochrane

Christopher

. 2020. “Word Embeddings for the Analysis of Ideological Placement in Parliamentary Corpora.” Political Analysis 28 (1): 112-33.

167.

Ribeiro

Marco Tulio

Tongshuang

Guestrin

Carlos

Singh

Sameer

. 2020. “Beyond Accuracy: Behavioral Testing of NLP Models with CheckList.” Pp. 4902-12 in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, edited by Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault. Stroudsburg, PA, USA: Association for Computational Linguistics.

168.

Riedl

Mark

. 2020. AI Democratization in the Era of GPT-3. The Gradient. Retrieved December 17, 2020 (https://thegradient.pub/ai-democratization-in-the-era-of-gpt-3/).

169.

Rodman

Emma

. 2020. “A Timely Intervention: Tracking the Changing Meanings of Political Concepts With Word Vectors.” Political Analysis 28 (1): 87-111.

170.

Rona-Tas

Akos

Cornuèjols

Antoine

Blanchemanche

Sandrine

Duroy

Antonin

Martin

Christine

. 2019. “Enlisting Supervised Machine Learning in Mapping Scientific Uncertainty Expressed in Food Risk Analysis.” Sociological Methods & Research 48 (3): 608-41.

171.

Rothe

Sascha

Schütze

Hinrich

. 2015. “AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes.” Pp. 1793-1803 in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), edited by Chengqing Zong and Michael Strube. Stroudsburg, PA, USA: Association for Computational Linguistics.

172.

Ruder

Sebastian

. 2018. “NLP’s ImageNet Moment Has Arrived.” Retrieved July 15, 2020 (https://ruder.io/nlp-imagenet/).

173.

Ruder

Sebastian

. 2019a. Neural Transfer Learning for Natural Language Processing. Ph.D. thesis, National University of Ireland, Galway.

174.

Ruder

Sebastian

. 2019b. “Unsupervised Cross-Lingual Representation Learning.” Retrieved December 31, 2021 (https://ruder.io/unsupervised-cross-lingual-learning/index.html).

175.

Ruder

Sebastian

. 2020. “NLP-Progress.” Retrieved August 04, 2020 (https://nlpprogress.com/).

176.

Ruder

Sebastian

. 2021. Recent Advances in Language Model Fine-Tuning. Retrieved December 21, 2021 (https://ruder.io/recent-advances-lm-fine-tuning/).

177.

Rudkowsky

Elena

Haselmayer

Martin

Wastian

Matthias

Jenny

Marcelo

Emrich

Stefan

Sedlmair

Michael

. 2018. “More than Bags of Words: Sentiment Analysis with Word Embeddings.” Communication Methods and Measures 12 (2-3): 140-57.

178.

Rumelhart

D. E.

Hinton

G. E.

Williams

R. J

. 1986. “Learning Representations by Back-Propagating Errors.” Nature 323:533-6.

179.

Rust

Phillip

Pfeiffer

Jonas

Vulić

Ivan

Ruder

Sebastian

Gurevych

Iryna

. 2021. “How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models.” Pp. 3118-35 in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), edited by Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli. Stroudsburg, PA, USA: Association for Computational Linguistics.

180.

Schuster

Nakajima

2012. “Japanese and Korean Voice Search.” Pp. 5149-52 in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). New York, NY, USA: Institute of Electrical and Electronics Engineers (IEEE).

181.

Schütze

Hinrich

. 1998. “Automatic Word Sense Discrimination.” Computational Linguistics 24 (1): 97-123.

182.

scikit-learn Developers. 2020a. “1.4. Support Vector Machines.” Retrieved November 23, 2020 (https://scikit-learn.org/stable/modules/svm.html).

183.

scikit-learn Developers. 2020b. “Classification Metrics.” Retrieved November 03, 2020 (https://scikit-learn.org/stable/modules/model_evaluation.html).

184.

scikit-learn Developers. 2020c. “RBF SVM Parameters.” Retrieved November 23, 2020 (https://scikit-learn.org/stable/auto_examples/svm/plot_rbf_parameters.html).

185.

Sebők

Miklós

Kacsuk

Zoltán

. 2021. “The Multiclass Classification of Newspaper Articles with Machine Learning: The Hybrid Binary Snowball Approach.” Political Analysis 29 (2): 236-49.

186.

Selivanov

Dimitry

Bickel

Manuel

Wang

Qing

. 2020. “text2vec: Modern Text Mining Framework for R.” Computer Software. http://text2vec.org/.

187.

Sennrich

Rico

Haddow

Barry

Birch

Alexandra

. 2016. “Neural Machine Translation of Rare Words with Subword Units.” Pp. 1715-25 in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), edited by Katrin Erk and Noah A. Smith. Stroudsburg, PA, USA: Association for Computational Linguistics.

188.

Slapin

Jonathan B.

Kirkland

Justin H.

2020. ““The Sound of Rebellion: Voting Dissent and Legislative Speech in the UK House of Commons”.” Legislative Studies Quarterly 45 (2): 153-76.

189.

Smith

Noah A

. 2011. Linguistic Structure Prediction. San Rafael, CA, USA: Morgan & Claypool Publishers.

190.

Socher

Richard

Perelygin

Alex

Jean

Chuang

Jason

Manning

Christopher D.

Andrew

Potts

Christopher

. 2013. “Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank.” Pp. 1631-42 in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, edited by David Yarowsky, Timothy Baldwin, Anna Korhonen, Karen Livescu, and Steven Bethard. Stroudsburg, PA, USA: Association for Computational Linguistics.

191.

Song

Hyunjin

Tolochko

Petro

Eberl

Jakob-Moritz

Eisele

Olga

Greussing

Esther

Heidenreich

Tobias

Lind

Fabienne

Galyga

Sebastian

Boomgaarden

Hajo G.

2020. “In Validations We Trust? The Impact of Imperfect Human Annotations as a Gold Standard on the Quality of Validation of Automated Content Analysis.” Political Communication 37 (4): 550-72.

192.

Spirling

Arthur

Rodriguez

Pedro L.

2020. “Word Embeddings: What Works, What Doesn’t, and How to Tell the Difference for Applied Research.” Manuscript. http://arthurspirling.org/documents/embed.pdf.

193.

Srivastava

Nitish

Hinton

Geoffrey

Krizhevsky

Alex

Sutskever

Ilya

Salakhutdinov

Ruslan

. 2014. “Dropout: A Simple Way to Prevent Neural Networks From Overfitting.” Journal of Machine Learning Research 15 (56): 1929-58.

194.

Sun

Chen

Myers

Austin

Vondrick

Carl

Murphy

Kevin

Schmid

Cordelia

. 2019a. “VideoBERT: A Joint Model for Video and Language Representation Learning.” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

195.

Sun

Chi

Qiu

Xipeng

Yige

Huang

Xuanjing

. 2019b. “How to Fine-Tune BERT for Text Classification?” arXiv Preprint. arXiv:1905.05583.

196.

Sutskever

Ilya

Vinyals

Oriol

Quoc V.

2014. ‘Sequence to Sequence Learning with Neural Networks.” Pp. 3104-12 in Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, edited by Zoubin Ghahramani, Max Welling, Corinna Cortes, Neil D. Lawrence, and Kilian Q. Weinberger. Cambridge, MA, USA: MIT Press.

197.

Tay

Dehghani

Mostafa

Abnar

Samira

Shen

Yikang

Bahri

Dara

Pham

Philip

Rao

Jinfeng

Yang

Liu

Ruder

Sebastian

Metzler

Donald

. 2021. “Long Range Arena: A Benchmark for Efficient Transformers.” in 9th International Conference on Learning Representations (ICLR 2021), edited by Shakir Mohamed.

198.

Tenney

Ian

Das

Dipanjan

Pavlick

Ellie

. 2019a. “BERT Rediscovers the Classical NLP Pipeline.” Pp. 4593-601 in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, edited by Anna Korhonen, David Traum, and Lluís Màrquez. Stroudsburg, PA, USA: Association for Computational Linguistics.

199.

Tenney

Ian

Xia

Patrick

Chen

Berlin

Wang

Alex

Poliak

Adam

McCoy

R. T.homas

Kim

Najoung

Durme

Benjamin Van

Bowman

Sam

Das

Dipanjan

Pavlick

Ellie

. 2019b. “What Do You Learn From Context? Probing for Sentence Structure in Contextualized Word Representations.” in 7th International Conference on Learning Representations, ICLR 2019, edited by Tara Sainath.

200.

The Hugging Face Team. 2018. “PyTorch BERT Model.” Retrieved December 26, 2021 (https://github.com/huggingface/transformers/blob/v4.15.0/src/transformers/models/bert/modeling_bert.py).

201.

The Hugging Face Team. 2020a. “Preprocess.” Retrieved November 11, 2020 (https://huggingface.co/transformers/preprocessing.html).

202.

The Hugging Face Team. 2020b. “Summary of the Models.” Retrieved November 13, 2020 (https://huggingface.co/transformers/model_summary.html).

203.

The Hugging Face Team. 2020c. Tokenizer Summary. Retrieved November 19, 2020 (https://huggingface.co/transformers/tokenizer_summary.html).

204.

Theocharis

Yannis

Barberà

Pablo

Fazekas

Zoltàn

Popa

Sebastian Adrian

Parnet

Olivier

. 2016. “A Bad Workman Blames His Tweets: The Consequences of Citizens’ Uncivil Twitter Use When Interacting With Party Candidates.” Journal of Communication 66 (6): 1007-31.

205.

Torch Contributors. 2021. Reproducibility. Retrieved December 31, 2021 (https://pytorch.org/docs/stable/notes/randomness.html).

206.

Turney

Peter D.

Pantel

Patrick

. 2010. “From Frequency to Meaning: Vector Space Models of Semantics.” Journal of Artificial Intelligence Research 37:141-88.

207.

Ushey

Kevin

Allaire

Joseph J.

Wickham

Hadley

Ritchie

Gary

. 2020. “rstudioapi: Safely Access the RStudio API.” Computer Software.

208.

Van Rossum

Guido

Drake

Fred L.

2009. Python 3 Reference Manual. Scotts Valley, CA: CreateSpace.

209.

Vapnik

Vladimir

. 1991. “Principles of Risk Minimization for Learning Theory.” Pp. 831-38 in Advances in Neural Information Processing Systems 4, edited by J. Moody, S. Hanson, and R. P. Lippmann. Morgan-Kaufmann.

210.

Vaswani

Ashish

Shazeer

Noam

Parmar

Niki

Uszkoreit

Jakob

Jones

Llion

Gomez

Aidan N

Kaiser

Lukasz

Polosukhin

Illia

. 2017. “Attention is All You Need.” Pp. 5998-6008 in Advances in Neural Information Processing Systems 30, edited by Isabelle Guyon, Ulrike v. Luxburg, Samy Bengio, Hanna Wallach, Rob Fergus, S. Vishwanathan, and Roman Garnett. Red Hook, NY, USA: Curran Associates, Inc.

211.

Wang

Alex

Pruksachatkun

Yada

Nangia

Nikita

Singh

Amanpreet

Michael

Julian

Hill

Felix

Levy

Omer

Bowman

Samuel

. 2019. “SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems 32.” Pp. 3266-80 in Advances in Neural Information Processing Systems, edited by Hanna Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d. Alché-Buc, Emily Fox, and Roman Garnett. Red Hook, NY, USA: Curran Associates, Inc.

212.

Wang

Sinong

Belinda Z.

Khabsa

Madian

Fang

Han

Hao

. 2020. “Linformer: Self-Attention With Linear Complexity.” arXiv Preprint. arXiv:2006.04768.

213.

Wankmüller

Sandra

. 2022. “A Comparison of Approaches for Imbalanced Classification Problems in the Context of Retrieving Relevant Documents for an Analysis.” arXiv Preprint. arXiv:2205.01600.

214.

Wankmüller

Sandra

Heumann

Christian

. 2021. “How to Estimate Continuous Sentiments From Texts Using Binary Training Data.” Pp. 182-92 in Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021), edited by Kilian Evang, Laura Kallmeyer, Rainer Osswald, Jakub Waszczuk, and Torsten Zesch. Düsseldorf, Germany: KONVENS 2021 Organizers.

215.

Waskom, Michael and Team. 2020. “Seaborn.” Computer Software. https://zenodo.org/record/4379347.

216.

Watanabe

Kohei

. 2021. “Latent Semantic Scaling: A Semisupervised Text Analysis Technique for New Domains and Languages.” Communication Methods and Measures 15 (2): 81-102.

217.

Wei

Jason

Bosma

Maarten

Zhao

Vincent

Guu

Kelvin

Adams Wei

Lester

Brian

Nan

Dai

Andrew M.

Quoc V

. 2022. “Finetuned Language Models are Zero-Shot Learners.” in International Conference on Learning Representations (ICLR 2022), edited by Katja Hofmann and Alexander Rush.

218.

Welbers

Kasper

Van Atteveldt

Wouter

Benoit

Kenneth

. 2017. “Text Analysis in R.” Communication Methods and Measures 11 (4): 245-65.

219.

Wickham

Hadley

. 2019. “stringr: Simple, Consistent Wrappers for Common String Operations.” Computer Software.

220.

Widmann

Tobias,

Wich

Maximilian

. 2022. “Creating and Comparing Dictionary, Word Embedding, and Transformer-Based Models to Measure Discrete Emotions in German Political Text.” Political Analysis: 1-16.

221.

Williams

Adina

Nangia

Nikita

Bowman

Samuel

. 2018. “A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference.” Pp. 1112-22 in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), edited by Marilyn Walker, Heng Ji, and Amanda Stent. Stroudsburg, PA, USA: Association for Computational Linguistics.

222.

Wolf

Thomas

Debut

Lysandre

Sanh

Victor

Chaumond

Julien

Delangue

Clement

Moi

Anthony

Cistac

Pierric

Rault

Tim

Louf

Rémi

Funtowicz

Morgan

Davison

Joe

Shleifer

Sam

von Platen

Patrick

Clara

Jernite

Yacine

Plu

Julien

Canwen

Scao

Teven Le

Gugger

Sylvain

Drame

Mariama

Lhoest

Quentin

Rush

Alexander M.

2020. “Hugging Face’s Transformers: State-of-The-Art Natural Language Processing.” arXiv Preprint. arXiv:1910.03771.

223.

Patrick Y.

Mebane

Walter R.

2021. “MARMOT: A Deep Learning Framework for Constructing Multimodal Representations for Vision-and-Language Tasks.” arXiv Preprint. arXiv:2109.11526.

224.

Yonghui

Schuster

Mike

Chen

Zhifeng

Quoc V.

Norouzi

Mohammad

Macherey

Wolfgang

Krikun

Maxim

Cao

Yuan

Gao

Qin

Macherey

Klaus

Klingner

Jeff

Shah

Apurva

Johnson

Melvin

Liu

Xiaobing

Kaiser

Łukasz

Gouws

Stephan

Kato

Yoshikiyo

Kudo

Taku

Kazawa

Hideto

Stevens

Keith

Kurian

George

Patil

Nishant

Wang

Wei

Young

Cliff

Smith

Jason

Riesa

Jason

Rudnick

Alex

Vinyals

Oriol

Corrado

Greg

Hughes

Macduff

Dean

Jeffrey

. 2016. “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation.” arXiv Preprint. arXiv:1609.08144.

225.

xgboost Developers. 2020. “xgboost.XGBClassifier.” Retrieved November 23, 2020 (https://xgboost.readthedocs.io/en/latest/python/python_api.html).

226.

Xue

Linting

Constant

Noah

Roberts

Adam

Kale

Mihir

Al-Rfou

Rami

Siddhant

Aditya

Barua

Aditya

Raffel

Colin

. 2021. “mT5: A Massively Multilingual Pre-Trained Text-to-Text Transformer.” Pp. 483-98 in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, edited by Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, and Yichao Zhou. Stroudsburg, PA, USA: Association for Computational Linguistics.

227.

Zhilin

Zhilinx

Dai

Zihang

Yang

Yiming

Carbonell

Jaime

Salakhutdinov

Ruslan

Quoc V

. 2019. “XLNet: Generalized Autoregressive Pretraining for Language Understanding.” Pp. 5753-63 in Advances in Neural Information Processing Systems 32, edited by Hanna Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d. Alché-Buc, Emily Fox, and Roman Garnett. Red Hook, NY, USA: Curran Associates, Inc.

228.

Yin

Pengcheng

Neubig

Graham

Yih

Wen-tau

Riedel

Sebastian

. 2020. “TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data.” Pp. 8413-26 in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, edited by Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault. Online: Association for Computational Linguistics.

229.

Yin

Wenpeng

Hay

Jamaal

Roth

Dan

. 2019. “Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach.” Pp. 3914-23 in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), edited by Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan. Stroudsburg, PA, USA: Association for Computational Linguistics.

230.

Yosinski

Jason

Clune

Jeff

Bengio

Yoshua

Lipson

Hod

. 2014. “How Transferable are Features in Deep Neural Networks?” Pp. 3320-28 in Advances in Neural Information Processing Systems 27, edited by Zoubin Ghahramani, Max Welling, Corinna Cortes, Neil D. Lawrence, and Kilian Q. Weinberger. Red Hook, NY, USA: Curran Associates, Inc.

231.

Zaheer

Manzil

Guruganesh

Guru

Dubey

Kumar Avinava

Ainslie

Joshua

Alberti

Chris

Ontanon

Santiago

Pham

Philip

Ravula

Anirudh

Wang

Qifan

Yang

Ahmed

Amr

. 2020. “Big Bird: Transformers for Longer Sequences.” Pp. 17283-297 in Advances in Neural Information Processing Systems 33, edited by H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin. Curran Associates, Inc.

232.

Zarrella

Guido

Marsh

Amy

. 2016. “MITRE at SemEval-2016 Task 6: Transfer Learning for Stance Detection.” Pp. 458-63 in Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), edited by Steven Bethard, Marine Carpuat, Daniel Cer, David Jurgens, Preslav Nakov, and Torsten Zesch. Stroudsburg, PA, USA: Association for Computational Linguistics.

233.

Han

Zhang

Pan

Jennifer

. 2019. “CASM: A Deep-Learning Approach for Identifying Collective Action Events With Text and Image Data from Social Media.” Sociological Methodology 49 (1): 1-57.

234.

Zhu

Yukun

Kiros

Ryan

Zemel

Rich

Salakhutdinov

Ruslan

Urtasun

Raquel

Torralba

Antonio

Fidler

Sanja

. 2015. “Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books.” Pp. 19-27 in Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). New York, NY, USA: Institute of Electrical and Electronics Engineers (IEEE).

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.94 MB