Scoring Dimension-Level Job Performance From Narrative Comments: Validity and Generalizability When Using Natural Language Processing

Abstract

Performance appraisal narratives are qualitative descriptions of employee job performance. This data source has seen increased research attention due to the ability to efficiently derive insights using natural language processing (NLP). The current study details the development of NLP scoring for performance dimensions from narrative text and then investigates validity and generalizability evidence for those scores. Specifically, narrative valence scores were created to measure a priori performance dimensions. These scores were derived using bag of words and word embedding features and then modeled using modern prediction algorithms. Construct validity evidence was investigated across three samples, revealing that the scores converged with independent human ratings of the text, aligned numerical performance ratings made during the appraisal, and demonstrated some degree of discriminant validity. However, construct validity evidence differed based on which NLP algorithm was used to derive scores. In addition, valence scores generalized to both downward and upward rating contexts. Finally, the performance valence algorithms generalized better in contexts where the same qualitative survey design was used compared with contexts where different instructions were given to elicit narrative text.

Keywords

text mining performance appraisals natural language processing machine learning big data qualitative analysis

Get full access to this article

View all access options for this article.

References

American Educational Research Association, American Psychological Association, National Council on Measurement in Education, & Joint Committee on Standards for Educational and Psychological Testing (U.S.). (2014). Standards for educational and psychological testing.

Ammons-Stephens

Cole

H. J.

Riehle

C. F.

Weare

W. H.

(2009). Developing core leadership competencies for the library profession. Library Leadership & Management, 23, 63–74.

Banks

G. C.

Woznyj

H. M.

Wesslen

R. S.

Ross

R. L.

(2018). A review of best practice recommendations for text analysis in R (and a user-friendly app). Journal of Business & Psychology, 33, 445–459.

Bartram

(2005). The great eight competencies: A criterion-centric approach to validation. Journal of Applied Psychology, 90, 1185–1203.

Berk

R. A.

(2016). An introduction to ensemble methods for data analysis. Sociological Methods and Research, 34, 263–295.

Binning

J. F.

Barrett

G. V.

(1989). Validity of personnel decisions: A conceptual analysis of the inferential and evidential bases. Journal of Applied Psychology, 74, 478–494.

Breiman

(2001). Random forests. Machine Learning, 45, 5–32.

Brutus

(2010). Words versus numbers: A theoretical exploration of giving and receiving narrative comments in performance appraisal. Human Resource Management Review, 20, 144–157.

Campion

M. C.

Campion

M. A.

Campion

E. D.

Reider

M. H.

(2016). Initial investigation into computer scoring of candidate essays for personnel selection. Journal of Applied Psychology, 101, 958–975.

10.

Cascio

W. F.

Aguinis

(2005). Applied psychology in human resource management (6th ed.). Prentice-Hall.

11.

Cheung

J. H.

Burns

D. K.

Sinclair

R. R.

Sliter

(2017). Amazon Mechanical Turk in organizational psychology: An evaluation and practical recommendations. Journal of Business and Psychology, 32, 347–361.

12.

Cohen

. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Erlbaum.

13.

Costigan

R. D.

Donahue

(2009). Developing the great eight competencies with leaderless group discussion. Journal of Management Education, 33, 596–616.

14.

DeNisi

A. S.

Cafferty

T. P.

Meglino

B. M.

(1984). A cognitive view of the performance appraisal process: A model and research propositions. Organizational Behavior and Human Performance, 33, 360–396.

15.

Gorman

C. A.

Meriac

J. P.

Roch

S. G.

Ray

J. L.

Gamble

J. S.

(2017). An exploratory study of current performance management practices: Human resource executives’ perspectives. International Journal of Selection and Assessment, 25, 193–202.

16.

Harari

M. B.

Rudolph

C. W.

(2017). The effect of rater accountability on performance ratings: A meta-analytic review. Human Resource Management Review, 27, 121–133.

17.

Harris

M. M.

(1994). Rater motivation in the performance appraisal context: A theoretical framework. Journal of Management, 20, 735–756.

18.

Hayes

P. A.

Omodei

M. M.

(2011). Managing emergencies: Key competencies for incident management teams. The Australasian Journal of Organisational Psychology, 4, 1–10.

19.

Hoffman

B. J.

Woehr

D. J.

(2009). Disentangling the meaning of multisource performance rating source and dimension factors. Personnel Psychology, 62, 735–765.

20.

Klendauer

Berkovich

Gelvin

Leimeister

J. M.

Krcmar

(2012). Towards a competency model for requirements analysts. Information Systems Journal, 22, 475–503.

21.

Kobayashi

V. B.

Mol

S. T.

Berkers

H. A.

Kismihók

Den Hartog

D. N.

(2018). Text mining in organizational research. Organizational Research Methods, 21, 733–765.

22.

Kurz

Bartram

(2002). Competency and individual performance: Modeling the world of work. In Robertson

I. T.

Callinan

Bartram

(Eds.), Organizational effectiveness: The role of psychology (pp. 227–255). Wiley.

23.

Landers

R. N.

Behrend

T. S.

(2015). An inconvenient truth: Arbitrary distinctions between organizational, Mechanical Turk, and other convenience samples. Industrial & Organizational Psychology, 8, 142–164.

24.

Landy

F. J.

Farr

J. L.

(1980). Performance rating. Psychological Bulletin, 87, 72–107.

25.

Levy

P. E.

Williams

J. R.

(2004). The social context of performance appraisal: A review and framework for the future. Journal of Management, 30, 881–905.

26.

Longenecker

C. O.

Sims

H. P.

Jr Gioia

D. A.

(1987). Behind the mask: The politics of employee appraisal. Academy of Management Perspectives, 1, 183–193.

27.

Mikolov

Chen

Corrado

Dean

(2013). Efficient estimation of word representations in vector space. https://arxiv.org/abs/1301.3781

28.

Murphy

K. R.

Cleveland

J. N.

(1995). Understanding performance appraisal: Social, organizational, and goal-based perspectives. SAGE Publications.

29.

Murphy

K. R.

Cleveland

J. N.

Hanscom

M. E.

(2018). Performance appraisal and management. SAGE Publications.

30.

Pandey

S. K.

(2019). Applying natural language processing capabilities in computerized textual analysis to measure organizational culture. Organizational Research Methods, 22, 765–797.

31.

Pang

Lee

(2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2, 1–135.

32.

Park

Schwartz

H. A.

Eichstaedt

J. C.

Kern

M. L.

Kosinski

Stillwell

D. J.

Ungar

L. H.

Seligman

M. E.

(2014). Automatic personality assessment through social media language. Journal of Personality and Social Psychology: Personality Processes and Individual Differences, 108, 934–952.

33.

Pennington

Socher

Manning

C. D.

(2014). GloVe: Global vectors for word representation. https://www.aclweb.org/anthology/D14-1162.pdf

34.

Peters

M. E.

Neumann

Iyyer

Gardner

Clark

Lee

Zettlemoyer

(2018). Deep contextualized work representations. https://www.aclweb.org/anthology/N18-1202.pdf

35.

Pulakos

E. D.

Hanson

R. M.

Arad

Moye

(2015). Performance management can be fixed: An on-the-job experiential learning approach for complex behavior change. Industrial and Organizational Psychology, 8, 51–76.

36.

Rojon

McDowall

Saunders

M. N.

(2015). The relationships between traditional selection assessments and workplace performance criteria specificity: A comparative meta-analysis. Human Performance, 28, 1–25.

37.

Rudkowsky

Haselmayer

Wastian

Jenny

Emrich

Sedlmair

(2018). More than bag of words: Sentiment analysis with word embeddings. Communication Methods and Measures, 12, 140–157.

38.

Rupayana

Hedricks

C. A.

Robie

Puchalski

(2017). Who wrote that? Source effects in narrative feedback from references [Paper presentation]. Society for Industrial & Organizational Psychology 32nd Annual Conference, Orlando, FL, United States.

39.

Schwartz

H. A.

Eichstaedt

J. C.

Kern

M. L.

Dziurzynski

Ramones

S. M.

Agrawal

Shah

Kosinski

Stillwell

Seligman

M. E.

Ungar

L. H.

(2013). Personality, gender, and age in the language of social media: The open vocabulary approach. PloS ONE, 8(9), Article e73791. https://doi.org/10.1371/journal.pone.0073791

40.

Socher

Perelygin

J. Y.

Chuang

J. Manning, C. D., Ng, A. Y.

Potts

(2013). Recursive deep models for semantic compositionality over a sentiment treebank. https://nlp.stanford.edu/∼socherr/EMNLP2013_RNTN.pdf

41.

Speer

A. B.

(2018). Quantifying with words: An investigation of the validity of narrative-derived performance scores. Personnel Psychology, 71, 299–333.

42.

Speer

A. B.

Schwendeman

Reich

C. C.

Tenbrink

A. P.

Siver

S. R.

(2019). Investigating the construct validity of performance comments: Creation of the Great Eight Narrative Dictionary. Journal of Business & Psychology, 34, 747–767.

43.

Speer

A. B.

Tenbrink

Schwendeman

(2019). Let’s talk it out: The effects of calibration meetings on the accuracy of performance ratings. Human Performance, 32, 107–128.

44.

Spence

J. R.

Keeping

(2011). Conscious rating distortion in performance appraisal: A review, commentary, and proposed framework for research. Human Resource Management Review, 21, 85–95.

45.

Spence

J. R.

Keeping

L. M.

(2013). The road to performance is paved with intentions: A framework for understanding managers’ intentions when rating employee performance. Organizational Psychology Review, 3, 360–383.

46.

Spendlove

(2007). Competencies for effective leadership in higher education. International Journal of Educational Management, 21, 407–417.

47.

Thompson

I. B.

Koenig

N. C.

Liu

(2019). SIOP select: The second SIOP machine learning competition [Paper presentation]. Society for Industrial & Organizational Psychology Annual Conference, National Harbor, MD, USA.

48.

Vandenberg

R. J.

Lance

C. E.

(2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3, 4–70.

49.

Viswesvaran

Ones

D. S.

Schmidt

F. L.

(1996). Comparative analysis of the reliability of job performance ratings. Journal of Applied Psychology, 81, 557–574.

50.

Viswesvaran

Schmidt

F. L.

Ones

D. S.

(2005). Is there a general factor in ratings of performance? A meta-analytic framework for disentangling substantive and error influences. Journal of Applied Psychology, 90, 108–131.

51.

Woehr

D. J.

Huffcutt

A. I.

(1994). Rater training for performance appraisal: A quantitative review. Journal of Occupational and Organizational Psychology, 67, 189–205.

52.

C.-H.

Wang

(2011). Understanding proactive leadership. In Mobley

W. H.

Wang

(Eds.), Advances in global leadership (Vol. 6, pp. 299–314). Emerald Group.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.04 MB