Sage Journals: Discover world-class research

Abstract

The present study aimed to develop and validate a multimodal self-esteem recognition method based on a self-introduction task, with the goal of achieving automated self-esteem evaluation. We recruited two independent samples of undergraduate students (N = 211 and N = 63) and collected 40-second self-introduction videos along with Rosenberg Self-Esteem Scale (RSES) scores. Features were extracted from three modalities—visual, audio, and text—and three-class models were trained using the dataset of 211 participants. Results indicated that the late-fusion multimodal model achieved the highest performance (Accuracy, ACC = 0.447 ± 0.019; Macro-averaged F1, Macro-F1 = 0.438 ± 0.020) and further demonstrated cross-sample generalizability when validated on the independent sample of 63 participants (ACC = 0.381, Macro-F1 = 0.379). Reliability testing showed good interrater consistency (Fleiss’ κ = 0.723, Intraclass Correlation Coefficient, ICC = 0.745). Criterion-related validity analyses indicated that the proposed method was significantly correlated with life satisfaction, subjective happiness, positive and negative affect, depression, anxiety, stress, relational self-esteem, and collective self-esteem. Moreover, incremental validity analyses indicated that the multimodal model provided additional predictive value for positive affect beyond the RSES. Taken together, these findings provide preliminary evidence that multimodal behavioral features can assist in achieving automated self-esteem evaluation, offering a feasible, low-burden complement to traditional self-report.

Keywords

self-esteem assessment personality computing machine learning multimodal fusion

Get full access to this article

View all access options for this article.

References

Baltrusaitis

Ahuja

Morency

L.-P.

(2019). Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(2), 423–443. https://doi.org/10.1109/TPAMI.2018.2798607

Baltrusaitis

Robinson

Morency

L.-P.

(2016). OpenFace: An open source facial behavior analysis toolkit [Conference session]. 2016 IEEE Winter Conference on Applications of Computer Vision (WACV). https://doi.org/10.1109/WACV.2016.7477553

Ben

Ren

Zhang

Wang

S.-J.

Kpalma

Meng

Liu

Y.-J.

(2021). Video-based facial micro-expression analysis: A survey of datasets, features and algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9), 5826–5846. https://doi.org/10.1109/TPAMI.2021.3067464

Bishop

C. M.

(2006). Pattern recognition and machine learning. Springer.

Briganti

Lechien

J. R.

(2025). Voice quality as digital biomarker in bipolar disorder: A systematic review. Journal of Voice. Advance online publication. https://doi.org/10.1016/j.jvoice.2025.01.002

Brownlow

Rosamond

J. A.

Parker

J. A.

(2003). Gender-linked linguistic behavior in television interviews. Sex Roles, 49(3–4), 121–132. https://doi.org/10.1023/A:1024404812972

Cristianini

Shawe-Taylor

(2000). An introduction to support vector machines: And other kernel-based learning methods. Cambridge University Press. https://doi.org/10.1017/CBO9780511801389

Dickerson

S. S.

Kemeny

M. E.

(2004). Acute stressors and cortisol responses: A theoretical integration and synthesis of laboratory research. Psychological Bulletin, 130(3), 355–391. https://doi.org/10.1037/0033-2909.130.3.355

Diener

Emmons

R. A.

Larsen

R. J.

Griffin

(1985). The Satisfaction With Life Scale. Journal of Personality Assessment, 49(1), 71–75. https://doi.org/10.1207/s15327752jpa4901_13

10.

Domingos

Pazzani

(1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29(2–3), 103–130.

11.

Donnellan

M. B.

Trzesniewski

K. H.

Robins

R. W.

(2011). Self-esteem: Enduring issues and controversies. In Chamorro-Premuzic

von Stumm

Furnham

(Eds.), The Wiley-Blackwell handbook of individual differences (pp. 718–746). Wiley-Blackwell. https://doi.org/10.1002/9781444343120.ch28

12.

King

R. B.

Chi

(2012). The development and validation of the Relational Self-Esteem Scale: Relational Self-Esteem Scale. Scandinavian Journal of Psychology, 53(3), 258–264. https://doi.org/10.1111/j.1467-9450.2012.00946.x

13.

Ekman

Friesen

W. V.

(1978). Facial action coding system [Dataset]. https://doi.org/10.1037/t27734-000

14.

Eyben

Wöllmer

Schuller

(2010). Opensmile: The Munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM International Conference on Multimedia (pp. 1459–1462). https://doi.org/10.1145/1873951.1874246

15.

Farnadi

Sitaraman

Sushmita

Celli

Kosinski

Stillwell

Davalos

Moens

M.-F.

De Cock

(2016). Computational personality recognition in social media. User Modeling and User-Adapted Interaction, 26(2–3), 109–142. https://doi.org/10.1007/s11257-016-9171-0

16.

Fleiss

J. L.

(1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378–382. https://doi.org/10.1037/h0031619

17.

Ghassemi

Zhang

van Breda

Koutsoumpis

Oostrom

J. K.

Holtrop

de Vries

R. E.

(2024). Unsupervised multimodal learning for dependency-free personality recognition. IEEE Transactions on Affective Computing, 15(3), 1053–1066.

18.

Gnambs

(2015). Facets of measurement error for scores of the Big Five: Three reliability generalizations. Personality and Individual Differences, 84, 84–89. https://doi.org/10.1016/j.paid.2014.08.019

19.

Grossman

Taylor

E. W.

(2007). Toward understanding respiratory sinus arrhythmia: Relations to cardiac vagal tone, evolution and biobehavioral functions. Biological Psychology, 74(2), 263–285. https://doi.org/10.1016/j.biopsycho.2005.11.014

20.

Chen

Wang

(2024). Exploiting smartphone voice recording as a digital biomarker for Parkinson’s disease diagnosis. IEEE Transactions on Instrumentation and Measurement, 73, Article 4006612. https://doi.org/10.1109/TIM.2024.3391339

21.

Hecker

Steckhan

Eyben

Schuller

B. W.

Arnrich

(2022). Voice analysis for neurological disorder recognition: A systematic review and perspective on emerging trends. Frontiers in Digital Health, 4, Article 842301. https://doi.org/10.3389/fdgth.2022.842301

22.

Hirschmüller

Schmukle

S. C.

Krause

Back

M. D.

Egloff

(2018). Accuracy of self-esteem judgments at zero acquaintance. Journal of Personality, 86(2), 308–319. https://doi.org/10.1111/jopy.12316

23.

Hosmer

D. W.

Lemeshow

Sturdivant

R. X.

(2013). Applied logistic regression (3rd ed.). Wiley.

24.

Izenman

A. J.

(2008). Linear discriminant analysis. In Casella

Fienberg

Olkin

(Eds.), Modern multivariate statistical techniques (pp. 237–280). Springer. https://doi.org/10.1007/978-0-387-78189-1

25.

Dong

Peng

Feng

Liu

Shi

(2024). Depressive and mania mood state detection through voice as a biomarker using machine learning. Frontiers in Neurology, 15, Article 1394210. https://doi.org/10.3389/fneur.2024.1394210

26.

Kaplan

R. M.

Saccuzzo

D. P.

(2018a). Reliability. In Kaplan

R. M.

Saccuzzo

D. P.

(Eds.), Psychological testing: Principles, applications, & issues (9th ed., pp. 99–130). Cengage Learning.

27.

Kaplan

R. M.

Saccuzzo

D. P.

(2018b). Validity. In Kaplan

R. M.

Saccuzzo

D. P.

(Eds.), Psychological testing: Principles, applications, & issues (9th ed., pp. 133–157). Cengage Learning.

28.

Klatt

D. H.

Klatt

L. C.

(1990). Analysis, synthesis, and perception of voice quality variations among female and male talkers. The Journal of the Acoustical Society of America, 87(2), 820–857. https://doi.org/10.1121/1.398894

29.

Koo

T. K.

M. Y.

(2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155–163. https://doi.org/10.1016/j.jcm.2016.02.012

30.

Kosinski

Stillwell

Graepel

(2013). Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences, 110(15), 5802–5805. https://doi.org/10.1073/pnas.1218772110

31.

Leary

M. R.

Kowalski

R. M.

(1990). Impression management: A literature review and two-component model. Psychological Bulletin, 107(1), 34–47. https://doi.org/10.1037/0033-2909.107.1.34

32.

LeCun

Bengio

Hinton

(2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539

33.

Lee

Kim

E. J.

(2023). Voice as a biomarker to detect acute decompensated heart failure: Pilot study for the analysis of voice using deep learning models. Circulation, 148(1), Article A13566. https://doi.org/10.1161/circ.148.suppl_1.13566

34.

Liao

Song

Gunes

(2024). An open-source benchmark of deep learning models for audio-visual apparent and self-reported personality recognition. IEEE Transactions on Affective Computing, 15(3), 1590–1607.

35.

Liu

Wen

Zhu

(2022). Ecological recognition of self-esteem leveraged by video-based gait. Frontiers in Psychiatry, 13, Article 1027445. https://doi.org/10.3389/fpsyt.2022.1027445

36.

Lovibond

P. F.

Lovibond

S. H.

(1995). The structure of negative emotional states: Comparison of the Depression Anxiety Stress Scales (DASS) with the Beck Depression and Anxiety Inventories. Behaviour Research and Therapy, 33(3), 335–343. https://doi.org/10.1016/0005-7967(94)00075-U

37.

Luhtanen

Crocker

(1992). A Collective Self-Esteem Scale: Self-evaluation of one’s social identity. Personality and Social Psychology Bulletin, 18(3), 302–318. https://doi.org/10.1177/0146167292183006

38.

Lyubomirsky

Lepper

H. S.

(1997). A measure of subjective happiness: Preliminary reliability and construct validation. Social Indicators Research, 46, 137–155.

39.

Martens

Greenberg

Allen

J. J. B.

(2008). Self-esteem and autonomic physiology: Parallels between self-esteem and cardiac vagal tone as buffers of threat. Personality and Social Psychology Review, 12(4), 370–389. https://doi.org/10.1177/1088868308323224

40.

Jordan

(2002). On discriminative vs. Generative classifiers: A comparison of logistic regression and naive Bayes. In Dietterich

Becker

Ghahramani

(Eds.), Advances in neural information processing systems 14 (NIPS) (pp. 841–848). MIT Press.

41.

Orth

Robins

R. W.

(2022). Is high self-esteem beneficial? Revisiting a classic question. American Psychologist, 77(1), 5–17. https://doi.org/10.1037/amp0000922

42.

Oyserman

Coon

H. M.

Kemmelmeier

(2002). Rethinking individualism and collectivism: Evaluation of theoretical assumptions and meta-analyses. Psychological Bulletin, 128(1), 3–72. https://doi.org/10.1037/0033-2909.128.1.3

43.

Pedregosa

Varoquaux

Gramfort

Michel

Thirion

Grisel

Blondel

Prettenhofer

Weiss

Dubourg

Vanderplas

Passos

Cournapeau

(2011). Scikit-learn: Machine learning in Python. Machine Learning in Python, 12, 2825–2830.

44.

Phan

L. V.

Rauthmann

J. F.

(2021). Personality computing: New frontiers in personality assessment. Social and Personality Psychology Compass, 15(7), Article e12624. https://doi.org/10.1111/spc3.12624

45.

Porges

S. W.

(2007). The polyvagal perspective. Biological Psychology, 74(2), 116–143. https://doi.org/10.1016/j.biopsycho.2006.06.009

46.

Pulford

B. D.

Johnson

Awaida

(2005). A cross-cultural study of predictors of self-handicapping in university students. Personality and Individual Differences, 39(4), 727–737. https://doi.org/10.1016/j.paid.2005.02.008

47.

Rogers

H. P.

Hseu

Kim

Silberholz

Dorste

Jenkins

(2024). Voice as a biomarker of pediatric health: A scoping review. Children, 11(6), Article 684. https://doi.org/10.3390/children11060684

48.

Rosenberg

(1965). Society and the adolescent self-image. Princeton University Press.

49.

Sedikides

Rudich

E. A.

Gregg

A. P.

Kumashiro

Rusbult

(2004). Are normal narcissists psychologically healthy?: Self-esteem matters. Journal of Personality and Social Psychology, 87(3), 400–416. https://doi.org/10.1037/0022-3514.87.3.400

50.

Settanni

Marengo

(2015). Sharing feelings online: Studying emotional well-being via automated text analysis of Facebook posts. Frontiers in Psychology, 6, Article 1045. https://doi.org/10.3389/fpsyg.2015.01045

51.

Shrout

P. E.

Fleiss

J. L.

(1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420–428. https://doi.org/10.1037/0033-2909.86.2.420

52.

Song

Shao

Jaiswal

Shen

Valstar

Gunes

(2023). Learning person-specific cognition from facial reactions for automatic personality recognition. IEEE Transactions on Affective Computing, 14(4), 3048–3065.

53.

Sowislo

J. F.

Orth

(2013). Does low self-esteem predict depression and anxiety? A meta-analysis of longitudinal studies. Psychological Bulletin, 139(1), 213–240. https://doi.org/10.1037/a0028931

54.

Suman

Saha

Gupta

Pandey

S. K.

Bhattacharyya

(2022). A multi-modal personality prediction system. Knowledge-Based Systems, 236, Article 107715. https://doi.org/10.1016/j.knosys.2021.107715

55.

Sun

Zhang

Liu

Zhu

(2017). Self-esteem recognition based on gait pattern using Kinect. Gait & Posture, 58, 428–432. https://doi.org/10.1016/j.gaitpost.2017.09.001

56.

Tian

(2006). Shortcoming and merits of Chinese version of Rosenberg (1965) self-esteem scale. Psychological Exploration, 26(2), 88–91 (in Chinese).

57.

Tracy

J. M.

Özkanca

Atkins

D. C.

Hosseini Ghomi

(2020). Investigating voice as a biomarker: Deep phenotyping methods for early detection of Parkinson’s disease. Journal of Biomedical Informatics, 104, Article 103362. https://doi.org/10.1016/j.jbi.2019.103362

58.

Vinciarelli

Mohammadi

(2014). A survey of personality computing. IEEE Transactions on Affective Computing, 5(3), 273–291. https://doi.org/10.1109/TAFFC.2014.2330816

59.

Wang

Liu

Wang

Zhu

(2021). How can people express their trait self-esteem through their faces in 3D space? Frontiers in Psychology, 12, Article 591682. https://doi.org/10.3389/fpsyg.2021.591682

60.

Wang

Kosinski

(2018). Deep neural networks are more accurate than humans at detecting sexual orientation from facial images. Journal of Personality and Social Psychology, 114(2), 246–257. https://doi.org/10.1037/pspa0000098

61.

Watson

Clark

L. A.

Tellegen

(1988). Development and validation of brief measures of positive and negative affect: The PANAS scales. Journal of Personality and Social Psychology, 54(6), 1063–1070. https://doi.org/10.1037/0022-3514.54.6.1063

62.

Yao

Yang

Cui

Wang

(2023). MiniRBT: A two-stage distilled small Chinese pre-trained model. arXiv. https://doi.org/10.48550/ARXIV.2304.00717

63.

Yao

Wang

Zhang

Yang

Peng

Chen

Xie

Lei

(2021). WeNet: Production oriented streaming and non-streaming end-to-end speech recognition toolkit. arXiv. https://doi.org/10.48550/arXiv.2102.01547

64.

Yeagley

Morling

Nelson

(2007). Nonverbal zero-acquaintance accuracy of self-esteem, social dominance orientation, and satisfaction with life. Journal of Research in Personality, 41(5), 1099–1106. https://doi.org/10.1016/j.jrp.2006.12.002

65.

Zang

Wang

Yang

(2025). Beyond the text: Voice as a stable marker of self-esteem. Journal of Research in Personality, 117, Article 104636. https://doi.org/10.1016/j.jrp.2025.104636

66.

Zang

Yang

(2025). Dynamic facial emotional expressions in self-presentation predicted self-esteem. Behavioral Sciences, 15(5), Article 709. https://doi.org/10.3390/bs15050709

67.

Zeigler-Hill

(2011). The connections between self-esteem and psychopathology. Journal of Contemporary Psychotherapy, 41(3), 157–164. https://doi.org/10.1007/s10879-010-9167-8

68.

Zhang

Peng

Song

Yao

Xie

Yang

Pan

Niu

(2022). WeNet 2.0: More productive end-to-end speech recognition toolkit. arXiv. https://doi.org/10.48550/arXiv.2203.15455

69.

Zhang

Yang

Chen

Zhang

Leng

Zhao

(2024). Deep learning-based multimodal emotion recognition from audio, visual, and text modalities: A systematic review of recent advancements and future prospects. Expert Systems With Applications, 237, Article 121692. https://doi.org/10.1016/j.eswa.2023.121692

70.

Zhao

Pietikäinen

(2023). Facial micro-expressions: An overview. Proceedings of the IEEE, 111(10), 1215–1235. https://doi.org/10.1109/JPROC.2023.3275192

Self-Esteem Assessment Based on Self-Introduction: A Multimodal Approach to Personality Computing

Abstract

Keywords

Get full access to this article

References