Sage Journals: Discover world-class research

Abstract

This study investigates the potential of custom generative AI models for evaluating product usability, comparing their assessments to those of human experts. A custom GPT model, trained in social robot usability feedback and Nielsen’s heuristics, demonstrated moderate agreement with human experts in quantitative evaluations. The AI model exhibited good test-retest reliability and provided clear, logical explanations for its scores, correctly identifying usability issues and offering suitable recommendations in many cases. While results indicate AI’s promise as a tool for efficient usability assessments, some limitations were noted in handling ambiguous feedback, suggesting the continued need for human oversight in complex scenarios. The study highlights the potential of AI to reduce resource demands in product development while maintaining adequate assessment quality.

Keywords

user experience Nielsen’s heuristics heuristic evaluation generative AI usability social robot

Get full access to this article

View all access options for this article.

References

Chan

V. K. Y.

(2024). The convergent validity of mobile learning apps’ usability evaluation by popular generative artificial intelligence (AI) robots [Conference session]. International Conferences e-Society 2024 and Mobile Learning 2024 (pp. 247–254).

Davis

F. D.

(1993). User acceptance of information technology: System characteristics, user perceptions and behavioral impacts. International Journal of Man-Machine Studies, 38(3), 475–487.

Gray

W. D.

Salzman

M. C.

(1998). Damaged merchandise? A review of experiments that compare usability evaluation methods. Human–Computer Interaction, 13(3), 203–261.

Hornbæk

(2006). Current practice in measuring usability: Challenges to usability studies and research. International Journal of Human-Computer Studies, 64(2), 79–102.

Kortum

(2016). Usability assessment: How to measure the usability of products, services, and systems. Human Factors and Ergonomics Society.

Landis

J. R.

Koch

G. G.

(1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159–174.

Macha

K. B.

Garikipati

S. D.

Miriyala

N. S.

Venkat

Mittal

(2024). Mitigating bias in generative AI: The role of explainable AI for ethical deployment. International Journal of Scientific Research in Engineering and Management, 8(8), 1–9.

Nielsen

(1994). 10 Usability heuristics for user interface design. https://www.nngroup.com/articles/ten-usability-heuristics/

Norman

Nielsen

(1998). The definition of user experience. https://www.nngroup.com/articles/definition-user-experience/

10.

Pourasad

A. E.

Maalej

(2024). Does GenAI make usability testing obsolete?. arXiv preprint, arXiv:2411.00634.

11.

Smith

T. J.

Kheng

(2021, June). Reliability of heuristic evaluation during usability analysis [Conference session]. Congress of the International Ergonomics Association (pp. 708–714). Springer International Publishing.

12.

Sun

(2011). Meta-analysis of Cohen’s kappa. Health Services & Outcomes Research Methodology, 11(3–4), 145–163.

13.

Thalpage

(2023). Unlocking the black box: Explainable artificial intelligence (XAI) for trust and transparency in AI systems. Journal of Digital Art & Humanities, 4(1), 31–36.

14.

von Eschenbach

W. J

. (2021). Transparency and the black box problem: Why we do not trust AI. Philosophy & Technology, 34(4), 1607–1622.

Smarter UX Evaluations? Comparing AI and Human Experts in Usability Analysis

Abstract

Keywords

Get full access to this article

References