Abstract
This study investigates the potential of custom generative AI models for evaluating product usability, comparing their assessments to those of human experts. A custom GPT model, trained in social robot usability feedback and Nielsen’s heuristics, demonstrated moderate agreement with human experts in quantitative evaluations. The AI model exhibited good test-retest reliability and provided clear, logical explanations for its scores, correctly identifying usability issues and offering suitable recommendations in many cases. While results indicate AI’s promise as a tool for efficient usability assessments, some limitations were noted in handling ambiguous feedback, suggesting the continued need for human oversight in complex scenarios. The study highlights the potential of AI to reduce resource demands in product development while maintaining adequate assessment quality.
Get full access to this article
View all access options for this article.
