Sage Journals: Discover world-class research

Abstract

Many lines of human factors inquiry rely on dialogue data to examine team dynamics, coordination, and trust in automation. While embeddings enable the transformation of such qualitative information into mathematical representations, they can be challenging to explore—a vector of hundreds or thousands of numbers lacks simple interpretation. To address this, we introduce an interactive interface using RShiny, facilitating the exploration of large datasets containing rich affective context, including lexical, and acoustic information linked to experimental outcomes. Motivated by research quantifying trust in automation during the NASA HERA Study, we aim to pinpoint crucial conversation moments to better understand how social dynamics influence individuals’ development of trust in automation. We anticipate that our method will enhance construct transparency for researchers studying dialogue data.

Keywords

trust in automation dynamic trust formation dialogue data interface speech processing text analysis sentiment analysis

Objectives

Microsoft Research’s Future of Work Study (Microsoft, 2023) highlights that digital knowledge is moving from documents to dialogues. “Knowledge is no longer only embedded in documents, spreadsheets, and text—it is now embedded in conversation and can be served up dynamically through that same medium.” Information that was formerly “lost” in ephemeral conversations has transitioned to knowledge that can be referenced and shared.

The sheer volume of information contained in naturalistic dialogue data pushes the boundaries of traditional quantitative analysis. This demonstration, using a novel data exploration tool, tackles the challenge of cleaning massive amounts of naturalistic data to retrieve lexical and acoustic information.

Our research goal is to identify key points in conversations to better understand team dynamics facilitating trust in automation. By developing a reproducible, interactive interface in RShiny, we present a method for exploring massive datasets containing affective indicators linked to dialogue and outcome data.

Quantitative sentiment analysis utilizes text or audio embeddings. The brilliance of embeddings lies in the ability to map qualitative information to a vector space. In a vector space, one can perform standard mathematical operations and derive quantifiable information. Yet, like other machine learning techniques, embeddings trade the power of quantitative analysis for the easy interpretation of dialogue.

Moving from numerically measured data to naturalistic conversational databases pushes researchers (hereafter used interchangeably with the word “analysts”) to develop new skills and curiosities in the data processing stage. Existing data analysis techniques, such as summary statistics, are challenged on data where the mapping from number to qualitative meaning is not in the practitioner’s immediate reference. The vector of hundreds or thousands of numbers that constitute an embedding lacks an easy interpretation, but these embeddings make it possible represents many hours of dialog concisely.

Many human factors research questions can be answered using dialogue data, especially with respect to team and system dynamics, shared knowledge, trust in automation, and organizational coordination. In such contexts, affective insights such as tone of voice and sentiment are key factors that drive outcomes of interest. Determining whether a topic was discussed traditionally requires use of indicators on keywords or sequences in the data or laborious manual coding. This requires knowing, ex-ante, what to look for. For qualitative researchers, developing the codebook can require months of effort and discussion.

Motivated by over-arching work studying trust in teams with the NASA HERA Mission, we develop a reproducible, open-source tool that allows a practitioner to quickly sift through naturalistic dialogue data to identify critical events in conversation. We hope that such a tool can be adopted by others with the goal of studying affect in dialogue, to increase construct transparency between researchers and data.

Approach

Continuous data were recorded via microphone from teams of participants placed in a 45 day simulation of long-duration outer-space exploration. Various tasks were assigned with the aid of a virtual assistant.

During the study, participants wore microphones to continuously capture audio data. Subsequently, the audio recordings were transcribed using whisperX software (Bain et al., 2023). Acoustic features, such as formants and fundamental frequency contours, were extracted using a variety of R’s open-source packages such as tuneR (Ligges et al., 2023) and seewave (Sueur et al., 2008).

Using RShiny we create an interface showing a 2-dimensional Uniform Manifold Approximation and Projection (UMAP) visualization of utterances exchanged via pairwise interactions. UMAP constructs a high-dimensional graph representation of the data, where every data point is connected to its nearest neighbors (McInnes et al., 2018). Then, it optimizes the low-dimensional embedding of the data points such that the low-dimensional representation preserves local and global structures of the high-dimensional space. UMAP minimizes the distance between connected data points in the low-dimensional space, rendering it useful for tasks like visualizing data which are similar in meaning (Dorrity et al., 2020).

Findings

This interface shows how volumes of tens of thousands of hours of audio data can be represented and plotted to show topics. In a lower-dimensional space, patterns, and relationships are easily discernible. In our approach, this showcases how utterance-level text embeddings compare while simultaneously considering qualitative information from utterances.

By incorporating clusters and sina plots to represent acoustic and lexical details, analysts can quickly peruse graph points. Mousing over points displays corresponding text in the interface, allowing the analyst to access underlying transcriptions and microphone information at a given time. We link this to display mission, role, and experimental conditions experienced by individuals during the selected period.

A timeline selector allows us to view windows of lexical and acoustic information at any point in the conversation, across time. This feature enables us to identify critical points in a conversation that may drive or deter trust formation, either through via social dynamics or direct discussion. It offers the ability to summarize the entirety of many conversations within a single space, while still maintaining resolution if an in-depth investigation is necessary. This reduces cognitive demand associated with switching between multiple interfaces and eliminates the need to manually sift through transcriptions.

Takeaways

Our findings highlight a user-friendly, reproducible tool for making sense of granular naturalistic datasets. This way of presenting dialogue data makes a notable improvement along the dimension of understanding and evaluating qualitative data. Our interface addresses the challenge of interpreting embeddings, thereby facilitating the visualization of affective data and aiding in the identification of critical events within conversational databases.

We develop this approach in our research to support a larger agenda: to understand the data and determine which data was relevant as we studied lexical and acoustic information to describe environments that foster or deter trust formation. Abstracting beyond our research agenda, the ability to immediately combine, decompose, access, and interpret dialogue data through lexical and acoustic embeddings is a widely useful innovation. Critically, it is easily adaptable to other datasets. The fundamental advantage of this approach is that it circumvents information loss associated with reducing language to numbers. This saves time and promotes familiarity between the analyst and the data.

By making it easy to explore data, researchers can efficiently explore large volumes of lexical and acoustic data. Moreover, the researcher can focus on the construct underlying data, leading to richer insights from dialogue data.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by NASA Human Research Program No.80NSSC19K0654.

References

Bain

Huh

Han

Zisserman

(2023). WhisperX: Time-accurate speech transcription of long-form audio. Proceedings of Interspeech 2023 (pp. 4489–4493). https://doi.org/10.21437/Interspeech.2023-78

Dorrity

M. W.

Saunders

L. M.

Queitsch

Fields

Trapnell

(2020). Dimensionality reduction by UMAP to visualize physical and genetic interactions. Nature Communications, 11(1), 1537. https://doi.org/10.1038/s41467-020-15351-4

Ligges

Krey

Mersmann

Schnackenberg

(2023). tuneR: Analysis of music and speech. https://CRAN.R-project.org/package=tuneR

McInnes

Healy

Melville

(2018). Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426.

Microsoft. (2023). Microsoft new future of work report 2023. https://www.microsoft.com/en-us/research/publication/microsoft-new-future-of-work-report-2023/

Sueur

Aubin

Simonis

(2008). Seewave, a free modular tool for sound analysis and synthesis. Bioacoustics, 18(2), 213–226.