Abstract
This paper examines the potential benefits and limitations of using Artificial Intelligence (AI) technology in qualitative research. Avidnote, an AI platform designed for research, is compared with manual analysis methods for theme generation. The findings show that Avidnote provides fast theme generation, resulting in significant time and cost-saving benefits. However, both similarities and differences between human and Avidnote-generated themes were observed, raising some concerns regarding the internal validity of AI-generated themes compared to traditional manual analysis. Avidnote can be a valuable supplementary tool in the analysis phase of research, as it has the potential to increase efficiency, reduce bias, reveal subtle themes, and help to compare and contrast themes. However, fundamental concerns persist regarding the robustness, generalisability, credibility, reliability and trustworthiness of qualitative research when using AI technologies. Ethical considerations, such as data security and privacy, also need to be addressed in research settings. AI platforms should not be considered a substitute for critical thinking and personal interpretations, as these are unique skills inherent to humans. Researchers must maintain their fundamental role in determining research objectives and interpreting qualitative data to ensure methodological rigour. Future iterations of Avidnote are likely to address current challenges as advancements in AI continue to evolve. Further research is recommended to assess AI tools tailored for research purposes.
Introduction
Technology has rapidly transformed many aspects of our lives in the digital age, and one recent advance in the field of digital technology is the growing number of generative AI programs across many domains. The potential impact of AI technology on different sectors of the labour force has been extensively debated (Gordon, 2023). Many experts predicted that low-skilled industries including manufacturing, retail, and services would be significantly impacted (Harari, 2019). However, others (Hamilton et al., 2023) argue that AI, especially ChatGPT, has provided a fresh perspective on these conversations, and they now question whether even highly skilled positions requiring intricate tasks could eventually be replaced.
AI is also gaining traction as a valuable instrument in other domains, such as research (Longo, 2020). While quantitative research is useful for understanding the frequency or quantity of certain behaviours and lends itself to AI analysis, qualitative research is preferred when seeking to understand the origins and dynamics of our social environment (Burrell & Gross, 2017). Qualitative research utilises rich, detailed data obtained from in-depth interviews, focus groups and similar. The data must be carefully analysed to identify patterns and emerging themes and has been traditionally carried out using manual methods. This involves a team of researchers reading, re-reading and coding interview transcripts line by line, which is a slow and repetitive process. Codes are then compared and grouped to create themes. However, this process is time-consuming and susceptible to potential claims of researcher bias (Galdas, 2017), and it may be desirable to significantly speed up certain aspects of analysis using artificial intelligence.
As far as ‘bias’ in qualitative research is concerned, it refers to any factor that distorts the findings of a study (Florczak, 2021). Qualitative research is a methodical approach used to understand perceptions, experiences, and behaviours by collecting non-numerical data, often through interviews, focus groups, or observations. While qualitative research provides rich, detailed insights, it can also be influenced by biases that may skew the results (Florczak, 2021).
AI technologies present potential solutions for streamlining and expediting research processes. For example, the public availability of programs such as ChatGPT, a sophisticated chatbot developed by OpenAI based on Large Language Model (LLM) 1 , has prompted significant interest regarding the implications of automated data analysis (ChatGPT, 2023). Although ChatGPT has significantly influenced various domains, including research, academia, media, journalism, and health science (Hamilton et al., 2023), it is important to acknowledge that the information generated by ChatGPT through machine learning algorithms is sometimes unreliable, inaccurate and potentially fabricated (Goto & Katanoda, 2023). Although UNESCO (2023) provides global guidelines on GenAI in education and research, to support countries to regulate policies to protect data security, data privacy and enable ethical use of AI, the concerns with ChatGPT include a broad range of ethical issues that need to be systematically addressed (Stahl & Eke, 2024). The European Commission (EU) (2024) also provides guidelines for the use of generative artificial intelligence in academic and scientific publishing (Directorate-General for Research and Innovation, 2024). It emphasizes the need for transparency when researchers use AI tools, recommending that they disclose the specific tools used, how they were applied, and any limitations or biases associated with them. Researchers are urged to protect sensitive information and intellectual property when interacting with AI systems, ensuring that unpublished work is not exposed to potential misuse. This is a particular concern with ChatGPT, the platform is open source and unpublished material is at risk of potential misuse. Despite these limitations, for many individuals, ChatGPT has become synonymous with artificial intelligence, serving as the primary reference point for AI tools in contemporary discourse.
However, AI platforms such as Avidnote, which have robust data privacy and security features, are available but less well known. Avidnote is a sophisticated AI-powered research tool developed by researchers and regulated by EU General Data Protection Regulations (GDPR) for Large Language Models (LLM). In contrast to ChatGPT, data uploaded on the platform remains private, it is not used to train the AI algorithm, nor does it take ownership of any data. Moreover, it is designed specifically to analyse, write and organize research work (Chalmers, 2021), with features including note taking, summarising relevant literature, proofreading, editing, transcribing, coding, research paper structuring and theme generation. It combines aspects of each step required in the research process into one process using different AI templates to efficiently expedite various research tasks and produce research outcomes (Stapleton, 2023). The interface is user-friendly and includes systematically explained step-by-step guidelines (Eldin, 2023), gaining popularity and endorsement from academics and internationally prestigious institutions (Chalmers, 2021). However, it is important to note that, similar to other AI apps, Avidnote may not effectively align with the unique requirements of all research types, so for research projects that are unique and context-specific, manual adjustments may still be necessary (Eldin, 2023).
This paper aims to evaluate the effectiveness and potential utility of using AI technology in qualitative research. To achieve this, we conducted a comparative analysis of themes generated by traditional manual methods and those generated by Avidnote. While we hypothesise that manual methods may allow for a more nuanced understanding of complex qualitative material, especially in relation to culture-specific issues, we also acknowledge the possibility that AI could offer alternative insights that are less susceptible to researcher bias.
Methods
Study Design
This study aims to compare qualitative themes developed by members of the research team with those generated by Avidnote. The focus of this paper is not on reporting participants’ experiences but rather on an examination and comparison of the methodologies used to assess these. For this study, a premium version of Avidnote was used (Chalmers, 2021).
Context
The research team had been working on a qualitative study examining dual relationship challenges faced by Muslim professionals providing support in their own community following a terror attack (Sulaiman-Hill et al., 2021). In-depth interviews were conducted with 20 participants, selected using purposive sampling from a population of Muslim professionals working in government agencies, support providers and academic institutions. Participants were provided with detailed information about the study and their written consent was obtained prior to data collection. The study received ethics approval from the University of Otago Human Ethics Committee (Health), 22/153.
Prior to commencing interviews with the main participants, preliminary interviews were conducted with two members of the research team, designated DR01 and DRD03. This pilot phase allowed us to refine our interview process and address any potential issues. These individuals had differing experiences in the aftermath of the attacks and voluntarily consented to participate in interviews regarding their perspectives. Conducting these interviews prior to recruiting other participants was a strategic decision aimed at preserving the integrity of their insights, and thereby minimising the risk of potential contamination from subsequent interviews. The data collected from these initial interviews were included in the main analysis, contributing valuable insights to our findings.
During the analysis phase, the team discussed the potential applications of AI technology in the field of research. These two participants expressed keen interest in using their own interview transcripts for a comparative analysis between manual and AI-theme generation. Both willingly provided additional written consent for their anonymised transcripts to be used for this.
Data Collection
Interviews were conducted individually, in English, following a semi-structured interview guide (Appendix A). Questions were formulated to be concise, and straightforward providing participants the opportunity to share contemplative anecdotes. The average duration of interviews was 30-40 minutes.
Data Analysis - Manual
The qualitative analysis in this study followed a two-part procedure. In the first phase a traditional analysis was conducted using Reflexive Thematic Analysis (RTA) (Figure 1). A six-phase process for Reflexive Thematic Analysis adapted from Braun and Clarke (2022).
This is a readily accessible and theoretically adaptable interpretive methodology for analysing qualitative data. It enables researchers to effectively discern and examine recurring patterns or themes within a specific dataset (Braun & Clarke, 2019). The research team, consisting of two research fellows, an assistant research fellow/PhD student, and a research assistant, are all members of the Muslim community who have been working on studies related to the mosque terrorist attacks. They conducted the in-depth interviews with participants and each team member independently listened to the audio files. These were then transcribed manually by an independent audio-typist. The team then read and re-read the transcripts and took preliminary notes. They collectively discussed the transcripts to identify relevant and significant quotes. Specific codes were used to categorise the data, helping to streamline and encapsulate the core themes and shared interpretations. Certain codes that were ambiguous or lacking in relevance, due to their infrequent occurrence within the data were removed. Subsequently, after extensive review, the remaining codes evolved into distinct themes that represented the participants’ lived experiences.
To ensure the credibility, dependability, confirmability, and transferability of our data analysis and interpretation, the following measures were implemented. Trustworthiness was established by member checking (where participants review findings for accuracy), and prolonged engagement (sufficient time spent within the research setting). To establish credibility, the four members of the research team collaboratively reviewed the coded data extracts, ensuring that themes and subthemes were derived from the data in a meaningful and coherent manner. Furthermore, all themes were scrutinised and validated during team meetings before being finalised (Braun & Clarke, 2006). This manual process was very time consuming, and therefore expensive. Transcriptions, for example, took several days each for the typist to process. Initial reading and coding required at least 1−2 hours per interview, and team meetings to discuss coding and theme development took up to 4 hours each.
Data Analysis – Artificial Intelligence
In the next phase, Avidnote was used to generate themes from the interview transcripts of the two participants. The raw transcripts of interviews DR01 and DR03 were used for each analysis. To ensure the internal validity of the themes generated from the interview transcripts, the Avidnote analysis was conducted twice- initially in February and subsequently in October 2024. This repetition allowed for a more robust examination of the consistency and reliability of the identified themes. The ‘Code interview you define how – Reflexive thematic analysis’ option was used for this.
In addition, recent software updates to Avidnote now provide additional options for analysis. These include Analyse data, Analyse interview, Code interview (grounded theory), Code interview – you define how, Narrative analysis, Analyse data with framework, and Analyse interviews (multiple). The raw transcript of DR03 was used to assess the consistency of results using several of these options: • Code interview you define how – Reflexive thematic analysis • Analyse data with framework – Reflexive thematic analysis • Insights – using the Analyse data option • Analyse interview • Narrative analysis
Results
Comparison of Researcher and Avidnote-Generated Themes for DR01.
Similarly, nine thematic categories were identified from the transcript of DR03. Avidnote was then prompted to derive pertinent themes using RTA. The initial analysis produced seven themes and subsequent analysis five (Table 2).
Although there were similarities between researcher- and AI-generated themes, significant differences were also noted. Supervision, for example, was a key theme identified by the team for DR01; however, it was not identified by Avidnote.
Comparison of Researcher and Avidnote-Generated Themes for DR03.
A recent software update now allows multiple interviews to be analysed concurrently, as illustrated in Figure 2, which presents the comparative analytical outcomes for DR01 and DR03. While Avidnote successfully generated relevant themes, it did not fully capture the richness and uniqueness of the interview data within the theme titles. Nevertheless, the combined analysis provided a more comprehensive description of some key issues, with Avidnote also identifying additional themes present in the data. Text box: Combined Analysis provided by Avidnote for DR01 and DR03.
Comparison of different options for Avidnote data analysis for DR03.
Discussion
Avidnote demonstrated the ability to rapidly generate themes, resulting in potential time and cost savings in qualitative data analysis. This is a significant advantage, allowing researchers to analyse large amounts of data within a shorter timeframe. These models generate themes quickly and efficiently by employing algorithms to explore interconnected data and identify similar themes. However, it is important to note that AI-generated themes may lack internal validity compared to the rigorous manual process of data analysis. The internal validity of using Avidnote for qualitative theme generation raises critical questions regarding the consistency and reliability of the thematic analysis process. In the initial analysis conducted in February, seven distinct themes were consistently identified across each interview transcript. However, upon repeating the analysis in October, the algorithm yielded only five themes per transcript. This discrepancy prompts a query about the underlying AI algorithms used by Avidnote, specifically, whether they are calibrated to produce a predetermined number of themes, which could influence the richness and depth of the qualitative analysis. Thus, the question remains whether the AI’s thematic output is inherently limited by its design, potentially compromising the internal validity of qualitative findings derived from this tool.
Previous research (Phillips et al., 2017) suggests that posing higher-level questions is essential for critical thinking and obtaining a comprehensive understanding of a topic. Unlike manual analysis, where themes are derived from interviews, the quality of results obtained from some AI tools primarily depends on the specific questions that are asked of them. However, this is less of an issue for Avidnote which operates on an algorithm-driven approach that eliminates the need for users to provide complex prompt questions, making it more consistent, user-friendly and effective. As mentioned, recent software updates have improved the analysis options, providing more nuanced and responsive results. The results shown in Table 3 demonstrate some consistency in theme generation, with some analysis options also providing relevant quotations and additional commentary. However, one concern is a potential overload of data and a lack of consistency overall, as each new iteration of analysis produces an overwhelming choice of similar but slightly different results.
Researchers now have the option to incorporate AI as a supplementary tool in their analysis. However, although AI can rapidly generate themes, caution is needed to prevent over reliance on it, as critical thinking and personal interpretations are indispensable skills possessed only by human researchers. Artificial Intelligence offers significant benefits in terms of speed and cost-effectiveness compared to manual processes. It can also supplement human analysis by comparing and contrasting themes, or identifying sub-themes that may have been overlooked. In our analysis using Avidnote, qualitative themes were identified that differed from those generated through manual coding. For example, Avidnote highlighted a need for better training and emotional support systems, exposing a critical gap in resources for professionals working in such sensitive community contexts. Although we classified ‘supervision’ as a theme, Avidnote identified more specific and targeted components under this theme, prompting us to revisit our initial analysis. However, the specific focus of the Avidnote theme also failed to capture other important aspects of supervision, including culturally informed guidance that was highlighted by the manual process. Thus, both methods can be considered complementary when finalising themes.
However, concerns remain regarding essential aspects of qualitative research, such as rigor, transferability, credibility, ethical procedures and trustworthiness which are needed to preserve the foundational ontological value of qualitative research. AI platforms, such as Avidnote, may fail to grasp the nuances of a given context, leading to potentially inappropriate responses or advice, as social science research is based on the data from lived experiences (Gibson & Beattie, 2024). Artificial intelligence is a machine-based system devoid of anthropomorphic qualities, which means it inherently lacks personal values and beliefs; consequently, it may not fully encapsulate the deeply held values of individual research participants. Despite this, it is also plausible that AI tools may exhibit better cultural awareness compared to some researchers, particularly those who may be unaware of nuanced cultural differences.
To effectively incorporate AI as supplementary research tool, it is important to establish boundaries and provide a rationale for its inclusion in the research methodology. This should justify how utilization of the AI model contributes to the achievement of the research aims, objectives, and theoretical understanding of the chosen topic or phenomenon.
Despite the insights gained in this study, several limitations need to be acknowledged. Firstly, the comparison of theme generation between human coders and AI was based on an analysis of qualitative data from only two interviews. While this allowed for a focused comparison, it limits the generalisability of findings. Future research could expand this comparison using a larger sample to explore the consistency of themes generated and provide a more comprehensive evaluation of the similarities and differences.
In this study, only Avidnote was assessed. Assessing the effectiveness and suitability of a range of AI tools specifically tailored for research applications would provide a more comprehensive understanding of the strengths and limitations of different platforms for theme generation. Moreover, our research had a specifically narrow focus, so future research could explore the potential of AI to generate themes in diverse contexts and research fields. The field of AI applications is rapidly advancing, and ongoing technological advancements are likely to address some of the challenges identified in this study.
Conclusion
The rapid proliferation of AI technology has transformed many aspects of life, although its potential impact in research settings is still in its early stages. This study aimed to evaluate the effectiveness, potential utility, and viability of using AI in qualitative research by conducting a comparative analysis of themes generated through manual methods and by Avidnote. AI tools, like Avidnote, have some potential to streamline the qualitative data analysis process by rapidly generating themes, potentially saving substantial time in analysis. However, they need to be considered as a complementary component of the research process. AI themes should be cross-checked against manually processed themes to ensure the basic criteria of rigor, credibility, and trustworthiness are maintained. This highlights the continued importance of human involvement in the research process. Ethical issues, including the limitations and potential risks must also be considered, and researchers should carefully select and justify the use of particular AI systems, taking into consideration data security and privacy concerns.
As a supplementary tool, AI can effectively complement the expertise and critical thinking skills of human researchers across various academic domains. By increasing efficiency, reducing human bias, processing large volumes of data and uncovering subtle themes, it has the potential to enhance qualitative research. However, it is crucial to acknowledge that further research and continued technological advancements are necessary to fully leverage its potential in qualitative research. While it is useful as an adjunct in academic research, researcher input currently remains essential for ensuring accurate and reliable outcomes.
Footnotes
Acknowledgements
We sincerely thank Prof. Richard Porter, Prof. Caroline Bell, Prof. Philip Schluter, Dr. Ben Beaglehole for their invaluable support and critical feedback.
Ethical Approval
Ethical approval for that study was received from the University of Otago Human Ethics Committee (Health), 22/153.
Author contributions
S M Akramul Kabir and Ruqayya Sulaiman-Hill drafted the paper. All authors modified the paper as a team. Interviews for data collection were conducted by Ruqayya Sulaiman-Hill, Fareeha Ali, S M Akramul Kabir and Rana Lotfy Ahmed. All authors contributed to the analysis of interview data through discussions in meetings.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research project received no additional funding. The Dual relationship study from which the two interview transcripts were drawn was funded by a grant from the Canterbury Medical Research Foundation (Sulaiman-Hill) MPG 2022.
Conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Note
Appendix
Semi-structured Question Prompts for Interviews.
