Abstract
Effective consumer centred healthcare incorporates consumer and clinician perspectives into decision making, in addition to traditional quantitative measures. This information is usually captured in qualitative data that requires manual analysis. Healthcare systems often lack resources to systematically incorporate qualitative feedback into decision making. Semi-automated content analysis tools, such as Leximancer, provide an efficient and objective alternative to time consuming manual content analysis (MCA). Literature on the validity of Leximancer in healthcare is sparse. This study seeks to validate Leximancer against MCA on a broad emotive conversational dataset gathered in a healthcare setting. At the outset of the COVID-19 pandemic, a large Australian hospital and health service conducted interactive webcasts with staff to provide updates and answer questions. A manual thematic analysis and a Leximancer content analysis were conducted independently on 20 webcast transcripts. The findings were compared, along with the time required to the complete each analysis. The Leximancer analysis identified nine concepts, while the manual analysis identified 12 concepts. The Leximancer concepts mapped to five of the concepts identified in the manual analysis, which accounted for 74% of mentions tagged in the text through the manual analysis. Leximancer missed concepts which required an emotional or contextual interpretation. The Leximancer analysis took 21 hours (excluding time to learn the program), compared to 73 hours for the manual analysis. Semi-automated content analysis provides an efficient alternative to manual qualitative data analysis, shifting it from a small-scale research activity to a more routine operational activity, albeit with some limitations. This is critical to be able to utilise at scale the rich narratives from consumers and clinicians in healthcare decision making.
Introduction
Healthcare has traditionally been monitored using quantitative data. This has been effective to date, however this bias to quantitative data did result in the opinions, feelings and perspectives of consumers and clinicians, often recorded as rich qualitative data, being excluded from healthcare decision making (Rusinová et al., 2009). As a result, healthcare decision making has traditionally been centred around easy to measure quantitative data such as activity and finance.
This approach is no longer fit for purpose, and the rise of consumer centred healthcare, has forced the collection and analysis of qualitative data in order to capture the consumer and clinician perspective (Pope et al., 2000; Al-Busaidi, 2008). There has been a rapid rise in enterprise applications for healthcare systems to record narrative comments from consumers, with the expectation being that staff at the healthcare provider will manually read all the narrative, synthesise these data manually and respond in a timely manner (Rozenblum et al., 2013). Although collection of qualitative data is certainly occurring at scale, this is a recent development, and analysis has been limited to individual research projects. Few healthcare providers are currently resourced to promptly manually read and code the qualitative data as an operational activity at scale to develop actionable insights (Gleeson et al., 2016).
This rise in the volume of qualitative data being collected and the resultant increased popularity of qualitative data analysis methods, combined with advances in artificial intelligence have resulted in increased utilisation of computational tools in qualitative analysis that automatically or semi-automatically analyse and categorise large bodies of text (Krippendorff, 2019; Nelson, 2020). These semi-automated content analysis (SACA) tools can analyse larger volumes of data more efficiently than manual content analysis (MCA), reduce the human bias and subjectivity inherent in MCA and improve reproducibility (Grimmer & Stewart, 2013). SACA tools generally follow three steps, similar to MCA, the first of which is identifying initial terms which will be used to classify the text. Next, SACA tools define concepts, by creating a thesaurus for each initial term. Finally, the text is classified according to the concepts defined (Nunez-Mir et al., 2016).
Leximancer is one SACA tool which has been increasingly used in research. Leximancer uses machine learning to quickly analyse large bodies of text data and identify the salient points. Leximancer uses takes a completely unsupervised approach, meaning that it does not utilise any pre-defined dictionaries or datasets to understand the meaning of words and which words are similar (Smith & Humphreys, 2006). Rather it infers similarity of meaning and context by detecting patterns of how words travel together throughout each piece of text and provided by the user (Smith & Humphreys, 2006). This approach of being completely grounded in the text means that Leximancer can be used to analyse text in any language (Smith & Humphreys, 2006). Leximancer provides a graphical user interface which allows users to upload a file containing a large amount of text. Leximancer’s machine learning algorithms will analyse the data, and identify concepts from words that travel together throughout a body of text and group them into related themes (Smith & Humphreys, 2006). The software also allows users the option to specify concepts of interest to perform a deductive analysis (Smith & Humphreys, 2006). The output can be refined by tweaking other user settings such as combining concepts, removing filler words, etc. (Smith & Humphreys, 2006). The combination of objective approach, speed and customisability offered by Leximancer makes it a strong SACA tool.
There is a paucity of studies examining the validity of SACA, especially in healthcare. We identified only six studies that investigated the validity of Leximancer findings by comparing its output to findings from a MCA on the same dataset (Laura & Jameson, 2020; Harwood et al., 2015; Wilk et al., 2019; Sotiriadou et al., 2014; Penn-Edwards, 2010; Maccarthy & Shan, 2021). Four of those studies analysed non-emotive topics such as risk management (Harwood et al., 2015), brand advocacy (Wilk et al., 2019), sports management (Sotiriadou et al., 2014) and literacy (Penn-Edwards, 2010) and one study analysed the semi-emotive topic of employee engagement (Laura & Jameson, 2020). Of these five studies, one study found that all concepts identified in the manual analysis had an equivalent in the Leximancer analysis and vice-versa (Penn-Edwards, 2010). Four studies found more than half of the MCA themes were mirrored through Leximancer; two of which reported Leximancer found additional concepts or details that added to the MCA (Laura & Jameson, 2020; Sotiriadou et al., 2014). One manual phenomenography study also verified the relationships between concepts presented in Leximancer’s visual output (Penn-Edwards, 2010). Two studies highlighted that Leximancer is not able to incorporate the researchers’ insight to identify concepts which require more contextual knowledge (Harwood et al., 2015; Wilk et al., 2019) and two suggested that Leximancer and MCA should be used together to complement each other (Wilk et al., 2019; Sotiriadou et al., 2014). The remaining study analysed emotive data, using Trip Advisor reviews from visitors to Australia and New Zealand Army Corps (ANZAC) commemorative sites, which are of significant sentimental value (Maccarthy & Shan, 2021). This study found that Leximancer was not able to identify the underlying emotional concepts that were found in the MCA, such as community and gravitas (Maccarthy & Shan, 2021). Further, the existing validation studies have been limited in the types of data they have been applied to, using transcripts from one-on-one structured interviews (Laura & Jameson, 2020; Harwood et al., 2015; Sotiriadou et al., 2014), written survey responses (Penn-Edwards, 2010) or social media data (Maccarthy & Shan, 2021; Wilk et al., 2019). Language is used differently across mediums – for example formal written text is very different to online chats, and interviews vary from language used conversationally – so it is important that the SACA validation research spans the breadth of how we communicate.
This study will explore whether qualitative analysis using Leximancer produces the same findings as a MCA when applied to a broad conversational dataset which includes both narrative and emotional communication in an interactive webcast healthcare setting. The primary aim of this study is to validate (1) the concepts identified and (2) the relative concept prevalence through Leximancer as compared to the MCA findings (a novel quantitative method to compare findings). Second, we will compare the time required, usability and reproducibility of each method of analysis. This study will add to the literature examining strengths and limitations of SACA, specifically using Leximancer, identifying issues that researchers should be aware of when considering SACA for datasets which include both factual and emotional concepts.
Methods
Setting
A large Australian hospital and health service (HHS) conducted daily interactive webcasts at the outset of the COVID-19 pandemic in March and April 2020. The healthcare service has 21,000 staff and a budget of approximately $3.5 billion AUD. The webcasts served as an information conduit between the executive team and staff in a rapidly changing environment and provided opportunity for staff to ask questions directly of the HHS executives. Uncertainty and fear were high in the early stages of the pandemic and staff sought information in these sessions on all aspects of COVID-19 and the potential impact on the hospital, themselves, and their families.
Data Collection
Example of Verbatim Content From Third VIDCAST.
Leximancer Analysis
Leximancer version 4.5 was used to analyse the transcripts (Leximancer Pty Ltd, 2018). Leximancer creates concepts from words that often occur together throughout the text (Leximancer Pty Ltd, 2018). Leximancer then groups related concepts together into themes (Leximancer Pty Ltd, 2018). The terms concepts and themes will be used here after.
One researcher (TE) loaded the transcripts, which included the executive team members introduction and update, staff questions read aloud and answered, as text files into Leximancer. An initial ‘concept map’ was created without altering any settings. The concept map is a visual representation of the concepts and themes identified. Each concept is displayed as a small dot, with the size of the dot representing the degree of connectivity to other concepts – larger dots are more connected. Lines join concepts which are highly connected. Themes are represented by larger bubbles. The bubbles are heat mapped to represent their relative importance/prevalent – warmer colours (red, orange, yellow) represent more prevalent themes and cooler colours (green, blue, purple) represent less prevalent themes.
The concepts identified in the initial concept map were reviewed in conjunction with their associated text segments through the Leximancer query interface. The query interface allows users to search for and read text segments which were tagged with a particular concept. Concepts which offered little Lexical value (e.g. filler words, patterns of speech) in this context were identified and added as stop words, meaning Leximancer ignores those words when building concepts; this was an iterative process. Concepts were merged where they were considered sufficiently similar (e.g. a word and its plural). Compound concepts were created where two or more words formed a commonly used phrase. The concept map was initially observed at a summary level through ‘zooming out’, by moving Leximancer’s ‘Theme Size’ toggle to 50% which summarises the concepts into approximately 6 themes. Then individual themes and concepts were investigated in more detail by ‘zooming in’ to a smaller ‘Theme Size’ to see the concepts in more detail, as described by (Haynes et al., 2019). Note that zooming in and out doesn’t change the underlying relationships but provides different visualisations which is useful in the process of understanding the concepts in more detail.
Multiple theme sizes were trialled to arrive at the final concept map. Using Leximancer’s interface and query function, a sample of text segments coded with each concept across all transcripts were read until the concept was well understood and no new information emerged. The researcher then summarised the insights into main concepts and themes, regrouping where necessary based on their contextual knowledge and interpretation of the data, in conjunction with the proximity of themes and concepts in the Leximancer concept map, Leximancer counts the number of text segments coded with each concept, known as the ‘number of hits’; the number of hits for a specific concept was divided by the total number of hits across all concepts in order to calculate the relative prevalence of each concept.
Manual Analysis
A MCA of this data which was completed independently was used as the gold standard for comparison; this paper has been published with the full methodological detail for the analysis described (Strong et al., 2021). To briefly summarise the manual method, one researcher (JS) immersed herself in the data, reading and re-reading the Word files of each of the 20 transcripts. An inductive approach was used, where categories were derived from the data and A manual thematic analysis was used (Braun & Clarke, 2006). This researcher used open coding, writing down headings as she read through the first transcript. The headings were then grouped into broader themes (Elo & Kyngäs, 2008). For example, the headings of PPE (“At the point where we trigger everybody is wearing PPE inside ED, any clinicians visiting or consulting into the ED will be provided and expected to use the same standard of PPE”); Infection control (“One of the things we are doing is providing or improving capacity and access so that people who do want to shower and change at work, then they do that”); and Workforce issues (“Some of those things we’ve had time to prepare around: accommodation for frontline workers; accommodation for patients; all our processes of how we do redeployment”) were grouped into the theme of Accurate Information. A second researcher independently coded the first transcript, and then the two compared their coding framework. The first researcher then analysed the remaining 19 transcripts using the coding framework. A further three transcripts were double coded by another researcher to ensure inter-coder reliability (O’Connor & Joffe, 2020). The broader research team then read through the findings. The number of times each concept appeared in the text was counted manually and used to calculate the relative prevalence of each concept.
Comparison
After each researcher (TE and JS) independently completed their analysis using Leximancer and MCA respectively, they met to compare findings. They agreed on which concepts from their analysis were similar and which were different. The proportion of concepts which overlapped were calculated, using the relative prevalence percentages from each method. The time require for each method of analysis was estimated in hours and compared.
Results
Leximancer Analysis
The transcripts from the 20 interactive webcasts were analysed in Leximancer, with the default English list of stop words, providing the initial default concept map in Figure 1. The concepts identified by Leximancer were reviewed and 33 words with little lexical meaning were added as stop words (Supplementary Materials 1). This included speech patterns of the hosts (e.g. starting sentences with “in fact”) and frequently used words in the context of the webcast (e.g. question, answer),. The concepts mask and masks were merged (referring to the Personal Protective Equipment [PPE]). The concept map was re-created based on these settings, shown in Figure 2. The text associated with each concept was reviewed by one researcher and was ultimately formed into nine concepts that were placed into three main themes: (1) Information about COVID-19, (2) Keeping staff safe, and (3) Work policies (Description in Table 2; Illustrative quotes in Table 3). Leximancer initial concept map. Leximancer final concept map after removing stop words and merging concepts. Summary of Concepts From Leximancer Analysis. Illustrative Quotes for Concepts From Leximancer Analysis.

‘Information about COVID-19’ was the most prevalent theme with 49% of mentions (2066/4225). This included concepts ‘Current HHS situation and future capacity’ (23% of mentions), ‘COVID-19 epidemiology, transmission, risk and treatment’ (13%), ‘COVID-19 measures’ (9%), and ‘COVID-19 testing’ (4%). Participants were particularly interested in how COVID-19 cases were being managed at this HHS and the capacity for future cases. Further, questions about the epidemiology and testing of COVID-19, as well as how the disease was being measured in the community and the country were of interest.
The second most prevalent theme concerned ‘Work policies’ with 29% of mentions (1236/4225). This included concepts ‘Staff ways of working’ (19% of mentions) and ‘Working from home’ (11%) Participants asked questions on working from home and how policies were being updated to enable this during the pandemic, especially for staff considered vulnerable or living with vulnerable people. Staff were also interested in how other work practices would change through the pandemic.
The final theme concerned ‘Keeping staff safe’ with 22% of mentions (923/4225). This included concepts of ‘Infection control practices’ (15% of mentions), ‘Personal protective equipment’ (4%) and ‘Flu vaccine’ (2%). These safety concerns included conversations about the availability of PPE and how and when it should be worn. There was also discussion of new infection control practices that could or should be implemented in the hospitals.
Manual Analysis
The MCA identified 12 concepts which were grouped into three main themes: Accurate information, Reassurance and support, and Innovation. The full results have been published, but the results are briefly summarised here to enable comparisons and contrasts (Strong et al., 2021).
‘Accurate information’ was the most prevalent theme with 72% of mentions (1216/1683). The concepts were ‘COVID-19’ (29% of mentions), ‘Personal protective equipment’ (19%), ‘Workforce issues’ (10%) ‘Infection Control’ (9%), and ‘Hospital business’ (5%). Most conversations related to information on COVID-19, specifically case numbers, care plans, transmission, epidemiology and testing criteria. Information about PPE supply was equally prevalent. Staff were also interested in infection control practices, hospital business and workforce issues such as job security and leave arrangements.
‘Reassurance and support’ was the second most prevalent theme identified in the MCA with 18% of mentions (306/1683). The concepts were ‘Promoting staff well-being’ (7% of mentions), ‘Adapting to fast changing situations’ (5%) ‘Managing emotions of fear and anxiety’ (4%) and ‘Building connectedness’ (3%). This theme described how the executive staff were able to manage emotions of fear, anxiety and uncertainty and build connectedness in the way they responded to questions. Staff well-being was promoted, and the executive highlighted how the HHS was adapting to the fast-changing situation.
‘Innovation’ was the final theme of the MCA with 10% of mentions (161/1683). The included concepts were ‘New ways of working’ (5% of mentions), ‘Communication’ (2%) and ‘HHS digital agency’ (2%). This theme focused on how the Digital Agency at this HHS was introducing new means of communication, apps and virtual models of care to enable working in this changing environment.
Comparison
A visual comparison of the concepts identified in the two analysis methods is displayed in Figure 3. Of the 12 concepts found in the MCA, five had corresponding concepts in Leximancer; these five concepts were the most prevalent concepts derived from the MCA, representing 74% of all mentions tagged in the text. Leximancer analysis identified one additional concept that wasn’t explicitly identified in the MCA, flu vaccines, which was the least prevalent theme of all Leximancer concepts, accounting for only 2% of all mentions tagged in the text by Leximancer. However, the MCA coded flu vaccines into multiple different concepts – staff well-being, infection control and hospital business depending on the context. Comparison of concepts from manual content analysis and Leximancer content analysis.
The manual analysis identified seven concepts that the Leximancer analysis did not. Only one of the five concepts identified in the Accurate Information theme was not found in the Leximancer analysis, being ‘Hospital Business’, which represents 5% of all mentions tagged manually. The Leximancer analysis did not identify three of the concepts from the ‘Reassurance and support’ theme, with these being ‘Managing emotions of fear and anxiety’, ‘Building connectedness’ and ‘Adapting to fast changing situations’, which account for 12% of all mentions tagged manually.
The ‘Innovation’ theme concepts, ‘HHS Digital Agency’, ‘New ways of working’ and ‘Communication’ were identified by the MCA (9% of all mentions tagged) but not through Leximancer.
The Leximancer analysis took approximately 42 hours to complete. This included the time for one researcher (TE), proficient with using data and analytical software programs, to learn how to use Leximancer. The learning and analysis occurred concurrently; however, it is estimated that to complete this analysis with existing knowledge of how to use Leximancer would take approximately 21 hours. The MCA analysis took approximately 73 hours accounting for the time from both researchers with experience in conducting content analysis. This represents a time savings for the SACA of more than 42% over the MCA, including time to learn Leximancer; or a 71% reduction in time required for the MCA if the user has experience with Leximancer.
Strengths and Limitations of Manual and Leximancer Content Analysis.
Discussion
Qualitative data analysis is rapidly evolving from a research method to an operational activity for healthcare providers who are serious about including consumer and clinician perspective in their data-driven decision making (Lee et al., 2018). How to shift qualitative data analysis from a bespoke research activity to an operational activity which is achievable within constrained healthcare resourcing profiles remains unclear (Sanders et al., 2020). This paper pioneers this concept using semi-automated methods to analyse complex qualitative data from a large Australian healthcare service.
This study of hospital staff COVID-19 webcast conversation transcripts found Leximancer identified most of the concepts that were found through a (MCA) with only 29% of the time (29%–58% depending on Leximancer experience). However, the semi-automated analysis missed concepts which required an emotional or contextual interpretation. The concepts that Leximancer did identify accounted for 74% of mentions tagged manually throughout the text. Leximancer identified one additional concept, flu vaccines, which was not identified as a separate concept in the MCA but was incorporated into multiple other concepts. Leximancer’s concept map provided a high-level overview to orient the user to the main topics which were being discussed in the interactive webcasts, as well as how they were related to each other. It also provided an interactive interface which made delving into the underlying text easy to do. As with MCA, the role of the researcher remains important in interpreting the Leximancer output. This study adds support to the ability for Leximancer to be used for qualitative analysis, while highlighting the limitations that researchers should be aware of when using it.
Validity
Validity is a key criterion for evaluation of SACA (Grimmer & Stewart, 2013; Smith & Humphreys, 2006; Müller-Hansen et al., 2020); our study contributes to the shared knowledge regarding the validity and limitations of SACA tools. To our knowledge this is the first study to compare Leximancer output to a MCA which identified emotive themes from healthcare conversational data such as ‘Building connectedness’, and ‘Managing emotions of fear and anxiety’. An example of text that was identified as ‘Building connectedness’ is a staff member asking: “Do you have family or children at home, if so what measures or precautions are you taking to protect them, after seeing COVID patients?” and the executive team host responding: “Really good question and something that has kept me awake a bit while we were planning. So yes, I do have a little boy who is 15 months old, and I worry about this.“. Leximancer extracted the concepts ‘patients’, ‘COVID’ and ‘home’ from this exchange. One of the text passages labelled in the MCA as ‘Managing fear and emotions’ is the following: “This is a time we’ve got to support each other’s resilience. I look at this and I think about how many stressors we have outside of work. Friends, family, parents. Our fear for them. And all of those play out and come to our work environment. We are more than a sum of just treating patients. We all have complex circumstances outside of work and so I ask you all to be generous with each other. Be kind.“. In this case Leximancer identified the concepts ‘support’, ‘patients’ and ‘work’.
Meaning is derived from more than just individual words; the way in which language is arranged can elicit different interpretations and feelings in the reader. These MCA concepts are based on the researcher’s’ interpretation of how the executives were trying to make their audience feel through how they answered the questions, rather than through words in the text. Hence it is not unexpected that analysing the same data using Leximancer – a tool grounded in the text – did not identify these emotive concepts. This is a similar finding to that of MacCarthy, who concluded that Leximancer was not able to identify emotional concepts that were found through MCA in emotive online reviews of commemorative war sites (Maccarthy & Shan, 2021). Recent research has explored how the interpretation of emotions can differ across cultures and contexts, making it even more difficult for SACA and artificial intelligence technologies to identify such concepts (Van Berkel et al., 2020). Previous studies of social media data have also reported that Leximancer was not able to capture tone of voice (Wilk et al., 2019). This finding suggests that Leximancer alone should not be used in analyses where researchers are trying to identify emotive concepts, and that researchers should be aware of this limitation in the case of trying to identify both information based and emotive concepts.
Concepts identified in the manual analysis, such as ‘hospital business’ and ‘innovation’ were not discovered through Leximancer. The ‘hospital business’ theme is described in the MCA as including logistics, equipment, workflows and patient care. These areas were grouped together into a single concept by the researcher based on her contextual knowledge of what constitutes hospital business; this accounted for 5% of concepts coded in the text. The ‘innovation’ theme included a number of activities the HHS had begun in order to continue operations in the delivery of healthcare in the pandemic; it accounted for 9%. While Leximancer goes beyond word counting and creates a thesaurus for each concept of words that occur together throughout the text and are used in similar ways, it did not bring these terms together to form a concept of sufficient size to be included in the concept map. Given the contextual knowledge required to identify these concepts and their relatively low frequency, it is not unexpected that these concepts were not matched by Leximancer. Leximancer is grounded in the text provided and hence it is unable to incorporate the contextual knowledge of the researchers; this has been reported in other studies comparing Leximancer to MCA (Harwood et al., 2015; Wilk et al., 2019).
We have demonstrated a novel method of quantifying the validity of SACA findings relative to a gold standard MCA. The six SACA validation studies we identified all provided a narrative description of the differences compared to MCA; three of them also gave a simple count of the number of concepts or themes which were and were not matched in Leximancer (Laura & Jameson, 2020; Maccarthy & Shan, 2021; Penn-Edwards, 2010). In addition to this comparison of the number of concepts, we calculated the relative prevalence of each concept mentions tagged throughout the text and used the proportion of those mentions which were matched by Leximancer as the primary measure to validate the completeness of the SACA findings. This added level of detail is crucial to consider, as not all concepts are of equal prevalence or importance. This method of comparison has provided stakeholders comfort with SACA using Leximancer. We believe this measure is useful in addition to a qualitative narrative summary and would encourage researchers to adopt and expand on this method in future SACA validation studies.
Time Required, Usability, Subjectivity, and Reproducibility
Qualitative data analysis is a time-consuming task, and a well-documented benefit of SACA is that can incorporate large volumes of text much more quickly than MCA (Nunez-Mir et al., 2016; Penn-Edwards, 2010; Smith & Humphreys, 2006). This study confirmed this, with the Leximancer analysis taking 71% less time than required to complete the MCA (excluding the time required to learn how to use Leximancer). The Leximancer program reads all the text provided, extracting only the portions tagged with each concept into the interface for the researcher to read. It also removes the need for text to be double-coded, which is best practice in MCA (O’Connor & Joffe, 2020).
The Leximancer software is easy to use, the interface is straight forward and producing initial results can be achieved in a matter of minutes. However, a key step to refining the results is for the researcher to adjust the settings, such as adding stop words and merging concepts. Leximancer’s concept map and the ability to adjust the level of detail provides a useful overview of the results. The query window makes it easy for researchers to read text segments coded with each concept. However, the researcher’s role in interpreting the results is key. There are inherent similarities in what SACA is doing to the MCA methodology which focuses on pattern recognition (Haynes et al., 2019; Janasik et al., 2009; Yu et al., 2011). Hence it is advisable that those using Leximancer have some familiarity with MCA methods such that they understand what Leximancer is doing and how to interpret the output. The utility of SACA output is likely to be increased if the user has more knowledge of the topic and context of the text (Nunez-Mir et al., 2016).
SACA offers a more objective alternative to MCA, by reducing the human bias involved (Smith & Humphreys, 2006). This is important in order to be able to analyse qualitative data at scale, such as in the healthcare context where thousands of qualitative patient responses and physician notes are captured each day. If this was to be done using MCA many people would be required to analyse the data, each introducing their own biases, and reducing consistency of the analysis. Conscious or unconscious human bias may also feed into the analysis by highlighting topics which are known to be important in the health service, or their key performance indicators. A limitation of performing SACA at scale is that issues raised by a small number of people are unlikely to be highlighted, which potentially increases theris of not including feedback from underrepresented populations.
Another important criterion in evaluating SACA is reproducibility (Smith & Humphreys, 2006; Müller-Hansen et al., 2020); this has long been a key criterion in traditional quantitative scientific research, and has been considered to have been a weakness of qualitative research (Noble & Smith, 2015). Use of SACA tools, and the ability to publish the code and settings that were used in the analysis helps to improve reproducibility in this field (Müller-Hansen et al., 2020; Thompson et al., 2014). Leximancer facilitates this by allowing stop words and other settings to be downloaded and saved for reuse. Further SACA methods minimise the impact of human bias in interpreting a text compared to MCA (Nunez-Mir et al., 2016; Penn-Edwards, 2010; Smith & Humphreys, 2006), aiding reproducibility.
SACA offers a more efficient, reproduceable and objective method of pattern recognition, albeit with some limitations around identifying emotive and contextual concepts that researchers should be aware of when applying it.
Study limitations
This study has some limitations which should be acknowledged. In any comparison of qualitative analyses, it is highly unlikely that two independent researchers would arrive at the same results, even if they were using the same method (Harwood et al., 2015), so some differences in the results are to be expected. The researchers involved in this study both participated in the live interactive webcasts sessions as part of their employment, and so may have had some preconceived ideas of their content. However, they did not discuss their findings on this data until both analyses were finalised to maintain independence. This study aimed to validate Leximancer results through comparison with MCA; however some research warns against using manual analysis as the gold standard (Song et al., 2020). The MCA being utilised in this study has followed a well-established process, including double-coding, and has undergone peer review as part of the publication process (Strong et al., 2021), however it may still reflect some author bias. Some, albeit less explicit, human bias is likely to be reflected in the Leximancer analysis as human input and interaction is required in defining the ‘stop words’ and interpreting the output (Nunez-Mir et al., 2016).
Conclusion
There is a growing body of literature regarding the role of SACA tools in qualitative research. This study found that a SACA tool, Leximancer, had 74% concordance with a MCA of the same data. However, this SACA tool did not identify emotive concepts and those that required more contextual knowledge. Some studies in this area have noted that the ability of AI in qualitative analysis is exaggerated (Maccarthy & Shan, 2021) or that it should only be used to complement MCA (Harwood et al., 2015; Laura & Jameson, 2020; Singleton et al., 2018). While SACA has some limitations, we believe in the value of SACA tools like Leximancer which allow us to analyse more data more frequently; the growth in data far exceeds the available time and resources for MCA, so in most cases the alternative to SACA is not MCA, it is doing nothing with the data. SACA is important in shifting qualitative data analysis from a small-scale laborious and highly skilled research activity to an operational activity for healthcare which can be delivered at scale. This is critical to be able to include rich narratives from consumers and clinicians in healthcare decision making.
Supplemental Material
Supplemental Material - A Comparison of Leximancer Semi-automated Content Analysis to Manual Content Analysis: A Healthcare Exemplar Using Emotive Transcripts of COVID-19 Hospital Staff Interactive Webcasts
Supplemental Material for A Comparison of Leximancer Semi-automated Content Analysis to Manual Content Analysis: A Healthcare Exemplar Using Emotive Transcripts of COVID-19 Hospital Staff Interactive Webcasts by Teyl Engstrom, Jenny Strong, Clair Sullivan, and Jason D. Pole in International Journal of Qualitative Methods
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
