Sage Journals: Discover world-class research

Abstract

Avatar Therapy (AT) is a modern therapeutic alternative for patients with schizophrenia suffering from persistent auditory verbal hallucinations. Its intrinsic therapeutical process is currently qualitatively analyzed via human coders that annotate session transcripts. This process is time and resource demanding. This creates a need to find potential algorithms that can operate on small datasets and perform such annotations. The first objective of this study is to conduct the automated text classification of interactions in AT and the second objective is to assess if this classification is comparable to the classification done by human coders. A Linear Support Vector Classifier was implemented to perform automated theme classifications on Avatar Therapy session transcripts with the use of a limited dataset with an accuracy of 66.02% and substantial classification agreement of 0.647. These results open the door to additional research such as predicting the outcome of a therapy.

Keywords

Avatar therapy artificial intelligence machine learning psychological treatment schizophrenia

Introduction

Psychotherapies imply complex social interactions that require the mobilisation of several cognitive and communication skills from both the patient and the therapist.¹ Qualitative analysis of psychotherapy transcripts is often a methodology used to assess psychotherapies.² However, this type of analysis often relies on human resources and remains rather time-consuming.³ Furthermore, qualitative approaches lack the generation of quantitative data to assess specific components of the intrinsic process of the psychotherapy.⁴ A growing body of researchers is attempting to use mixed methods to account for this problem. Consistency and coherence with the qualitative methodology employed is crucial for the process and is often infringed by the limits of subjectivity, notably when conducted by novice researchers.⁵ Inherent subjectivity biases from the researchers can also lead to issues in the validity and reliability of qualitative assessment of psychotherapeutic transcripts.⁶ These issues can be found in the annotations of in-person therapies, which are time-consuming, and the identification of the different interactions which can be even more complex. Annotations conducted using machine learning could be a potential solution to these issues to diminish this labor-extensive work and develop a systematic method to account for the potential inherent subjectivity biases of human annotators.^7–8

Classification of textual entities is currently achieved in many different areas of medicine.^9–11 Automated classification of text consists of analyzing a textual entity and classifying it under a specific label. This can be done by either supervised learning (i.e. an algorithm is trained with pre-existing data to conduct the classification) or unsupervised learning (i.e. labels are generated by the data).¹² Text classification usually classifies text under two or more categories, which are also knowns as labels, features, or themes.¹³ The classification of therapeutic interactions may be a complex task as therapy sessions can vary in length, as well as content and sessions are dependent on the intrinsic and extrinsic characteristics of both therapist and patient.¹⁴ Few studies have attempted classify therapeutic interactions as large datasets consisting of human annotated transcripts, such as some seen in the field of internet-enabled cognitive behavioral therapy (IECBT), are required for complex machine learning algorithms to adequately learn and classify new information.¹⁵ However, in-person therapies can yield databases that are smaller than the ones generated by IECBT because of the need for human driven annotations which are time and resource demanding. This creates a need to find potential algorithms that can operate on small datasets. A recent systematic review having identified seven studies with small datasets in a psychotherapeutic context highlighted that support vector machine classifier was the best performing algorithm for these constraints.¹⁶ This opens the path for further studies on novel psychotherapeutic therapies for which limited data is available for analysis.

Avatar therapy (AT) is a type of virtual reality therapy. It is a modern therapeutic alternative for patients with schizophrenia suffering from persistent auditory verbal hallucinations (AVH) despite pharmacological treatment.^17–19 Studies on Avatar Therapy taking place at our institution are currently analyzing the use of AT for patients diagnosed with schizophrenia with persistent auditory hallucinations and other mental illnesses. Patients currently enrolled in AT undergo nine weekly sessions of 45-min (one session to create the Avatar and 8 immersive sessions). An Avatar representing the most distressing voice of the patient is animated by the therapist to re-enact the voice in a secure therapeutic environment. The effects of AT on AVH are evaluated via the Psychotic Symptoms Rating Scale (PSYRATS total and PSYRATS-distress scores) and the Beliefs About Voices Questionnaire-Revised (BAVQ-R score) which are commonly used in the field to evaluate the effects of psychotherapy on schizophrenia patients. Other research teams such as Leff’s and Craig’s team in England are also using PSYRATS and BAVQ-R to assess AT.^17,20 Current results demonstrate that therapeutic effects of AT on the distress associated with the voices were significant, as indicated by a net improvement in PSYRATS-distress score.^21–22 In AT, the therapeutic process as a variable of effectiveness is of the upmost importance as there is an additional level of complexity added to the therapeutic dyad between the patient and the therapist, being the inclusion of an avatar. There are changes at a psychological level that are not captured by self-reported such as the PSYRATS and BAVQ-R. Traditional qualitative analyses consider these elements but have their own methodological limitations. The use of machine learning via text mining can be a complement to these analyses. Current attempts to evaluate the therapeutic processes of AT by the means of annotating interactions by themes has been entirely conducted by human evaluators.^17,22,23 Furthermore, in AT, the complexity of having interactions between three individuals (avatar, therapist and the patient) and the fact that it is less readily available to the public limits the extent of useable data for constructing a dataset. The present study is therefore a first attempt at automated text annotation from a small dataset of AT transcripts.

The first objective of this study is to conduct the automated text classification of interactions in AT. Secondly, it is also important for us to assess if this classification is comparable to the classification done by human coders. This would provide an interesting solution for automated therapy annotations and could generate further data to evaluate AT process in relation to its effectiveness.

Methods

Dataset

A dataset was elaborated using 162 manually typed therapy transcripts of 18 randomly selected patients who undertook AT between 2017 and 2020 at our institution, which accounts for up to 10 therapy sessions per patient.²³ The language of the transcripts was Canadian French. Transcripts were manually annotated using the 28 themes described in Beaudoin et al. 2021. Please refer to Figure 1 in Beaudoin’s study for classification of the themes. In the latter study, prior qualitative analysis of AT was conducted.²³ Two research assistants coded each of the individual interactions independently. Robustness of the coding grid was cross validated by the same two research assistants. All the annotations were performed using QDA Miner version 5 (Provalis Research), a qualitative data analysis software. To improve the automated classification, annotations were then extracted as text files (containing from 1 to 40 interactions of the same theme) from QDA Miner and classified under three conceptual databases: Avatar, Patient and Therapist. The conceptual datasets were designed as per represented in Figure 1.

Figure 1.

Conceptual datasets design.

Text files classification per theme from the qualitative analyses are represented in Table 1.

Table 1.

Distribution of text files per theme in the database.

Avatar themes	Number of text files	Patient themes	Number of text files	Therapist theme	Number of text files
Accusations	132	Approbation	67	Therapeutic intervention	106
Omnipotence	72	Self-deprecation	60
Beliefs	89	Self-appraisal	87
Active listening, empathy	82	Other beliefs	88
Incitements, orders	48	Counterattack	86
Coping mechanisms	82	Maliciousness of the voice	59
Threats	31	Negative	129
Negative emotions	49	Negation	90
Self-perceptions	69	Omnipotence	67
Positive emotions	43	Disappearance of the voice	80
Provocation	87	Positive	81
Reconciliation	60	Prevention	101
Reinforcement	78	Reconciliation of the voice	41
		Self-affirmation	104
Total number of text files	922	Total number of text files	1140	Total number of text files	106
Average of text files per theme	71	Average of text files per theme	81	Average of text files per theme	106

After reading from the database, the training sets for the Avatar, Patient and Therapist themes consisted of 691, 855 and 74 documents and the testing sets contained 231, 285, 32 documents respectively.

Machine learning algorithm

A support vector machine classifier was implemented to conduct the automated text classification (classify the different interactions under themes). Support vector machines encompass multiple algorithms that are often used in conjunction with tokenizers to evaluate the textual entities being classified. A tokenizer applies the process of tokenization, which is a method that breaks text into tokens to weight the value of a word or a sequence of words to compare it with other words or sentences.²⁴ A member of the SVM family is the linear support vector classifier (LSVC). LSVC have been consistently more successful in text classification for small databases, such as ours.²⁵ Prior review of algorithms for small datasets indicated that LSVC is the algorithm of choice for our study. LSVC was implemented using Python version 3.6.7 and Scikit-Learn open library.²⁶ It is noteworthy that Python was selected as the main programming language for our study because of its various uses in the domain of artificial intelligence, its flexibility as compared to other programming languages for scientific purposes and its support for many operating systems.²⁷ Combined with a term frequency-inverse document frequency statistic (TF-IDF), it is an algorithm that performs best with text classification as compared to other combinations of SVM with a tokenizer.²⁸ For the TDI-DF tokenization, the TfidfVectorizer offered in the Scitkit-Learn open library was selected as it enables to convert the raw text of the extracted interactions from the to-be annotated interview into numerical vectors. Vectorizers can be customized to account for stop-words. Considering the classification categories were designed in a way that text entities would be separated as per their intrinsic characteristics defined in Beaudoin et al. (2021) which are fundamentally different, the features are assumed to be linearly separable.²³

To ensure best performances for the LSVC algorithm and enhance search strategies, a GridSearchCV (GSCV) was used. A GSCV is useful as it enables the user to test for different hyper-parameters and cross-validate the classification made by the LSVC to determine the best combination of LSVC parameters and the TfidVectorizer parameter variables.²⁹

For each of the conceptual databases, LSVC has been trained using 70% of the available annotated documents and the remaining 30% has been used for testing purposes to establish a statistical probability (predictive score) that an interaction could be adequately classified. The training and testing sets did not overlap as per design recommendations.³⁰ The predictive score refers to the mean accuracy (F1-Score) of the themes being testing. It is to be noted that a 70% training set and 30% testing set is the default setting for the Scikit-Learn LSVC library and is common practice for text classification³¹ This is modelized in Figure 2. A tenfold cross-validation was performed using the KFold model from the Scikit-Learn suite.

Figure 2.

Implementation of LSVC on conceptual databases to derive a Predictive score and a Scott’s Pi.

The annotation process is shown in Figure 3. Each sentence in the transcript was regarded as an interaction.

Figure 3.

Annotation process overview.

Performance analysis and inter-rater agreement

Information about the classification (precision, recall and F1-Score) for each theme was collected using the Classification Report tool, readily available in the Scikit-Learn metrics module. Precision refers to the positive predictive value, whereas recall refers to the sensitivity of the prediction and F1-score to the accuracy. The F1-score is the most widely used measure in text classification, reflecting the accuracy of theme classification and is a balance between precision and recall.³²

While the F1-Score reflects the accuracy of theme classification, it does not account for the expected chance agreement. A Scott’s Pi measure was therefore used to compare the degree of agreement between LSVC automatic classified annotation and the previously agreed ‘’correct’’ annotation by human referees.³³ The benchmark for the Scott’s Pi measure interpretation tends to vary. The benchmark provided by the SAGE Research Methods was used in which a Scott’s Pi of 0.81–1.00 is indicative of an almost perfect agreement, 0.61 to 0.80 of a substantial agreement, 0.41 to 0.60 of a moderate agreement, 0.21 to 0.40 of a fair agreement, 0.0–0.20 of a slight agreement and less than 0 as a poor agreement. This will be compared to the Scott’s Pi agreement obtained between human annotators that was of 0.58 for our database.³³

Results

The LSVC in combination with the TDI-DF was implemented and tested. An un-annotated transcript of an AT immersive session was automatically annotated. Training sets and testing sets are divided between Avatar themes (interactions involving the therapist animating the Avatar), Patient themes (patient’s interactions) and Therapist theme (interactions involving the therapist talking directly to the patients).

The GSCV best selection of parameters for our study and our dataset indicated that document frequency and tolerance parameters are more important than others for our vectorizer and our LSVC classifier. For our vectorizer, a minimum document frequency of 2 and maximum document frequency of 100 are applied. This ensures that a document appears at least 2 times to be considered by the LSVC and the limit of 100 is used to avoid documents that are repeated too frequently. The classifier tolerance was set to 0.001, dual parameters to false and intercept parameters to true. The mean squared error (MSE) training result was 0.88 and the MSE testing result was 0.96.

The Avatar, Patient and Therapist themes classification predictive score reached 70.6%, 61.8% and 100.0% respectively on average after 10 iterations. Considering the Therapist themes consists of solely one category, it was excluded from the overall weighed score. Therefore, an overall weighed score of 66.02% was obtained from the Avatar and Patient classifications. Classification reports in terms of precision, recall and F1-Score for Avatar, Patient and Therapist themes are listed in Table 2.

Table 2.

Precision, recall and F1-score for Avatar, patient and therapist themes.

Avatar theme	Examples (translated from French to English)	Precision (VPP)	Recall (sensitivity)	F1-score (specificity)	Sample test size
Accusations	“You did this’’	0.67	0.53	0.59	30
Omnipotence	“I am the strongest’’	0.53	0.73	0.62	11
Beliefs	“I believe that…’’	0.76	0.59	0.67	32
Active listening, empathy	“Take your time’’	0.76	0.8	0.78	20
Incitements, orders	“Kill yourself’’	0.67	0.91	0.77	11
Coping mechanisms	“I am not happy when you say this’’	1	0.75	0.86	16
Threats	“I will hurt you’’	1	0.91	0.95	11
Negative emotions	“It’s difficult’’	0.72	0.87	0.79	15
Self-perceptions	“The way I see myself is…’’	0.65	0.65	0.65	23
Positive emotions	“I’m fine’’	0.9	0.6	0.72	15
Provocation	“What are you waiting for?	0.43	0.71	0.54	14
Reconciliation	“Should we make peace?’’	0.73	0.73	0.73	15
Reinforcement	“You did well’’	0.7	0.78	0.74	18
	Average scores	0.73	0.71	0.706	231
Patient themes	Examples	Precision (VPP)	Recall(sensitivity)	F1-score (Specificity)	Sample test size
Approbation	“I agree with you’’	0.15	0.14	0.15	14
Self-deprecation	“I could never be confident’’	0.32	0.75	0.44	8
Self-appraisal	“I am kind’’	0.65	0.6	0.63	25
Other beliefs	“You are controlling me’’	0.62	0.58	0.6	26
Counterattack	“I think you are wrong’’	0.5	0.62	0.56	16
Maliciousness of the voice	“You are spreading misfortune to all’’	0.5	0.42	0.45	12
Negative	“It’s difficult’’	0.6	0.58	0.59	31
Negation	“I do not recognize this’’	0.95	0.56	0.7	34
Omnipotence	“I am the best’’	0.54	0.58	0.56	12
Disappearance of the voice	“Go away’’	0.83	0.76	0.79	25
Positive	‘’I’m fine’’	0.71	0.88	0.79	17
Prevention	“I will try to ignore you’’	0.75	0.75	0.75	32
Reconciliation of the voice	“I want to learn to live with you’’	0.55	0.75	0.63	8
Self-affirmation	“I do not think so’’	0.58	0.6	0.59	25
Average scores		0.65	0.65	0.62	285
Therapist theme	Examples	Precision (VPP)	Recall (sensitivity)	F1-score (specificity)	Sample test size
Therapeutic intervention	Any intervention by the therapist	1	1	1	32
Average scores		1	1	1	32

As it can be observed in Table 2, F1-Score for Avatar themes were on average better than for Patient’s themes (0.706 vs 0.62). The theme Provocation performed the worst for the Avatar whereas Maliciousness of the voice performed the worst for Patient.

Agreement between human referees and the classifier reached a Scott’s Pi of 0.647, which is ranked as substantial as per the SAGE Research Methods benchmark for Scott’s Pi interpretation.

Discussion

The objective of this study was to conduct the automated text classification of interactions held during sessions of AT. This was conducted by implementing an LSVC algorithm.

It was possible to obtain a fully automated annotation of an un-annotated AT transcript. The weighed F1 predictive score for the annotation of the themes of the Avatar of 70.1% outperformed the F1-score for the themes of the Patient by 8.8%. The themes of the Avatar scored accuracies ranging from 54 to 94%. Regarding the themes of the Patient, the interaction theme Approbation (interactions in which patients completely or partially approved what their avatar was saying in response to a verbal attack) scored worst than all the other themes with a specificity of 15%. Since this theme contained 67 text files but scored less than themes with much less text files (e.g. Reconciliation of the Voice with 41) this may indicate that our conception of Approbation was perhaps not as distinct and might overlap with other themes. Therefore, a therapist evaluating the therapeutic process of the therapy with the same set of descriptive themes such as the 28 themes used in this study could reflect on this and revise the requirements for an interaction to be classified as Approbation. Considering that the latter is a coping mechanism distinct from other interactions held by the patient, we decided to keep it to explore whether this classification theme would need to be re-evaluated or would increase in homogeneity with additional data. In a similar study, in which medical reports that targeted patient symptoms were classified by severity, reports in their severe category versus their moderate category were often incorrectly classified because there are elements similar to both severe and moderate data that overlap in the definition of these two categories. ³⁴ This supports the idea for better homogeneity amongst individual themes. Other Patient themes that had an overall poor accuracy included Self-deprecation (accuracy of 44% with 19 trained items and 8 tested items), Maliciousness of the voice (accuracy of 45% with 28 trained items and 12 tested items) and Omnipotence 12 (accuracy of 56% with 28 trained items and 12 tested items). This may be justified by the lack of data for these themes as compared to other accurately classified themes. These imbalances may also be explained by the fact that classifiers tend to respond better when there is a similar amount of data for each theme in the database.³⁵ Fortunately, this gave us an initial insight on the therapeutic processes of the therapy as it outlined which interactions occurred potentially less during therapy sessions.

The secondary objective was to assess if this classification is comparable to the classification done by human coders.

Agreement between human referees and the classifier reached a Scott’s Pi ranked as substantial. This is consistent with the Scott’s Pi agreement of 0.588 between the two human referees having used QDA Miner. The fact that it was like the kappa agreement between human annotators indicates that both agreements for the annotation are comparable. For studies with datasets of similar size as the one used in this study (i.e. of less than 10 000 items) such as Balakrishnan, Zolnoori and Singh’s studies reached substantial to moderate agreements.^36–38 Although these agreements appear higher than ours, they employed a different base calculation than the Scott’s Pi agreement. Pairwise agreement formula in Zolnoori’s study is an improvement on Cohen’s Kappa calculated at the level of entities rather than sentences to improve consistency and their result is also validated as substantial. This is also the case in Ewbanks et al.’s study in which they had 24 features and reached a kappa considered as moderate.¹⁵ As a comparison, a small study had 100 document samples and reached a kappa agreement of 0.6 with a Naïve Bayesian algorithm which was noted as acceptable.³⁹

Limitations of this study include the classification F1-score for each theme that can be underestimated because of the 70% training and 30% testing set selection. It has been known to lead to class imbalances and sample representativeness issues.³⁵ Such division in training and testing data is common for text classification, which is the reason we opted for such method. Nevertheless, since our dataset improves with additional data, it will be interesting to test for different training set sizes.³¹ It is to be noted that the transcripts analyzed in our study were typed in Canadian French and we did not find vectorizers that included stop-words (words not to be weighted during the process of tokenization as described above) for Canadian French language. This can yield a lower accuracy as there are insignificant words that can be weighed as part of a word vector.⁴⁰

Conclusion

Machine learning can be beneficial to the field of psychiatry. Automated text classification for AT is a promising avenue to generate quantitative and qualitative data in an efficient way to be readily available to analyze. Our study allowed to automatically annotate an un-annotated transcript basing ourselves solely on a database derived from the transcripts of 18 patients. Reaching an agreement in the same range as human agreement, this study highlights that the task of annotation can be done by a machine, saving resources, which can improve the focus on patients’ needs. It could also sharpen the therapeutic processes by reviewing what went wrong and what went well during AT based on automated text analysis. Nevertheless, this is to our knowledge the first study that outlined the possibility of automated annotation for AT and it highlights the need for more development in this field. These results open the door for additional research such as predicting the outcome of a therapy.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Fondation Pinel, Chaire Eli Lilly Canada de recherche en schizophrénie, Services et recherches psychiatriques AD, Otsuka Canada Pharmaceutical and Le Fonds de recherche du Québec – Santé (FRQS).

Ethical approval

This study was approved by the institutional ethical committee, and written informed consent was obtained from all patients.

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

ORCID iD

Alexandre Hudon

References

Papageorgiou

Loke

Fromage

. Communication skills training for mental health professionals working with people with severe mental illness. Cochrane Database Systematic Reviews 2017; 6(6): CD010006.

Perepletchikova

. On the topic of treatment integrity. Clin Psychol Sci Pract 2011; 18(2): 148–153.

Anderson

. Presenting and Evaluating Qualitative Research. Am J Pharm Edu 2010; 74(8): 141.

Szymańska

Dobrenko

Grzesiuk

. Characteristics and experience of the patient in psychotherapy and the psychotherapy's effectiveness. A structural approach. Psychiatria Polska 2017; 51(4): 619–631.

Ranjbar

Khankeh

Khorasani-Zavareh

, et al. Challenges in conducting qualitative research in health: A conceptual paper. Iranian J Nurs Midwifery Res 2015; 20(6): 635.

Noble

Smith

. Issues of validity and reliability in qualitative research. Evid Based Nurs 2015; 18: 34–35.

Sebastiani

. Machine learning in automated text categorization. ACM Comput Surv 2002; 34(1): 1–47.

Ewbank

Cummins

Tablan

, et al. Understanding the relationship between patient language and outcomes in internet-enabled cognitive behavioural therapy: A deep learning approach to automatic coding of session transcripts. Psychotherapy Res 2020; 31(3): 300–312.

Wang

Sohn

Liu

, et al. A clinical text classification paradigm using weak supervision and deep representation. BMC Med Inform Decis Making 2019; 19(1): 1.

10.

García Adeva

Pikatza Atxa

Ubeda Carrillo

, et al. Automatic text classification to support systematic reviews in medicine. Expert Syst Appl 2014; 41(4): 1498–1508.

11.

Venkataraman

Pineda

Bear Don’t Walk

IV O

, et al. FasTag: Automatic text classification of unstructured medical narratives. PLOS ONE 2020; 15(6): e0234647.

12.

Spasic

Nenadic

. Clinical Text Data in Machine Learning: Systematic Review. JMIR Med Inform 2020; 8(3): e17984.

13.

Yao

Mao

Luo

. Clinical text classification with rule-based features and knowledge-guided convolutional neural networks. BMC Med Inform Decis Making 2019; 19(S3): 71.

14.

Smink

Sools

van der Zwaan

, et al. Towards text mining therapeutic change: A systematic review of text-based methods for Therapeutic Change Process Research. PLOS ONE 2019; 14(12): e0225703.

15.

Ewbank

Cummins

Tablan

, et al. Quantifying the Association Between Psychotherapy Content and Clinical Outcomes Using Deep Learning. JAMA Psychiatry 2020; 77(1): 35.

16.

Hudon

Beaudoin

Phraxayavong

, et al. Use of Automated Thematic Annotations for Small Data Sets in a Psychotherapeutic Context: Systematic Review of Machine Learning Algorithms. JMIR Ment Health 2021; 8(10): e22651.

17.

Craig

Rus-Calafell

Ward

, et al. AVATAR therapy for auditory verbal hallucinations in people with psychosis: a single-blind, randomised controlled trial. The Lancet Psychiatry 2018; 5(1): 31–40.

18.

du Sert

Potvin

Lipp

, et al. Virtual reality therapy for refractory auditory verbal hallucinations in schizophrenia: A pilot clinical trial. Schizophrenia Res 2018; 197: 176–181.

19.

Leff

Williams

Huckvale

, et al. Computer-assisted therapy for medication-resistant auditory hallucinations: proof-of-concept study. Br J Psychiatry 2013; 202(6): 428–433.

20.

Leff

Williams

Huckvale

, et al.

Avatar therapy for persecutory auditory hallucinations: What is it and how does it work?

Psychosis 2013; 6(2): 166–176.

21.

du Sert

Potvin

Lipp

, et al. Virtual reality therapy for refractory auditory verbal hallucinations in schizophrenia: A pilot clinical trial. Schizophrenia Res 2018; 197: 176–181.

22.

Dellazizzo

Percie du Sert

Phraxayavong

, et al. Exploration of the dialogue components in A vatar T herapy for schizophrenia patients with refractory auditory hallucinations: A content analysis. Clin Psychol Psychotherapy 2018; 25(6): 878–885.

23.

Beaudoin

Potvin

Machalani

, et al. The Therapeutic Processes of Avatar Therapy: A Content Analysis of the Dialogue between Treatment‐resistant Patients with Schizophrenia and Their Avatar. Clinical Psychology & Psychotherapy, 2021.

24.

Ozaydin

Zengul

Oner

, et al. Text-mining analysis of mHealth research. mHealth 2017; 3: 53–53.

25.

Shridhar

Dash

Sahu

, et al. Subword Semantic Hashing for Intent Classification on Small Datasetsx. In: International Joint Conference on Neural Networks. IJCNN), 2019.

26.

Hao

. Machine Learning Made Easy: A Review of Scikit-learn Package in Python Programming Language. J Educ Behav Stat 2019; 44(3): 348–361.

27.

Oliphant

. Python for Scientific Computing. Comput Sci Eng 2007; 9(3): 10–20.

28.

Busagala

Ohyama

Wakabayashi

, et al. Multiple Feature-Classifier Combination in Automated Text Classification. In: 2012 10th IAPR International Workshop on Document Analysis Systems, 2012.

29.

Bisong

. More Supervised Machine Learning Techniques with Scikit-learn. In: Building Machine Learning and Deep Learning Models on Google Cloud Platform, 2019, pp. 287–308.

30.

Veronese

Castellani

Peruzzo

, et al. Machine Learning Approaches: From Theory to Application in Schizophrenia. Comput Math Methods Med 2013; 2013: 1–12.

31.

Gholamy

Kreinovich

Kosheleva

. Why 70/30 or 80/20 Relation between Training and Testing Sets: A Pedagogical Explanation. ScholarWorks@UTEP, 2022. [Internet] [cited 2022 Jan 10]. Available at: https://scholarworks.utep.edu/cs_techrep/1209/

32.

Zhang

Wang

Zhao

. Estimating the Uncertainty of Average F1 Scores. In: Proceedings of the 2015 International Conference on the Theory of Information Retrieval, 2015.

33.

Allen

. Intercoder Reliability Techniques: Scott’s Pi. SAGE Encyclopedia Commun Res Methods 2017; 1: 753–755.

34.

Karystianis

Nevado

Kim

, et al. Automatic mining of symptom severity from psychiatric evaluation notes. Int J Methods Psychiatr Res 2017; 27(1): e1602.

35.

Liu

Cocea

. Semi-random partitioning of data into training and test sets in granular computing context. Granular Comput 2017; 2(4): 357–386.

36.

Balakrishnan

Khan

Arabnia

. Improving cyberbullying detection using Twitter users' psychological features and machine learning. Comput Security 2020; 90: 101710.

37.

Zolnoori

Fung

Patrick

, et al. A systematic approach for developing a corpus of patient reported adverse drug events: A case study for SSRI and SNRI medications. J Biomed Inform 2019; 90: 103091.

38.

Singh

Shrivastava

Bouayad

, et al. Machine learning for psychiatric patient triaging: an investigation of cascading classifiers. J Am Med Inform Assoc 2018; 25(11): 1481–1487.

39.

de Ávila Berni

Rabelo-da-Ponte

Librenza-Garcia

, et al. Potential use of text classification tools as signatures of suicidal behavior: A proof-of-concept study using Virginia Woolf’s personal writings. PLOS ONE 2018; 13(10): e0204820.

40.

Venkatasubramanian

Veilumuthu

Krishnamurthy

, et al. A non-syntactic approach for text sentiment classification with stopwords. In: Proceedings of the 20th International Conference Companion on World Wide Web, Vol. 11. WWW, 2011.

Implementation of a machine learning algorithm for automated thematic annotations in avatar: A linear support vector classifier approach

Abstract

Keywords

Introduction

Methods

Dataset

Machine learning algorithm

Performance analysis and inter-rater agreement

Results

Discussion

Conclusion

Footnotes

Declaration of conflicting interests

Funding

Ethical approval

Data availability

ORCID iD

References