Acoustic and prosodic information for home monitoring of bipolar disorder

Abstract

Epidemiological studies suggest that bipolar disorder has a prevalence of about 1% in European countries, becoming one of the most disabling illnesses in working age adults, and often long-term and persistent with complex management and treatment. Therefore, the capacity of home monitoring for patients with this disorder is crucial for their quality of life. The current paper introduces the use of speech-based information as an easy-to-record, ubiquitous and non-intrusive health sensor suitable for home monitoring, and its application in the framework on the NYMPHA-MD project. Some preliminary results also show the potential of acoustic and prosodic features to detect and classify bipolar disorder, by predicting the values of the Hamilton Depression Rating Scale (HDRS) and the Young Mania Rating Scale (YMRS) from speech.

Keywords

bipolar disorder home monitoring app prosody speech voice

Introduction

Bipolar disorders are a common and complex form of mental disorder, ranking as one of the most disabling illnesses with a prevalence of about 1% in European countries, and up to 2.4% worldwide. Although this seems to be a small percentage, it becomes one of the most complex mental conditions. Patients with bipolar disorder experience episodes of abrupt mood changes, alternating mania (euphoria) and depression phases. But beyond these complex mental episodes, this disorder is also dynamic, with a typically relapsing-remitting course, and it becomes often a long-term and persistent illness.¹ Moreover, patients with bipolar disorder have shown to be at high risk of premature death due to comorbid cardio-vascular diseases.² In fact, several studies have demonstrated that bipolar disorder has become the sixth leading cause of disability worldwide, with a rate of death by suicide up to 15 % among the most severe cases.^1,3–5

Since bipolar disorder alternates depression and manic phases, it is generally assessed by the research community by means of several standard clinical scales, which account for the severity of both depression and mania. The most relevant ones include the Hamilton Depression Rating Scale (HDRS),⁶ and the Young Mania Rating Scale (YMRS).⁷ These rating scales are commonly provided by clinicians and take about 20–30 min to complete. YMRS rates 11 items related to mania (elevated mood, increased motor activity-energy, sexual interest, sleep, etc.) and it is based on the report from patients over the previous two days, and upon clinical observations made in a clinical interview. The scale ranges from 0 to 60, where higher scores indicate more severe mania, and its final aim is to evaluate manic state at baseline and over time. Similarly, HDRS rates 21 different items related with depression (depressed mood, feelings of guilt, suicide, insomnia, etc.), although only the first 17 compute to the final score, to indicate a degree of depression at baseline and over time. The final score ranges from 0 to 50, where higher scores indicate more severe depression.

Automatic home monitoring of mood instability can allow for early intervention on prodromal symptoms and potentially influence the course of illness. Over the last years, several electronic self-monitoring platforms for regular computers and smartphones have been developed.^8,9 However, these systems usually do not have the capacity to collect objective data on patient behaviour, and most of them do not include a feedback loop between the patients and the mental healthcare providers. Recent projects such as PSYCHE [www.psyche-project.org] and MONARCA [www.monarca-project.eu]¹⁰ use information and communication technologies for the treatment of bipolar disorder. However, they are either intrusive for the patients due to the number of sensors needed, or they do not explore the role of the caregiver. Instead, the NYMPHA-MD project¹¹ defined a framework for continuous patient monitoring to identify early warning signs of deviations in mood and attitudes suggesting the onset of a depressive or maniac episode, which, in turn, will allow for early intervention.¹² PULSO Ediciones S.L. won the NYMPHA-MD Pre-Commercial Procurement bid with the MoodRecord project, where the Universitat Pompeu Fabra (Barcelona) developed the speech module analysis.

In recent literature, speech has been shown to be a potential indicator for bipolar disorder detection in a few works,^13,14 apart from being a ubiquitous and non-intrusive identifier. In general, systems relying on the speech signal to detect or assess mental disorders can be classified into those using acoustic-dependent features, and those using context-dependent features.¹⁵ Systems using context-dependent features require the word transcriptions to infer linguistic features (see, for instance, the use of semantic information¹⁶ or lexical features¹⁷ to identify psychosis, schizophrenia and bipolar disorder). Contrarily, acoustic-dependent systems rely mainly on extracting speech-based features regardless of the linguistic content, such as spectral characteristics, voice quality features, or speech prosody. Although the context-dependent linguistic features can be very informative achieving generally better performance, they are highly dependent on language and the corresponding transcriptions from speech. The aim of the presented work is to explore the usability of speech in its acoustic form as a language-independent system for home monitoring for patients with bipolar disorder, by implementing a machine learning regressor capable of predicting the values of HDRS and YMRS scales from speech, thus providing a user-friendly tool and helpful information to patients and clinicians. Moreover, we show its integration and implementation in a smart daily-life system, specifically in the framework of the NYMPHA-MD project.

The structure of the current paper unfolds as follows. Section ‘Speech as a ubiquitous and non-invasive health sensor’ briefly overviews the use of speech as a ubiquitous and non-intrusive health sensor. Section ‘Bipolar mood status detection from acoustic and prosodic information’ presents some preliminary experiments on the prediction of mania and depression scales through speech-based information. Section ‘Patient continuous supervision through mobile app’ describes the MoodRecord mobile application created for patient continuous supervision in the framework of NYMPHA-MD; and finally, sections ‘Discussion’ and ‘Conclusion’ sketch the discussion and conclusions of the work, respectively.

Speech as a ubiquitous and non-invasive health sensor

Among all the biometric identifiers, voice has special characteristics that make it an exceptional health indicator. Moreover, speech is a non-invasive signal, ubiquitous and easy to record, with non-expensive equipment required,¹⁸ which makes it especially suitable for home monitoring applications. Despite its high variability between speakers, speech is highly dependent on their physical and emotional conditions,^19,20 making it suitable to detect changes on these conditions.

Speech can provide two different types of information: (i) the content of the message, composed by the words and their meanings as a result of a cognitive process and (ii) the acoustic information extracted from the voice sound, produced by the coordinated physical activity of several organs. Variations of specific acoustic features (F0, intensity and duration) over time comprise what is known by prosodic information, conveyed through the intonation, stress and rhythm elements, respectively, which can reflect the emotional aspects of the individual. In this project we focus on the second type, whose features can be extracted in a language independent manner, thus allowing a broad application to different countries with many different languages (as it was the case on the NYMPHA-MD project). In the following subsections, we introduce the potential use of speech for medical diagnosis, specifically the use of prosodic information for the analysis of the mood status and bipolar disorder.

Acoustic and prosodic information

Voice has been largely used to detect several physical and mental pathologies, being acoustic parameters related to voice quality (e.g. jitter, shimmer and harmonics-to-noise ratio) some of the most used for these purposes.²¹ By way of example, jitter and shimmer have been recently used for the detection of Parkinson’s disease,^22–25 Alzheimer,^26,27 post-traumatic stress,²⁸ multiple sclerosis and dysarthria,^29,30 thyroid patients³¹ and coronary heart disease,³² among many others.

Speech prosody consists of the following elements: intonation –perceived by listeners as a variation in time of the fundamental frequency–, stress –as variation of loudness– and rhythm –as the variation of sound duration.³³ Prosody is crucial in oral communication^34,35 and in expressing different emotional states. Furthermore, prosody has been shown to be more robust to channel noise than other spectral-based speech features,³⁶ at it can be extracted in its acoustic form without the need of the corresponding text transcription.

Acoustic and prosodic parameters for bipolar disorder detection

Several works in the literature have reported the usefulness of acoustic features based only on voice characteristics (from now on, acoustic features) to detect emotional states such as depression and mania. In Shinohara et al.,³⁷ for instance, several voice quality features such as pitch rate, jitter, shimmer and harmonic-to-noise ratio indices were shown to be indicators for patients with some kind of disorders in contrast to healthy people, measured by means of a voice disability index over spontaneous speech recordings in contrast to healthy people. In Vicsi et al.,³⁸ jitter, shimmer and first and second formant frequencies differed significantly in depressed speech. In Pan et al.,³⁹ fundamental frequency, formants and cepstral coefficients were explored for bipolar detection. In a similar way, Shimizu et al.⁴⁰ analysed the chaotic behaviour of vocal sounds in patients with depression and Zhou et al.⁴¹ proposed a new feature value based on the nonlinear Teager energy operator to classify speech under stressed conditions. Other voice quality characteristics were further studied in Scherer et al.⁴² and Hargreaves et al.⁴³ to detect depression and post-traumatic stress disorder, and it has also been shown that pressure of speech is a powerful indicator for mania states.⁴⁴

The use of prosodic features also in its acoustic form (from now on prosodic features), has been shown to be highly relevant to identify emotions,^45,46 which leads to think of its usefulness in the detection of bipolar disorder –although the literature in this field is less exhaustive. See, for instance, the use of intonation (pitch contour) in Guidi et al.,⁴⁷ and rhythm features in Gideon et al.⁴⁸

Bipolar mood status detection from acoustic and prosodic information

The current section presents some preliminary experiments on the use of speech-based features for the detection of bipolar disorders. Since the data collected were not exhaustive, the aim of these experiments is to show the potential of speech features to classify individuals’ speech into depression and mania scales. The following subsections describe the gathering of data and feature extraction in the MoodRecord application, the setup for bipolar status detection and the corresponding preliminary results.

Data recording and feature extraction

The original recordings were acquired by two of the EU procurers working in the project, namely Consorci Sanitari Parc Taulí (CSPT, Barcelona), and Provincia Autonoma di Trento (PAT, Trento). The recordings were carried out by bipolar patients gathered from the two institutions, using the MoodRecord application, and further processed with Praat, a free computer software package for speech analysis.⁴⁹ The recordings had an average duration of 26.0 s, and an average effective duration (excluding silences) of 18.5 s.

We developed a Praat-based module to extract acoustic and prosodic features from the sound files. These features were then used to train machine learning models to detect mania, depression and normal states by means of YMRS and HDRS scales. The selection of the acoustic features was mainly based on existing literature. Thus, several works, as seen in previous sections, have reported the use of fundamental frequency (F0), formants and voice quality features. Other few works have also reported the use of prosodic features based on intonation and rhythm. We used a similar set of acoustic features, which lead to the following nine features:

Fundamental frequency (one feature): mean value of F0.

Formants (two features): first formant frequency (F1) and second formant frequency (F2).

Voice quality (six features): relative value of jitter, absolute value of jitter, relative value of shimmer and relative value of shimmer,⁵⁰ noise-to-harmonics ratio (NHR) and harmonics-to-noise ratio (HNR).

Moreover, we also extracted nine prosodic features, based on the following three prosody elements:

Intonation (four features): maximum value of F0, minimum value of F0, range of F0, slope of F0.

Stress (one feature): mean value of intensity.

Rhythm (four features): ratio of pauses, speech rate, articulation rate and average syllable duration.⁵¹

In total, we extracted 18 features from each of the patients’ speech. For the extraction of F0 and its related features, we used the auto-correlation method in Praat with an interval of 10 ms and a Hanning window of length 40 ms. Although rhythm features could have been accurately extracted from the corresponding transcripts, they were extracted by adapting the Praat script found in De Jong and Wempe⁵⁰ to keep the system language independent. The mean F0 and intensity values were used to normalise the F0- and intensity-based features, respectively, to avoid speaker dependence. Thus, F0-based features were computed as distance in semitones with respect to the mean value of the individual. Voice quality parameters jitter and shimmer were also normalised by means of F0 and intensity, respectively. After normalisation, F0 and intensity were left out and the remaining 16 features (eight voice quality and formant features plus eight prosodic features) were used for the detection experiments.

Bipolar status detection

The detection of mood status in MoodRecord application was performed by means of regression algorithms using the extracted speech features. To this end, doctors annotated the different speech recordings according to YMRS and HDRS scales so that the system can learn from such scores to point out the occurrence of mania or depressive episodes. Specifically, the detection of depression and mania states was based on the range of scales and mania/depression assessments⁵² indicated in Table 1, considering also the case of ‘no mania’ and ‘no depression’ (euthymic state). The scales were assigned to those audios yielding within 3 days before and after the date of the doctor’s assessment and thus the corresponding scale assignation.

Table 1.

Ranges of HDRS and YMRS scales in different mania and depression states.

Depression	HDRS value	Mania	YMRS value
No	<4	No	<4
Subclinical	5 ⩽ HDRS < 10	Subclinical	5 ⩽ YMRS < 9
Mild	10 ⩽ HDRS < 16	Mild	9 ⩽ YMRS < 16
Moderate	16 ⩽ HDRS < 25	Moderate	16 ⩽ YMRS < 25

Preliminary experiments

For an initial testing of the system, some preliminary regression experiments were performed within the framework of the project. An initial small database consisting of 65 recordings from 19 different real users and patients aged 23–69 years old was collected in the framework of the NYMPHA-MD project (see more details on patient recruitment in NYMPHA⁵³). After a careful check on the quality of the audios, only 49 of them – corresponding to 13 different users – were found to be recorded with enough quality. The remaining ones were too short – less than 2 s, empty, or containing only noise. Moreover, 15 recordings were shorter than 5 s, which could be used to extract acoustic features but were not suitable for computing reliable prosodic features over time, so they were not annotated with their corresponding HDRS and YMRS scales by the clinicians. For other recordings, clinicians were not available at the time of recordings. Since assessments had to be done in situ and within a short time range with respect to the recording time to be reliable, only 30 out of the valid audios could be annotated. Furthermore, the variability of the scales was rather low: a great majority of them were associated to ‘no depression’ and ‘no mania’ scales. Table 2 summarises these statistics.

Table 2.

Statistics of the speech recordings within the NYMPHA-MD project.

	# Recordings	Different users
Total recordings	65	19
Valid recordings	49	13
Valid recordings with scales	30	11
States
No depression/no mania	24	7
Subclinical depression/no mania	2	1
Mild depression/no mania	0	0
Moderate depression/no mania	1	1
No depression/subclinical mania	3	2
No depression/mild mania	0	0
No depression/moderate mania	0	0

The regression experiments were performed on Weka,⁵⁴ using a leave-one-out (LOO) cross-validation method to ameliorate the lack of data. To predict the HDRS and YMRS values, three different regression algorithms were tested: linear regression, random forest and support vector regression with a radial kernel. Tables 3 and 4 show the results obtained in the prediction of HDRS and YMRS scales, respectively, in terms of root mean square error (RMSE) and using the following sets of features: (1) voice quality and formants, (2) prosody and (3) all features. Bold numbers represent the best-performing set of features for each regression algorithm. Moreover, Table 5 compares the best results obtained in the LOO for both HDRS and YMRS scales with 10- and 5-fold cross-validations. The results clearly show that the larger the number of folds, the better the accuracy achieved in the regression experiments.

Table 3.

Root mean square errors obtained in the prediction of HDRS values (LOO cross-validation).

	Linear regression	Random forest	Support vector regression
Voice quality + formants	4.275	4.398	3.945
Prosody	5.052	4.545	4.020
All	5.063	4.604	4.049

Bold numbers refer to the best result for each regression algorithm.

Table 4.

Root mean square errors obtained in the prediction of YMRS values (LOO cross-validation).

	Linear regression	Random forest	Support vector regression
Voice quality + formants	2.226	2.067	2.244
Prosody	2.497	2.075	2.189
All	1.985	2.021	2.156

Bold numbers refer to the best result for each regression algorithm.

Table 5.

RMSE values obtained using 5-, 10-fold and LOO cross-validation.

	Cross-validation	Linear regression	Random forest	Support vector regression
HDRS voice quality + formants	LOO	4.275	4.398	3.945
	10-fold	5.620	4.617	4.269
	5-fold	4.441	4.935	4.115
YMRS all features	LOO	1.985	2.021	2.124
	10-fold	2.329	2.045	2.134
	5-fold	2.671	2.172	2.139

Bold numbers refer to the best result for each regression algorithm.

On the one hand, Table 3 shows that the best HDRS prediction is obtained by using only the voice quality plus formants set of features. In addition, among all the regression methods tested, support vector regression algorithm outperformed the other ones. On the other hand, Table 4 shows that the best YMRS prediction is obtained with the whole range of speech features. Unlike the HDRS prediction, prosody here provides useful information for the YMRS detection, which could be explained by the fact that manic speech is conveyed in a much higher degree by prosody, with higher variations in intonation and rhythm than depressed speech. In Figures 1 and 2 we have plotted the actual and predicted values for both HDRS and YMRS, respectively, for the 30 speech samples corresponding to the lowest RMSE obtained for each scale. The graphics show that the most extreme values are more difficult to be predicted due to the lack of representative training data. These results should be interpreted in caution due to the minimal number of valid recordings obtained. Moreover, most of the valid recordings were very close to the euthymic states, which limits the variability of the data used in the experiments. Note that, in the final application, since negative values make no sense in a real setting, negative predicted values are turned into 0 values. In the same way, large values are cut to the maximum HDRS and YMRS values (50 and 60, respectively).

Figure 1.

Actual HDRS values compared to the HDRS predicted values using voice quality and formant features through a support regression algorithm.

Figure 2.

Actual YMRS values compared to the YMRS predicted values using all speech features through a linear regression algorithm.

Patient continuous supervision through mobile app

The application developed within the NYMPHA-MD aimed to provide a new way to manage patients diagnosed with bipolar disorder through the MoodRecord system that allows the estimation of the mood of the patient and the patient monitoring.

MoodRecord system

The MoodRecord system was designed to be used for patients (app functionalities) and healthcare professionals and caregivers (website functionalities, www.moodrecord.com). The patient registers a set of parameters related to their mood using the app. The website provides clinicians with all user’s data registered by the app to manage and track their patients’ disorders (Figure 3).

Figure 3.

Diagram of the MoodRecord system architecture.

The flow starts when clinician registers new patients from the web interface and sends them their credentials. Patients are then able to access the application on Android or IOS smartphones and start registering the data related to their mood or health state following the weekly guideline defined by their case manager (see in Figure 3 the diagram of the MoodRecord system architecture).

The speech recordings in MoodRecord are performed through a module called Story of the day. In this module, the patients are asked to record a video explaining their day, and it includes two functionalities: face recognition, and speech pattern detection. The system is first calibrated to remove the microphone noise. After that, the Story of the day module is ready for the video/audio recording. Figure 4 shows the different steps followed in this process.

Figure 4.

Story of the day registration.

The recorded video is then sent for speech analysis. Audio is separated from image and speech features are extracted for their analysis and further mania and depression scales prediction, based on the manually annotated speeches previously used to train the system. The system is initially built using the normalised speech features to develop a generic model. As training data is being collected, the system can be focused on the development of a personalised user model.

Medical supervision

The clinician sets up initial and follow-up visits, in which the patient status is revised and assessed again. Between such doctor visits, patients use the MoodRecord app to record a video through the Story of the day module so that they can be home monitored. Apart from audio and facial recognition algorithms, the system includes other patient parameters such as sleep quality, personal questionnaires, etc., which are checked online by the clinician. The system includes alarms that are raised when one of the indicators achieve a critical value.

Discussion

The preliminary experiments presented in the previous sections have shown the feasibility of voice features to detect bipolar status, since audio features can reflect both manic and depressive states. With respect to other existing works, the results are comparable when proving the usefulness of speech in bipolar status detection. Muaremi et al.,¹⁴ for instance, show that voice features are objective markers of emotional states in bipolar disorder –improved when combined with other data extracted from smartphones – and that they become more effective in the detection of mania than the detection of depression. Maxhuni et al.,¹³ on the other hand, test both prosodic and spectral speech features and find that both types have similar accuracy when tested together or in isolation. Our analysis goes beyond these works by testing the specific contribution of both prosodic and other acoustic features, and finds that, while prosody provides useful information for the YMRS detection (manic state), HDRS detection (depressive state) relies more on the non-prosodic acoustic features.

Our experiments could be improved by collecting more data to produce better machine learning models. Moreover, the short time used to test the application with patients implies that patients have been stable during the recorded period, and that they have not had any significant changes in their mood states –often, the mood status can change when turning into different seasons, so a collection period over six months would be needed for a better performance in the prediction algorithm–. The low data variability is a handicap for training the models. To overcome it, data collected in the future will automatically be used to improve both the individual and the generic models.

Conclusion

In this work, we have presented the use of acoustic and prosodic information as a health indicator; concretely, as an identifying factor of bipolar disorders. Speech is ubiquitous and easy to record, which makes it a suitable identifier for home monitoring systems. Some preliminary experiments on the use of acoustic and prosodic features in the framework of the NYMPHA-MD project have shown promising capabilities to detect different mood states from speech, overcoming those systems based only on voice characteristics, without an extra overload for the patient. Moreover, the MoodRecord application presented in the current work becomes a practical tool for a further medical supervision of the patient, leveraging the need of regular physical visits between patients and clinicians.

Although preliminary, the results show that, within the range of the available data, our speech-based algorithm is a potential tool for predicting different mood status in patients with bipolar disorders. Therefore, the use of speech and more linguistic-prosodic information should be thus further included in home monitoring health systems.

Footnotes

Acknowledgements

The authors would like to thank Ivan Latorre for his technical support and Giorgia Cistola for her help on the data preparation.

Author’s note

Mireia Farrús is now affiliated with Universitat de Barcelona, Barcelona, Catalonia.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is part of the NYMPHA-MD project, which has been funded by the European Union under Grant Agreement No. 610462. The first author has been funded by the Agencia Estatal de Investigación (AEI), Ministerio de Ciencia, Innovación y Universidades and the Fondo Social Europeo (FSE) under grant RYC-2015-17239 (AEI/FSE, UE).

ORCID iD

Mireia Farrús

References

Strakowski

Fleck

Adler

, et al. (eds). Bipolar disorder. New York: Oxford University Press, 2020.

Leboyer

Soreca

Scott

, et al. Can bipolar disorder be viewed as a multi-system inflammatory disease? J Affect Disord 2012; 141(1): 1–10.

Goodwin

Kay

RJ.

Manic-depressive illness: bipolar disorders and recurrent depression. Vol. 1. New York: Oxford University Press, 2017.

Marvel

Paradiso

Cognitive and neurological impairment in mood disorders. Psychiatr Clin North Am 2004; 27(1): 19-viii.

Strakowski

SM.

A programmatic approach to treatment. In: Bipolar disorder. Oxford American Psychiatry Library. New York: Oxford University Press, 2014.

Hamilton

. The Hamilton rating scale for depression. In: Assessment of depression. Berlin, New York: Springer, 1986, pp.143–152.

Young

Biggs

Ziegler

, et al. A rating scale for Mania: reliability, validity and sensitivity. Br J Psychiatry 1978; 133: 429–435.

Gravenhorst

Muaremi

Bardram

, et al. Mobile phones as medical devices in mental disorder treatment: an overview. Pers Ubiquit Comput 2015; 19(2): 335–353.

Torous

Powell

AC.

Current research and trends in the use of smartphone applications for mood disorders. Internet Interv 2015; 2(2): 169–173.

10.

Mayora

Arnrich

Bardram

, et al. Personal health systems for bipolar disorder: anecdotes, challenges and lessons learnt from MONARCA project. In: Proceedings of the 7th international conference on pervasive computing technologies for healthcare, 2013, pp.424–429.

11.

Faurholt-Jepsen

The NYMPHA-MD project: next generation mobile platforms for health, in mental disorders. Eur Psychiatry 2017; 41: S23.

12.

Codina-Filbà

Escalera

Escudero

, et al. Mobile eHealth platform for home monitoring of bipolar disorder. In: Proceedings of the 27th international conference on multimedia modeling (MMM). Prague, Czech Republic, 2021.

13.

Maxhuni

Muñoz-Meléndez

Osmani

, et al. Classification of bipolar disorder episodes based on analysis of voice and motor activity of patients. Perv Mobile Comput 2016; 31: 50–66.

14.

Muaremi

Gravenhorst

Grünerbl

, et al. Assessing bipolar episodes using speech cues derived from phone calls. In: International symposium on pervasive computing paradigms for mental health, 2014, pp.103–114.

15.

Chien

Hong

Cheah

, et al. An automatic assessment system for Alzheimer’s disease based on speech using feature sequence generator and recurrent neural network. Sci Rep 2019; 9: 19597.

16.

Mota

Vasconcelos

Lemos

, et al. Speech graphs provide a quantitative measure of thought disorder in psychosis. PloS ONE, 2012; 7(4): e34928.

17.

Voleti

Woolridge

Liss

, et al. Objective assessment of social skills using automated language analysis for identification of schizophrenia and bipolar disorder. In: Proceedings of interspeech, Graz, Austria, 2019, pp.1433–1437.

18.

Reynolds

Andrews

Campbell

, et al. The SuperSID project: exploiting high-level information for high-accuracy speaker recognition. In: Proceedings of the ICASSP, Hong Kong, China, 2003, vol. 4, pp.784–787.

19.

Bolle

Connell

Pankanti

, et al. Guide to biometrics. New York: Springer, 2004.

20.

Maltoni

Maio

Jain

, et al. Handbook of fingerprint recognition. New York: Springer, 2003.

21.

Kreiman

Gerrat

BR.

Perception of aperiodicity in pathological voice. J Acoust Soc Am 2005; 117(4): 2201–2211.

22.

Ali

Adnan

Aziz

, et al. Sound classification of Parkinsonism for telediagnosis. Tech J 2019; 24(1): 90–97.

23.

Benba

Jilbab

Hammouch

. Hybridization of best acoustic cues for detecting persons with Parkinson’s disease. In: World conference on complex systems, 2014, pp.622–625.

24.

Meghraoui

Boudraa

Merazi-Meksen

, et al. Parkinson’s disease recognition by speech acoustic parameters classification. In: Chikhi

Amine

Chaoui

, et al. (eds) Modelling and implementation of complex systems: lecture notes in networks and systems. Vol. 1. Cham: Springer, 2016.

25.

Sakar

Serbes

Sakar

CO.

Analyzing the effectiveness of vocal features in early telediagnosis of Parkinson’s disease. PloS One 2017; 12(8): e0182428.

26.

Farrús

Codina-Filbà

Combining prosodic, voice quality and lexical features to automatically detect Alzheimer’s disease.Epub ahead of print 2020, ArXiv:2011.09272.

27.

Mirzaei

El Yacoubi

Garcia-Salicetti

, et al. Two-stage features selection of voice parameters for early Alzheimer’s disease prediction. IRBM 2018; 39(6): 430–435.

28.

Banerjee

Oslam

Mei

, et al. A deep transfer learning approach for improved post-traumatic stress disorder diagnosis. In: IEEE international conference on data mining, 2017, pp.11–20.

29.

Bhat

Vachhani

Kopparapu

. Recognition of dysarthric speech using voice parameters for speaker adaptation and multi-taper spectral estimation. In: Proceedings of interspeech, 2016, pp.228–232.

30.

Vizza

Mirarchi

Tradigo

, et al. Vocal signal analysis in patients affected by multiple sclerosis. Proc Comput Sci 2017; 108: 1205–1214.

31.

Gour

Udayashankara

Voice disorder analysis of thyroid patients. Int J Comput Sci Inform Technol 2015; 4(5): 720–727.

32.

Pareek

Detection of coronary heart disease from vocal profiling and to determine the vocal tract transfer function and glottal excitation pulse. Technical report, National Institute of Technology, Kurukshetra, 2018.

33.

Adami

AG.

Modeling prosodic differences for speaker recognition. Speech Commun 2007; 49(4): 277–291.

34.

Nooteboom

The prosody of speech: melody and rhythm. The handbook of phonetic sciences. Oxford: Blackwell Publishers Ltd, 1997.

35.

Wennerstrom

The music of everyday speech. Prosody and discourse analysis. New York: Oxford University Press, 2001.

36.

Atal

BS.

Automatic speaker recognition based on pitch contours. J Acoust Soc Am 1972; 52(6B): 1687–1697.

37.

Shinohara

Nakamura

Mitsuyoshi

, et al. Voice disability index using pitch rate. In: IEEE EMBS conference on biomedical engineering and sciences (IECBES), 2017.

38.

Vicsi

Sztahó

Kiss

. Examination of the sensitivity of acoustic-phonetic parameters of speech to depression. In: IEEE 3rd international conference on cognitive infocommunications (CogInfoCom), 2012.

39.

Pan

Gui

Zhang

, et al. Detecting manic state of bipolar disorder based on support vector machine and Gaussian mixture model using spontaneous speech. Psychiatry Invest 2018; 15(7): 695.

40.

Shimizu

Furuse

Yamazaki

, et al. Chaos of vowel /a/ in Japanese patients with depression: a preliminary study. J Occup Health 2005; 47(3): 267–269.

41.

Zhou

Hansen

JHL

Kaiser

JF.

Nonlinear feature-based classification of speech under stress. IEEE Trans Speech Audio Process 2001; 9(3): 201–216.

42.

Scherer

Stratou

Gratch

, et al. Investigating voice quality as a speaker-independent indicator of depression and PTSD. In: Proceedings of interspeech, 2013.

43.

Hargreaves

Starkweather

Blacker

KH.

Voice quality in depression. J Abnorm Psychol 1965; 70(3): 218.

44.

Carlson

Goodwin

FK.

The stages of mania: a longitudinal analysis of the manic episode. Arch Gen Psychiatry 1973; 28(2): 221–228.

45.

Luengo

Navas

Hernáez

, et al. Automatic emotion recognition using prosodic parameters. In: Ninth European conference on speech communication and technology, 2005.

46.

Mary

. Extraction and representation of prosody for speaker, language, emotion, and speech recognition. In: Extraction of prosody for automatic speaker, language, emotion and speech recognition. Cham: Springer, 2019, pp.23–43.

47.

Guidi

Vanello

Bertschy

, et al. Automatic analysis of speech F0 contour for the characterization of mood changes in bipolar patients. Biomed Sig Process Control 2015; 17: 29–37.

48.

Gideon

Provost

McInnis

. Mood state prediction from speech of varying acoustic quality for individuals with bipolar disorder. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), 2016, pp.2359–2363.

49.

Boersma

Weenink

Praat: doing phonetics by computer [Computer program]. Version 6.0.29, http://www.praat.org/ (2017, accessed February 2017).

50.

Farrús

Hernando

Using jitter and shimmer in speaker verification. IET Sig Process 2009; 3(4): 247–257.

51.

De Jong

Wempe

. Praat script to detect syllable nuclei and measure speech rate automatically. Behav Res Methods 2009; 41(2): 385–390.

52.

Martino

Samamé

Marengo

, et al. A longitudinal mirror-image assessment of morbidity in bipolar disorder. Eur Psychiatry 2017; 40: 55–59.

53.

NYMPHA. NYMPHA-MD: next generation mobile platforms for health, in mental disorders. Technical specification, 2018, http://www.appalti.provincia.tn.it/binary.php/pat_pi_bandi_new/bandi/NYMPHA_MD_tecnical_specification.1441203112.pdf

54.

Frank

Hall

Witten

IH.

The WEKA workbench. Online appendix for “Data mining: practical machine learning tools and techniques”. 4th ed. San Francisco: Morgan Kaufmann, 2016.