Abstract
Objective:
Currently, the diagnosis in bipolar disorder relies on patient information and careful clinical evaluations and judgements with a lack of objective tests. Core clinical features of bipolar disorder include changes in behaviour. We aimed to investigate objective smartphone data reflecting behavioural activities to classify patients with bipolar disorder compared with healthy individuals.
Methods:
Objective smartphone data were automatically collected from 29 patients with bipolar disorder and 37 healthy individuals. Repeated measurements of objective smartphone data were performed during different affective states in patients with bipolar disorder over 12 weeks and compared with healthy individuals.
Results:
Overall, the sensitivity of objective smartphone data in patients with bipolar disorder versus healthy individuals was 0.92, specificity 0.39, positive predictive value 0.88 and negative predictive value 0.52. In euthymic patients versus healthy individuals, the sensitivity was 0.90, specificity 0.56, positive predictive value 0.85 and negative predictive value 0.67. In mixed models, automatically generated objective smartphone data (the number of text messages/day, the duration of phone calls/day) were increased in patients with bipolar disorder (during euthymia, depressive and manic or mixed states, and overall) compared with healthy individuals. The amount of time the smartphone screen was ‘on’ per day was decreased in patients with bipolar disorder (during euthymia, depressive state and overall) compared with healthy individuals.
Conclusion:
Objective smartphone data may represent a potential diagnostic behavioural marker in bipolar disorder and may be a candidate supplementary method to the diagnostic process in the future. Further studies including larger samples, first-degree relatives and patients with other psychiatric disorders are needed.
Introduction
Bipolar disorder (BD) is characterized by changes in mood with episodes of depression, (hypo)mania and mixed episodes with intervening periods of euthymia (Phillips and Kupfer, 2013). Core clinical features of BD include changes in psychomotor activity and behavioural activities, and episodic shifts in energy, activity, sleep and other behavioural aspects that may be quantified objectively (Beigel and Murphy, 1971; Faurholt-Jepsen et al., 2012; Kuhs and Reschke, 1992; Kupfer et al., 1974; Mitchell et al., 2008; Popescu et al., 1991; Sobin and Sackeim, 1997). More specifically, changes in social activity (Weinstock and Miller, 2008), i.e., engaging in social relations, as well as physical activity (Faurholt-Jepsen et al., 2012; Kuhs and Reschke, 1992; Kupfer et al., 1974) represent the central aspects of BD that may be possible to measure objectively.
Currently, due to the lack of objective tests, the diagnostic process as well as the clinical assessment of the severity of depressive and manic symptoms in BD relies on patient information and information from relatives, clinical observations and evaluations, and clinical rating scales (Phillips and Kupfer, 2013). Thus, the reliance on patient information and clinical evaluations raises issues including recall bias, decreased illness insight and differences in assessment experience (Cassidy, 2010). Therefore, objective methods for diagnosis and illness activity monitoring would be a tremendous clinical advantage.
Mobile health (mHealth) refers to health services delivered by mobile devices, such as mobile phones, mobile monitoring devices, personal digital assistants (PDAs) and other wireless electronic devices (World Health Organization (WHO), 2011). mHealth is a relatively new area within health care, and the use of sensors embedded within mobile monitoring devices could provide opportunities for new areas of research, development and treatment. A report by the WHO in 2011 stated ‘the use of mobile and wireless technologies to support the achievement of health objectives (mHealth) has the potential to transform the face of health service delivery across the globe’ (WHO, 2011). Currently, approximately one-third of the world’s adult population owns and uses a smartphone, and it has been estimated that by the year 2020 this proportion will increase to 80% (ChargeItSpot, 2016). Data suggest that more than half of smartphone users seek health-related information on their phone, and more recently the use of sensors embedded within mobile devices to monitor behavioural aspects has provided new areas of research. A digital marker has been defined as consumer-generated physiological and behavioural measures collected from digital tools that can be used to explain, influence and/or predict health-related outcomes (The Medical Futurist, 2018). Smartphones have been suggested as an easy and inexpensive way to monitor daily illness activity in BD including daily data on social and physical activity (Bardram et al., 2013; Faurholt-Jepsen et al., 2015). The use of smartphones to monitor BD provides unique opportunities to collect large amounts of fine-grained data in an unobtrusive, passive and continuous way in the long term and outside of clinical settings and could lead to the identification of new behavioural digital markers and digital phenotyping of BD (Glenn and Monteith, 2014; Hidalgo-Mazzei et al., 2016b; Insel, 2017; Monteith et al., 2015).
Within major depressive disorder and schizophrenia, few preliminary pilot studies, feasibility studies and case reports including automatically generated objective smartphone data, i.e., information on phone usage, mobility and voice features, have been published (Burns et al., 2011; Dang et al., 2016; Doryab, 2014; Wahle et al., 2016; Zhang et al., 2016). One of these studies investigated differences in voice features between patients with schizophrenia and healthy control individuals (HC) and suggested that several voice features may be able to discriminate between patients with schizophrenia and HC. Findings from studies including patients with major depressive disorder and collecting automatically generated objective smartphone data suggest that the overall concept of digital diagnosis is accepted positively by the patients (Burns et al., 2011; Dang et al., 2016; Doryab, 2014). Only one of the pilot studies investigated the use of data on acceleration, WiFi and GPS in combination with self-reported depression survey data for the delivery of a tailored cognitive behavioural therapy intervention (Wahle et al., 2016). However, the use of automatically generated smartphone data as an additional diagnostic behavioural marker in these populations has not been investigated. Recently, there has been an increased interest in the use of both self-monitored and automatically generated objective smartphone data within BD. Few observational studies (Faurholt-Jepsen et al., 2014, 2015, 2016a, 2016c; Gideon et al., 2016), pilot studies (Abdullah et al., 2016; Alvarez-Lozano et al., 2014; Beiwinkel et al., 2016; Grünerbl et al., 2015; Karam et al., 2014; Muaremi et al., 2014; Palmius et al., 2016), feasibility studies, case reports (Guidi et al., 2015; Hidalgo-Mazzei et al., 2016a; Saunders et al., 2017; Vanello et al., 2012) and study protocols (Faurholt-Jepsen et al., 2016b, 2017; Hidalgo-Mazzei et al., 2015; Kessing et al., 2017; Ritter et al., 2016) using automatically generated objective smartphone data have been published. Studies by Our previous studies (Faurholt-Jepsen et al., 2014, 2015, 2016a, 2016c) and those of others (Abdullah et al., 2016; Beiwinkel et al., 2016; Guidi et al., 2015) suggest that the automatically generated objective smartphone data reflected by communication logs, screen activation, location and voice features may reflect illness activity in BD and discriminate between affective states. However, these studies included rather small samples and did not include HC, and thus the use of automatically generated objective smartphone data as a potential diagnostic behavioural digital marker discriminating between BD and HC has not been investigated and is unknown.
The aim of this pilot study was to investigate whether objective smartphone data could discriminate between patients with BD and HC including analyses of sensitivity and specificity, and thus potentially represent a potential diagnostic behavioural digital marker in BD. We hypothesized that automatically generated objective smartphone data would be able to discriminate (1) between patients with BD during euthymia compared with HC, (2) between patients with BD during depressive state and manic or mixed state, respectively, compared with HC, and (3) between patients with BD overall compared with HC.
We have previously reported on the association between smartphone data and illness activity among patients with BD included in this report, i.e., not including HC or comparisons between patients with BD and HC. In that study, we found that objective smartphone data reflect illness severity in BD and differ between affective states (Faurholt-Jepsen et al., 2016c).
Materials and methods
This pilot case–control study investigated the use of automatically generated objective smartphone data as an electronic diagnostic behavioural marker in patients with BD compared with HC.
The study was approved by the Committee on Health Research Ethics of the Capital Region of Denmark (H-7-2014-007 & H-2-2011-056) and the Danish Data Protection Agency (2013-41-1710). Smartphone data were stored at a secure server at Concern IT, Capital Region, Denmark (I-suite no. RHP-2011-03). The participants were offered to loan a smartphone free of charge by the study. Written informed consent was obtained from all participating subjects. The study complied with the Declaration of Helsinki.
We developed a software (‘MONARCA’) for smartphones to monitor self-assessed items and objective activities prior to this study (Bardram et al., 2013).
Study participants
Patients with BD
The patients were recruited from The Clinic for Affective Disorder, Psychiatric Centre Copenhagen, Denmark from October 2013 to December 2014. The inclusion criterion was BD diagnosis according to ICD-10 using Schedules for Clinical assessment of Neuropsychiatry (SCAN; Wing et al., 1990). The exclusion criteria were the lack of Danish language skills and pregnancy. In order to collect data during different affective states, the patients participated for a 12-week study period during the very early phase of their course of treatment at the clinic and received various types, doses and combinations of psychopharmacological treatment during the study. The patients were invited to participate in the study following referral to the clinic. Clinical and socio-demographic data were collected at inclusion. Analyses on the smartphone data and the association with illness activity in BD collected as part of this study have been published elsewhere (Faurholt-Jepsen et al., 2016c).
HC
As part of the study investigating stress in healthy individuals, a group of HC were recruited consecutively from the Blood Bank at Rigshospitalet, Copenhagen University Hospital, Denmark, by approaching blood donors in the waiting room on random occasions from September 2015 to August 2016. The inclusion criteria were as follows: women and men over the age of 18 years, no history of psychiatric illness and no first-generation family history of psychiatric illness and use of an Android smartphone as the regular mobile phone. The exclusion criteria were as follows: lack of Danish language skills and pregnancy. The HC participated in the study as part of a larger cohort study (Kessing et al., 2017). In this study, baseline data from HC were included in the analyses.
Since it was not possible to collect automatically generated objective smartphone data from iPhones at the time of the study, patients with BD and HC not willing to use Android smartphones during the study were excluded.
Settings and assessments
The study was conducted at The Clinic for Affective Disorder, Psychiatric Centre Copenhagen, Denmark.
Clinical assessments
Patients with BD. The BD diagnosis according to ICD-10 was confirmed using SCAN (Wing et al., 1990). The severity of depressive and manic symptoms was clinically rated fortnightly using the Hamilton Depression Rating Scale-17 items (HDRS-17; Hamilton, 1967) and the Young Mania Rating Scale (YMRS; Young et al., 1978) for the 12-week study period.
HC. The absence of any psychiatric diagnoses according to ICD-10 was confirmed using SCAN (Wing et al., 1990). The severity of depressive and manic symptoms was clinically rated at inclusion using the HDRS-17 (Hamilton, 1967) and the YMRS (Young et al., 1978).
All participants were instructed to carry their smartphones with them during the day and to use it for usual communicative purposes. Participants did not receive economic compensation for participating in this study.
Smartphone data
The MONARCA software used to monitor subjective and objective activities of BD was developed in our previous study (Bardram et al., 2013). After inclusion, the participants were instructed to use the MONARCA software for smartphones for self-evaluations on a daily basis during the study period. At the time of the study, the following objective smartphone data were available and automatically collected around the clock: the number of outgoing and incoming calls and text messages/day, the duration of phone calls (min/day), the number of times the smartphones’ screen was turned ‘on/off’ per day (reflecting the number of times the participants interacted with the smartphones), the duration the smartphone screen was ‘on’ per day. The particular smartphone data included in this report were due to technical aspects available in both cohorts (patients with BD and HC) at the time of the study. In addition, data on voice features during phone calls were collected, but due to the need for more advanced and technical statistical modelling data on the use of voice features for discriminating between patients with BD and HC and discriminating between affective states will be presented in future reports.
The researchers conducting the clinical assessments did not have access to the objective smartphone data and were therefore blinded to these data at the time of the clinical assessments. An overview of data collection during the study is provided in Table 1.
Data collection during the study including patients with bipolar disorder (BD; n = 29) and healthy control individuals (HC; n = 37).
Statistical analysis
The statistical analyses were defined a priori. To investigate differences in smartphone data between BD and HC, two-level linear mixed-effects regression models, which allow for variation of the outcome variables both within subjects (intra-individual variation) and between subjects (inter-individual variation), were employed since patients were investigated multiple times during different affective states. Level one represented individual repeated measures of objective smartphone data within each group (BD and HC) and level two represented between-subject variations (variations between BD and HC). Analyses comparing objective smartphone data between patients with BD (including different affective states) and HC were conducted. The linear mixed-effects regression models included a random intercept and the specification of individual participant as a random effect to accommodate correlations within each individual in the outcome variables over time. Covariates including age and gender were specified as the fixed effects. First, unadjusted linear mixed-effects regression models with levels of objective smartphone data as the dependent variables were conducted. Second, linear mixed-effects regression models with levels of smartphone data as the dependent variables adjusted for age and gender as the possible confounding factors were conducted. There is no consensus on how to report the performance on hierarchical linear models. In this paper, we used the Snijders and Bosker’s method (Snijders and Bosker, 1994).
In patients with BD and HC, the averages of the automatically generated objective smartphone data were taken for the days the outcome measures (HDRS-17 and YMRS) were referring to (the current and past 3 days). Automatically generated objective smartphone data from patients as well as HC were represented by 4 days from each rating and with the number of ratings ranging between 1 (HC) and 7 (BD). A depressive state was defined as an HDRS-17 score ⩾ 13 and a YMRS score < 13. A manic or mixed state was defined as a YMRS score ⩾ 13. A euthymic state was defined as an HDRS-17 score < 13 and a YMRS score < 13.
To calculate the classification accuracy of the objective smartphone data, machine learning techniques (scikit-learn gradient boosting classifier) were used. In many cases, we observed class imbalance; one class was represented by a large amount of examples (euthymia and depression), while the other was represented by a few examples (mania). To mitigate this problem, random oversampling, sampling the minority class with replacement, was used to create a balanced training set before learning the classifier. The gradient boosting classifier combines an additive sequence of simple decision tree classifiers into a single stronger classifier. At each iteration, a tree is generated from a subsample of the training data and using a random subset of features to ensure maximal degree of independence among the trees and prevents overfitting (Breiman, 2001). Model evaluation was done by 10-fold cross-validation. The sensitivity, estimating the probability that a test will identify ‘disease’ among those with ‘disease’, was calculated as true positive/positive, and the specificity, estimating the fraction of those without ‘disease’ among those with a negative test result, was calculated as true negative/negative. Area under the curve (AUC) was used as a metric to assess the performance of a model and the tradeoff between the sensitivity and specificity. The proportions of positive and negative results that were true positive and true negative were reflected by the positive predictive value (PPV), calculated as true positive/true positive + false positive, and the negative predictive value (NPV), calculated as true negative/true negative + false negative, and these metrics were used to describe the performance of the automatically generated objective smartphone data as a diagnostic test.
As no prior study has compared automatically generated objective smartphone data among patients with BD and HC, we were not able to make statistical power analyses prior to the study since potential effects are unknown. The hypotheses tested in this pilot study were made initially before data analyses and based on our prior findings within BD patients (Faurholt-Jepsen et al., 2016c). Although outcomes and covariates were specified prior to the analyses, the machine learning analyses were conducted post hoc. Consequently, we find it most correct to account for multiple testing in the statistical models (Bonferroni correction conducting 10 comparisons). p-values below 0.005 in individual models were considered statistically significant. Data were entered using Excel and EpiData®, Stata version 12.1 (StataCorp LP, College Station, TX, USA) was used for statistical analyses, and Python with the scikit-learn library was used for classification analysis.
Results
Background characteristics
A total of 51 patients with a diagnosis of BD and 72 HC were invited to participate in the study, of which 32 (62.7%) patients and 37 (53.4%) healthy control individuals agreed to participate. The main reasons for declining to participate were as follows: (1) it would be too time consuming and (2) a preference of iPhone. Three patients dropped out immediately after inclusion (changed their minds regarding participation), and thus a total of 29 patients participated in the study. None of the participants dropped out during the follow-up period. A total of 10.3% of the patients’ visits with the researcher were missing, thus leaving a total of 182 clinical ratings available. The present results are based on 29 patients with BD clinically evaluated fortnightly during the 12 weeks and different affective states and a group of 37 HC and represent a total of 219 clinical ratings (182 from BD and 37 from HC). Apart from three participants, all the included participants wished to use their own smartphone during the study. Furthermore, socio-demographic characteristics including the severity of depressive and manic symptoms according to affective states represented by raw and unadjusted mean scores on the HDRS-17 and YMRS are presented in Table 2.
Socio-demographic characteristics of patients with bipolar disorder (BD; n = 29) and healthy control individuals (HC; n = 37). a
HDRS-17: Hamilton Depression Rating Scale 17-item score; YMRS: Young Mania Rating Scale score; SD: standard deviation; IQR: interquartile range.
Data are mean (SD), median [IQR] or proportions (%, (n)) unless otherwise stated.
n represents the total number of clinical assessments with repeated measurements per patient during the study period. Data are mean (SD) and unadjusted values.
Differences in automatically generated objective smartphone data in BD overall and during affective states compared with HC
Table 3 presents the results of linear mixed-effects regression models for differences in objective smartphone data between affective states (euthymia, depressive state, manic or mixed state) in patients with BD and HC. Results from the unadjusted and adjusted analyses were similar, and thus results from the adjusted analyses (adjusted for age and gender) are presented.
Differences in automatically generated objective smartphone data between patients with bipolar disorder (BD; n = 29) and healthy control individuals (HC; n = 37). a
Number of clinical assessments according to affective states: a depressive state – n = 62; a manic or mixed state – n = 21; a euthymic state – n = 99; healthy control individuals – n = 37.
Analyses adjusted for age and gender.
p-values < 0.005 were considered statistically significant (Bonferroni correction).
In the adjusted models, the duration of phone calls was increased during a euthymic state, a depressive state, a manic or mixed state and overall (B = 53.33, 95% confidence interval (CI): [45.60, 61.05], p < 0.001, Snijders and Bosker’s estimate: 0.10), compared with HC.
The number of incoming text messages/day was increased during a manic or mixed state compared with HC. The duration the smartphone screen was on/day was decreased during a euthymic state (B = –72.44, 95% CI: [–112.70, –32.16], p < 0.001) and overall (B = –50.00, 95% CI: [–86.82, 13.19], p = 0.008, Sniders and Bosker’s estimate: 0.060) compared with HC.
Analyses including models adjusted for employment status were omitted, due to high collinearity between the groups (BD or HC).
Sensitivity, specificity, PPV and NPV of automatically generated objective smartphone data
Table 4 presents the results of gradient boosting classifier models for sensitivity, specificity, PPV and NPV of objective smartphone data. In models classifying BD overall versus HC, there were a sensitivity of 0.92, a specificity of 0.39, a PPV of 0.88 and an NPV of 0.52. In models classifying patients with BD during euthymia versus HC, there were a sensitivity of 0.90, a specificity of 0.56, a PPV of 0.85 and an NPV of 0.67.
Classification of BD versus HC based on automatically generated objective smartphone data.
BD: bipolar disorder; HC: healthy control individuals; SD: standard deviation; PPV: positive predictive value; NPV: negative predictive value; AUC: area under the curve.
Euthymic state: HDRS < 13 and YMRS < 13; depressive state: HDRS ⩾ 13 and YMRS < 13; manic state: YMRS score ⩾ 13.
Sensitivity = true positive/positive.
Specificity = true negative/negative.
In addition, exploratory analyses on the classification of affective states within BD based on combined objective smartphone data were conducted. A depressive state versus a euthymic state was classified with a sensitivity of 0.36, a specificity of 0.68, a PPV of 0.33 and an NPV of 0.65. A manic state versus a euthymic state was classified with a sensitivity of 0.37, a specificity of 0.92, a PPV of 0.47 and an NPV of 0.88.
Discussion
In this pilot study, we investigated differences in automatically generated objective smartphone data as well as the sensitivity, specificity, PPV and NPV between patients with BD and HC. The results indicated that automatically generated objective smartphone data differed between BD and HC, and that these had a rather high sensitivity and PPV but a low specificity and NPV.
Interestingly and in accordance with our a priori hypotheses, we found that the sensitivity and PPV of objective smartphone data were rather high in classifying BD overall and during euthymia versus HC. Furthermore, automatically generated objective smartphone data on changes in communicative activities and the use of smartphone differed between patients with BD during euthymia, depressive states, manic or mixed states and overall, respectively, and HC. However, not all of the automatically generated objective smartphone data differed between patients with BD and HC, and thus the findings in regard to the hypotheses were both supportive and some rejected.
The most intriguing and novel results from this pilot study were that (1) the sensitivity and PPV in classifying BD overall and during euthymia compared with HC were relatively high; (2) several of the automatically generated objective smartphone data differed between patients with BD during a euthymic state and HC; and (3) several of the automatically generated objective smartphone data differed between patients with BD, regardless the affective state, and HC.
A diagnosis of BD needs careful clinical evaluation and judgement. This is the first report on the use of automatically generated objective smartphone data as a potential diagnostic behavioural digital marker discriminating between BD and HC. The results from this study indicate that alterations in automatically generated objective smartphone data reflecting behavioural activities may represent a diagnostic behavioural digital marker of BD and could potentially represent a supplementary and assisting tool that could facilitate the clinical diagnostic process that currently, due to the lack of objective markers, relies on patient information and information from relatives, clinical observations and evaluations. However, in this study, the specificity and NPV were quite low. Thus, the tradeoff between the sensitivity and specificity reflected by the AUC should be considered in future studies and considerations on whether a high sensitivity could be important even though it could be at a cost of lower specificity and thereby the risk of false-positive classifications of patients with BD versus HC or false-positive classifications of affective states. Since still few studies have investigated the use of smartphones for monitoring in BD, more studies investigating this area more in depth both clinically and methodologically are needed. Large long-term cohort studies investigating the use of automatically generated objective smartphone data as a diagnostic marker, differentiating BD from HC and relatives at risk of BD and as a state marker, and differentiating among euthymia, depression and mania are ongoing (Kessing et al., 2017).
Surprisingly, as can be seen from the results of this pilot study, regardless of the affective state, patients with BD received and sent more text messages than HC. This may reflect the fact that the social network including relatives and people caring for the patients may be activated by the patients’ condition. Patients may thus contact others and may be contacted more due to worrying, help and care. During manic or mixed states, patients with BD did not send more text messages than HC. However, during manic or mixed states, the duration of phone calls was increased so it may be that patients and their social network (including relatives) experienced increased need for verbal communication by phone calls and not by writing text massages. Also, during a manic or mixed state, the smartphones’ screen was turned ‘on’ (e.g. interacting with the smartphone by turning on the smartphones’ screen) more frequently compared with HC. However, the duration of time when the screen was turned ‘on’ was not increased. It may be that patients during a manic or mixed state had increased activity levels and restlessness reflected by turning the screen ‘on/off’ an increased number of times but not letting the screen ‘on’ for a prolonged period of time.Future long-term and large cohort studies investigating the use of composite measures based on automatically generated objective smartphone data reflecting behavioural activities to discriminate between patients with BD and HC and to discriminate between affective states will hopefully provide more insight to the area, and investigate which combinations of smartphone data that will give the highest sensitivity, specificity, PPV and NPV so that the use of automatically generated objective smartphone data potentially can be clinically relevant for monitoring and treating patients with BD.
Currently, approximately one-third of the world’s adult population owns and uses a smartphone, and it has been estimated that by the year 2020 this proportion will increase to 80%.
Overall, since most people carry their phone with them during most of the day and that it has been estimated that by the year 2020 the proportion of the world’s adult population that owns and uses a smartphone will increase to 80% (ChargeItSpot, 2016), smartphones can allow for collection of data on behavioural aspect that otherwise would be difficult to access. Furthermore, fine-grained data can be collected during real-time and naturalistic settings with a low level of intrusiveness without the need for people to interact with a software programme, minimizing the risk of missing data and fatigue during long-term monitoring. Accordingly, this type of automatically generated objective smartphone data has clear advantages in the monitoring of BD.
Limitations
Several clinical as well as methodological limitations to this study should be mentioned. Case–control studies carry an inherent risk of bias at different levels, such as selection bias, information bias and confounding, necessitating strict methodological requirements and thorough considerations of the study design and analyses. The patients included in the study consisted of those with bipolar I and bipolar II disorders treated in a highly specialized mood disorder clinic, during the very early phase of their course of treatment. This may have contributed to the relatively low number of manic/mixed episodes as well as the relatively low symptom level during affective episodes. However, the patients were recruited during the beginning of their course of illness presenting with rather high levels of depressive and manic symptoms assessed using standardized clinical rating scales during the study period. In addition, patients were newly diagnosed with BD and this could have had an impact on the use of technology regarding support and information seeking. Thus, findings may be more specific to this group of patients, and future studies including patients during different stages of illness could provide more generalizable findings.
Furthermore, a potential confounding effect of psychopharmacological treatment cannot be ruled out. Defining and recruiting a proper control group in case–control studies is always difficult. The HC included in this study were recruited from the Blood Bank at Rigshospitalet, Copenhagen University Hospital, Denmark, and thus may represent a ‘super-healthy’ comparison group.
Along this line, a potential confounding effect of employment status cannot be excluded. However, analyses including models adjusted for employment status were omitted, due to high collinearity between the groups (BD or HC). Future observational studies could consider investigating this aspect further and perhaps consider matching of groups on employment status. Many factors that are not related to the mental health status of the participants could also influence the results. People may vary considerably in how they interact with their phones and many people use other platforms to communicate with others including WhatsApp and Facebook messenger. To account for some of these differences, the statistical analyses were adjusted for age and gender. However, the unknown variability in phone usage should be addressed in future studies by including larger samples and adjusting the statistical analyses for additional confounding factors.
The study included a rather small sample of patients and HC and did not include a power analysis prior to the study. However, each patient with BD was assessed several times during the follow-up, thereby increasing the statistical power. Future studies should include a priori power analysis estimated when designing the study. Regarding the statistical analyses, although outcomes and covariates were specified prior to the analyses, the machine learning analyses were conducted post hoc and consequently multiple testing was accounted for in the statistical analyses.
None of the included patients dropped out during the study, but patients who were unwilling to use an Android smartphone were excluded, since it was not possible to collect automatically generated objective smartphone data from iPhones. Thus, the participating patients could represent a sample of particularly motivated patients not having problems with interacting with Android Smartphones and could introduce a potential bias as described by others (Montes et al., 2012; Spaniel et al., 2008).
Using automatically generated objective smartphone data as diagnostic digital markers highlights a number of challenges with digital/mHealth, and this sort of smartphone system in particular. When using this kind of platform, there are challenges regarding regular smartphone/iOS update and the need for constant updating of software. There will be challenges regarding whether the validity evaluations of older versions can apply to newer versions. This is one of the key problems in digital health, and considerations regarding this matter should be one of the key points to address in future studies. Furthermore, future studies including a priori power calculations investigating the sensitivity, specificity, PPV and NPV of automatically generated objective smartphone data in larger samples in BD comparing affective states, healthy relatives at risk of BD and HC are needed to evaluate the clinical utility. Including healthy first-degree relatives at risk of developing BD, as we are currently doing (Kessing et al., 2017), could provide further information on the use of objective smartphone data as an early predictive marker of later onset of BD and could provide important knowledge regarding the causality of changes in objective smartphone data in BD. Finally, few studies on the use of smartphone data as a marker of BD have been published and the thus the findings from this study are hypothesis generating and should be investigated further in future studies.
Conclusion
This pilot study demonstrated rather high sensitivity and PPV of objective smartphone data between BD and HC, but with low specificity and NPV. Furthermore, there were differences in levels of automatically generated objective smartphone data reflecting behavioural activities in patients with BD (during euthymia, depressive and manic or mixed states and overall) compared with HC. Objective smartphone data may represent a potential diagnostic behavioural digital marker and could potentially supplement, assist and facilitate the diagnostic process within BD, but further studies including larger samples are needed.
Footnotes
Acknowledgements
The authors would like to thank the patients and HC for participating in this study and statistician Christian Ritz, PhD, for consulting on the statistical analyses. M.F.-J., H.Þ. and L.V.K. authored the protocol, conceived the study and performed the statistical analyses. All authors have contributed to and approved the manuscript.
Declaration of Conflicting Interests
M.F.-J., J.B. and H.Þ. have no conflicts of interest. M.F. and J.E.B. are shareholders and co-founders of Monsenso providing the MONARCA system. M.V. has within the recent 3 years received speaker fees from Lundbeck. L.V.K. has within recent 3 years been a consultant for Lundbeck, Servier and Astra Zeneca.
Funding
This study was funded by Lundbeckfonden (R167-2013-16138), the European Union (EU) 7th Frame Programme, the Mental Health Services, Copenhagen, Denmark, the Danish Foundation Trygfonden (ID 109766), the Gert Einar Joergensens foundation, and the A.P. Moeller and the Hustru Chastine Mc-Kinney Moellers foundation for general purposes. The funders had no role in the study design, data collection, analyses and preparation of the manuscript.
