COVID-19: Detecting depression signals during stay-at-home period

Abstract

The new coronavirus outbreak has been officially declared a global pandemic by the World Health Organization. To grapple with the rapid spread of this ongoing pandemic, most countries have banned indoor and outdoor gatherings and ordered their residents to stay home. Given the developing situation with coronavirus, mental health is an important challenge in our society today. In this paper, we discuss the investigation of social media postings to detect signals relevant to depression. To this end, we utilize topic modeling features and a collection of psycholinguistic and mental-well-being attributes to develop statistical models to characterize and facilitate representation of the more subtle aspects of depression. Furthermore, we predict whether signals relevant to depression are likely to grow significantly as time moves forward. Our best classifier yields F-1 scores as high as 0.8 and surpasses the utilized baseline by a considerable margin, 0.173. In closing, we propose several future research avenues.

Keywords

Coronavirus stay home depression overlapping behavior similarity

Introduction

The ongoing coronavirus outbreak has been officially defined a global pandemic by the World Health Organization (WHO) on 11 March 2020. Coronavirus disease 2019 (COVID-19) is an infectious disease caused by a newly discovered coronavirus.¹ COVID-19 causes a respiratory illness characterized by symptoms such as cough, fever, difficulty breathing, and pneumonia in both lungs. These symptoms may take up to 14 days to appear after exposure to COVID-19. COVID-19 spares no one and infects people of all ages. Older people and those with pre-existing medical conditions like cardiovascular disease, diabetes, chronic respiratory disease, and cancer appear to be more vulnerable to becoming severely ill with COVID-19.^1,2

WHO has reported a drastic increase in confirmed cases and deaths all over the world. To mitigate the rapid spread of COVID-19, many countries have forbidden indoor and outdoor gatherings in excess of particular numbers of people; asked non-essential services, nonprofit entities, and retail businesses to close; issued stay-at-home orders for their residents; and advised them to practice social distancing and avoid all non-essential travel abroad. We are living through a pivotal moment in history. The onslaught of the pandemic has severely challenged our economic systems³ and caused substantial changes to people’s daily routine. The current pandemic can affect people both physically and psychologically.⁴ For example, in China, 96.2% of clinically stable COVID-19 patients in the early recovery phase reported significant post-traumatic stress disorder (PTSD) symptoms.⁵ Psychological distress is increasing worldwide and may have long-lasting consequences and repercussions on mental health.^6–9

Given the developing situation with the pandemic, social media allows people to inform themselves and get updates from official sources. People may naturally panic when seeing headlines announcing bad news and numbers of cases. This may affect ways in which individuals express themselves and share opinions, thoughts, and personal experiences with others. The emotion and language in social media postings may potentially indicate feelings such as loneliness,¹⁰ anxiety, anger and stress, among others.^11,12 For instance, a person may express emotional reactions that can be unpleasant, disturbing, and overwhelming. Emotional problems like anxiety and depression manifest themselves as feelings of inner emotional distress. Mental health issues can comprise a wide range of disorders that affect mood, thinking, and behavior. Some examples of mental illness include PTSD, depression, anxiety disorders, addictive behaviors, etc. In this paper, our primary interest is in depression. Depression is a serious condition that can cause a persistent feeling of sadness and loss of interest and can affect a person’s daily life.¹³ Survey research conducted by Mental Health Research Canada found that feelings of depression are rising constantly.¹⁴ Before the pandemic, 7% of Canadians reported high levels of depression. This rate has risen to 16% during the stay-at-home period and 22% predict high levels of depression if social isolation continues for two more months.¹⁴

Recognizing early signs of depression is of critical importance and can aid mental health services in assessing the impact of the pandemic on the population and implementing healthier coping strategies to build personal resilience. In addition, appropriate services can be provided for those in need. In this paper, we leverage social media postings to detect signals relevant to depression due to COVID-19. To this end, we build a corpus of postings shared on Twitter during the stay-at-home period. We make use of a topic modeling approach to generate topics addressed by individuals and evaluate language features from topic words to determine whether they indicate signals for depression. It should be noted that we retain solely depression-indicative topics and collect individuals who engage with these topics to investigate their posting histories since the onset of the stay-at-home order. Specifically, this work makes the following contributions:

1. We demonstrate the effectiveness of our data collection and data pre-processing strategy to gather social media postings containing signals relevant to depression.

2. We capture evidence from a corpus of postings and potential individuals who manifest signals for depression and consider them as an experimental group. We measure the similarity between different topics addressed by individuals in the experimental group to discover their overlapping behavioral characteristics and understand their linguistic idiosyncrasies.

3. We develop models to predict whether signals relevant to depression are likely to grow significantly as time moves forward.

Related work

The role of social media in mental health has been explored by De Choudhury.¹¹ The study suggested a guideline that emphasizes the use of social media postings to gauge what the pertinent mental literature would predict at the individual- and population-levels. This could allow the identification of depressed or otherwise at-risk individuals through the large-scale passive monitoring of social media.¹⁵ Recently, research has associated social media with several mental health conditions, including stress,^16–18 post-traumatic stress disorder,^19–21 and depression.^15,22–32

To quantify depression from texts, De Choudhury et al. proposed a social media depression index to identify levels of depression among individuals and predict social network behavior changes related to post-partum depression using several features, including structural properties of social networks.²⁴ While some studies rely exclusively on open-vocabulary analysis and lexicon-based techniques such as Linguistic Inquiry and Word Count (LIWC)³³ to build a classifier, other studies couple LIWC with topic modeling features.^27,34–36 For instance, Coppersmith et al. used LIWC to demonstrate characteristic differences in language use for mental disorders.¹⁹ Their approach utilizes uni-grams and 5-grams to indicate the presence of mental health conditions. Stark et al. combined LIWC and latent Dirichlet allocation (LDA)-based features in the classification of social relationships.³⁴ Resnik et al. explored the value-add of topic modeling in text analysis for depression and showed that topic models can take us beyond the LIWC categories to relevant themes related to depression and neuroticism as a strongly associated personality measure.²⁷ Another work of Resnik et al. investigated the use of supervised topic models in the analysis of linguistic signals for detecting depression.²⁸ Tadesse et al. demonstrated that multiple feature combinations (LIWC+LDA+bi-gram) can yield competitive results.³⁵ In this paper, we take a step forward by combining LDA with bi-gram, LIWC and other psycholinguistic dictionary-based features to identify depression-indicative topics, in order to facilitate the investigation of signals relevant to depression. The rationale behind the incorporation of additional features is to enrich the model to be able to capture depression-related terms and patterns that may escape the LIWC dictionary. We utilize correlation metrics to compare the performance of the proposed features with other alternative feature combinations.

Detection of depression signals

Dataset during the stay-at-home period

All data we obtained is public, posted between 12 March 2020 and 25 May 2020,¹ and made available from Twitter. Specifically, we extracted tweets bearing the words or hashtags: COVID, coronavirus, #StayAtHome, or #StayHome. For privacy and ethical reasons, we avoid displaying personally identifiable information, especially names and pseudonyms. Therefore, we randomly replaced such information to ensure the anonymity and privacy of the data.

To preprocess the data, we limited our set to Canadian users and removed tweets written in a language other than English or French. Additionally, we discarded redundant tweets, retweets without comments, tweets containing only the keyword (i.e., words or hashtags utilized for extraction), and multimedia such as image and video. We removed links in tweets, but kept emojis, since research has proven that emotions within a text can be expressed through the use of emojis.³⁷ We used the Python Googletrans² implementation package to translate tweets from French to English. We removed tweets in which the word COVID or coronavirus occurs simultaneously with the term mental health or depression.³ We believe that people reacting emotionally may avoid combining the two words in a single tweet when it conveys a personal account. Consequently, we assume that these kinds of tweets are more likely to convey information or warnings about mental health. We eliminated stopwords but kept pronouns.⁴ Pronouns reveal information on people’s emotional state, thinking, and personality.³³ Chung and Pennebaker discovered that individuals susceptible to mental illness such as depression more frequently use first-person pronouns, suggesting higher self-attention focus.³⁸

To concentrate exclusively on data containing signals relevant to depression, we quantified different aspects of the language usage and patterns of individuals, using automated methods in order to extract features indicative of depression in tweets.

Dataset before the stay-at-home order

We replicated and applied the same logic as above to collect tweets posted before the stay-at-home order, that is, from 1 January 2020 to 11 March 2020. In total, we extracted 1,006,941 tweets and 161,327 distinct users, that is, users who had at least five tweets.

Feature Design

Bi-gram features

We extracted bi-grams from tweets by leveraging the vectors based on the term frequency-inverse document frequency (TF-IDF) approach.^35,39 We used TF-IDF as a statistical measure to evaluate how important a word is to each tweet in the corpus. We convert each tweet into its bag-of-word representation and calculate the TF-IDF value of each word utilizing the standard formula (Equation (1)).

TF - IDF = (1 + \log n_{w, t}) \times \log \frac{T}{T_{w}}

(1)

where the TF-IDF value of word w in tweet t is the log normalization of the number of times the word occurs in the tweet times the inverse log of the number of tweets T and T_w the number of tweets containing word w.

LIWC features

The Linguistic Inquiry and Word Count (LIWC) dictionary is a widely used psychometrically validated system for psychology-related analysis of language and word classification.³³ LIWC includes word categories that have pre-labeled meanings. For each tweet, we calculated the number of observed words, using the LIWC dictionary and focusing on three LIWC categories: linguistic dimensions, psychological processes, and personal concerns. For the psychological processes and personal concerns categories, we utilized all of their subcategories, while for the linguistic dimensions category, we exclusively measured the proportion of first-person pronouns in the tweet.

PLUS features

We extracted depression-related features from the MRC psycholinguistic database,⁴⁰ the WHO glossary of psychiatric and mental health terms,⁴¹ and the NRC emotion lexicons.⁴² The NRC emotion lexicon is a list of English words and their associations with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). MRC provides information about 26 different linguistic properties and includes more than 150,000 words with linguistic and psycholinguistic features of each word. For each tweet, we identified depression-related words using the WHO glossary and verified whether these words fall into the NRC emotion lexicons. Specifically, we discarded all the words that imply “joy” as the emotional state. Each MRC feature was computed by averaging the scores of all the depression-related words found in the database.

LDA features

We utilized LDA⁴³ to learn the topics addressed from the tweets. LDA is a probabilistic model that discovers latent topics in a text corpus and can be trained using collapsed Gibbs sampling. A topic is a distribution over a fixed vocabulary. As the parameters of LDA, we set α and β to 0.01. All extracted topics were used as features.

Experimental Setup and Results

Prediction of depression during the stay-at-home period

We generated 50 topics overall, of which we especially examined topics containing words related to mental health. To this end, we combined PLUS, bi-gram, and LIWC features to identify topics containing depression-related words. The depression-indicative topics were validated by clinical psychologists. Next, we took users who engaged with the 38 depression-indicative topics (see Table 1) and collected all tweets of these users from 12 March 2020 to 25 May 2020. We kept users who had at least five tweets and considered these users as an experimental group. In total, we were left with 87,236 distinct users and 857,294 tweets. We performed linear regression with elastic-net regularization to predict depression signals derived from previous features and evaluated the quality of prediction using the Pearson correlation (r). We stratified the dataset for 10-fold cross-validation to separate our training and testing sets. Table 2 shows that all of the feature sets combined (LIWC+PLUS+bi-gram+LDA) produce much stronger correlations (r = 0.506, p

<

0.001) with depression than other alternative combinations or LIWC alone, and perform reliably well at predicting depression. We report that all correlation coefficients meet (p

<

0.05). We observe that adding PLUS features improves significantly on the results yielded by LIWC+bi-gram+LDA by a considerable margin. It should be noted that Pearson correlations between behavior (such as language use) and psychologically-based features rarely surpass an r of 0.4.⁴⁴

Table 1.

Top fifteen words for the first five of the 38 validated depression-indicative topics.

Topic	Words
1	Limit, alone, bad, I, bored, hard, when, time, wash, hand, tired, isolation, abuse, social, paper
2	Feeling, myself, mask, mood, extremely, time, affect, out, crisis, mind, bad, finish, way, I, worse
3	Friends, sleep, I, life, suffer, miss, shit, always, dull, long, end, back, family, hopeless, change
4	disgust, hell, freaking, I, enemy, worry, care, moment, invisible, difficult, feel, bad, health, home, sick
5	Time, sad, home, close, depressed, hard, move, limited, boring, unhappy,stay, services, weird, feel, park

Table 2.

Prediction quality for depression, for different feature sets and all combinations, as measured using the Pearson r. For LIWC features, we consider one feature per category and for LDA features, we take one feature per topic.

Feature set	r
LIWC	0.286
LIWC+LDA	0.342
LIWC+bi-gram	0.313
LIWC+bi-gram+LDA	0.371
LIWC+PLUS+bi-gram+LDA	0.506

To make predictions over time for signals relevant to depression, we divided our data (857,294 tweets) into one-week periods. Specifically, we separately derived 50 topics from each subset. We prepared the training set using topics from the first to the penultimate week and took topics from the last week as the test set. We utilized three different classifiers: support vector machine (SVM), logistic regression (LR), and random forest (RF). We trained our classifiers with the three feature sets which achieved the highest Pearson’s (r) results in Table 2: LIWC+LDA, LIWC+bi-gram+LDA, and LIWC+PLUS+bi-gram+LDA. We considered the feature set LIWC itself as a baseline. For SVM, we set the regularization parameter λ = 0.0001 and the value γ of the radial basis function kernel to 0.5 and for RF, we set the number of trees to 500 and the maximum depth and number of features to 3 and 30, respectively. The prediction performances are reported as F-1 scores, i.e., the harmonic mean of precision and recall.

Table 3 shows the results for depression prediction over time. We see that the F-1 scores achieved with SVM, LR, and RF over the used feature sets are significantly higher than 0.5. We observe that SVM yielded the best performance over LIWC+PLUS+bi-gram+LDA features (0.802), surpassing the baseline (0.629) with a substantial improvement of 0.173. We note that the smallest result achieved with LIWC+PLUS+bi-gram+LDA (0.780) is superior to the performance of our second-best features, LIWC+bi-gram+LDA (0.718). These results indicate that LIWC+PLUS+bi-gram+LDA can detect signals relevant to depression more effectively than other features. LIWC+bi-gram+LDA features resulted in better results than LIWC features alone (0.629) or the combination of LIWC and LDA (0.654). We note that prediction quality depends heavily on complementary features, that is, the more a combination includes several features, the more it yields significantly better results.

KL (P ∥ Q) = \sum_{i \in [n]} p_{i} \times \log (\frac{p_{i}}{q_{i}})

(2)

JS (P ∥ Q) = \frac{1}{2} KL (P ∥ M) + \frac{1}{2} KL (Q ∥ M)

(3)

Table 3.

Prediction performances over time. Bold font indicates the best result for each feature set.

Feature set	SVM	LR	SVM
LIWC	0.629	0.611	0.623
LIWC+LDA	0.652	0.647	0.654
LIWC+bi-gram+LDA	0.706	0.718	0.715
LIWC+PLUS+bi-gram+LDA	0.802	0.800	0.780

Similarity between topics before and during stay-at-home restrictions

To discover overlapping behavioral characteristics of depression-related terms, we experimented with 50 topics on each one-week subset of the data as divided above. Each topic was represented by the top fifteen highest-probability words, out of which we retained solely the top ten depression-related words. We computed topic similarity using measures based on topic word probability distributions⁴⁵ (such as Kullback-Leibler divergence (KL)⁴⁶) and topic word sets⁴⁷ (such as Jaccard similarity (JS).⁴⁸)

Let us look at two discrete probability distributions

P = {p_{i}}_{i \in [n]}

and

Q = {q_{i}}_{i \in [n]}

supported on [n]. KL measures the difference between two probability distributions (Equation (2)). Equation (2) determines how the Q distribution is different from the P distribution. KL is a non-negative, asymmetric distance (i.e., KL(P∥Q) ≠ KL(Q∥P)) which yields zero if the two distributions are identical and can potentially equal infinity.⁴⁹ For JS, we measured the similarity between all possible topic pairs. JS is a symmetrized, smoothed version of KL which measures the total KL divergence from the average mixture distribution,

M = \frac{(P + Q)}{2}

(Equation (3)). Some salient features of JS are that it is always defined, bounded and symmetric, and only vanishes when P = Q. When all the top words of a pair of topics are different, JS may result in 0. We found that some topic pairs bear words that include different spellings but are synonyms. To harmonize topic pairs that fall into that situation, we manually replaced synonyms with a single word on either side. We calculated the average JS and KL yielded from different time periods and found that depression-related words were overlapping from one topic to another during the stay-at-home period, and were slightly overlapping before the stay-at-home order (see Table 4).

Table 4.

Similarity between different depression-related topics addressed by individuals between before and during the stay-at-home period.

	Similarity	Before	During
LIWC+LDA	JS	0.005	0.327
	KL	0.017	0.403
LIWC+bi-gram+LDA	JS	0.022	0.341
	KL	0.02	0.335
LIWC+PLUS+bi-gram+LDA	JS	0.025	0.478
	KL	0.027	0.290

The Spearman correlation (ρ) between the two-similarity metrics is presented. We obtain ρ = 0.839 for LIWC+LDA, ρ = 0.873 for LIWC+bi-gram+LDA, and ρ = 0.930 for LIWC+PLUS+bi-gram+LDA during the stay-at-home period; and ρ = 0.011 for LIWC+LDA, ρ = 0.016 for LIWC+bi-gram+LDA, and ρ = 0.02 for LIWC+PLUS+bi-gram+LDA before the stay-at-home order. We report that all correlations are statistically significant (p $<$ 0.001) and superior to 0.820 during the stay-at-home; and all correlations are not significant before the stay-at-home order (p $>$ 0.05). In Figure 1, we utilize LIWC+PLUS+bi-gram+LDA. It should be recalled that the stay-at-home was issued on March 12. Consequently, we combine all the data of March to measure the similarity. Specifically, January and February are fully comprised in the data before the stay-at-home. We obtain a KL of 0.024 and 0.035 in January and February (p $>$ 0.05), respectively; 0.29 and 0.3 in April and May (p $<$ 0.001), respectively; and 0.27 in March (p $<$ 0.05). We get a JS of 0.026 and 0.0249 in January and February (p $>$ 0.05), respectively; 0.48 and 0.5 in April and May (p $<$ 0.001), respectively; and 0.39 in March (p $<$ 0.05).

Figure 1.

Monthly trends of similarity between depression-related topics addressed by individuals. Note that we utilize LIWC+PLUS+bi-gram+LDA.

These results indicate strong and meaningful correlations between depression-indicative topics addressed during the stay-at-home. The language in these topics appears to be somewhat similar and recurs from one period to another during the stay-at-home period. This suggests that we should pay more attention to this vocabulary when predicting depression from the individual-level.

Figure 2 shows the trend of individuals who have participated in depression-related topics. We observe a rise of participants within the second week of March, which symbolizes the onset of lockdown; and we note that the number substantially decreased within the fifth week of May, which represents the date on which COVID-19 lockdown restrictions began slowly being relaxed across the country. We calculated the percentage that individuals who have participated in depression-related topics represents to the overall number of individuals collected for each month. We found that 6.9%, 7.7%, 28.4%, 36.4% and 30.1%, respectively, for January, February, March, April and May.

Figure 2.

The number of individuals who have participated in depression-related topics. We make a weekly count of these individuals in the months before and during the stay-at-home order. For instance, the blue bar in Jan (January) is associated with the first week (W1), the red bar with the second week (W2), and so on.

Discussion and Conclusion

The COVID-19 pandemic has upended much of society in unprecedented ways. Due to border closures, lockdowns, social-distancing measures and other restrictions implemented by the government, people often use social media such as Twitter and Facebook for socializing. In this paper, we collected geo-located tweets before and during the first lockdown in Canada in order to detect relevant-depression signals from the content of social media postings.

The primary objective of this research was to measure the similarity between different topics addressed by people to discover overlapping behavioral characteristics of depression-related words. To this end, we computed the language similarity between all possible topic pairs addressed by people. By analyzing the similarity between depression-related topics, we observed that all correlations are statistically significant during stay-at-home restrictions, but not significant before the coronavirus lockdown. Specifically, during the lockdown, the correlations are statistically significant for all the features utilized, and the proposed features specifically achieve the strongest correlation. Our results find that people were engaged more deeply in depression-related topics during lockdown in Canada. Interestingly, our results indicate similar patterns with a study conducted to detect community depression dynamics due to COVID-19 in Australia and another study conducted to estimate the prevalence of and risk factors associated with depression symptoms among United States adults before and during the pandemic.^50,51 Our research and aforementioned studies found a surge of depression-indicative signals from the onset of lockdown (March 2020) to the relaxation of lockdown (May 2020). Our findings highlight the urgent need to develop psychological interventions and preventive strategies that can preserve the mental health of individuals during the COVID-19 lockdown.

The second main aim of the research was to build a predictive model to investigate the evolution of relevant-depression signals during the coronavirus lockdown. Our predictive model provided evidence of pronounced depression patterns in the experimental datasets. Our best classifier achieves F-1 scores as high as 0.8, which is a 0.173 relative the improvement over the baseline features. The proposed features yield a higher Pearson correlation (r = 0.506) than other alternative feature combinations and the improvement is statistically significant (p $<$ 0.001). Prior work found that Pearson correlations between language use and psychologically-based features rarely exceed a value of r = 0.4,⁴⁶ while our result has surpassed this value by 0.106. Our results provide strong evidence that we can predict depression signals with an accuracy exceeding 0.4, a resolution that is likely fine-grained enough for various experimental datasets. While our model achieved good performance, our research is limited by several important factors. We did not take into account demographic features and predict depression signals at the regional level. In addition, we did not split into two groups experimental datasets to predict depression based on Twitter discourse of ordinary users and verified users; this can help detect more depressed groups.

In future work, we aim to include socioeconomic and demographic attributes with network and language information to predict depression at the regional level. Furthermore, we would like to investigate affinity relationships between individuals who manifest signs of depression, including their personality types.^52–54

Footnotes

Acknowledgements

The authors would like to thank the anonymous reviewers for their insightful and constructive comments. This work was partially supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) under Discovery Grant RGPIN-2020-07110 and Discovery Accelerator Supplements Grant RGPAS-2020-00089 to Pr. Wang. Tshimula thanks the Arbour Foundation for awarding him a scholarship.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Jean Marie Tshimula

Notes

References

WHO . Coronavirus disease (COVID-19) pandemic. Geneva, Switzerland: World Health Organization; 2019. https://www.who.int/emergencies/diseases/novel-coronavirus-2019 (accessed on 13 September 2020).

The Government of Canada . Coronavirus disease. Health: Diseases and conditions. Canada: The Government of Canada; 2020. https://www.canada.ca/en/public-health/services/diseases/coronavirus-disease-covid-19.html (accessed on 26 September 2020).

McKibbin

Fernando

. The global macroeconomic impacts of COVID-19: Seven scenarios. CAMA Working Paper No. 19/2020. Rochester, NY: SSRN; 2020. p. 45.

Wang

Pan

Wan

, et al. Immediate psychological responses and associated factors during the initial stage of the 2019 coronavirus disease (COVID-19) epidemic among the general population in China. Int J Environ Res Public Health 2020; 17(5): 1729.

Yang

, et al. Posttraumatic stress symptoms and attitude toward crisis mental health services among clinically stable patients with COVID-19 in China. Psychol Med 2020; 51: 1052–1053.

Brooks

Webster

Smith

, et al. The psychological impact of quarantine and how to reduce it: Rapid review of the evidence. Lancet 2020; 395: 912–920.

Gunnell

Appleby

Arensman

et al. and the COVID-19 Suicide Prevention Research Collaboration. Suicide risk and prevention during the COVID-19 pandemic. Lancet Psychiatry 2020; 7: 468–471.

Wang

Xue

, et al. The impact of COVID-19 epidemic declaration on psychological consequences: A study on active Weibo users. International J Environmental Res Public Health 2020; 17(6): 2032.

Meng

Dai

, et al. Analyze the psychological impact of COVID-19 among the elderly population in China and make corresponding suggestions. Psychiatry Res 2020; 289: 112983.

10.

Guntuku

Schneider

Pelullo

, et al. Studying expressions of loneliness in individuals using twitter: an observational study. BMJ Open 2019; 9: e030355.

11.

De Choudhury

. Role of social media in tackling challenges in mental health. In: Proceedings of the 2nd International Workshop on Socially-Aware Multimedia, Barcelona, Spain, 2013, pp. 49–52.

12.

Mohammad

Elham

Aghamohammadi

, et al. Identifying data elements and key features of a mobile-based self-care application for patients with COVID-19 in Iran. Health Informatics J 2021; 27(4): 1–15.

13.

Kanter

Busch

Weeks

, et al. The nature of clinical depression: Symptoms, syndromes, and behavior analysis. Behav Analyst 2008; 31(1): 1–21.

14.

Mental Health Research Canada . Mental health in crisis: How covid-19 is impacting Canadians. Toronto, ON: Mental Health Research Canada; 2020. https://bit.ly/2UUvyKU (accessed on 15 September 2020).

15.

Guntuku

Yaden

Kern

, et al. Detecting depression and mental illness on social media: an integrative review. Current Opinion in Behavioral Sciences 2017; 18: 43–49.

16.

Guntuku

Buffone

Jaidka

, et al. Understanding and measuring psychological stress using social media. In: Proceedings of the Thirteenth International AAAI Conference on Web and Social Media (ICWSM), Münich, Germany, 2019, pp. 214–225.

17.

Saha

De Choudhury

. Modeling stress with social media around incidents of gun violence on college campuses. Proc ACM Hum-Comput Interact 2017; 1: 1–27.

18.

Thelwal

. TensiStrength: Stress and relaxation magnitude detection for social media texts. Information Processing Management 2017; 53(1): 106–121.

19.

Coppersmith

Dredze

Harman

. Quantifying mental health signals in Twitter. In: Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality; 2014, pp. 51–60.

20.

Coppersmith

Dredze

Harman

. Measuring post traumatic stress disorder in Twitter. In: Proceedings of the 7th International AAAI Conference on Weblogs and Social Media (ICWSM), Atlanta GA, 2014, pp. 23–45.

21.

Veldkamp

De Vries

. Screening for posttraumatic stress disorder using verbal features in self narratives: A text mining approach. Psychiatry Res 2012; 198(3): 441–447.

22.

Cacheda

Fernandez

Novoa

, et al. Early detection of depression: Social network analysis and random forest techniques. J Med Internet Res 2019; 21(6): e12554.

23.

Coppersmith

Dredze

Harman

, et al. CLPsych 2015 shared task: Depression and PTSD on Twitter. In: Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality; 2015, pp. 31–39.

24.

De Choudhury

Counts

Horvitz

. Social media as a measurement tool of depression in populations. In: Proceedings of the 5th Annual ACM Web Science Conference, Paris, France, 2013, pp. 47–56.

25.

Jamil

Inkpen

Buddhitha

, et al. Monitoring tweets for depression to detect at-risk users. In: Proceedings of the Fourth Workshop on Computational Linguistics and Clinical Psychology; 2017, pp. 32–40.

26.

Resnik

Armstrong

Claudino

, et al. The University of Maryland CLPsych 2015 shared task system. In: Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality; 2015, pp. 54–60.

27.

Resnik

Garron

Resnik

. Using topic modeling to improve prediction of neuroticism and depression in college students. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, WA, 2013, pp. 1348–1353.

28.

Resnik

Armstrong

Claudino

, et al. Beyond LDA: Exploring supervised topic modeling for depression-related language in Twitter. In: Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality; 2015, pp. 99–107.

29.

Sadeque

Bethard

. Measuring the latency of depression detection in social media. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining; 2018, pp. 495–503.

30.

Schwartz

Eichstaedt

Kern

, et al. Towards assessing changes in degree of depression through Facebook. In: Proceedings of the ACL Workshop on Computational Linguistics and Clinical Psychology, New Orleans, LA, 2014.

31.

Shen

Jia

Nie

, et al. Depression detection via harvesting social media: A multimodal dictionary learning solution. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, Melbourne, Australia, 2017, pp. 3838–3844.

32.

Tsugawa

Kikuchi

Kishino

, et al. Recognizing depression from Twitter activity. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems - CHI’15, New York, NY, 2015, pp. 3187–3196.

33.

Pennebaker

Boyd

Jordan

, et al. The development and psychometric properties of LIWC2015. Austin, TX: University of Texas at Austin, 2015.

34.

Stark

Shafran

Kaye

. Hello, who is calling?: can words reveal the social nature of conversations? In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; 2012, pp. 112–119.

35.

Tadesse

Lin

, et al. Detection of Depression-Related Posts in Reddit Social Media Forum. IEEE Access 2019; 7: 44883–44893.

36.

Zhai

Boyd-Graber

Asadi

, et al. a flexible large scale topic modeling package using variational inference in MapReduce. In: Proceedings of the 21st international conference on World Wide, New York, NY, 2012, pp. 879–888.

37.

Hauthal

Burghardt

Dunkel

. Analyzing and visualizing emotional reactions expressed by emojis in location-based social media. SPRS International J Geo-Information 2019; 8(3): 113.

38.

Chung

Pennebaker

. The Psychological functions of function words. Social Communication 2007: 343–359.

39.

Ramos

Eden

Edu

. Using TF-IDF to determine word relevance in document queries. Proceedings the First Instructional Conference on Machine Learning 2003; 242: 133–142.

40.

Wilson

. The mrc psycholinguistic database: machine readable dictionary. Behavioural Res Methods Instruments Computers 1988; 20: 6–10.

41.

Lexicon of psychiatric and mental health terms. 2nd ed. Geneva, Switzerland: World Health Organization; 1994. https://apps.who.int/iris/handle/10665/39342 (accessed on 13 September 2020).

42.

Mohammad

Turney

. Crowdsourcing a word-emotion association lexicon. Computational Intelligence 2013; 29(3): 436–465.

43.

Blei

Jordan

. Latent Dirichlet allocation. J Machine Learning Res 2003; 3: 993–1022.

44.

Meyer

Finn

Eyde

, et al. Psychological testing and psychological assessment. A review Evidence Issues. American Psychologist 2001; 56(2): 128–165.

45.

Aletras

Stevenson

. Measuring the similarity between automatically generated topics. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics; 2014, pp. 22–27.

46.

Kullback

Leibler

. On information and sufficiency. Ann Math Statist 1951; 22(1): 79–86.

47.

Mantyla

Claes

Farooq

. Measuring LDA topic stability from clusters of replicated runs. In: Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, New York, NY, 2018, pp. 1–4.

48.

Jaccard

. The Distribution of the flora in the alpine zone. New Phytol 1912; 11(2): 37–50.

49.

Shlens

. Notes on Kullback-Leibler divergence and likelihood theory. arXiv preprint arXiv:1404.2000, 2014.

50.

Ettman

Abdalla

Cohen

, et al. Prevalence of depression symptoms in US adults before and during the COVID-19 pandemic. JAMA Netw Open 2020; 13(9): e2019686.

51.

Zhou

Zogan

Yang

, et al. Detecting community depression dynamics due to COVID-19 pandemic in Australia. IEEE Transactions on Computational Social Systems 2021; 8(4): 958–967.

52.

Tshimula

Chikhaoui

Wang

. HAR-search: A method to discover hidden affinity relationships in online communities. In: Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, New York, NY, 2019, pp. 176–183.

53.

Tshimula

Chikhaoui

Wang

A new approach for affinity relationship discovery in online forums. Soc Netw Anal Min 2020; 10: 40.

54.

Tshimula

Chikhaoui

Wang

Discovering Affinity Relationships between Personality Types. arXiv preprint arXiv:2202.10437, 2022.