Sage Journals: Discover world-class research

Abstract

A valid mechanism for suicide detection and intervention to a wider population online has not yet been fully established. With the increasing suicide rate, we proposed an approach that aims to examine temporal patterns of potential suicidal ideations and behaviors on Twitter to better understand their risk factors and time-varying features. It identifies latent suicide topics and then models the suicidal topic–related score time series to quantitatively represent behavior patterns on Twitter. After evaluated on a collection of suicide-related tweets in 2016, 13 key risk factors were discovered and the temporal patterns of suicide behavior on different days during 1 week were identified to highlight the distinct time-varying features related to different risk factors. This study is practical to help public health services and others to develop refined prevention strategies, to monitor and support a population of high-risk at right moments.

Keywords

behavior social media suicide temporal patterns time series Twitter

Introduction

Mental health disorders affect a substantial portion of the population. It is estimated that nearly half of all Americans will experience a mental illness during their lifetime.¹ The economic and social costs associated with mental illness are significant.² Individuals with mental health conditions are more likely to have the suicidality. Suicidality is defined as any suicide-related behavior, including completing or attempting suicide, suicidal ideation or communications.³ Suicide is a global leading cause of death in recent years. With the advent of open and massive social networking sites, such as Twitter, attention has been focused on how these new modes of communication may become a highly interconnected forum for collective communication of suicidal ideation on a large scale.⁴

Concerns have been raised about how social media communication may have great influence on suicidal ideation and cause a contagion effect among people. Suicide does a devastating impact on both families⁵ and communities,⁶ despite that many suicide deaths are actually preventable.⁷ Given the large volume of Twitter data, it is not yet feasible or ethical to directly contact and survey every Twitter user who may be at risk.⁸ While the public nature makes Twitter a potentially valuable source of information about suicide from a wide population, some studies^9–11 have analyzed communication on Twitter about mental health, particularly suicide. A mechanism to intervene suicidality at the community level, with valid, reliable and acceptable methods of online detection have not yet been fully established.¹² Due to the lack of effective detection methods, a significant portion of the US population with mental illness do not get any treatment.¹³ It undeniably results in undesirable social consequences. For instance, negative perceptions and discrimination toward persons with mental illness are substantial and widespread.¹⁴

Due to the increasing suicide rates and the large impact on individuals and the society, it is important to gain more knowledge to support a vulnerable population and take part in suicide prevention. Although several risk factors of suicide have already been identified, it is not yet easy to predict persons at risk especially for those who have no sign of suicide. Time trends in suicide incidence have gained broad international considerations,^15,16 and it is recommended to plan mental health care services to be available especially at high-risk moments, such as during the spring and in the beginning of January.¹⁷ However, the data studied for suicide incidence usually depend on the suicide register in statistics institutions. In addition, the results about time trends are only limited to the season or month trends, while the time periods like seasons or months are of granularities too large for effective prevention in the era with sharp increasing suicide rates. Furthermore, some studies which focused on suicide incidence did not reveal the detailed factors of suicide, which are also not practical to make appropriate suicide prevention for individuals with different risks of suicide.

Therefore, a better understanding of risk factors of suicidality and their time trends is highly needed for developing more appropriate suicide prevention strategies. Insight into high-risk time frames and suicide factors would contribute to more refined prevention strategies. Thus, this article aims to provide insights into the temporal patterns of which a large number of people communicate their suicidality on Twitter. For these research objectives, we proposed an approach to mine suicidal behaviors on Twitter based on the contents’ quantitative analysis and the time series analysis. In this special case, the tweets messages were retrieved by the Twitter-streaming API using suicide-related terms as queries. We presented an approach combining the semantic analysis and tweet time to extract the temporal patterns of suicidal behavior of the users. Since we measure the suicidality through the semantic analysis, we used the latent topic modeling to reveal the latent risk factors of suicide from large volume of tweets, and proposed a quantitative metric, suicidal topic–related score (ST-Score), to assess the suicide tendency related to risk factors. Next, we explored the temporal patterns based on ST-Score using the Fourier series analysis, so as to discover suicidal patterns across different days during a week in Twitter. The contributions of this article are summarized as follows:

Identified suicidal risk factors from tweets by content analysis, which can reflect the suicidal risk factors’ trend in time.

Discovered temporal patterns of suicidal behavior from quantitative measuring of suicidality, which reveal the suicidality peaks occurring at different time frame for different risk factors.

Proposed an approach for exploring temporal patterns of suicidal behavior on Twitter, which potentially provides means of online detection of a wider population with high risks to better understand their behavior.

Related work

Some individuals communicated their suicidal thoughts and plans to friends and family prior to suicide;¹⁸ however, many do not disclose their intent. In recent years, individuals have broadcast their suicidality on social media sites such as Twitter,⁹ indicating that it is potential to utilize social media site as a suicide prevention tool.¹⁰ Twitter has recognized that individuals express suicidality in their broadcasts. Depression-related chatter on Twitter can glean insight into social networking about mental health.¹⁹ Guntuku et al.²⁰ reviewed recent studies that aimed to predict mental illness using social media and suggested that depression and other mental illnesses are detectable on several online environments, but the generalizability of these studies to broader samples has not been established. However, little studies to date have analyzed communication on Twitter about mental health, particularly suicide.¹¹ Suicide is a serious public health concern and is preventable. Automated detection methods may help to identify at-risk individuals through the large-scale passive monitoring of social media. Access to appropriate mental health care, when and where it is needed, is vital for the prevention of suicide.

Christodoulou et al.¹⁶ reviewed the literature on suicide seasonality from articles published between 1979 and 2009, and found that majority of the studies confirm a peak in spring and a secondary peak during autumn. Weekly day patterns in suicide incidence were found in Beauchamp et al.,²¹ showing the beginning of the week, and the spring and fall seasons were associated with higher numbers of suicide attempts. And the associated with increased attempted and completed suicides on particular days and holidays were studied.¹⁷ Regarding seasonal patterns in suicide incidence, Durkheim et al.²² already suggested in the early 19th century that suicide incidence shows seasonal variation, and seasonality is now one of the most studied phenomenon in suicide research.^21,23,24 However, the results mostly indicated peak in spring^23,25,26 or autumn.²¹ Furthermore, no studies to date have analyzed seasonality patterns on Twitter about suicidality, which limits more appropriate suicide prevention strategies for population at right moment.

Materials and methods

Data collection and pre-processing

First, we went over many suicide-related tweets. By reading the tweets, we got familiar with the expression for suicide thoughts or suicide ideation, such as “suicide,” “kill myself” and so on. So we try to collect the terms in high frequency to express the suicide thoughts or suicide ideation. Second, we investigated some suicide-related research papers. Those papers gave us valuable information about suicide expressions. Based on the above two steps, we abstracted a relatively larger terms list on Twitter, including suicides or self-hurt-related keywords. The final list of suicide-related terms was identified in an interactive way. We used the initial terms list to collect the real-time tweets using Twitter streaming API. We manually checked the collected tweets and updated the terms list by adding, deleting, and modifying terms. We also add a stopwords list to remove obviously irrelevant tweets, like suicide attack. We stopped the process after we found that most of the tweets collected were related to suicides. The generation of the suicide-related key terms list lasted roughly 2 to 3 weeks. The full list of suicide-related terms and stop words can be seen in Table 1.

Table 1.

Suicide-related Terms as Queries for Tweets Retrieval and Stop Words for Cleaning.

Suicide-related terms	Stop words
“suicide,” “suicidal,” “suic,” “self-harm,” “self-injury,” “self harm,” “self injury,” “hang myself,” “hung myself,” “kill myself,” “kills myself,” “killed myself,” “take my life,” “takes my life,” “want to die,” “wanted to die,” “wants to die,” “want death,” “wants death,” “wanted death,” “to be dead”	“bomb,” “suicide attack,” “suicide attacks,” “car attack,” “car attacks,” “suicide hotline,” “https://,” “http://,” “_,” “&amp:,” “oh,” “lol”

We collected a data set of 716,899 public tweets from January to November on 2016 using suicide-related terms to search through Twitter-streaming API. The terms include “suicide,” “want to die,” “to be dead” and so on.

Next, the collected data were cleaned through removing stop words, which are terms that are regarded as not conveying any significant semantics to the texts or phrases they appeared in and are consequently discarded. The stop words list also includes special characters (e.g. “_,” “https,” “&amp”;) and meaningless words (e.g. “oh,” “ lol”) in the text.

Since we focused on the content analysis and temporal analysis, a set of main features were extracted and used to model the datasets collected above, and they were defined in Table 2.

Table 2.

Tweets Data Features.

Feature	Description
tweetID	ID of the tweet message posted by user
userID	ID of the user who tweets the message
dateTime	Date and time of each tweet message
tweetText	Content of the tweet message

The large volume of tweets collected presented a significant challenge to extract useful semantic information. Since the freedom discussion on Twitter, mainly large percentage of tweets is non-suicidality, resulting in data sparseness and diversity. To achieve better performances of suicidal behavior study, we leveraged the convolutional neural networks (CNN) model for short text classification proposed by Kim²⁷ to build the tweets binary classifier to select precise suicide-related tweets. We used the GloVe Twitter embedding to initialize the model input. The model was trained on corpus of 3000 annotated tweets, among which 1985 tweets were annotated as related to suicide. The model achieved a precision at 0.78, recall at 0.88 and F-1 measure at 0.83 on the testing corpus. This model was also compared with traditional machine learning algorithms including Support Vector Machine, Extra Trees, Random Forest, Logistics Regression, and Bi-directional Long Short-Term Memory model. As shown in Du et al.,²⁸ the CNN model led the performance in Positive type, Negative type and the overall accuracy.

The CNN-based classifier was built to choose the label from Positive/Negative for the tweets. Positive means the tweet is related to suicide or suicide ideation of the Twitter user (personal experience or feeling). Negative means the tweet is not related to suicide or suicide ideation, the negation of suicide or suicide ideation or other non-positive tweets. As a result, we used the trained CNN model to select 191,473 Positive tweets for the following analysis. It significantly reduced the size of data set.

Identifying suicide-related topics

In this section, we expect to capture some interesting facts about the suicide-related semantic theme by topic modeling. Two main topic models usually used for topic modeling are (1) Latent Dirichlet Allocation (LDA), which is a probabilistic generative topic model proposed by Blei et al.²⁹ and (2) Non-negative Matrix Factorization (NMF), which is a vector space factorization method for topic modeling.³⁰ Topic modeling is a key tool to discover latent semantic structure within a variety of document collections.³¹ LDA is a probabilistic model capable of expressing uncertainty about the placement of topics across texts and the assignment of words to topics.³² NMF is a deterministic algorithm that arrives at a single representation of the corpus. For this reason, NMF is often characterized as a machine-learning algorithm. Although LDA has been effectively employed in many text mining fields, it is often not scalable to large data sets with millions documents or tweets.³³

In our research, we identified suicide-related latent topics by NMF topic modeling. First, latent topics from the suicidal tweets data set were inferred. Second, the optimum latent topics’ structure was shown, to shed light on significant semantics of suicide tweets.

Inferring topics by NMF topic model

NMF is a technique for decomposing a non-negative matrix $F \in ℝ_{+}^{m \times n}$ into two matrices $W \in ℝ_{+}^{m \times k}$ and $H \in ℝ_{+}^{k \times n},$ such that $F \approx W H .$ ³⁴ $F$ is a $m \times n$ term document matrix; $W$ and $H$ are reduced rank-k factors whose product is an approximation of $F .$ This enables a latent subspace representation with much lower dimensions, where $W$ contains a set of $k$ topic basis vectors and $H$ provides the coefficients for the additive linear combinations of these basis vectors to generate the corresponding document vectors in $F .$ The weights in a $W$ topic basis vector can be used to generate a topic descriptor consisting of high-ranking terms, while a $H$ vector of coefficients can be interpreted as the $k$ topic membership weights for the corresponding document. Here, we use an algorithm based on alternating least square with projected gradient descent.³⁵

Choosing the optimal latent topics structure

It is critical to determine an appropriate number of topics $k$ to ensure an opposite analysis of the suicidal tweets. If $k$ is too large, the topics’ structure may be redundant and the topics will be too trivial. If $k$ is too small, the topics’ structure may be vague and the topics will be insignificant. To get a proper result of this task, we considered a metrics to help to choose the optimal latent topics structure.

The metric is based on the assumption that a model with an appropriate number of topics is more robust to missing data, which was proposed by Greene et al.³⁶ Given a value of $k,$ the stability of the produced topics is evaluated for several models fitted to sub-samples of the original data. The metric measures the stability according to the average-weighted Jaccard distance between the ordered sets of words describing the topics. Therefore, the closer $k$ with higher stability can be an appropriate value. Then, we can evaluate the quality of the topics identified by the metric. For each topic found by applying NMF, it is descripted as the top 5 highest weight terms from the topic’s basis vector $W .$

Modeling the ST-Score time series

Topics found by NMF topic modeling became the main themes about which Twitter users are discussing or expressing the suicidal thought. Each vector in $H$ is the feature related with topic terms for a tweet message, indicating the percentage of probability related to each of the $k$ topics. So each tweet message $i$ can be represented by weight: ${w_{i, 1}, \dots, w_{i, j}, \dots, w_{i, k}},$ $j \in [1, k],$ where $k$ is the number of total topics. $w_{i, j}$ indicates the weight that is related to the $j th$ topic for tweet message $i .$

To reflect the user-involved topic strength for a tweet message, we defined ST-Score as

S_{i, K} = \max (w_{i, j}), where K = \underset{j}{argmax} (w_{i, j})

(1)

Next, we partitioned the tweets into $k$ non-overlapping clusters, each cluster for a topic. Each user-involved tweet message $i$ is assigned to a cluster $K$ that corresponds to $S_{i, K} .$ In addition, the dateTime feature of each tweet message $i$ is denoted as $D_{i} .$ The behavior features related to the user who post the tweet message $i$ can be represented by $D_{i}$ and $S_{i, K} .$ Therefore, $k$ ST-Score time series which could explain the user-involved suicidality behavior are constructed, as formalized in

C_{K} = (D_{i}, S_{i, K}), K = 1, \dots, k

(2)

Behavior patterns mining

In order to get a clear description of behavior patterns, we hope the patterns are as unrelated as possible. So, we evaluate the correlation between two ST-Score time series, when the value of, in which $α$ is a given threshold, the variables are suitable for behavior patterns mining

\max (c o r r (C_{i}, C_{j})) < α, (i, j \in [0, k))

(3)

Since the value of ST-Score is different, the ST-Score time series are fluctuation series, which present an oscillatory behavior that should be studied. In order to find out users’ time-varying behavior patterns, the ST-Score time series might be studied with a tool that takes advantage of this fact. Fourier transformation has been used widely in scientific researches such as signal processing and time series analysis to quantify underlying signals and repetitive cycles in data forms. It is well suited to this study.

We utilized Fourier series to make a model of periodic analysis.³⁷ Let $U$ be the fundamental period we expect to analyze. For example, $U = 7$ for weekly data, if we scale the time variable in days. Fourier series can fit the fluctuation series

P (t) = \sum_{q = 1}^{Q} (a_{q} \cos (\frac{2 π q t}{U}) + b_{q} \sin (\frac{2 π q t}{U}))

(4)

It is required to estimate the $2 Q$ parameters $ϕ = {[a_{1}, b_{1}, \dots, a_{Q}, b_{Q}]}^{T}$ to fit the series as to present a clear periodic behavior. Equation (4) can be transformed as

P (t) = X (t) ϕ

(5)

where

$X (t) = [\cos (\frac{2 π (1) t}{W}), \dots, \sin (\frac{2 π (Q) t}{W})]$

For example, with weekly seasonality and $Q = 3$

P (t) = [\cos (\frac{2 π (1) t}{7}), \dots, \sin (\frac{2 π (3) t}{7})] ϕ

(6)

We employed $ϕ ~ Normal (0, σ^{2})$ to obtain a smoothing prior on the seasonality in the generative model.

It is valuable to explore user behavior patterns of weekly period. We explored the temporal patterns of different day during 1 week. For weekly periodic analysis, we have found Fourier series expansion for $Q = 3$ to work well for most problems. The choice of these parameters could be automated using a model selection procedure such as Akaike information criterion (AIC).³⁸ It can be implemented based on a statistical modeling and high-performance statistical computation framework, Stan, through interfaces for python.³⁹

Fourier series expansion for $Q = 3$ and the first three terms of the Fourier series fit the problem.

Results

Suicidal topics discovery

Guided by the metric described previously, we set the topic number $k$ varying between 5 and 25. For illustration, Figure 1 plots the stability value for $k \in [5, 25]$ using NMF topic modeling. The highest stability value was arrived at $k = 6$ (stability: 0.9651), and the second highest stability value was arrived at $k = 13$ (stability: 0.9470).

Figure 1.

Weighted Jaccard average stability for different number of topics.

According to the stability value and the discovered topics, we evaluated and found that the topics structure with $k = 6$ is vaguer than that with $k = 13 .$ So eventually we judged that the suitable results are achieved by NMF with $k = 13 .$

Table 3 lists the most relevant words for each of the 13 topics identified from the tweets with NMF. They reveal that a variety of issues or factors related to suicide is spreading on Twitter. For example, a tweet of “my school actually makes me want to kill myself” is related to a suicidal factor about “school,” and a tweet of “I have severe depression and I want to kill myself” is related to a suicidal factor about “depression.”

Table 3.

Identified topics and labels.

Topic Id	Topic label	Topic proportion	Weighted words	Sample tweets
T0	Life factor	0.169481	bout, penalized, rehab, breakfast, eat	I got penalized in rehab, I was bout to commit suicide. Had coke for breakfast and want to die.
T1	Loss of energy	0.072632	people, sleep, god, tried, stop	I just fell asleep on my break so I’m going to be dead for the remainder of my shift. Sleepy and I want to kill myself.
T2	Caring for	0.034195	day, cares, watch, times, work	I might as well kill myself for all you cares.
T3	Depression	0.017936	depression, fake, notes, pill, anxiety	I have severe depression and I want to kill myself.
T4	Movie-related feeling	0.078009	squad, excited, watch, bad, reviews	I wish I was in the suicide squad this thread make me want die. Suicide squad’s reviews make me want to suicide.
T5	Indifferent or appearance	0.036954	crying, cute, look, god, laughing	All this shit make me want to die crying so hard. Cute girls with sho hair make me want to kill myself.
T6	Fashion	0.006949	belt, Gucci, plane, wearing, lately	I might hang myself with a Gucci belt. I’m wearing a suicidal belt.
T7	Emotion	0.049258	times, people, happy, cute, mad	I mentally killed myself a hundred times. I want to die early so I don’t have to feel the pain of losing people love.
T8	Change	0.006714	cried, lied, times, bad, changed	A lot of shit changed and I’m rocking with it. Nothing has changed, still want to die, I have cried at least once a day.
T9	Work or finance	0.227014	time, work, live, dollar, right	Every time I go to work I want to die a little bit more. Spent 1200 dollars for rent this month and now I want to die.
T10	Fandom	0.012173	twitter, private, public, main, fandom	Cherry denied me from following her private account and now I want to die. There’s one twitter fandom that make me want to die.
T11	Low mood	0.160202	hate, people, everything, bad, everyone	I’m going to kill myself I hate everyone. Everything is constantly overwhelming me and I want to die.
T12	School	0.128484	school, work, week, thought, people	my school actually makes me want to kill myself. This semester ends in like 20 weeks, I want kill myself.

After reviewing the highest weight words from each topic and the high-probability tweets for each topic, we abstracted the implication and gave a description label for each topic. Since the topics discovered with NMF topic modeling are latent semantic themes of the large tweets texts, the description labels could reflect the risk factors related to users’ suicidality. For instance, Topic T0 covers life factors related to suicidality and Topic T4 is a special event-based factor that shows a movie named as “suicide squad” related feeling. Topic T1, T2, T3, T5, T6, T7, T8, T9, T10, T11 and T12 cover the manifest factors about loss of energy, caring for, depression, indifferent or appearance, fashion, emotion, change, work or finance, fandom, low mood and school. Table 3 also lists the sample tweets to better understand the risk factors related to users’ suicidality. From the topic proportion list in Table 3, we can see Topic T9, T0 and T11 are the top three factors emerged from the tweets.

ST-Score time series

According the topic modeling results, we computed ST-Score for all tweets in the data set. Then we got 13 ST-Score time series

C_{K} = (D_{i}, S_{i, K}), K = 0, \dots, 12, i ϵ [0, M - 1]

(7)

where $M$ is the number of each ST-Score time series. Figure 2 shows the $M$ for each time series. Though the time series are imbalanced, they show the key risk factors related to suicide.

Figure 2.

Number of each suicidal topic–related score time series.

Behavior patterns discovery

Correlation analysis

To leverage the discovered topics to highlight the temporal behavior patterns, first, we evaluated the correlation between two ST-Score time series $(C_{i}, C_{j}), where (i, j \in [0, k]) .$ To eliminate the effects of number imbalance, we calculated the total ST-Score of each day for each topic, so the quantity scale of ST-Score series is the same for each topic. To allow the clear comparison of the result, we made ST-Score series normalization so as to align ST-Score value to a normal distribution. After that, ST-Score value $\in [- 1, 1] .$ Then, we prepared a correlational matrix using the ST-Score series for each topic.

Then we used a heat map to plot the correlation matrix. Figure 3 shows the output heat map of correlational color gradient (from negative to positive correlation).

Figure 3.

Correlational matrix.

Overall, the maximum absolute value of correlation is less than 0.45, and none of the series’ features show high correlational statistics. The features for each ST-Score time series are suitable to exploit behavior patterns.

Temporal patterns of weekly different day

Figure 4 shows variation of ST-Score across the days of the week: y axis is the log value of ST-Score and x axis is the weekly different day. We can observe different fluctuation for each topic. It reveals the weekly seasonality of suicidal behavior regarding to different risk factors.

Figure 4.

Temporal patterns of weekly different day.

There are many significant features. First, the more high ST-Score occurs on Sunday and Saturday for T2, T5, T6, T10 and T11. During weekends, people may engage in leisure-related activities with family members or others. For those people with mental disorders, what they see and hear tend to induce suicidality, which results in a higher ST-Score on weekends than that on weekdays.

Second, the ST-Score is higher on Monday or Tuesday for T1, T7, T9, T10 and T12. It indicates that people are in high risk on Monday or Tuesday due to the stress when coming back to work or school after weekend.

Third, that the ST-Score is higher during weekdays is observed for T3 and T4. It indicates that those who have latent mental disorders are in high risk on weekdays, because they may be lonely or feel stressful on weekday work. While individuals engage more in family visits, friends visit and activities on Saturday or Sunday, their mental symptom may be relieved to some extent.

Though these observations warrant additional investigation, the temporal patterns of suicidal behavior are significant to understand the weekly seasonality related to suicidality. The identified temporal patterns in this study highlight the behavior features in fine-grained level related to different risk factors, which is practical to make refining prevention strategies, to monitor and support a wider population.

Discussion and implications

Social media like Twitter offers the opportunity to open new frontiers in the behavioral and health sciences as it also provides data where many purpose-designed studies either cannot be launched in time or would be prohibitively difficult to conduct. More generally, mental health researchers often lack data on longer periods of time to systematically assess the mental health problem. Because social media data are inherently longitudinal, it could also facilitate investigation of mental health–related problems. Furthermore, social media data are available in real-time, facilitating surveillance and prediction of mental health risk.⁴⁰

All the above demonstrate that Twitter research on suicidality could be done using more advanced content analysis across temporal domains and multiple levels of analysis. In this study, temporal patterns of suicidal behavior on Twitter in 2016 were examined. First, we examined the risk factors of suicide on Twitter. The results indicated that latent suicidal factors could be detected from Twitter by semantic analysis. And the identified risk factors are not only significantly correlated with ground truth surveillance and survey data, for example, depression, but also closer to the times and can reflect the suicidality trend in time, for example, fashion and fandom.

Second, we examined the weekly different day patterns of suicidal behavior on Twitter by constructing quantitative ST-Score time series. The suicidal risk peaks occur at different time frames for different suicidal risk factors. The temporal patterns are related with the users’ behavior of interaction on Twitter. For weekly patterns, they indicate people’s everyday life and work related to suicidality, and give fine-grained findings on time-varying suicidality trend. This kind of temporal patterns gives insights on understanding suicidal behavior better and makes more concerns on vulnerable populations at right moments.

Third, the proposed approach for exploring temporal patterns of suicidal behavior on Twitter potentially provides means of online detection for a wider population. It contributed to scientific research of suicidal behavior on social media. It presented a valid, reliable and acceptable method of online suicide detection, which uses the quantitative measure of suicidality to offer time trend analysis. It can potentially yield new insights not easily achievable through traditional qualitative science methods. More research is needed to better understand the underlying behavior mechanisms in suicide.

The findings help to develop appropriate suicide prevention strategy and can be generalized to a wider population. The findings create knowledge about high-risk time frames in fine-grained level. The high-risk time frames indicate when we should be aware of concerns to vulnerable population. Therefore, we would recommend mental health care services or communities, friends and family members to be available especially at high-risk moments, to give proper supports and resources to a broader population. Suicide prevention intervention in right moments and in multilevel seems to be promising. Furthermore, the findings in this study also would recommend mental health products to improve treatment of suicide-related mental disorders. For instance, musical player, App or wristband would be designed according to the temporal patterns of suicidal behavior discovered in this article, so as to relieve the symptoms and reduce the risks of suicide.

A limitation of this study is data collection and pre-processing. We searched and collected suicide-related tweets according the suicide-related terms in Table 1. Undeniable, some users express their suicidal ideation in other unusual terms or new Internet words. To our knowledge, there is no such full terms list in current published papers. It need more time to make out clear list for the other unusual terms or new Internet words. We look forward to read more related researches and get more information about suicide-related terms in the future. So, we collected the common terms as much as possible in Table 1 and collected data.

The tweets collected by suicide-related terms may include non-suicidal thoughts tweets. So, we cleaned it by CNN-based classifier to get Positive tweets. By using the classifier, the Positive results may contain some tweets which were vague or exaggerated expression. Although some expressions may also show suicide ideation to some extent, it is really a limitation that could not classify a vague expression tweet with full accuracy. So more tweets generated by the user can be considered to make clear whether the vague tweet is Positive or not. The future analysis over a user’s timeline tweets may improve the accuracy for vague expressions recognition. However, in this study, with the good performance of CNN, the tweets like vague expression can be a small proportion in the large data set. So, it will not have obvious influences on the research results.

Another limitation of this study is that the location data are inaccurate and incomplete when collected from Twitter, so it is not available for spatio-temporal analysis.

Conclusion

Temporal behavior patterns exploring from large-scale data on social media sites offer potential valuable channel to new insights into public health concerns. By systematically assessing suicide risk factors and time-varying behavior from Twitter, government and public health institutions may be able to improve suicide prevention and treatment initiatives, such as setting up help lines and health care services available at the right moments. Continuing work would fully exploit more available social media information to conduct multiple-level analysis.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the National Nature Science Foundation of China (Grant No.71501172) and Zhejiang Provincial Natural Science Foundation of China (Grant No. LY18G020017). This research was partially supported by the National Library of Medicine of the National Institutes of Health under award number 2R01LM010681-05, R01LM011829 and the Cancer Prevention Research Institute of Texas (CPRIT) Training Grant #RP160015. The authors also acknowledge the scholarship support from the China Scholarship Council.

ORCID iD

Jianhong Luo

References

Kessler

Chiu

Demler

, et al. Prevalence, severity, and comorbidity of twelve-month DSM-IV disorders in the National Comorbidity Survey Replication (NCS-R). Arch Gen Psychiatry 2005; 62: 617–627.

Knapp

McDaid

Parsonage

Mental health promotion and mental illness prevention: the economic case, http://www.pssru.ac.uk/ (2011, accessed 12 April 2018).

Institute of Medicine (US) Committee on Pathophysiology and Prevention of Adolescent and Adult Suicide. Reducing suicide: a national imperative. Washington, DC: National Academies Press (US), http://www.ncbi.nlm.nih.gov/books/NBK220939/ (2002, accessed 11 April 2018).

Colombo

Burnap

Hodorog

, et al. Analysing the connectivity and communication of suicidal users on twitter. Comput Commun 2016; 73(Pt. B): 291–300.

Cerel

Jordan

Duberstein

PR.

The impact of suicide on the family. Crisis 2008; 29: 38–44.

Levine

Suicide and its impact on campus. New Dir Stud Serv 2008; 2008(121): 63–76.

Bailey

Patel

Avenido

, et al. Suicide: current trends. J Natl Med Assoc 2011; 103: 614–617.

O’Dea

Wan

Batterham

, et al. Detecting suicidality on Twitter. Internet Interv 2015; 2: 183–188.

Jashinsky

Burton

Hanson

, et al. Tracking suicide risk factors through Twitter in the US. Crisis 2014; 35(1): 51–59.

10.

Luxton

June

Fairall

. Social media and suicide: a public health perspective. Am J Public Health 2012; 102(Suppl. 2): S195–S200.

11.

McClellan

Ali

Mutter

, et al. Using social media to monitor mental health discussions—evidence from Twitter. J Am Med Inform Assoc 2017; 24(3): 496–502.

12.

Christensen

Batterham

O’Dea

E-health interventions for suicide prevention. Int J Environ Res Public Health 2014; 11(8): 8193–8212.

13.

Ali

Teich

Woodward

, et al. The implications of the affordable care act for behavioral health services utilization. Adm Policy Ment Health 2016; 43(1): 11–22.

14.

McGinty

Goldman

Pescosolido

, et al. Portraying mental illness and drug addiction as treatable health conditions: effects of a randomized experiment on stigma and discrimination. Soc Sci Med 2015; 126: 73–85.

15.

Sansone

LA.

The christmas effect on psychopathology. Innov Clin Neurosci 2011; 8(12): 10–13.

16.

Christodoulou

Douzenis

Papadopoulos

, et al. Suicide and seasonality. Acta Psychiatr Scand 2012; 125: 127–146.

17.

Hofstra

Elfeddali

Bakker

, et al. Springtime peaks and christmas troughs: a National Longitudinal Population-Based Study into suicide incidence time trends in the Netherlands. Front Psychiatry 2018; 9: 45.

18.

Wasserman

Tran Thi Thanh

Pham Thi Minh

, et al. Suicidal process, suicidal communication and psychosocial situation of young suicide attempters in a rural Vietnamese community. World Psychiatry 2008; 7(1): 47–53.

19.

Cavazos-Rehg

Krauss

Sowles

, et al. A content analysis of depression-related tweets. Comput Human Behav 2016; 54: 351–357.

20.

Guntuku

Yaden

Kern

, et al. Detecting depression and mental illness on social media: an integrative review. Curr Opin Behav Sci 2017; 18: 43–49.

21.

Beauchamp

Yin

Variation in suicide occurrence by day and during major American holidays. J Emerg Med 2014; 46(6): 776–781.

22.

Durkheim

Spaulding

Simpson

Suicide: a study in sociology. Am J Sociol 1951; 57: 100–101.

23.

Zonda

Bozsonyi

Veres

Seasonal fluctuation of suicide in Hungary between 1970–2000. Arch Suicide Res 2005; 9(1): 77–85.

24.

Preti

Miotto

Seasonality in suicides: the influence of suicide method, gender and age on suicide distribution in Italy. Psychiatry Res 1998; 81(2): 219–231.

25.

Postolache

Mortensen

Tonelli

, et al. Seasonal spring peaks of suicide in victims with and without prior history of mood disorders. J Affect Disord 2010; 121: 88–93.

26.

Preti

The influence of climate on suicidal behaviour in Italy. Psychiatry Res 1998; 78(1–2): 9–19.

27.

Kim

Convolutional neural networks for sentence classification, https://arxiv.org/abs/1408.5882 (2014, accessed 10 April 2018).

28.

Zhang

Luo

, et al. Extracting psychiatric stressors for suicide from social media using deep learning. BMC Med Inform Decis Mak 2018; 18(Suppl. 2): 43.

29.

Blei

Jordan

MI.

Latent Dirichlet Allocation. J Mach Learn Res 2003; 3: 993–1022.

30.

Berry

Browne

Email surveillance using non-negative matrix factorization. Comput Math Organ Theory 2005; 11: 249–264.

31.

Luo

Pan

Wang

, et al. Identifying target audience on enterprise social network. Ind Manage Data Syst 2019; 119: 111–128.

32.

Luo

Pan

Zhu

XY.

Discovery of repost patterns by topic analysis in enterprise social networking. Aslib Journal of Info Mgmt 2017; 69: 158–173.

33.

Kuang

Park

Fast rank-2 nonnegative matrix factorization for hierarchical document clustering. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, Chicago, IL, 11–14 August 2013, pp. 739–747. New York: ACM.

34.

Lee

Seung

HS.

Learning the parts of objects by non-negative matrix factorization. Nature 1999; 401(6755): 788–791.

35.

Lin

C-J.

Projected gradient methods for nonnegative matrix factorization. Neural Comput 2007; 19(10): 2756–2779.

36.

Greene

O’Callaghan

Cunningham

How many topics? Stability analysis for topic models. In: Proceedings of the European conference on machine learning and knowledge discovery in databases —volume 8724, Nancy, 15–19 September 2014, pp. 498–513. New York: Springer-Verlag.

37.

Harvey

Shephard

Structural time series models. In: Srinivasa Rao

ASR

Rao

(eds) Handbook of statistics (vol. 11). Amsterdam: Elsevier, 1993, pp. 261–302.

38.

Akaike

Information theory and an extension of the maximum likelihood principle. In: Petrov

Csaki

(eds) Proceedings of the 2nd international symposium on information theory. Budapest: Akadémiai Kiado, 1973, pp. 267–281.

39.

Stan Development Team. PyStan: the Python interface to Stan. http://mc-stan.org

40.

Gruebner

Sykora

Lowe

, et al. Big data opportunities for social behavioral and mental health research. Soc Sci Med 2017; 189: 167–169.

Exploring temporal suicidal behavior patterns on social media: Insight from Twitter analytics