Sage Journals: Discover world-class research

Abstract

The measurement of investor sentiment in social media remains a challenging and unresolved issue. The lack of transparency in sentiment tracking tools and survey methodologies in financial research complicates the distinction between measurement noise and genuine online sentiment in historical studies. This review aims to provide structured recommendations for improving the reliability and standardization of investor sentiment measurement in social media. The findings contribute to enhancing the reliability, replicability, and comparability of studies on investor sentiment, offering valuable guidance for future research in this domain.

Keywords

investor sentiment social media text mining sentiment analysis recommendation

Introduction

Around 2001, sentiment analysis and opinion mining gained recognition as key research areas, laying the foundation for investor sentiment studies on social networks (Das & Chen, 2001; Dave et al., 2003; Dini & Mazzini, 2002; H. Liu et al., 2003; Morinaga et al., 2002; Nasukawa & Yi, 2003). Pang and Lee (2008) attributed this surge in research to three factors: the growing adoption of machine learning in natural language processing, the rapid development of the World Wide Web and social media, and the field’s extensive application potential.

In the era of big data and artificial intelligence, social media sentiment serves as a critical complement to traditional sentiment proxies, such as surveys, market indicators, and search engine results (Antweiler & Frank, 2004; Da et al., 2015; Q. Liu et al., 2022). Indeed, sentiment indicators built from large-scale online data accurately gage investor sentiment and can predict a variety of socioeconomic phenomena (Bollen, Mao, & Pepe, 2011; Bollen, Mao, & Zeng, 2011). Mining investor sentiment on social media has promising applications and offers a novel solution for understanding and measuring investor sentiment (T. Li et al., 2018).

Sentiment mining on social media is a branch of text mining (Kumar & Ravi, 2016; Onan, 2021). Each piece of social media content represents an individual opinion. Beyond analyzing the sentiment of a single text, it is essential to aggregate samples to reflect public sentiment as a whole (Bollen, Mao, & Pepe, 2011; Bollen, Mao, & Zeng, 2011). Researchers achieve this by summarizing the sentiment polarity or intensity extracted from a large number of texts over a specific time period, thereby capturing the overall sentiment of the public during that period (Gruca et al., 2005; Wu et al., 2014). As shown in Figure 1, sentiment mining from social media texts involves two fundamental steps: sentiment extraction and sentiment aggregation (Pang & Lee, 2008). Sentiment aggregation often relies on specific aggregation formulas (Antweiler & Frank, 2004). Consequently, historical studies have regarded text-based sentiment mining as the most critical component of investor sentiment analysis on social media (Kušen et al., 2017; Q. Li et al., 2018; Loughran & Mcdonald, 2016).

Figure 1.

Methodology for developing an investor sentiment index with social media content.

Building on sentiment extraction from social media texts, researchers can construct investor sentiment measures based on specific sentiment aggregation formulas. Social media-based investor sentiment encapsulates the question, “What are investors thinking?” and holds fascinating application potential. However, due to the lack of standardized metrics or established best practices (Mayr & Weller, 2017), the reliability of sentiment measurement derived from social media remains subject to skepticism (Mao et al., 2011).

Investor sentiment from social media has been used to analyze various stock market dimensions, including stock prices (Q. Liu et al., 2022), returns (Bollen, Mao, & Zeng, 2011; Oliveira et al., 2013), volatility (Sprenger et al., 2014), and market indices (Cheng & Lin, 2013; Zheludev et al., 2014). However, findings have been inconsistent. While some studies suggest that social media sentiment lacks predictive power (Antweiler & Frank, 2004; Tumarkin & Whitelaw, 2001), others highlight its predictive capabilities or significant influence (Q. Liu et al., 2022; Mao et al., 2011). These discrepancies arise from two core issues:

(a) Limitations in the disclosure of sentiment analysis methodologies: Many studies on investor sentiment derived from social media texts disclose only the basic methods and results of sentiment analysis while lacking systematic transparency regarding the construction process of sentiment indices. This methodological ambiguity hinders deeper exploration of their research processes and reproducibility.

(b) Unique combinations of data sources and tools: Research findings often rely on specific combinations of data sources and sentiment-tracking tools, making it difficult to determine whether inconsistent conclusions stem from genuine market differences or systematic biases in sentiment measurement methodologies.

Based on the above analysis, we believe that the inconsistency in existing research findings is primarily due to the lack of open and transparent descriptions of sentiment mining methodologies. To address this issue, this study systematically examines key research on sentiment mining from social media texts in financial literature published between 2004 and 2024. Our objectives are as follows:

• Focus on the text mining process: Concentrate on sentiment mining methodologies used in constructing investor sentiment, outlining the core steps and technical details involved in this process.

• Propose fundamental guidelines: Summarize best practices and potential challenges in sentiment mining from historical literature and propose fundamental guidelines for studying investor sentiment based on social media texts.

• Emphasize the importance of standardization: Highlight the critical role of standardized sentiment mining methodologies in enhancing research consistency and reproducibility, particularly in the absence of standardized guidelines (Mayr & Weller, 2017).

We hope this study enhances the transparency and standardization of sentiment mining methodologies while establishing a theoretical and practical foundation for financial research based on social media sentiment.

The remainder of this paper is structured as follows: Section 2 outlines our review framework, detailing the core reference screening process and the paper’s organizational logic. Section 3 presents key methods for text sentiment mining, covering supervised learning, unsupervised learning, and validation, along with relevant recommendations. Section 4 discusses additional aspects of financial text mining, including research implications and limitations. Finally, the conclusion summarizes the key findings and contributions of the study.

Review Framework

Core Reference List

The core references for this study were identified through a systematic search on Google Scholar using the primary keywords “social media” and “investor sentiment.” The initial search yielded 500 articles, ranked by relevance and citation metrics. A preliminary screening was conducted to ensure that the selected studies were both impactful and directly aligned with the research focus, resulting in the retention of 300 articles.

Subsequently, we examined the titles and abstracts of these papers, eliminating 121 research studies that did not meet the inclusion criteria. We subjected a total of 179 articles to a comprehensive full-text examination, which led to the selection of 80 studies that met the specified inclusion criteria:

(a) The study constructs sentiment indices using text data from social media or investor forums.

(b) The study investigates the role of investor sentiment in financial markets.

(c) The study provides a detailed description of the dataset, sentiment mining methodology, and sentiment index construction process.

The final reference list added nine additional high-impact studies published in 2023 and 2024 to enhance the timeliness and relevance of the evaluation. Figure 2 visually summarizes the above-described screening process by outlining the systematic flow of article identification, selection, and inclusion. This diagram highlights each stage of the review process, from the initial search to the final compilation of 89 core references, ensuring transparency and replicability in the methodology. The final list of core references derived from this process is provided in Appendix A.

Figure 2.

Flow diagram of studies included in review.

Figure 3 presents the annual distribution and citation statistics of our core reference list. The literature spans from 2004 to 2024, with an average citation count of 460 and a median citation count of 151. Therefore, this is a representative and influential set of important references.

Figure 3.

Annual and prevalence distribution of core literature.

Review Process

Text sentiment mining methods can generally be categorized into two main approaches: supervised learning and unsupervised learning (Hastie et al., 2009). The defining characteristic of supervised learning is its reliance on labeled training data. After validating the model’s reliability, these rules are used to assist in extracting sentiment information from other texts (Cunningham et al., 2008; Q. Liu et al., 2022). In contrast, unsupervised learning does not depend on manually labeled data, reducing the complexity of data annotation (Oliveira et al., 2016). It is more convenient to use but is generally considered less accurate than supervised learning (Renault, 2017; Wu et al., 2014; Zheludev et al., 2014).

In the core literature we reviewed (see Appendix A), 32.9% of the studies adopted supervised learning methods, 45.1% employed unsupervised learning approaches, and 6.1% combined both methods (see Table 1). Additionally, approximately 15.9% of the studies could not be clearly classified, as they often relied on third-party sentiment extraction results or pre-labeled text (Guégan & Renault, 2021).

Table 1.

Categorization of Sentiment Mining Methods in Literature.

Method classification	Percentage
Supervised learning (SL)	32.9
Unsupervised learning (UL)	45.1
SL and UL	6.1
Not applicable	15.9

Note. Detailed data are shown in Appendix A.

In our review, we started with an examination of supervised and unsupervised learning methods, systematically evaluating the key techniques for sentiment mining in social media texts and providing practical recommendations based on their applications. Additionally, the outputs of sentiment mining tasks are typically categorized into two types: semantic orientation (polarity-based) and sentiment intensity (valence-based; Thelwall et al., 2012; Z. Zhang et al., 2012). The former focuses on classification tasks, dividing text sentiment into discrete categories such as positive, negative, or neutral. In contrast, the latter quantifies the strength or degree of sentiment, moving beyond binary or categorical classifications to provide a more nuanced analysis (Z. Zhang et al., 2012). Therefore, as part of our review, we also analyzed the differences between these two output paradigms.

After systematically reviewing methods for extracting financial market sentiment from social media, we focused on the validation of sentiment analysis results. This emphasis arises from the challenge researchers face in determining whether the extracted investor sentiment is sufficiently accurate and applicable to effectively support behavioral finance research without introducing systematic biases. Supervised learning, unsupervised learning, and the importance of validation collectively form the core of our review framework. As illustrated in Figure 4, the shaded areas intuitively depict the logical structure of the core review modules.

Figure 4.

Review framework.

Core Methods and Recommendations

Unsupervised Learning

Sentiment in text is closely linked to semantics (Wierzbicka, 1995), making semantic models a central focus in sentiment analysis (Batbaatar et al., 2019; Wierzbicka, 1992). In our survey, over 90% of unsupervised sentiment classification tasks employed lexicon- and rule-based methods (see Appendix A). Pennebaker et al. (2001) described this approach as relying on linguistic inquiry and word count (LIWC), highlighting its core principles of lexicon use and semantic rules.

Lexicon- and semantic rule-based sentiment analysis methods typically determine the overall sentiment orientation of a document by analyzing the sentiment or semantic direction of individual words within the text (Karalevicius et al., 2018). The fundamental assumption of these methods is that the overall polarity of a text is an aggregation of the sentiment scores of individual words (Karalevicius et al., 2018).

These methods first represent the text as a collection of words, using approaches such as the bag-of-words model (Y. Zhang et al., 2010), word vector models (Bojanowski et al., 2017), or unordered/ordered word stacking representations (X. Zhang et al., 2011). Next, a sentiment lexicon is used to assign a sentiment direction or intensity to each word (Ackert et al., 2016). Finally, the overall sentiment representation of the text is generated by aggregating the sentiment directions or intensities of all the words (Z. Zhang et al., 2012).

Figure 5 illustrates a simplified workflow for lexicon- and semantic rule-based sentiment analysis. It is important to note that different studies may adjust specific steps in the sentiment parsing process according to practical requirements; however, the underlying principles consistently rely on lexicons and semantic rules (Hutto & Gilbert, 2014; Stone et al., 1966). Sentiment parsing represents social media texts as word bags, word vectors, or word collections. Usually, the form of expression aids in the implementation of semantic rules and can even be considered a component of these rules (Pennebaker et al., 2001). Therefore, we can view a text’s sentiment as a combinatorial strategy:

{Sentiment}_{Text} = f (Text, Lexicon, Rules) .

Figure 5.

Simplified diagram of the process of dictionary-based sentiment analysis.

The lexicon and semantic rules used typically determine the performance of textual emotion parsing.

Lexicon

Sentiment lexicons are an important tool for sentiment mining (Xing et al., 2019). Scholars and linguists have provided many domain-specific sentiment lexicons. Table 2 displays some of the sentiment lexicon relevant to financial surveys. Panel A shows the relevant literature and links to sources of the lexicon, and Panel B is the relevant description of the lexicon. We must emphasize that this table is incomplete, and providing a complete one would be both impossible and futile.

Table 2.

Description of Some Lexicons.

Panel A: Literature and sources
Lexicon	Literature	Links or source literature
Harvard-IV-4	Karalevicius et al. (2018), Loughran and McDonald (2011), Tetlock et al. (2008)	http://www.mariapinto.es/ciberabstracts/Articulos/Inquirer.htm
Loughran-McDonald Master Dictionary	Chen et al. (2014), Karalevicius et al. (2018), Kraaijeveld and De Smedt (2020), Loughran and McDonald (2011, 2014), Mai et al. (2018)	http://www.nd.edu/~mcdonald/Word_Lists.html
VADER	Hutto and Gilbert (2014), Kraaijeveld and De Smedt (2020), Steinert and Herff (2018)	https://doi.org/10.1609/icwsm.v8i1.14550
Linguistic inquiry and word count (LIWC)	Boot et al. (2017), Dudău and Sava (2021), Pennebaker et al. (2001)	www.LIWC.net
opinion lexicon (OL)	Stone et al. (1966)	http://www.cs.uic.edu/~liub/FBS/opinion-lexicon-English.rar
Macquarie Semantic Orientation Lexicon (MSOL)	Mohammad et al. (2009)	http://saifmohammad.com/Lexicons/MSOL-June15-09.txt.zip
MPQA subjectivity lexicons	Wilson et al. (2005)	http://mpqa.cs.pitt.edu/lexicons/subj_lexicon/
SentiWordNet (SWN)	Z. Zhang et al. (2012)	http://www.lrec-conf.org/proceedings/lrec2010/pdf/769_Paper.pdf
Umigon	Nisar and Yeung (2018)	https://github.com/seinecle/Umigon
TSS	Feldman et al. (2011), Geva and Zahavi (2014)	https://doi.org/10.1609/aaai.v25i2.18854
GubaLex	Y. Sun et al. (2017, 2018)	https://doi.org/10.1007/s00779-018-1121-x
Aktuelle Stimmmungsskala (ASTS)	Ali and Pazzani (1992)	https://escholarship.org/uc/item/4dq1d07d
Wortschatz	Biemann et al. (2004)	https://wortschatz.uni-leipzig.de/de
WASTS	Nofer and Hinz (2015)	https://doi.org/10.1007/s12599-015-0390-4
WordNet	Kamps et al. (2004)	https://wordnet.princeton.edu/
WordNet-Affect:	Strapparava and Valitutti (2004)	https://www.researchgate.net/publication/254746105
HowNet	Dong and Dong (2003)	https://doi.org/10.1109/NLPKE.2003.1276017
Oliveira	Oliveira et al. (2016, 2017)	https://github.com/nunomroliveira/stock_market_lexicon
Panel B: Description
Lexicon		Description
Harvard-IV-4		A psychological dictionary known for its negative word list.
Loughran-McDonald Master Dictionary		Categorizes sentiment into negative, positive, uncertainty, litigious, modal strength, and constraining.
VADER		A sentiment lexicon for social media, combining NLP tools and manual verification.
Linguistic inquiry and word count (LIWC)		Evaluates ~100 text dimensions, widely validated in research.
opinion lexicon (OL)		Contains ~6,000 positive/negative terms, including common misspellings.
Macquarie Semantic Orientation Lexicon (MSOL)		MSOL categorizes about 75,000 n-grams as positive or negative.
MPQA subjectivity lexicons		Includes strong and weak subjective terms, with weaker terms assigned half the sentiment score.
SentiWordNet (SWN)		Annotates WordNet synsets for “positivity,”“negativity,” and “neutrality.”
Umigon		Designed for Twitter sentiment analysis with lexicons for tone, sentiment strength, and negations.
TSS		A broad dictionary-based component, extended by Loughran-McDonald and Harvard General Inquirer.
GubaLex		A specialized sentiment lexicon for stock market discussions.
ASTS		A German version of POMS, covering five mood categories.
Wortschatz		Provides a language-independent infrastructure for large monolingual corpora.
WASTS		Expands ASTS from 19 to 529 terms using Wortschatz.
WordNet		A large lexical database organizing words into cognitive synonym sets (synsets).
WordNet-Affect:		An affective extension of wordnet.
HowNet		A Chinese-English knowledge base for sentiment computation.
Oliveira		An opinion lexicon for stock market discussions on platforms like StockTwits and Twitter.

Although scholars and linguists offer many domain-specific lexicons of sentiment. However, in researching the use of an emotion lexicon, a number of barriers must be crossed to clarify whether this lexicon can be applied to the research; otherwise, this research may be considered unconvincing. In our literature survey, we found that these barriers include at least three dimensions:

The Field of Lexicon

Loughran and McDonald (2011) showed that almost three-quarters of the words identified as negative by the widely used Harvard lexicon are words that are not normally considered negative in a financial context. Hutto and Gilbert (2014) highlighted that although lexicons are commonly used for sentiment assessment in the context of social media, they are often applied with little regard for their practical applicability to the field. Renault (2017) argued that domain-specific lexicons are clearly superior to standard public lexicons.

Cultural Applicability

Nofer and Hinz (2015) noted that emotional expressions vary across cultures and languages, limiting the direct applicability of generic sentiment lexicons in cross-cultural studies. For instance, the Chinese term 割肉 (“cutting the meat”) refers to selling stocks at a significant loss, while 套牢 (“trapping”) describes being unable to sell stocks without substantial loss—both lacking direct English equivalents (Y. Sun et al., 2017, 2018). Similarly, Nofer and Hinz (2015) developed a specialized German sentiment lexicon for analyzing stock market discussions, significantly improving model accuracy by incorporating cultural factors.

B. Liu (2020) emphasized that the effectiveness of lexicon-based methods relies heavily on their adaptability to the target language and cultural context. This linguistic and cultural barrier is particularly pronounced in multilingual or cross-regional studies, often undermining the accuracy and credibility of sentiment analysis. To effectively address cultural matching challenges, we recommend that researchers implement a multicultural data calibration process, which includes the following strategies:

• Qualitative pre-testing and sentiment bias identification: Qualitative pre-testing is a crucial step in cross-cultural sentiment analysis. By conducting preliminary analysis on sample data from the target culture, researchers can identify potential biases in sentiment expression (Pang & Lee, 2008). This process helps uncover ambiguities before formal analysis and optimizes the selection of sentiment lexicons.

• Text translation and multi-round semantic validation: Text translation is an unavoidable step in multicultural research, but direct translations can lead to loss or distortion of sentiment meaning. Multi-round semantic validation can address this issue. For instance, Loughran and McDonald (2011) highlighted that translated sentiment lexicons require manual adjustments to ensure semantic alignment with the target culture.

• Cross-cultural corpus-based comparative analysis: Cross-cultural corpora are critical for verifying the applicability of lexicons in sentiment analysis. Renault (2020) found that variations in semantic orientations of cross-cultural vocabularies significantly impact sentiment classification results. Using cross-cultural corpora, researchers can perform comparative analyses to identify the applicability of sentiment lexicons across different linguistic and cultural contexts.

By employing these calibration methods, researchers can better address the impact of cultural differences on sentiment analysis. Cultural adaptation not only improves the accuracy of models but also enhances the generalizability and scientific validity of findings in cross-cultural sentiment research.

Timeliness

Y. Sun et al. (2018) emphasized that the rapid evolution of financial markets and the internet has led to the emergence of new financial terms. Expressions like 吃土 (“eating dirt,” indicating financial hardship) and 割韭菜 (“cutting leeks,” referring to novice investor exploitation) are often missing from traditional sentiment lexicons. This challenge of semantic timeliness is particularly evident in social media. Hutto and Gilbert (2014) further noted that online discussions frequently introduce newly coined terms beyond existing linguistic structures, making traditional lexicons insufficient for capturing contemporary sentiment.

Moreover, the meanings of words or phrases can shift significantly over time. For example, Jatowt and Duh (2014) demonstrated that the emotional polarity of certain terms may change in response to evolving social or economic contexts. A term like “bear” might be a neutral descriptor during certain periods but could carry a stronger negative sentiment during times of market panic. This phenomenon, known as semantic drift, further complicates the applicability of sentiment lexicons across different time periods.

To address the challenge of timeliness, we propose the following recommendations: First, researchers must recognize the potential impact of outdated lexicons on study outcomes, especially in the rapidly evolving financial landscape. Second, maintaining dynamic corpora and regularly updating sentiment lexicons can improve research reliability and applicability. Finally, researchers should transparently disclose their lexicon update and validation strategies to enhance credibility and provide guidance for future studies.

Table 3 summarizes three key challenges in sentiment lexicon applicability: domain relevance, cultural alignment, and timeliness. Addressing these issues enhances research credibility and interpretability. However, as Xing et al. (2019) noted, constructing a universal sentiment lexicon that fully captures linguistic variations across domains is impractical. Researchers should therefore validate their chosen lexicons and ensure suitability for their specific context. This can be achieved by adopting validated domain-specific lexicons, creating custom ones, or adapting existing lexicons through supplementation and refinement.

Table 3.

Key Challenges in Lexicon Relevance for Financial Sentiment Analysis.

Challenges	Description
Domain relevance	Mismatch of general dictionaries in financial contexts
Domain relevance	Incorrect classification of financial terms
Cultural applicability	Language-specific expressions in different socio-cultural contexts
Cultural applicability	Cultural variations in sentiment expression
Timeliness	Evolution of terminology with time
Timeliness	Outdated dictionaries versus modern contexts

For instance, Y. B. Kim et al. (2016) employed the VADER lexicon for sentiment analysis and explicitly stated its parallel applicability to their research domain, which strengthened the persuasiveness of their findings. Additionally, to enhance the reliability of research conclusions, many scholars have opted to redesign sentiment lexicons tailored to their specific studies. The creation of sentiment lexicons has become a significant research topic, developed through two primary approaches: manual and automatic creation.

Manual lexicon creation typically involves linguists or domain experts classifying the sentiment values of each term, making it a high-cost and time-intensive process (Heerschop et al., 2011). Examples of this approach include the MPQA Subjectivity Lexicon (Wilson et al., 2005) and Harvard-IV-4 (X. Li et al., 2014). In contrast, automatic lexicon creation methods require less human labor and can quickly generate larger vocabulary sets, albeit often at the expense of some accuracy (Oliveira et al., 2016). For example, Cheng and Lin (2013) developed four lexicons containing bullish, bearish, degree, and negation terms to analyze sentiment more precisely. Similarly, Homburg et al. (2015) constructed positive and negative word lists based on annotated text.

In conclusion, we propose the following principle for sentiment extraction:

R1: If researchers adopt a lexicon-based approach to mining financial textual sentiment, they should clarify the extent to which the lexicon matches the research question in terms of application area, cultural context, and time dimension.

Measurement Rules

In lexicon- and semantic rule-based sentiment analysis, researchers start with a lexicon, which is a list of words that have known emotional meanings. They then use algorithms to figure out how the emotional meaning of a text will be based on how often these words appear (Neviarouskaya et al., 2007; Taboada et al., 2011). However, due to the lack of universal and effective parsing standards, the sentiment parsing rules adopted by different researchers often vary significantly (B. Liu & Zhang, 2012; Loughran & Mcdonald, 2016).

Y. Zhang et al. (2012) used SentiWordNet to encode the sentiment of each word in a sentence. An English word is assigned two scores in this lexicon: a positive score (p_score) and a negative score (n_score), with $0 \leq p_score, n_score \leq 1$ . Finally, they transformed the affective state scores of words from value-based to polarity-based using the following formula:

Sentiment = {\begin{matrix} Neutral (0), if | p_score - n_score | < 0.1 \\ Positive (1), if | p_score - n_score | \geq 0.1 \\ Negative (- 1), if | p_score - n_score | \geq 0.1 \end{matrix}

Based on the above work, Y. Zhang et al. (2012) obtained the sentiment probability distribution of a text by calculating the proportion of positive, negative, and neutral words among all words in it. The sentiment with the highest probability is considered to be the sentiment polarity of the text.

Tetlock et al. (2008) used the Harvard-IV-4 psychosocial dictionary to quantify negative sentiment in texts. They first represented the text for the period t as a bag of words (Wallach, 2006). We then calculated the text’s negative tendency over time t:

Neg (t) = \frac{No . of negative words}{No . of total words}

Finally, they normalized the negative sentiment for time period t by subtracting the mean of the previous year’s negative words and dividing by the standard deviation of the previous year’s negative word scores:

neg (t) = \frac{Neg (t) - \bar{Neg}}{σ Neg}

Y. Sun et al. (2018) used the bullish word list (GL-Bull) and bearish word list (GL-Bear) from the GubaLex dictionary to divide the words in a text, $text = {w_{1}, w_{2}, \dots, w_{n}}$ , into two parts: bullish words and bearish words. The difference between the bullish and bearish words is then calculated to obtain the sentiment score (S), $S = N_{Bull} - N_{Bear}$ . Finally, they calculated the sentiment polarity using the following formula:

Sentiment = {\begin{matrix} Positive (1), if S > 0 \\ Neutral (0), if S = 0 \\ Negative (- 1), if S < 0 \end{matrix} .

Leitch and Sherif (2017) used Loughran-McDonald’s Dictionary of Finance to calculate the number of positive and negative words for all tweets from each company. The sentiment index of the company was then constructed using the following formula:

\begin{matrix} Sentiment \\ = \frac{No . of words \in Positive_Dictionary - No . of words \in Negative_Dictionary}{Size of Positive_Dictionary - Size of Negative_Dictionary}, \end{matrix}

where “Size of Positive Dictionary” and “Size of Negative Dictionary” represent the lengths of the positive and negative word lists in the dictionary.

We also found that some studies used extremely simple semantic rules, which made us question the results of sentiment mining. For example, See-To and Yang (2017) simply treated tweets containing the word “bullish” as bullish sentiment and tweets containing the word “bearish” as bearish sentiment. However, it is known that many tweets containing bullish sentiment do not necessarily contain the word “bullish,” while tweets containing the deny word and “bullish” can express bearish sentiment.

In lexicon-based sentiment analysis, the rules of analysis used in various studies are very different. W. Zhang and Skiena (2010) showed that, in the absence of any agreed gold standard for entity-level sentiment analysis, intuitive performance evaluation is not possible. However, clear disclosure of the parsing rules is still important because it enables other academics to replicate the relevant results, and clear and logical parsing rules can increase the article’s persuasiveness.

In summary, we offer our second recommendation:

R2: If the researchers use a lexicon-based approach to parsing financial text sentiment, they should clearly disclose the rules for parsing text sentiment. This helps in the replication of experimental results and enhances the reliability of the study.

Other Unsupervised Methods

In addition to lexicon-based semantic framework models, some scholars have used other semantic frameworks for sentiment parsing. For example, Latent Dirichlet Allocation (LDA) and Social Network Analysis (SNA).

The LDA topic model can accomplish the dimensionality reduction representation of text in semantic space, and it models the text with the likelihood of vocabulary, which alleviates the problem of data sparsity to some extent (Yu & Qiu, 2019; Y. Zhang et al., 2015). LDA is a three-level hierarchical Bayesian model in which each collection item is represented as a finite mixture over an underlying set of topics. Each subject is then modeled as an infinite mixture of underlying topic probabilities (Blei et al., 2003). LDA can divide large amounts of text into a small number of topics using semantic weights, allowing it to perform a function similar to unsupervised learning.

But traditional LDA models do not have the capability to assign sentiment. To address this limitation, Nguyen and Shirai (2015) introduced a novel topic model called Topic Sentiment Latent Dirichlet Allocation (TSLDA), which estimates different opinion word distributions for individual sentiment categories for each topic. TSLDA is a more complex version of the Latent Dirichlet Allocation model (Blei et al., 2003). The TSLDA model divides words in a document into three categories: topic words (c = 1), opinion words (c = 2), and other words (c = 0). Nguyen and Shirai (2015) stated that, depending on the topic, opinion words may express different emotional meanings. For example, the opinion words “low” in “low cost” and “low wage” have opposite polarity. In the TSLDA model, different topics, also represented by word distributions, will have different opinion word distributions.

Gloor and Zhao (2006) created a tool to develop a complex semantic social network analysis called Condor. Condor is able to obtain a web of information about a stock through multiple recursive searches of the web for a specific stock (Gloor et al., 2009). Based on this mesh information, Condor creates a list of terms with positive and negative sentiments for words and word pairs using a bag of words (T. Li et al., 2010). Semantic rules are then used to obtain the thematic sentiment status of a particular stock in the social network for a particular day.

It is straightforward to see that neither Latent Dirichlet Allocation (LDA) nor Social Network Analysis (SNA) moves away from semantic rules; they both ultimately require the construction of sentiment indices based on the sentiment of words. The effectiveness of LDA and SNA algorithms becomes more difficult to measure than that of lexicon- and semantic-rule-based approaches to social media sentiment extraction.

Supervised Learning

While the introduction of lexicons makes lexicon and semantic rule-based sentiment parsing methods more interpretative (Oliveira et al., 2016), it also makes the analysis results subject to lexicon and keyword selection noise (Giannini et al., 2018). There is a lot of evidence that algorithms that use supervised learning tend to be more accurate than those that use unsupervised learning:

• F. Li (2010) and Huang et al. (2014) showed that Naive Bayes methods outperformed dictionary and word count methods in predicting text sentiment.

• Wu et al. (2014) showed a classification accuracy of 81.82% for supervised learning methods, which was higher than the 75.58% classification accuracy for semantic methods.

• Zheludev et al. (2014) report that supervised SentiStrength tends to be more accurate than unsupervised SentiStrength.

• Thavareesan and Mahesan (2019) obtained the highest accuracy of 79% using supervised machine learning among multiple methods of emotion extraction based on lexicon and supervised learning.

• Audrino et al. (2020) showed that the predictive power of sentiment variables decreases when lexicon- and rule-based methods are used.

The availability of annotated training data is the defining feature of supervised learning (Hinz et al., 2011). The term refers to a supervisor who instructs the learning system on which labels to associate with the training data (Cunningham et al., 2008). In classification issues, these labels are often classified as class labels. Supervised learning techniques create models from this training data, which may then be used to categorize additional unlabeled data (Hastie et al., 2009). Figure 6 illustrates a brief flow of supervised learning, where obtaining usable, manually annotated text is the first step in the process.

Figure 6.

Diagram of supervised learning.

Manual-Based Text Sentiment Annotation

Oliveira et al. (2016) stated that the task of text annotation is arduous. Saif et al. (2013) and Engelson and Dagan (1996) both emphasized that collecting high-quality, manually annotated text is quite expensive. The manual sentiment annotation of texts is an extremely subjective task. This brings into question the reliability of the annotation results (Ipeirotis et al., 2010). For example, Rechenthin et al. (2013) employed three staff members to annotate social media documents with sentiment “bullish” (positive outlook), “neutral,”“bearish” (negative outlook), or “off-topic.” A sample of the results found that approximately 70% of the posts were classified as off-topic. It is clear that accurate and reliable emotion annotation is not easy.

Hutto and Gilbert (2014) used four quality control measures to make sure that the labels on the sentiment data made sense:

• First, each rater was pre-screened for English reading comprehension.

• Each pre-screened rater had to complete an online sentiment scoring training and orientation course and score 90% or higher in matching the mean sentiment score of known (pre-validated) vocabulary items.

• For each batch of data (25), 5 of them were pre-annotated. If three or more of these five a priori criteria were incorrect at the time of annotation, all 25 were discarded.

• Incentives and rewards are given for high-quality work.

Due to the subjective nature of text annotation, this task is usually carried out by several people. The reliability of the annotation is then improved by cross-checking the results with each other:

• In the study by Hutto and Gilbert (2014), two human experts were engaged to scrutinize all 800 tweets separately and independently rate their emotional intensity on a scale from −4 to +4.

• To improve the accuracy of manual annotation, T. Li et al., 2018; Q. Li et al. (2018) required each tweet message to be manually classified by at least five different trusted contributors. The average agreement between the five contributors on the classification of individual messages was 73%.

• In Yang et al. (2016)’s study, each text was classified separately by three people, and if the classification results of the three persons did not agree, it was labeled as neutral or noisy.

• In Y. Li et al. (2020), each message was annotated by at least two analysts, and the annotation findings were then evaluated and compared. If a text had a different labeling result, they invited a third analyst to label it again, and the majority view was utilized as the final label for that message.

• In order to generate accurately labeled texts, Rechenthin et al. (2013) asked each worker to provide an explanation for the labeling results, and they discovered that this enhanced annotation accuracy. Three workers individually annotated each piece of social media text. Three workers agreed on the annotation results in 39.12% of the annotated data, and at least two persons agreed on the annotation results in 84.87% of them; texts that did not agree were eliminated.

The availability of manually annotated text data is a typical feature of sentiment classification based on supervised learning (Ipeirotis et al., 2010). If the quality of manually annotated text cannot be guaranteed, the results of text sentiment mining are bound to be unreliable. However, in our survey, we found that some studies simply disclose the manual annotation process (Giannini et al., 2018; Wu et al., 2014), which may reduce the credibility of the study.

In summary, we offer our third recommendation:

R3: During supervised learning-based sentiment mining, researchers should take measures to safeguard the reliability of manually annotated data and elaborate on these measures in their research.

Models and Algorithms

Machine learning models acquire the ability to predict the label y using the feature vector x by learning a set of pre-annotated data and deconstructing the complex rules between x and y (B. Liu, 2011). For text mining tasks, x is a word vector constructed from text data, while y is the sentiment implied by the text (Hotho et al., 2005; Tan, 1999). In supervised learning-based social media sentiment parsing tasks, the construction of word vectors is a necessary prior step (Neviarouskaya et al., 2007; Tan, 1999).

Word Vector Construction

The process of constructing word vectors from raw text is the process of text pre-processing and text representation (Wu et al., 2014). Q. Liu et al. (2022) remove numbers, punctuation marks, extra spaces, and unintelligible special characters such as (, [, {, &, etc. from the text during text preprocessing. They then replace the web link in the text with the string “URL” and the username with the string “USERNAME.” Finally, the text was encoded using Word2vec. Q. Liu et al. (2022) also used Word2vec to transform text into word vectors. Word2vec is a two-layer neural network that transforms the input text data into a set of vectors as output (Church, 2017). This helps with the assumption of additional semantic features for text classification.

Word vectors can be constructed using N-grams from strings of n adjacent characters retrieved from a continuous chunk of text (Damashek, 1995; Robertson & Willett, 1998). As a result, the N-gram has become a common paradigm for modeling vector spaces (Sidorov et al., 2014). Wu et al. (2014) set the weights using N-gram feature selection, document frequency (DF) dimensionality reduction, and Boolean weighting methods. Specifically, let $T = T (t_{1}, t_{2}, \dots, t_{n})$ represent a predefined set of features that could appear in a text; each text can be represented as a vector $D = D (t_{1}, w_{1}, t_{2}, w_{2,} \dots, t_{n}, w_{n})$ , where $w_{k,}$ is the weight of feature $t_{k}$ .

J. Li et al. (2017) defined a document as a set of m features $(f_{1}, f_{2}, \dots, f_{m})$ . Let $n_{i} (d)$ be the number of occurrences of $f_{i}$ in document d. The word vector of d is then denoted as

{\vec{vector}}_{d} = (n_{1} (d), n_{2} (d), \dots, n_{m} (d)) .

“Bag of words” is another commonly used method for building word vectors. The “bag of words” treats the list of words that make up a text as a sequence of features (Wallach, 2006). In conjunction with the word vector construction method, scholars often use the TF-IDF algorithm to weight lexical feature vectors (Roelleke & Wang, 2008):

weight = tf \times \log \frac{N}{df} .

The IDF (inverse document frequency) is the log of the number of texts divided by the number of texts that contain the word, whereas the TF (term frequency) is the number of times a word appears inside the texts (Bafna et al., 2016; Trstenjak et al., 2014).

Although some studies have compared various methods of constructing word vectors in the hope of achieving higher classification performance, for example, Rechenthin et al. (2013) used the “bag of words” feature set, the meta-data feature set, and the combined “bag of words” and meta-data feature set for prediction performance comparison. However, we cannot estimate the difference in sentiment classification performance based on differences in word vector construction methods used by the scholars.

Machine Learning Algorithms

Having constructed the word vectors, the researcher was given a dataset that could be used for supervised learning model training: ${(x_{1}, y_{1}), (x_{2}, y_{2}), \dots (x_{n}, y_{n})}$ . The training set can be used for model training in machine learning, allowing the model to learn the parsing rules of the training set. Finally, the researcher can use the trained model to perform the emotional parsing of a large amount of text: $x \to y$ .

Table 4 provides statistics on the main machine learning algorithms used in our survey of supervised learning-based investor sentiment resolution work. It is easy to see that finance researchers mainly use traditional machine learning algorithms such as Naive Bayes, Support Vector Machines, Maximum Entropy, K-Nearest Neighbor, etc. Our conclusions are supported by the recent findings of Zad et al. (2021) on text sentiment mining. In particular, the Naive Bayes algorithm based on probabilistic models is the most popular among finance academics, and more than half of the supervised learning-based studies on investor sentiment in social media use the Naive Bayes algorithm.

Table 4.

Statistics of Machine Learning Algorithms for Social Media Investor Sentiment Mining.

Method classification	Literature for algorithm introduction	Percentage
Naive Bayes (NB)	Rish (2001), Fersini et al. (2014)	59.4
Support Vector Machines (SVM)	Hearst et al. (1998), Noble (2006)	21.9
Maximum Entropy (ME)	Jaynes (1982)	12.5
K-nearest Neighbor (KNN)	S. Zhang et al. (2017)	9.4
Boosted Decision Tree (BDT)	Roe et al. (2006)	6.3
Deep Learning (DL)	LeCun et al. (2015), Albawi et al. (2017)	6.3
Others: Decision Table, J48 Tree, etc.	Kohavi (1995)	9.4
Third-party Tool	Hu et al. (2019)	18.8

Note. See Appendix A for detailed data. The sum of the percentage columns is greater than 100% due to the fact that some studies used multiple algorithms.

The Naive Bayes algorithm treats each word in a given text message as a “feature,” and the given text message is used to construct a feature vector (S.-H. Kim & Kim, 2014). Each text message has a “label,” that is, sentiment polarity (“buy” or “sell”) or sentiment temperature (Huang et al., 2014). The Naive Bayes rule is used to figure out the probability of a label given “features,”

P (label | features) = \frac{P (label) \times P (features | label)}{P (features)} .

The Naive Bayes classification method assumes that feature occurrences, when given a “label,” are unrelated to one another. If there are n independent features (e.g., a series of words), $f_{1}, f_{2}, \dots, f_{n}$ , then

\begin{matrix} P (features | label) \\ = P (f_{1} | label) \times P (f_{2} | label) \times, \dots, \times P (f_{n} | label) . \end{matrix}

As a result, the likelihood of a label conditional on “features” can now be stated as:

\begin{matrix} P (label | features) \\ = \frac{P (label) \times P (f_{1} | label) \times P (f_{2} | label) \times \dots \times P (f_{n} | label)}{P (features)} \end{matrix}

$P (label)$ is the probability of obtaining an output label given no knowledge of the characteristics of the input; $P (f_{j} | label)$ is the likelihood of obtaining a specific feature j given a known label (S.-H. Kim & Kim, 2014). Using the training set, the probabilities $P (label)$ and $P (f_{j} | label)$ can be estimated.

The general idea behind the maximum entropy (ME) classification is that when nothing is known about the distribution, it should be homogeneous, that is, have maximum entropy. Consider the example of trying to classify documents as positive, negative, or neutral, where we are only told that 50% of documents containing the word “buy” are considered positive. Intuition tells us that if a document has the word “buy” in it, then it has a 50% chance of being a positive post, a 25% chance of being negative, and a 25% chance of being neutral. If we do not have the word “buy” in our document, then we will assume that it has a 33% even distribution of probability of belonging to each category.

To extract more reliable sentiment from social media texts, many researchers use a combination of machine learning algorithms for sentiment mining (Giannini et al., 2018; Rechenthin et al., 2013). For example, Y. Zhang et al. (2012) used eight classifiers in their study to evaluate the performance of different text sentiment classification algorithms. These algorithms include the Probabilistic Indexing Model (Maron & Kuhns, 1960), Expectation Maximization (Dempster et al., 1977), Kullback–Leibler divergence (Kullback & Leibler, 1951), K-Nearest Neighbor (Salzberg, 1991), Maximum Entropy (Ratnaparkhi, 1997), Naive Bayes (Y. Li et al., 2020), Support Vector Machine, and Term Frequency Inverse Document Frequency. Some studies using multiple algorithms have used the parsing result of the best-performing algorithm as the final sentiment parsing result (Y. Zhang et al., 2012), while others have used the average voting result of multiple algorithms as the final sentiment parsing result (Rechenthin et al., 2013).

The construction of word vectors and machine learning algorithms goes hand in hand. Furthermore, Kraaijeveld and De Smedt (2020) showed that different data pre-processing methods have a large impact on the performance of social media sentiment mining. Although we cannot intuitively estimate from the sentiment mining processes described by the researchers (data preprocessing, text representation, and machine learning algorithms) whether the performance of sentiment mining meets the need for research precision, the necessary description of these processes is still very relevant. This allows us to check whether the authors’ work conforms to the basic logic of natural language processing (Nadkarni et al., 2011; Ratnaparkhi, 1997) and makes it possible to reproduce the results of the study.

Sample Size

Q. Liu and Son (2024) conducted a statistical analysis of investor sentiment studies based on social media text and found that the median data size used by researchers was 1,109,500 entries, with an average of 23,643,300 entries. This highlights that sentiment extraction from social media is typically a classic big data application scenario.

Training sample size plays a crucial role in supervised sentiment analysis. Renault (2020) observed that as the dataset size increased from 250,000 to 500,000 messages, the prediction accuracy stabilized; further increasing the size to 1,000,000 messages improved accuracy by only 0.31 percentage points. This indicates that while larger datasets may increase research workload and complexity, existing studies have not reported adverse effects of larger data sizes on prediction outcomes.

However, our investigation revealed cases where the training sample size was evidently insufficient. For example, Sprenger et al. (2014) conducted sentiment classification on 45 million social media messages but used only 2,500 training samples—just 0.01% of the total dataset. In such cases, if researchers fail to demonstrate the applicability of their models under small sample conditions, the classification performance becomes unconvincing.

In the absence of unified guidelines for sample size, we recommend researchers critically consider the following questions:

• Is the chosen training sample size adequate to support the complexity of the predictive task?

• Do the training samples sufficiently represent the relationships between features and sentiment labels within the overall dataset?

In summary, we offer our fourth recommendation:

R4: A basic elaboration of the experimental setup of text pre-processing, text representation, sample size, and machine learning algorithms during supervised learning-based sentiment extraction contributes to the reliability and persuasiveness of the findings.

Verification

We use supervised or unsupervised learning to parse investor sentiment from social media texts. Following the basic recommendations of R1, R2, R3, and R4 enhances the credibility of the study and makes it possible to replicate its results. However, we are not sure whether the extracted investor sentiment can be used in behavioral finance research without incurring additional systematic errors. As shown in Figure 7, we need to perform an acceptance exercise to demonstrate that we have assigned sufficiently accurate sentiment polarity, or sentiment temperature, to the texts in social media (Giannini et al., 2019; Q. Liu et al., 2022; Xiong et al., 2017).

Figure 7.

The role of validation.

Verification of Supervised Learning

In supervised learning, we enable the annotation of unannotated text with sentiment by training a machine learning model on manually annotated social media texts. These annotated texts act as “mentors.” The algorithmic part encompasses the implementation process of sentiment analysis, including pre-processing, vector construction, and model selection. Therefore, we can describe supervised learning-based sentiment extraction from social media texts as a function of the text, the sentiment annotation, and the algorithm:

{Sentiment}_{Text} = P (Text, Annotation, Methods)

A scientifically rigorous text annotation and algorithm design procedure can elevate our expectations of supervised learning outcomes and bolster the study’s credibility; however, it does not establish its validity (Hastie et al., 2009; Nasteski, 2017). Sentiment extraction by supervised learning requires a calibration session to validate its efficacy (W. Li et al., 2019; Y. Zhang et al., 2019).

Similar to unsupervised learning, the benchmark for validating the sentiment analysis capability of supervised learning models also relies on manually labeled sentiment texts (ChandraKala & Sindhu, 2012; H. Zhang et al., 2014). However, compared to unsupervised learning, validation in supervised learning is generally more straightforward, as the same labeled data used for “supervision” can also serve as the basis for evaluation.

In our review of studies on investor sentiment derived from social media, 63.33% of sentiment analyses based on supervised learning disclosed performance metrics for sentiment extraction (detailed data available in Appendix A). This proportion is significantly higher than the 23.07% observed in studies using unsupervised learning. One possible reason for this disparity is that many studies adopt unsupervised learning sentiment analysis methods specifically to avoid the labor-intensive process of manually labeling investor sentiment data (Oliveira et al., 2016).

Antweiler and Frank (2004) investigated the correlation between the textual sentiment of communities and stock prices. They verified the classification performance of the plain Bayesian algorithm using 1,000 hand-labeled observations. They showed that their algorithm does not misclassify “bullish” messages as “bearish” or “bearish” messages as “bullish,” so their constructed sentiment index is credible.

Q. Liu et al. (2022) employed a deep learning model to mine social media text sentiment in their investigation of the synergy between investor sentiment and stock prices in social media. The combined accuracy of the network model they employed for classifying the sentiment of unlabeled investor messages was 89.14%, higher than Antweiler and Frank (2004)’s 88.1% and Xiong et al. (2017)’s 85.4%, with high confidence and no systematic error.

Go et al. (2009) achieved over 80% accuracy in classifying tweet sentiment with emojis using various algorithms, such as Naive Bayes, Maximum Entropy, and SVM. Rechenthin et al. (2013) and Y. Zhang et al. (2012) used a variety of machine learning algorithms for sentiment analysis. They compared and analyzed the performance of these machine learning algorithms for sentiment classification.

The supervised learning-based behavioral finance studies that have been marked with a “1” in the “Results Checked” column in Appendix A have all disclosed the sentiment analysis performance to support the study. Following R3 and R4’s recommendations boosts the study’s credibility but not its reliability. The validation of the sentiment analysis results is a fundamental way to demonstrate the reliability of investor sentiment.

Verification of Unsupervised Learning

Lexicon and semantic rule-based methods dominate sentiment parsing approaches based on unsupervised learning. Lexicon- and semantic-rule-based sentiment parsing methods regard text sentiment as a function of text, lexicon, and semantic rules:

{Sentiment}_{Text} = f (Text, Lexicon, Rules)

However, we cannot intuitively assess the reliability of sentiment parsing based on the lexicon features and parsing rules described by the researchers (Pang & Lee, 2008). This is especially true when we question this study’s lexicon or parsing rules. For example, Loughran and McDonald (2011) have shown that almost three-quarters of the words identified as negative by the widely used Harvard-IV-4 dictionary are words that are not normally considered negative in a financial context. However, there is still a large body of research using the Harvard-IV-4 dictionary to parse the sentiment rules of financial texts (Tetlock et al., 2008). However, Karalevicius et al. (2018) only took into account the first occurrence of dictionary words when parsing texts, whereas additional research indicates that multiple occurrences of mood words are significant for mood temperature (Chen et al., 2014; Loughran & McDonald, 2011).

To address concerns about parsing results, it is necessary to calibrate the results of parsing (Geva & Zahavi, 2014). However, our survey revealed that only 17.6% of lexicon and semantic rule-based studies performed this calibration (see Appendix B for detailed data). In the absence of calibration criteria, manual-based sentiment classification becomes an accepted and reliable standard of comparison (Homburg et al., 2015).

Cheng and Lin (2013) verified the accuracy of post sentiment classification using 140 manually labeled “bullish” posts and 140 “bearish” posts. These manually labeled posts were independently judged by three people, and if all three people judged the post as “bullish” or “bearish,” the post was marked as “bullish” or “bearish.” If the three people do not agree on the marker, the post is discarded. It is easy to see that collecting high-quality, manually annotated text is quite expensive (Saif et al., 2013), and the process of manual tagging often needs to be disclosed in detail; otherwise the accuracy of manual classification may be questioned (Wilson et al., 2009).

In summary, there is no universally accepted golden rule for lexicon- and semantic rule-based sentiment parsing (W. Zhang & Skiena, 2010). Following the guidelines of R1 and R2 can enhance the study’s credibility, but it doesn’t demonstrate the reliability of the parsing results. However, validating the sentiment parsing results is an effective method to demonstrate the reliability of investor sentiment.

Validation of Third-Party Tools

As shown in Table 1 and Appendix A, a number of additional studies used other people’s algorithms or other tools for investor sentiment parsing, which we refer to collectively as “third-party tools.” For example:

• Piñeiro-Chousa et al. (2016) and Deng et al. (2018) used the CoreNLP toolkit of Stanford. The CoreNLP toolkit provides an extensible pipeline that provides core natural language analysis (Manning et al., 2014; Rao & Srivastava, 2012). It contains complete and high-quality analysis components and does not require the use of a large number of related data packages, allowing for the simple application of machine learning to natural language processing (Kaur & Agarwal, 2018). The software uses the Sentiment Treebank corpus, the first corpus with fully labeled parse trees (Socher et al., 2013).

• Fan et al. (2020) used the TextBlob toolkit, which is a simple API provided by the Python library, to perform certain natural language processing tasks (Gujjar & Kumar, 2021) and to provide polarity scoring for tweets (Shekhawat, 2019).

• The study by Nisar and Yeung (2018) used Umigon for sentiment parsing, a lexicon-based sentiment classifier that provides sentiment detection for tweets. Umigon also provides indications of other semantic features present in tweets, such as temporal indications or subjectivity markers.

Studies that have utilized third-party tools for sentiment parsing frequently cite the disclosed sentiment parsing performance as proof of the validity of their own research (Audrino et al., 2020; Nisar & Yeung, 2018; Zheludev et al., 2014). However, additional evidence suggests that this is not reliable. For example:

• With lexicon and semantic rule-based approaches, the choice of lexicon can have a significant impact on classification performance (Oliveira et al., 2016), and researchers need to consider whether the lexicon employed by a third-party tool and their own study are a sufficient match in terms of the research domain, regional culture, and timeliness.

• X. Sun et al. (2020) show that Tencent’s NPL service cannot be applied to the sentiment classification of finance texts due to a lack of finance-specific text training.

• PaddlePaddle is one of the most advanced natural language processing models in China, and PaddlePaddle’s website claims that its sentiment classification accuracy can exceed 88%, but it can only achieve about 60% accuracy when dealing with finance texts (Wang et al., 2022).

In summary, even when researchers use third-party tools for sentiment measurement, they still need evidence that the third-party tools are effective for their particular sentiment mining task. Drawing from the aforementioned discussion, we present our fifth recommendation:

R5: The process of parsing is not sufficient to prove the validity of the parsing, regardless of the way in which the scholar performs the sentiment parsing. Validation of the parsing results is a necessary process to prove that sentiment parsing is reliable.

Summary and Comparison

As shown in Table 1, our core literature review reveals no significant difference in the number of studies utilizing supervised and unsupervised learning methods for extracting financial sentiment from social media. This finding suggests that the academic community has yet to reach a consensus on whether one method outperforms the other in sentiment analysis tasks. Instead, researchers tend to base their choice of methodology on specific research objectives and data characteristics.

We have visualized this comparison in Figure 8, highlighting the applicability of both approaches. Regardless of the chosen method, data collection and preprocessing remain foundational prerequisites for financial sentiment analysis. Similarly, validating the results of sentiment extraction is essential for enhancing the robustness of research findings. These fundamental steps are consistent across both approaches.

Figure 8.

Comparison chart of supervised versus unsupervised learning in sentiment analysis.

The primary differences between the two methods lie in their basis requirements and modeling approaches:

Basis

• Unsupervised learning: Most unsupervised learning methods rely on lexicons and semantic rules for sentiment extraction. The success of such methods depends heavily on the suitability of the lexicon, particularly regarding domain relevance, cultural and linguistic adaptability, and temporal alignment.

• Supervised learning: Supervised learning relies on high-quality, manually labeled data as the foundation for training models. Researchers must implement rigorous quality control measures to ensure the reliability of annotated data, which is crucial for the effectiveness of supervised learning models.

Models

• Unsupervised learning: These methods primarily depend on predefined semantic rules for sentiment analysis, which vary according to the researcher’s design and the domain requirements (B. Liu & Zhang, 2012; Loughran & Mcdonald, 2016).

• Supervised learning: Researchers typically employ machine learning algorithms—including traditional techniques and deep learning architectures—to predict textual sentiment. Recent studies suggest that deep learning frameworks outperform traditional supervised and ensemble learning methods in sentiment analysis tasks (Onan, 2021). However, the value of a study is not solely determined by the choice of model but rather by the researchers’ comprehensive disclosure of reliability assessments.

Advantages and Disadvantages

Existing studies indicate that both methods have their strengths and limitations.

• Unsupervised learning: Its primary advantage is the avoidance of the labor-intensive task of text annotation, significantly reducing the costs and time associated with data preparation. Moreover, lexicon-based methods offer high interpretability (Oliveira et al., 2016). However, lexicon selection and keyword mismatches introduce noise into these methods (Giannini et al., 2018).

• Supervised learning: While this approach requires substantial resources for text annotation, it generally achieves higher accuracy in sentiment classification tasks (Audrino et al., 2020; Huang et al., 2014; F. Li, 2010; Wu et al., 2014).

In summary, the effectiveness of a study depends more on the researchers’ transparency in disclosing their methodological choices, parsing rules, and validation processes than on the specific method employed. Transparent and standardized research designs are particularly crucial, as they facilitate reproducibility and enhance the reliability and academic contribution of the findings. However, with the rapid advancement of natural language processing technologies, employing cutting-edge techniques not only improves the precision and efficiency of sentiment analysis but also fosters greater trust in the reliability of the research (Onan, 2021). In the next section, we will delve deeper into this perspective.

Discussion

Lags in the Application of Technology

We also found that the latest natural language processing techniques have not been applied to the parsing of text sentiment in social media. We searched Google Scholar for the keyword “text sentiment analysis” and restricted the years of publication to 2018 to 2024. To obtain a representative cross-section of results, we screened the top 100 documents with the highest relevance to the keywords and obtained 61 pieces of core literature with a minimum citation count of 20. As shown in Panel A of Table 5, the average number of citations in these literatures is 174, which constitutes a high level of impact.

Table 5.

Literature Statistics for a Survey of Advances in Sentiment Mining Techniques.

Panel A: Volume of references
Indicators		Citation
mean		174
std		132
min		20
median		141
max		643
Panel B: Technical classification statistics
Research content	Technical fields
Research content	Neural Networks	Others
Computer, IT (59%)	83.3%	16.7%
Others (41%)	36.0%	64.0%

Note. See Appendix B for detailed data.

The citation count data comes from Google Scholar as of November 30, 2024

The literature we surveyed includes 59% of the literature in the field of computing and IT and 41% in other fields such as finance, business, and economics. As shown in Panel B in Table 5, more than 3/4 of the research belonging to the computer and IT technology domain is based on neural networks, and deep learning remains at the forefront of text sentiment mining. In contrast, only 36.0% of the sentiment parsing techniques used in other research areas employ neural network techniques—a rate that is less than that of research in computing and IT. So, there is a distance between the techniques applied in sentiment mining in non-computer and IT areas and the latest technological advances.

Financial studies based on social media investor sentiment are no exception to the textual sentiment parsing techniques applied in these studies, which benefit from a technological spillover from the computer and IT fields (Pang & Lee, 2008). Our findings indicate a lagging trend in this technological spillover. As shown in Appendix B, the field of finance has not been interested in the spillover of computer technology, preferring to use readily available, simple, traditional methods for parsing investor sentiment in text.

In summary, we offer our sixth recommendation:

R6: The use of recent advances in natural language processing techniques for sentiment parsing, as opposed to traditional methods of sentiment parsing, helps enhance the persuasiveness of studies.

Recommendations

Based on a review of historical literature, we propose key norms and recommendations for financial researchers conducting sentiment analysis using social media text. Table 6 summarizes these recommendations, which include the applicability of lexicons (R1), transparency in parsing rules (R2), reliability of manual annotations (R3), scientific and standardized experimental setups (R4), validation of sentiment analysis results (R5), and the application of cutting-edge natural language processing techniques (R6).

Table 6.

The Basic Recommendations for Measuring Investor Sentiment in Social Media.

Recommendation no.	Concerns
R1	Are the lexicons used in the study consistent with the research questions in terms of application areas, cultural scope, and time dimensions?
R2	Are the rules for parsing the sentiment of the text clearly disclosed?
R3	What actions have been taken to ensure the reliability of the manual labeling of data?
R3	Are the justifications for these actions clearly stated?
R4	Are experimental settings like text pre-processing, text representation, sample size, and machine learning algorithms well illustrated?
R5	Is the validity of text sentiment analysis verified?
R6	Is the reliability of text parsing improved by using up-to-date natural language processing techniques?

These recommendations address the critical aspects of extracting financial sentiment from social media text. We emphasize that adherence to these norms can significantly enhance the reliability, reproducibility, and academic contribution of sentiment analysis research. Beyond providing practical guidance for investor sentiment studies based on social media text, these recommendations establish a more robust academic foundation for exploring sentiment in financial markets.

Research Implications

This study contributes to both academia and practice by providing methodological support and practical insights for analyzing investor sentiment on social media.

Academically, it outlines key principles of sentiment analysis, focusing on lexicon use, parsing rules, and effectiveness. By standardizing these elements, it enhances research reliability, reproducibility, and comparability, addressing inconsistencies caused by varying tools and data processing methods. Additionally, it identifies gaps in the literature, particularly in sentiment index validation and the influence of sentiment analysis on financial research.

Practically, this study offers a structured approach to constructing sentiment indices, enabling researchers to derive more reliable indicators for decision-making. It also provides a systematic framework for scholars reviewing related studies. By establishing clear guidelines, it delivers actionable insights for both academia and industry, improving sentiment analysis credibility and application.

More broadly, this research demonstrates the value of social media sentiment analysis in financial markets, shedding light on its relationship with market behavior. Beyond finance, its findings hold relevance for sentiment analysis in other socio-economic contexts and support future cross-lingual and cross-cultural research.

Limitations

This study provides a comprehensive review of key methods for extracting financial sentiment from social media and offers relevant recommendations. However, several limitations must be acknowledged.

Bias in Data Representativeness

While we have compiled a core set of representative literature, it may not fully capture research conducted across diverse linguistic and cultural contexts. Sentiment analysis studies in non-English environments may be underrepresented, potentially limiting the generalizability of our findings. Additionally, some studies may exhibit regional or platform-specific biases, challenging the universality of the conclusions.

Limitations in Method Applicability

This study does not thoroughly examine how different sentiment analysis methods perform under specific financial market conditions. Some techniques may be more effective in particular markets or economic scenarios, yet their contextual suitability is not comprehensively addressed. Additionally, emerging approaches, such as multimodal data analysis (integrating images and videos) and network relationship analysis, are beyond the scope of this review

Limitations in Method Applicability

Impact of Large Language Models (LLMs)

Advances in natural language processing, particularly generative pre-trained models like GPT-4, are reshaping sentiment analysis. These models enhance semantic understanding and improve sentiment classification. However, this study does not assess LLM-based sentiment analysis methods. Their adoption may not only alter technical methodologies but also redefine the analytical frameworks of financial sentiment research.

Challenges of a Dynamic Data Ecosystem

The evolving nature of social media platforms and shifting user behaviors create highly dynamic data characteristics, complicating sentiment analysis. As platforms update policies and data structures change, the recommendations provided in this study may require adaptation to maintain their effectiveness.

In conclusion, this study systematically reviews key sentiment extraction methods and provides actionable recommendations for financial market research. However, the identified limitations highlight areas for further exploration.

Conclusion

Investor sentiment in social media has fascinating applications. Scholars have studied the relationship between social media sentiment and financial markets from different perspectives, but these studies have not reached a consensus (Nguyen et al., 2015). The diversity of sentiment tracking tools and the openness of sentiment mining elaboration in the financial literature make replication and comparison studies difficult to carry out. Through a literature survey, this study aims to summarize the parsing and elaboration norms that future researchers should follow in social media sentiment mining.

In this study, we reviewed a body of key financial literature on investor sentiment derived from social media, aiming to address the aforementioned challenges. We thoroughly examined essential components of sentiment text mining, such as sentiment lexicons, semantic rules, sentiment text annotation, and supervised learning. Based on these reviews, we proposed critical guidelines and recommendations for sentiment analysis in financial studies (see Table 6). We emphasize that adhering to these principles can help reduce noise in constructing investor sentiment metrics and improve the reliability and scientific rigor of related research.

To the best of our knowledge, there has been no systematic review and synthesis focusing specifically on sentiment parsing methods in the construction of investor sentiment. This study fills an important gap by offering significant reference value in this area. Theoretically, this research integrates the strengths and limitations of sentiment analysis methods, establishing a framework tailored to financial studies that can enhance consistency and reproducibility. By emphasizing the importance of transparency and standardization, this study provides a solid theoretical foundation for social media-based investor sentiment research and future financial market analyses.

Practically, this research offers vital guidelines for policymakers, supporting the development of more precise financial regulatory strategies to address market volatility and mitigate risks of manipulation. For investors and financial institutions, it provides practical guidance to optimize the construction of sentiment indicators, thereby improving decision-making efficiency. Moreover, the study offers valuable insights for other socio-economic fields and lays a solid foundation for future research in cross-linguistic and cross-cultural contexts.

Through a systematic review of literature spanning 2004 to 2024, this study thoroughly examined critical aspects of text mining for social media-based sentiment measurement. Despite the limitations of length and resources, we confidently present a comprehensive snapshot of sentiment text mining in the context of investor sentiment, providing valuable references for future research.

Due to the limitations of this study, future research should mainly focus on two main areas: first, finding out how new natural language processing technologies, like generative pre-trained models (e.g., GPT series), can be used in sentiment mining and how well they do in dynamic data and multimodal analysis; and second, exploring the potential of these technologies in more depth. Second, we should further investigate the adaptability of sentiment lexicons and parsing methods in cross-linguistic and cross-cultural contexts, with the aim of developing sentiment measurement tools that are more generalizable and accurate.

Supplemental Material

sj-docx-1-sgo-10.1177_21582440251328535 – Supplemental material for Text Sentiment Mining used for Constructing Investor Sentiment in Social Media: Survey and Recommendations

Supplemental material, sj-docx-1-sgo-10.1177_21582440251328535 for Text Sentiment Mining used for Constructing Investor Sentiment in Social Media: Survey and Recommendations by Qing Liu and Hosung Son in SAGE Open

Footnotes

ORCID iDs

Qing Liu

Hosung Son

Statements and Declarations

Supplemental Material

Supplemental material for this article is available online.

References

Ackert

L. F.

Jiang

Lee

H. S.

Liu

(2016). Influential investors in online stock forums. International Review of Financial Analysis, 45, 39–46. https://doi.org/10.1016/j.irfa.2016.02.001

Albawi

Mohammed

T. A.

Al-Zawi

(2017). Understanding of a convolutional neural network [Conference session]. 2017 International Conference on Engineering and Technology (ICET). https://doi.org/10.1109/icengtechnol.2017.8308186

Ali

K. M.

Pazzani

M. J.

(1992). Reducing the small disjuncts problem by learning probabilistic concept descriptions. https://escholarship.org/uc/item/4dq1d07d

Antweiler

Frank

M. Z.

(2004). Is all that talk just noise? The information content of internet stock message boards. Journal of Finance, 59(3), 1259–1294.

Audrino

Sigrist

Ballinari

(2020). The impact of sentiment and attention measures on stock market volatility. International Journal of Forecasting, 36(2), 334–357. https://doi.org/10.1016/j.ijforecast.2019.05.010

Bafna

Pramod

Vaidya

(2016). Document clustering: TF-IDF approach [Conference session]. 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT). https://doi.org/10.1109/iceeot.2016.7754750

Batbaatar

Ryu

K. H.

(2019). Semantic-emotion neural network for emotion recognition from text. IEEE Access, 7, 111866–111878. https://doi.org/10.1109/access.2019.2934529

Biemann

Bordag

Heyer

Quasthoff

Wolff

(2004). Language-independent methods for compiling monolingual lexical data. In Gelbukh

(Ed.), Computational linguistics and intelligent text processing (pp. 217–228). Springer.

Blei

D. M.

A. Y.

Jordan

M. I.

(2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.

10.

Bojanowski

Grave

Joulin

Mikolov

(2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135–146. https://doi.org/10.1162/tacl_a_00051

11.

Bollen

Mao

Pepe

(2011). Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. Proceedings of the International AAAI Conference on Web and Social Media, 5(1), 450–453.

12.

Bollen

Mao

Zeng

(2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1), 1–8. https://doi.org/10.1016/j.jocs.2010.12.007

13.

Boot

Zijlstra

Geenen

(2017). The Dutch translation of the linguistic inquiry and word count (LIWC) 2007 dictionary. Dutch Journal of Applied Linguistics, 6(1), 65–76. https://doi.org/10.1075/dujal.6.1.04boo

14.

ChandraKala

Sindhu

(2012). Opinion mining and sentiment classification a survey. ICTACT Journal on Soft Computing, 3(1), 420–425. https://doi.org/10.21917/ijsc.2012.0065

15.

Cheng

W. Y.

Lin

(2013). Investor sentiment and stock market indices in social media. Management Science, 26(5), 111–119 (In Chinese).

16.

Chen

Hwang

B.-H.

(2014). Wisdom of crowds: The value of stock opinions transmitted through social media. Review of Financial Studies, 27(5), 1367–1403. https://doi.org/10.1093/rfs/hhu001

17.

Church

K. W.

(2017). Word2Vec. Natural Language Engineering, 23(1), 155–162.

18.

Cunningham

Cord

Delany

S. J.

(2008). Supervised learning. In Cord

Cunningham

(Eds.), Machine learning techniques for multimedia: Case studies on organization and retrieval (pp. 21–49). Springer.

19.

Damashek

(1995). Gauging similarity with n-grams: Language-independent categorization of text. Science, 267(5199), 843–848. https://doi.org/10.1126/science.267.5199.843

20.

Das

Chen

(2001). Yahoo! For Amazon: Extracting market sentiment from stock message boards. Proceedings of the Asia Pacific Finance Association Annual Conference (APFA), 35, 43.

21.

Dave

Lawrence

Pennock

D. M.

(2003). Mining the peanut gallery: Opinion extraction and semantic classification of product reviews [Conference session]. Proceedings of the 12th International Conference on World Wide Web. https://doi.org/10.1145/775152.775226

22.

Engelberg

Gao

(2015). The sum of all FEARS investor sentiment and asset prices. Review of Financial Studies, 28(1), 1–32. https://doi.org/10.1093/rfs/hhu072

23.

Dempster

A. P.

Laird

N. M.

Rubin

D. B.

(1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B (Statistical Methodology), 39(1), 1–22. https://doi.org/10.1111/j.2517-6161.1977.tb01600.x

24.

Deng

Huang

Sinha

A. P.

Zhao

(2018). The Interaction between Microblog Sentiment and Stock Return: An Empirical Examination (SSRN Scholarly Paper 3054906). Available at SSRN: https://ssrn.com/abstract=3054906

25.

Dini

Mazzini

(2002). Opinion classification through information extraction. WIT Transactions on Information and Communication Technologies, 28.

26.

Dong

(2003). HowNet—A hybrid language and knowledge resource [Conference session]. International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003. https://doi.org/10.1109/nlpke.2003.1276017

27.

Dudău

D. P.

Sava

F. A.

(2021). Performing multilingual analysis with linguistic inquiry and word Count 2015 (LIWC2015). An equivalence study of four languages. Frontiers in Psychology, 12, 2860. https://doi.org/10.3389/fpsyg.2021.570568

28.

Engelson

S. P.

Dagan

(1996). Minimizing manual annotation cost in supervised training from corpora (arXiv:cmp-lg/9606030). arXiv. https://doi.org/10.48550/arXiv.cmp-lg/9606030

29.

Fan

Talavera

Tran

(2020). Social media bots and stock markets. European Financial Management, 26(3), 753–777. https://doi.org/10.1111/eufm.12245

30.

Feldman

Rosenfeld

Bar-Haim

Fresko

(2011). The stock sonar—Sentiment analysis of stocks based on a hybrid approach. Proceedings of the AAAI Conference on Artificial Intelligence, 25(2), 1642–1647.

31.

Fersini

Messina

Pozzi

F. A.

(2014). Sentiment analysis: Bayesian ensemble learning. Decision Support Systems, 68, 26–38.

32.

Geva

Zahavi

(2014). Empirical evaluation of an automated intraday stock recommendation system incorporating both market data and textual news. Decision Support Systems, 57, 212–223. https://doi.org/10.1016/j.dss.2013.09.013

33.

Giannini

Irvine

Shu

(2018). Nonlocal disadvantage: An examination of social media sentiment. Review of Asset Pricing Studies, 8(2), 293–336. https://doi.org/10.1093/rapstu/rax020

34.

Giannini

Irvine

Shu

(2019). The convergence and divergence of investors’ opinions around earnings news: Evidence from a social network. Journal of Financial Markets, 42, 94–120. https://doi.org/10.1016/j.finmar.2018.12.003

35.

Gloor

P. A.

Krauss

Nann

Fischbach

Schoder

(2009). Web science 2.0: Identifying trends through semantic social network analysis [Conference session]. 2009 International Conference on Computational Science and Engineering.https://doi.org/10.1109/cse.2009.186

36.

Gloor

P. A.

Zhao

(2006). Analyzing actors and their discussion topics by semantic social network analysis [Conference session]. Tenth International Conference on Information Visualisation (IV’06).

37.

Bhayani

Huang

(2009). Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, 1(12).

38.

Gruca

Berg

Cipriano

(2005). Consensus and differences of opinion in electronic prediction markets. Electronic Markets, 15(1), 13–22. https://doi.org/10.1080/10196780500034939

39.

Guégan

Renault

(2021). Does investor sentiment on social media provide robust information for bitcoin returns predictability? Finance Research Letters, 38, 101494. https://doi.org/10.1016/j.frl.2020.101494

40.

Gujjar

J. P.

Kumar

H. P.

(2021). Sentiment analysis: Textblob for decision making. International Journal of Scientific Research in Engineering and Trends, 7(2), 1097–1099. https://ijsret.com/wp-content/uploads/2021/03/IJSRET_V7_issue2_289.pdf

41.

Hastie

Tibshirani

Friedman

(2009). Overview of supervised learning. In The elements of statistical learning (pp. 9–41). Springer.

42.

Hearst

M. A.

Dumais

S. T.

Osuna

Platt

Scholkopf

(1998). Support vector machines. IEEE Intelligent Systems and their Applications, 13(4), 18–28. https://doi.org/10.1109/5254.708428

43.

Heerschop

Hogenboom

Frasincar

(2011). Sentiment lexicon creation from lexical resources [Conference session]. International Conference on Business Information Systems. https://doi.org/10.1007/978-3-642-21863-7_16

44.

Hinz

Skiera

Barrot

Becker

J. U.

(2011). Seeding strategies for viral marketing: An empirical comparison. Journal of Marketing, 75(6), 55–71. https://doi.org/10.1509/jm.10.0088

45.

Homburg

Ehm

Artz

(2015). Measuring and managing consumer sentiment in an online community environment. JMR, Journal of Marketing Research, 52(5), 629–641. https://doi.org/10.1509/jmr.11.0448

46.

Hotho

Nürnberger

Paaß

(2005). A brief survey of text mining. Ldv Forum, 20(1), 19–62.

47.

Huang

A. H.

Zang

A. Y.

Zheng

(2014). Evidence on the information content of text in analyst reports. Accounting Review, 89(6), 2151–2180. https://doi.org/10.2308/accr-50833

48.

Han

Zhou

Liu

(2019). Public perception on healthcare services: Evidence from social media platforms in China. International Journal of Environmental Research and Public Health, 16(7), 1273. https://doi.org/10.3390/ijerph16071273

49.

Hutto

Gilbert

(2014). Vader: A parsimonious rule-based model for sentiment analysis of social media text. Proceedings of the International AAAI Conference on Web and Social Media, 8(1), 216–225.

50.

Ipeirotis

P. G.

Provost

Wang

(2010). Quality management on Amazon Mechanical Turk [Conference session]. Proceedings of the ACM SIGKDD Workshop on Human Computation, 64–67. https://doi.org/10.1145/1837885.1837906

51.

Jatowt

Duh

(2014). A framework for analyzing semantic change of words across time [Conference session]. IEEE/ACM Joint Conference on Digital Libraries, 229–238.

52.

Jaynes

E. T.

(1982). On the rationale of maximum-entropy methods. Proceedings of the IEEE, 70(9), 939–952. https://doi.org/10.1109/proc.1982.12425

53.

Kamps

Marx

Mokken

R. J.

De Rijke

(2004). Using WordNet to measure semantic orientations of adjectives. LREC, 4, 1115–1118.

54.

Karalevicius

Degrande

De Weerdt

(2018). Using sentiment analysis to predict interday bitcoin price movements. Journal of Risk Finance, 19(1), 56–75.

55.

Kaur

Agarwal

(2018). A detailed analysis of core NLP for information extraction. International Journal of Machine Learning and Networked Collaborative Engineering, 1(01), 33–47. https://doi.org/10.30991/ijmlnce.2017v01i01.005

56.

Kim

S.-H.

Kim

(2014). Investor sentiment from internet message postings and the predictability of stock returns. Journal of Economic Behavior & Organization, 107, 708–729. https://doi.org/10.1016/j.jebo.2014.04.015

57.

Kim

Y. B.

Kim

J. G.

Kim

J. H.

Kim

T. H.

Kang

S. J.

Kim

C. H.

(2016). Predicting fluctuations in cryptocurrency transactions based on user comments and replies. PLoS One, 11(8), e0161197.

58.

Kohavi

(1995). The power of decision tables [Conference session]. European Conference on Machine Learning.

59.

Kraaijeveld

De Smedt

(2020). The predictive power of public Twitter sentiment for forecasting cryptocurrency prices. Journal of International Financial Markets Institutions and Money, 65, 101188. https://doi.org/10.1016/j.intfin.2020.101188

60.

Kullback

Leibler

R. A.

(1951). On information and sufficiency. Annals of Mathematical Statistics, 22(1), 79–86. https://doi.org/10.1214/aoms/1177729694

61.

Kumar

B. S.

Ravi

(2016). A survey of the applications of text mining in financial domain. Knowledge-Based Systems, 114, 128–147. https://doi.org/10.1016/j.knosys.2016.10.003

62.

Kušen

Cascavilla

Figl

Conti

Strembeck

(2017). Identifying emotions in social media: Comparison of word-emotion lexicons; [Conference session]. 2017 5th International Conference on Future Internet of Things and Cloud Workshops (FiCloudW). https://doi.org/10.1109/ficloudw.2017.75

63.

LeCun

Bengio

Hinton

(2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539

64.

Leitch

Sherif

(2017). Twitter mood, CEO succession announcements and stock returns. Journal of Computational Science, 21, 1–10. https://doi.org/10.1016/j.jocs.2017.04.002

65.

(2010). The information content of forward-looking statements in corporate filings—A naïve Bayesian machine learning approach. Journal of Accounting Research, 48(5), 1049–1102. https://doi.org/10.1111/j.1475-679x.2010.00382.x

66.

(2017). Sentiment-aware stock market prediction: A deep learning method [Conference session]. 2017 International Conference on Service Systems and Service Management.

67.

Chen

Wang

Chen

(2018). Web media and stock markets: A survey and future directions from a big data perspective. IEEE Transactions on Knowledge and Data Engineering, 30(2), 381–399. https://doi.org/10.1109/tkde.2017.2763144

68.

Mei

Kweon

I.-S.

Hua

X.-S.

(2010). Contextual bag-of-words for visual categorization. IEEE Transactions on Circuits and Systems for Video Technology, 21(4), 381–392. https://doi.org/10.1109/tcsvt.2010.2041828

69.

van Dalen

van Rees

P. J.

(2018). More than just noise? Examining the information content of stock microblogs on financial markets. Journal of Information Technology, 33(1), 50–69. https://doi.org/10.1057/s41265-016-0034-2

70.

Liu

(2011). Supervised learning. In Web data min exploring hyperlinks, contents, and usage data (pp. 63–132). Springer.

71.

Liu

(2020). Sentiment analysis: Mining opinions, sentiments, and emotions. (2nd ed.). Cambridge University Press.

72.

Liu

Zhang

(2012). A survey of opinion mining and sentiment analysis. In C. C. Aggarwal & C. Zhai (Eds.), Mining text data (pp. 415–463). Springer.

73.

Liu

Lieberman

Selker

(2003). A model of textual affect sensing using real-world knowledge [Conference session]. Proceedings of the 8th International Conference on Intelligent User Interfaces. https://doi.org/10.1145/604045.604067

74.

Liu

Lee

W.-S.

Huang

(2022). Synergy between stock prices and investor sentiment in social media. Borsa Istanbul Review, 23(1), 76–92. https://doi.org/10.1016/j.bir.2022.09.006

75.

Liu

Son

(2024). Data selection and collection for constructing investor sentiment from social media. Humanities and Social Sciences Communications, 11(1), 1–13. https://doi.org/10/gtz557

76.

Liu

Wang

(2022). The weekly cycle of investor sentiment and the holiday effect—An empirical study of Chinese stock market based on natural language processing. Heliyon, 8(12), e12646.

77.

Liu

Zhou

Zhao

(2022). View on the bullishness index and agreement index. Frontiers in Psychology, 13, 957323. https://doi.org/10.3389/fpsyg.2022.957323

78.

Liu

Zhang

Liu

(2019). An improved approach for text sentiment classification based on a deep neural network via a sentiment attention mechanism. Future Internet, 11(4), 96. https://doi.org/10.3390/fi11040096

79.

Xie

Chen

Wang

Deng

(2014). News impact on stock price return via sentiment analysis. Knowledge-Based Systems, 69, 14–23. https://doi.org/10.1016/j.knosys.2014.04.022

80.

(2020). The role of text-extracted investor sentiment in Chinese stock price prediction with the enhancement of deep learning. International Journal of Forecasting, 36(4), 1541–1562. https://doi.org/10.1016/j.ijforecast.2020.05.001

81.

Loughran

McDonald

(2011). When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. Journal of Finance, 66(1), 35–65. https://doi.org/10.1111/j.1540-6261.2010.01625.x

82.

Loughran

McDonald

(2014). Measuring readability in financial disclosures. Journal of Finance, 69(4), 1643–1671. https://doi.org/10.1111/jofi.12162

83.

Loughran

Mcdonald

(2016). Textual analysis in accounting and finance: A survey. Journal of Accounting Research, 54(4), 1187–1230. https://doi.org/10.1111/1475-679x.12123

84.

Mai

Shan

Bai

Wang

Chiang

R. H. L.

(2018). How does social media impact bitcoin value? A test of the silent majority hypothesis. Journal of Management Information Systems, 35(1), 19–52. https://doi.org/10.1080/07421222.2018.1440774

85.

Manning

C. D.

Surdeanu

Bauer

Finkel

J. R.

Bethard

McClosky

(2014). The Stanford CoreNLP natural language processing toolkit [Conference session]. Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. https://doi.org/10.3115/v1/p14-5010

86.

Mao

Counts

Bollen

(2011). Predicting financial markets: Comparing survey, news, twitter and search engine data (arXiv: 1112.1051). arXiv. https://doi.org/10.48550/arXiv.1112.1051

87.

Maron

M. E.

Kuhns

J. L.

(1960). On relevance, probabilistic indexing and information retrieval. Journal of the ACM, 7(3), 216–244. https://doi.org/10.1145/321033.321035

88.

Mayr

Weller

(2017). Think before you collect: Setting up a data collection approach for social media studies. In The Sage handbook of social media research methods (pp. 108–124). SAGE Publications Ltd.

89.

Mohammad

Dunne

Dorr

(2009). Generating high-coverage semantic orientation lexicons from overtly marked words and a thesaurus [Conference session]. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing.

90.

Morinaga

Yamanishi

Tateishi

Fukushima

(2002). Mining product reputations on the web [Conference session]. Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. https://doi.org/10.1145/775047.775098

91.

Nadkarni

P. M.

Ohno-Machado

Chapman

W. W.

(2011). Natural language processing: An introduction. Journal of the American Medical Informatics Association, 18(5), 544–551. https://doi.org/10.1136/amiajnl-2011-000464

92.

Nasteski

(2017). An overview of the supervised machine learning methods. Horizons, 4, 51–62. https://doi.org/10.20544/horizons.b.04.1.17.p05

93.

Nasukawa

(2003). Sentiment analysis: Capturing favorability using natural language processing [Conference session]. Proceedings of the 2nd International Conference on Knowledge Capture. https://doi.org/10.1145/945645.945658

94.

Neviarouskaya

Prendinger

Ishizuka

(2007). Textual affect sensing for sociable and expressive online communication [Conference session]. International Conference on Affective Computing and Intelligent Interaction. https://doi.org/10.1007/978-3-540-74889-2_20

95.

Nguyen

T. H.

Shirai

(2015). Topic modeling based sentiment analysis on social media for stock market prediction [Conference session]. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). https://doi.org/10.3115/v1/p15-1131

96.

Nguyen

T. H.

Shirai

Velcin

(2015). Sentiment analysis on social media for stock movement prediction. Expert Systems with Applications, 42(24), 9603–9611. https://doi.org/10.1016/j.eswa.2015.07.052

97.

Nisar

T. M.

Yeung

(2018). Twitter as a tool for forecasting stock market movements: A short-window event study. Journal of Finance and Data Science, 4(2), 101–119. https://doi.org/10.1016/j.jfds.2017.11.002

98.

Noble

W. S.

(2006). What is a support vector machine? Nature Biotechnology, 24(12), 1565–1567. https://doi.org/10.1038/nbt1206-1565

99.

Nofer

Hinz

(2015). Using Twitter to predict the stock market. Business & Information Systems Engineering, 57(4), 229–242. https://doi.org/10.1007/s12599-015-0390-4

100.

Oliveira

Cortez

Areal

(2013). On the predictability of stock market behavior using stocktwits sentiment and posting volume [Conference session]. Portuguese Conference on Artificial Intelligence.

101.

Oliveira

Cortez

Areal

(2016). Stock market sentiment lexicon acquisition using microblogging data and statistical measures. Decision Support Systems, 85, 62–73. https://doi.org/10.1016/j.dss.2016.02.013

102.

Oliveira

Cortez

Areal

(2017). The impact of microblogging data for stock market prediction: Using Twitter to predict returns, volatility, trading volume and survey sentiment indices. Expert Systems with Applications, 73, 125–144. https://doi.org/10.1016/j.eswa.2016.12.036

103.

Onan

(2021). Sentiment analysis on massive open online course evaluations: A text mining and deep learning approach. Computer Applications in Engineering Education, 29(3), 572–589. https://doi.org/10.1002/cae.22253

104.

Pang

Lee

(2008). Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval, 2(1–2), 1–135. https://doi.org/10.1561/1500000011

105.

Pennebaker

J. W.

Francis

M. E.

Booth

R. J.

(2001). Linguistic inquiry and word count: LIWC 2001 (Vol. 71, p. 2001). Lawrence Erlbaum Associates.

106.

Piñeiro-Chousa

J. R.

López-Cabarcos

M. Á.

Pérez-Pico

A. M.

(2016). Examining the influence of stock market variables on microblogging sentiment. Journal of Business Research, 69(6), 2087–2092. https://doi.org/10.1016/j.jbusres.2015.12.013

107.

Rao

Srivastava

(2012). Analyzing stock market movements using Twitter sentiment analysis, 119–123. https://doi.org/10.1109/ASONAM.2012.30

108.

Ratnaparkhi

(1997). A simple introduction to maximum entropy models for natural language processing. IRCS Technical Reports Series, 81.

109.

Rechenthin

Street

W. N.

Srinivasan

(2013). Stock chatter: Using stock sentiment to predict price direction. Algorithmic Finance, 2(3–4), 169–196. https://doi.org/10.3233/af-13025

110.

Renault

(2017). Intraday online investor sentiment and return patterns in the U.S. stock market. Journal of Banking & Finance, 84, 25–40. https://doi.org/10.1016/j.jbankfin.2017.07.002

111.

Renault

(2020). Sentiment analysis and machine learning in finance: A comparison of methods and models on one million messages. Digital Finance, 2(1), 1–13. https://doi.org/10.1007/s42521-019-00014-x

112.

Rish

(2001). An empirical study of the naive Bayes classifier. IJCAI 2001. Workshop on Empirical Methods in Artificial Intelligence, 3(22), 41–46.

113.

Robertson

A. M.

Willett

(1998). Applications of n-grams in textual information systems. Journal of Documentation, 54(1), 48–67. https://doi.org/10.1108/eum0000000007161

114.

Roe

B. P.

Yang

H. J.

Zhu

(2006). Boosted decision trees, a powerful event classifier. In Lyons

Karagoz Unel

(Eds.), Statistical problems in particle physics, astrophysics and cosmology (pp. 139–142). World Scientific.

115.

Roelleke

Wang

(2008). TF-IDF uncovered: A study of theories and probabilities [Conference session]. Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.https://doi.org/10.1145/1390334.1390409

116.

Saif

Fernandez

Alani

(2013). Evaluation datasets for Twitter sentiment analysis: A survey and a new dataset, the STS-Gold. In Proceedings of the 1st international workshop on emotion and sentiment in social and expressive media: Approaches and perspectives from AI (ESSEM 2013) (pp. 9–21). Turin, Italy.

117.

Salzberg

(1991). Distance metrics for instance-based learning [Conference session]. International Symposium on Methodologies for Intelligent Systems, 399–408. https://doi.org/10.1007/3-540-54563-8_103

118.

See-To

E. W. K.

Yang

(2017). Market sentiment dispersion and its effects on stock return and volatility. Electronic Markets, 27(3), 283–296. https://doi.org/10.1007/s12525-017-0254-5

119.

Shekhawat

B. S.

(2019). Sentiment classification of current public opinion on BREXIT: Naïve Bayes Classifier Model vs Python’s TextBlob Approach [Master’s thesis, Dublin, National College of Ireland]. https://norma.ncirl.ie/3856/

120.

Sidorov

Velasquez

Stamatatos

Gelbukh

Chanona-Hernández

(2014). Syntactic N-grams as machine learning features for natural language processing. Expert Systems with Applications, 41(3), 853–860. https://doi.org/10.1016/j.eswa.2013.08.015

121.

Socher

Perelygin

Chuang

Manning

C. D.

Potts

(2013). Recursive deep models for semantic compositionality over a sentiment treebank [Conference session]. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 1631–1642. https://aclanthology.org/D13-1170

122.

Sprenger

T. O.

Tumasjan

Sandner

P. G.

Welpe

I. M.

(2014). Tweets and trades: The information content of stock microblogs. European Financial Management, 20(5), 926–957.

123.

Steinert

Herff

(2018). Predicting altcoin returns using social media. PLoS One, 13(12), e0208119. https://doi.org/10.1371/journal.pone.0208119

124.

Stone

P. J.

Dunphy

D. C.

Smith

M. S.

(1966). The general inquirer: A computer approach to content analysis. Cambridge, MA: M.I.T. Press.

125.

Strapparava

Valitutti

(2004). Wordnet affect: An affective extension of wordnet. LREC, 4(1083–1086), 40.

126.

Sun

Liu

Sima

(2020). A novel cryptocurrency price trend forecasting model based on LightGBM. Finance Research Letters, 32, 101084. https://doi.org/10.1016/j.frl.2018.12.032

127.

Sun

Fang

Wang

(2018). A novel stock recommendation system using Guba sentiment analysis. Personal and Ubiquitous Computing, 22(3), 575–587. https://doi.org/10.1007/s00779-018-1121-x

128.

Sun

Fang

Wang

Diao

(2017). GubaLex: Guba-oriented sentiment lexicon for big texts in finance [Conference session]. 2017 13th International Conference on Semantics, Knowledge and Grids (SKG). https://doi.org/10.1109/skg.2017.00013

129.

Taboada

Brooke

Tofiloski

Voll

Stede

(2011). Lexicon-based methods for sentiment analysis. Computational Linguistics, 37(2), 267–307. https://doi.org/10.1162/coli_a_00049

130.

Tan

A. H.

(1999). Text mining: The state of the art and the challenges [Conference session]. Proceedings of the Pakdd 1999 Workshop on Knowledge Discovery from Advanced Databases.

131.

Tetlock

P. C.

Saar-Tsechansky

Macskassy

(2008). More than words: Quantifying language to measure firms’ fundamentals. Journal of Finance, 63(3), 1437–1467. https://doi.org/10.1111/j.1540-6261.2008.01362.x

132.

Thavareesan

Mahesan

(2019). Sentiment analysis in Tamil texts: A study on machine learning techniques and feature representation [Conference session]. 2019 14th Conference on Industrial and Information Systems (ICIIS). https://doi.org/10.1109/iciis47346.2019.9063341

133.

Thelwall

Buckley

Paltoglou

(2012). Sentiment strength detection for the social web. Journal of the American Society for Information Science and Technology, 63(1), 163–173. https://doi.org/10.1002/asi.21662

134.

Trstenjak

Mikac

Donko

(2014). KNN with TF-IDF based framework for text categorization. Procedia Engineering, 69, 1356–1364. https://doi.org/10.1016/j.proeng.2014.03.129

135.

Tumarkin

Whitelaw

R. F.

(2001). News or noise? Internet postings and stock prices. Financial Analysts Journal, 57(3), 41–51. https://doi.org/10.2469/faj.v57.n3.2449

136.

Wallach

H. M.

(2006). Topic modeling: Beyond bag-of-words [Conference session]. Proceedings of the 23rd International Conference on Machine Learning. https://doi.org/10.1145/1143844.1143967

137.

Wang

Xiang

Yuan

(2022). The causal relationship between social media sentiment and stock return: Experimental evidence from an online message forum. Economics Letters, 216, 110598.

138.

Wierzbicka

(1992). Talking about emotions: Semantics, culture, and cognition. Cognition & Emotion, 6(3–4), 285–319. https://doi.org/10.1080/02699939208411073

139.

Wierzbicka

(1995). Everyday conceptions of emotion: A semantic perspective. In Russell

J. A.

Fernández-Dols

J. M.

Manstead

A. S. R.

Wellenkamp

J. C.

(Eds.), Everyday conceptions of emotion: An introduction to the psychology, anthropology and linguistics of emotion (pp. 17–47). Springer.

140.

Wilson

Hoffmann

Somasundaran

Kessler

Wiebe

Choi

Cardie

Riloff

Patwardhan

(2005). OpinionFinder: A system for subjectivity analysis [Conference session]. Proceedings of HLT/EMNLP 2005 Interactive Demonstrations. https://doi.org/10.3115/1225733.1225751

141.

Wilson

Wiebe

Hoffmann

(2009). Recognizing contextual polarity: An exploration of features for phrase-level sentiment analysis. Computational Linguistics, 35(3), 399–433. https://doi.org/10.1162/coli.08-012-r1-06-90

142.

D. D.

Zheng

Olson

D. L.

(2014). A decision support approach for Online Stock forum sentiment analysis. IEEE Transactions on Systems Man and Cybernetics Systems, 44(8), 1077–1087. https://doi.org/10.1109/tsmc.2013.2295353

143.

Xing

F. Z.

Pallucchini

Cambria

(2019). Cognitive-inspired domain adaptation of sentiment lexicons. Information Processing & Management, 56(3), 554–564. https://doi.org/10.1016/j.ipm.2018.11.002

144.

Xiong

Chunchun

L. U. O.

(2017). Stock BBS and trades: The information content of stock BBS. Journal of Systems Science and Mathematical Sciences, 37(12), 2359.

145.

Yang

Shen

Zhu

(2016). Local preferences, investor sentiment and stock returns: Empirical evidence from an online forum. Financial Studies, 438(12), 143–158.

146.

Qiu

(2019). ULW-DMM: An effective topic modeling method for microblog short text. IEEE Access, 7, 884–893. https://doi.org/10.1109/access.2018.2885987

147.

Zad

Heidari

Jones

J. H.

Uzuner

(2021). A survey on concept-level sentiment analysis techniques of textual data [Conference session]. 2021 IEEE World AI IoT Congress (AIIoT). https://doi.org/10.1109/aiiot52608.2021.9454169

148.

Zhang

Gan

Jiang

(2014). Machine learning and lexicon based methods for sentiment classification: A survey [Conference session]. 2014 11th Web Information System and Application Conference. https://doi.org/10.1109/wisa.2014.55

149.

Zhang

Zong

Zhu

Cheng

(2017). Learning K for KNN classification. ACM Transactions on Intelligent Systems and Technology, 8(3), 1–19.

150.

Zhang

Skiena

(2010). Trading strategies to exploit blog and news sentiment [Conference session]. Fourth International AAAI Conference on Weblogs and Social Media. https://doi.org/10.1609/icwsm.v4i1.14075

151.

Zhang

Fuehres

Gloor

P. A.

(2011). Predicting stock market indicators through twitter “I hope it is not as bad as I fear. Procedia - Social and Behavioral Sciences, 26, 55–62. https://doi.org/10.1016/j.sbspro.2011.10.562

152.

Zhang

Chen

Liu

(2015). A review on text mining [Conference session]. 2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS). https://doi.org/10.1109/icsess.2015.7339149

153.

Zhang

Jin

Zhou

Z.-H.

(2010). Understanding bag-of-words model: A statistical framework. International Journal of Machine Learning and Cybernetics, 1(1), 43–52. https://doi.org/10.1007/s13042-010-0001-0

154.

Zhang

Swanson

P. E.

Prombutr

(2012). Measuring effects on stock returns of sentiment indexes created from stock message boards. Journal of Financial Research, 35(1), 79–114. https://doi.org/10.1111/j.1475-6803.2011.01310.x

155.

Zhang

Zheng

Jiang

Huang

Chen

(2019). A text sentiment classification modeling method based on coordinated CNN-LSTM-attention model. Chinese Journal of Electronics, 28(1), 120–126. https://doi.org/10.1049/cje.2018.11.004

156.

Zhang

Chen

(2012). Deciphering word-of-mouth in social media: Text-based metrics of consumer reviews. ACM Transactions on Management Information Systems, 3(1), 1–23. https://doi.org/10.1145/2151163.2151168

157.

Zheludev

Smith

Aste

(2014). When can social media lead financial markets? Scientific Reports, 4(1), 1–12. https://doi.org/10.1038/srep04213

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.17 MB