Abstract
This paper develops a novel approach to stock selection by integrating natural language processing techniques with machine learning algorithms to analyze public opinion data in financial markets. Specifically, we employ the Bidirectional Encoder Representations from Transformers (BERT) model to process and classify financial news and social media content, combined with the Light Gradient Boosting Machine (LightGBM) algorithm to select high-potential stocks within identified concept sectors. Using over 18 million Chinese financial text records from 2023 to 2024, we construct a comprehensive framework that captures both market sentiment and stock-specific characteristics. Our strategy consists of three core components: (1) a BERT-based sentiment analyzer that identifies promising concept sectors with strong momentum, (2) a LightGBM-powered stock selection mechanism utilizing a specially designed “Concept-Momentum-Combined” factor alongside conventional financial indicators, and (3) a risk management system combining sentiment anomaly detection with multi-stage Average True Range (ATR) trailing stop mechanisms. Empirical results demonstrate significant outperformance over CSI 800 across various timeframes, with annualized excess returns of 21.55% over a six-month period and maximum drawdowns of only 11.68%. Performance attribution analysis confirms that concept sector selection based on sentiment analysis is the primary driver of excess returns. Our results add to the expanding literature on the use of artificial intelligence in financial markets and provide actionable takeaways for investors who would like to incorporate public opinion data into their investment process.
Keywords
Get full access to this article
View all access options for this article.
