Abstract
Understanding how information moves around on social media requires identifying who will be the most important influencers and predicting which user groups will produce the most activity in social networks. This paper presents a new way of identifying and predicting active users in social networks by combining hierarchical graph structures with sequence prediction algorithms to find and predict the user groups producing the most activity in a network. A key component of this new method is to transform the complex structure of social networks into hierarchical tree structures, and then extract the propagation sequences occurring in those networks to reduce structural noise and capture real propagation patterns. Results from experimental evaluation of this method using three different datasets (Reddit, Facebook, and Twitter) show that while many types of user groups exist within each dataset, specific user groups clearly have a larger role in spreading information through these networks. The results of the experiments also demonstrate that the Compressed Prediction Tree (CPT+) algorithm can accurately predict future propagation sequences and achieves predictive accuracy exceeding 99% on structured datasets from Reddit and Facebook. These results indicate that CPT + is very effective in hierarchical and repetitive propagation environments. However, as demonstrated by experiments using Twitter data, the performance of CPT + degrades in large-scale and noisy diffusion environments, where propagation patterns are less likely to be hierarchical or repetitive. Overall, these results support that the proposed framework is well-suited for structured information diffusion and provides an interpretable and scalable solution for analyzing and forecasting the spread of information across social networks.
Get full access to this article
View all access options for this article.
