Abstract
Sarcasm detection is becoming increasingly important owing to several reasons such as online brand management, online market research, human-machine interaction and gauging cyberbullying. To improve the performance of sarcasm detection on text, a novel model implementing parallel attention heads based on the concept of transformers at multiple granular levels using a Bidirectional Gated Recurring Unit (Bi-GRU) is proposed. Entropy regularization is implemented to overcome attention collapse and distribute attention mitigating its tendency to over-emphasize particular tokens. Early stopping, and learning rate scheduling used during different implementation and training stages optimize performance and resource utilization. The model is tested on four different sets of embeddings namely: Global Vector (GloVe), FastText, Bidirectional Encoder Representations from Transformers(BERT) and a Latent Semantic Analysis (LSA) reduced ensemble embedding obtained by concatenating the aforementioned embeddings against two datasets. It records maximum accuracy of 93.64% and 81.61% for News Headlines and SARC respectively with FastText outperforming other embeddings. Better results with FastText establish the model's ability to learn well from simpler representations compared to more sophisticated context related patterns established by resource-intensive BERT. The attention heatmaps are plotted to illustrate the interpretability of model. To validate model's effectiveness in real-time, robustness analysis on News Headlines dataset is done using three adversarial testing techniques namely synonym replacement, word dropout and character swap yielding accuracy of 86.12%, 84.16 and 83.21% respectively.
Keywords
Get full access to this article
View all access options for this article.
