Abstract
Topic modelling is an important technique for extracting meaningful insights from large volumes of unstructured text data. The paper presents a federated technique for topic modelling that is based on a novel approach of the Latent Dirichlet Allocation (LDA) method for topic model generation. The proposed approach enables a topic model to be developed in a distributed environment using data continually generated from multiple sources without the need for sharing actual data. The first iteration of the topic modelling uses unsupervised LDA at each device generating the data. The results of each device are aggregated at a central server to generate a set of seed words that are used for guided LDA by the subsequent iterations of topic modelling. The proposed work, Federated LDA (F-LDA) has been evaluated using two datasets: a text dataset of dialogues between patients and doctors based on factual conversations and another comprising tweets related to depression. Comparing the performance of F-LDA with that of a centralized LDA, it was observed that F-LDA results in improved coherence score as well as diversity score in comparison to centralized LDA. This indicates that F-LDA achieves better interpretable topics covering a wide range of themes without redundancy.
Get full access to this article
View all access options for this article.
