Unsupervised learning of textual pattern based on Propagation in Bipartite Graph

Abstract

Graph-based algorithms have aroused considerable interests in recent years by facilitating pattern recognition and learning via information propagation process through the graph. Here, we propose an unsupervised learning algorithm based on propagation on bipartite graph, referred to as Propagation in Bipartite Graph (PBG) algorithm. The contributions of this approach are threefold: 1) we present an iterative graph-based algorithm and a straight-forward bipartite representation for textual data, in which vertices represent documents and words, and edges between documents and words represent the occurrences of the words in the documents. Additionally, 2) we show that PBG is more flexible and easier to be adapted for different applications than the mathematical formalism of the generative models, and 3) we present a comprehensive evaluation and comparison of PBG to other topic extraction techniques. Here, we describe the strategy employed in PBG algorithm as a problem of maximization of similarity between latent vectors assigned to vertices and edges and demonstrate that the proposed strategy can be improved by assigning good initial values for the vectors. We notice that PBG can be parallelized by a simple adjustment in the algorithm. We also show that the proposed algorithm is competitive with LDA and NMF in the task of textual collection modelling, returning coherent topics, and in the dimensionality reduction task.

Keywords

Unsupervised learning topic modelling bipartite graph representation dimensionality reduction text mining

Get full access to this article

View all access options for this article.

References

Ahn

Y.-Y.

Bagrow

J.P.

and Lehmann

, Link communities reveal multiscale complexity in networks, Nature 466 (2010), 761–764.

Asuncion

Welling

Smyth

and Teh

Y.W.

, On smoothing and inference for topic models, in: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI ’09, Arlington, Virginia, United States, AUAI Press, 2009, pp. 27–34.

Berton

and Lopes

A.A.

, Graph construction based on labeled instances for semi-supervised learning, Stockholm, Sweden, 2014, 2477–2482.

Bishop

C.M.

, Pattern Recognition and Machine Learning (Information Science and Statistics), Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.

Blei

D.M.

, Introduction to probabilistic topic models, Communications of the ACM, 2011.

Blei

D.M.

and Lafferty

J.D.

, Topic models, in: Text Mining: Classification, Clustering, and Applications, Chapman and Hall/CRC Data Mining and Knowledge Discovery Series, 2009.

Blei

D.M.

A.Y.

and Jordan

M.I.

, Latent dirichlet allocation, J. Mach. Learn. Res. 3 (Mar. 2003), 993–1022.

Boutsidis

and Gallopoulos

, Svd based initialization: A head start for nonnegative matrix factorization, Pattern Recogn 41(4) (Apr. 2008), 1350–1362.

Buntine

, Variational extensions to em and multinomial pca, in: In ECML 2002, Springer-Verlag, 2002, pp. 23–34.

10.

Chang

Boyd-Graber

Wang

Gerris

and Blei

D.M.

, Reading tea leaves: How humans interpret topic models, in: Neural Information Processing Systems, 2009.

11.

de Paulo Faleiros

Berton

and de Andrade Lopes

, Exploring data classification with k-associated network, in: IV International Workshop on Web and Text Intelligence (WTI-2012), 2012.

12.

de Paulo Faleiros

and de Andrade Lopes

, On the equivalence between algorithms for non-negative matrix factorization and latent dirichlet allocation, in: ESANN 2016, 24th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges, Belgium, Proceedings, April 26–29, 2016.

13.

de Paulo Faleiros

Rossi

R.G.

and de Andrade Lopes

, Optimizing the class information divergence for transductive classification of texts using propagation in bipartite graphs, Pattern Recognition Letters 87 (2017), 127–138. Advances in Graph-based Pattern Recognition.

14.

Ding

and Peng

, On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing, Comput. Stat. Data Anal. 52(8) (Apr. 2008), 3913–3927.

15.

Fung

B.C.

Wang

and Ester

, Hierarchical document clustering using frequent itemsets, in: In Proc. Siam International Conference on Data Mining 2003 (SDM 2003), 2003.

16.

Galán

S.F.

and Mengshoel

O.J.

, Neighborhood beautification: Graph layout through message passing, Journal of Visual Languages & Computing 44 (2018), 72–88.

17.

Gaussier

and Goutte

, Relation between plsa and nmf and implications, in: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’05, New York, NY, USA, ACM, 2005, pp. 601–602.

18.

Girolami

and Kabán

, On an equivalence between plsi and lda, in: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, SIGIR ’03, New York, NY, USA, ACM, 2003, pp. 433–434.

19.

Hammouda

K.M.

and Kamel

M.S.

, Incremental document clustering using cluster similarity histograms, in: Proceedings of the 2003 IEEE/WIC International Conference on Web Intelligence, WI ’03, Washington, DC, USA, IEEE Computer Society, 2003, p. 597.

20.

Jordan

M.I.

Ghahramani

Jaakkola

T.S.

and Saul

L.K.

, An introduction to variational methods for graphical models, Mach. Learn. 37(2) (Nov. 1999), 183–233.

21.

Kong

M.K.

and Zhou

Z.-H.

, Transductive multilabel learning via label set propagation, IEEE Transactions on Knowledge and Data Engineering 25(3) (2013), 704–719.

22.

Lau

J.H.

Newman

and Baldwin

, Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality, in: Bouma

and Parmentier

, eds, Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2014, Gothenburg, Sweden, The Association for Computer Linguistics, April 26–30, 2014, pp. 530–539.

23.

Lee

D.D.

and Seung

H.S.

, Learning the parts of objects by non-negative matrix factorization, Nature 401(6755) (Oct. 1999), 788–791.

24.

Lee

D.D.

and Seung

H.S.

, Algorithms for non-negative matrix factorization, in: Leen

T.K.

Dietterich

T.G.

and Tresp

, eds, Advances in Neural Information Processing Systems 13, MIT Press, 2001, pp. 556–562.

25.

Lin

C.-J.

, Projected gradient methods for nonnegative matrix factorization, Neural Comput 19(10) (Oct. 2007), 2756–2779.

26.

MacQueen

J.B.

, Some methods for classification and analysis of multivariate observations, in: Cam

L.M.L.

and Neyman

, eds, Proc. of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Vol. 1, 1967, pp. 281–297.

27.

Masegosa

A.R.

Martínnez

A.M.

Langseth

Nielsen

T.D.

Salmerón

Ramos-López

and Madsen

A.L.

, d-VMP: Distributed variational message passing, in: Antonucci

Corani

and Campos

C.P.

, eds, Proceedings of the Eighth International Conference on Probabilistic Graphical Models, 2016, pp. 321–332.

28.

Moura

M.F.

and Rezende

S.O.

, A simple method for labeling hierarchical document cluster, in: Proceedings for the 10th IASTED – International Conference on Artificial Intelligence and Applications (IAI 2010), Calgary-Zurich, 2010, pp. 363–371.

29.

Muqattash

and Yahdi

, Infinite family of approximations of the digamma function, Mathematical and Computer Modelling 43(11–12) (2006), 1329–1336.

30.

Murphy

K.P.

Weiss

and Jordan

M.I.

, Loopy belief propagation for approximate inference: An empirical study, in: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, UAI’99, San Francisco, CA, USA, Morgan Kaufmann Publishers Inc, 1999, pp. 467–475.

31.

Newman

Lau

J.H.

Grieser

and Baldwin

, Automatic evaluation of topic coherence, in: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT ’10, Stroudsburg, PA, USA, Association for Computational Linguistics, 2010, pp. 100–108.

32.

Park

C.Y.

Laskey

K.B.

Costa

P.C.G.

and Matsumoto

, Message passing for hybrid bayesian networks using gaussian mixture reduction, in: 2015 Tenth International Conference on Digital Information Management (ICDIM), Oct 2015, pp. 210–216.

33.

Rossi

R.G.

Lopes

A.A.

Faleiros

T.P.

and Rezende

S.O.R.

, Inductive model generation for text classification using a bipartite heterogeneous network, Journal of Computer Science and Technology 29(3) (2014), 361–375.

34.

Rossi

R.G.

Marcacini

R.M.

and Rezende

S.O.

, Benchmarking text collections for classification and clustering tasks, Technical Report 395, Institute of Mathematics and Computer Sciences – University of Sao Paulo, 2013.

35.

Steinbach

Karypis

and Kumar

, A comparison of document clustering techniques, in: KDD Workshop on Text Mining, 2000.

36.

Stevens

Kegelmeyer

Andrzejewski

and Buttler

, Exploring topic coherence over many models and many topics, in:Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL ’12, Stroudsburg, PA, USA, Association for Computational Linguistics, 2012, pp. 952–961.

37.

Steyvers

and Griffiths

, Probabilistic Topic Models, Lawrence Erlbaum Associates, 2007.

38.

Suh

Choo

Lee

and Reddy

C.K.

, Local topic discovery via boosted ensemble of nonnegative matrix factorization, in: Proceedings of the 26th International Joint Conference on Artificial Intelligence, IJCAI’17, AAAI Press, 2017, pp. 4944–4948.

39.

Wallach

H.M.

Murray

Salakhutdinov

and Mimno

, Evaluation methods for topic models, in: Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, New York, NY, USA, ACM, 2009, pp. 1105–1112.

40.

Zeng

Cheung

W.K.

and Liu

, Learning topic models by belief propagation, IEEE Trans. Pattern Anal. Mach. Intell. 35(5) (2013), 1121–1134.

41.

Zhang

Yoshida

Tang

and Wang

, Text clustering using frequent itemsets, Know.-Based Syst. 23 (July 2010), 379–388.

42.

Zhou

Bousquet

Lal

T.N.

Weston

and Schölkopf

, Learning with local and global consistency, in: Proceedings of the Advances in Neural Information Processing Systems, Vol. 16, 2004, pp. 321–328.

43.

Zhu

Ghahramani

and Lafferty

, Semi-supervised learning using gaussian fields and harmonic functions, in: Proceedings of the International Conference on Machine Learning, AAAI Press, 2003, pp. 912–919.

44.

Zhu

and Goldberg

A.B.

, Introduction to Semi-Supervised Learning, Morgan and Claypool Publishers, 2009.