Evolving voices based on temporal Poisson factorisation

Abstract

The world is evolving and so is the vocabulary used to discuss topics in speech. Analysing political speech data from more than 30 years requires the use of flexible topic models to uncover the latent topics and their change in prevalence over time as well as the change in the vocabulary of the topics. We propose the temporal Poisson factorisation (TPF) model as an extension to the Poisson factorisation model to model sparse count data matrices obtained based on the bag-of-words assumption from text documents with time stamps. We discuss and empirically compare different model specifications for the time-varying latent variables consisting either of a flexible auto-regressive structure of order one or a random walk. Estimation is based on variational inference where we consider a combination of coordinate ascent updates with automatic differentiation using batching of documents. Suitable variational families are proposed to ease inference. We compare results obtained using independent univariate variational distributions for the time-varying latent variables to those obtained with a multivariate variant. We discuss in detail the results of the TPF model when analysing speeches from 18 sessions in the U.S. Senate (1981–2016).

Keywords

Auto-regressive process Poisson factorisation time-varying topic model variational inference

Get full access to this article

View all access options for this article.

References

Bischof

and Airoldi

(2012) Summarising topical content with word frequency and exclusivity. In Proceedings of the 29th International Coference on International Conference on Machine Learning , ICML’12, page 9–16.

Blei

(2012) Probabilistic topic models. Communications of the ACM , 55, 77–84. doi: 10.1145/2133806.2133826.

Blei

and Lafferty

(2006) Dynamic topic models. In Proceedings of the 23rd International Conference on Machine Learning , ICML’06, page 113–20. doi: 10.1145/1143844.1143859.

Blei

, Ng

and Jordan

(2003) Latent Dirichlet allocation. Journal of Machine Learning Research , 3, 993–1022.

Blei

, Kucukelbir

and McAuliffe

(2017) Variational inference: A review for statisticians. Journal of the American Statistical Association , 112, 859–77. doi: 10.1080/01621459.2017.1285773.

Bullinaria

and Levy

(2007) Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods , 39, 510–26. doi: 10.3758/BF03193020.

Charlin

, Ranganath

, McInerney

and Blei

(2015) Dynamic Poisson factorisation. In Proceedings of the 9th ACM Conference on Recommender Systems , RecSys ’15. doi: 10.1145/2792838.2800174.

Gentzkow

, Kelly

and Taddy

(2019) Text as data. Journal of Economic Literature , 57, 535–74. doi: 10.1257/jel.20181020.

Gentzkow

, Shapiro

and Taddy

(2018) Congressional record for the 43rd–114th Congresses: Parsed speeches and phrase counts. https://data.stanford.edu/congresstext. Stanford Libraries [distributor], 2018-01-16.

10.

Glynn

, Tokdar

, Banks

and Howard

(2019) Bayesian analysis of dynamic linear topic models. Bayesian Analysis , 14, 53–80. doi: 10.1214/18-BA1100.

11.

Gopalan

, Hofman

and Blei

(2015) Scalable recommendation with hierarchical Poisson factorisation. In Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence , pages 326–35.

12.

Gopalan

, Charlin

and Blei

(2014) Content-based recommendations with Poisson factorisation. In Advances in Neural Information Processing Systems , volume 27.

13.

Hofmarcher

, Vávra

, Adhikari

and Grün

(2025) Revisiting group differences in high-dimensional choices: Method and application to cngressional speech. Journal of Applied Econometrics , 40, 577–88 doi:10.1002/jae.3125.

14.

Hosseini

, Alizadeh

, Khodadadi

, Arabzadeh

, Farajtabar

, Zha

and Rabiee

(2017) Recurrent Poisson factorisation for temporal recommendation. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , KDD ’17, page 847–55. doi: 10.1145/3097983.3098197.

15.

, Olea

JLM

and Nesbit

(2024) Robust machine learning algorithms for text analysis. Quantitative Economics , 15, 939–70. doi: 10.3982/QE1825.

16.

Kingma

and Ba

(2015) Adam: A method for stochastic optimisation. In Proceedings of International Conference on Learning Representations .

17.

Kucukelbir

, Tran

, Ranganath

, Gelman

and Blei

(2017) Automatic differentiation variational inference. Journal of Machine Learning Research , 18, 1–45.

18.

McGrory

and Titterington

(2007) Variational approximations in Bayesian model selection for finite mixture distributions. Computational Statistics & Data Analysis , 51, 5352–67. doi: 10.1016/j.csda.2006.07.020.

19.

Roberts

, Stewart

and Airoldi

(2016) A model of text for experimentation in the social sciences. Journal of the American Statistical Association , 111, 988–1003. doi: 10.1080/01621459.2016.1141684.

20.

Spiegelhalter

, Best

, Carlin

and Van Der Linde

(2002) Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 64, 583–639. doi: 10.1111/1467-9868.00353.

21.

Vafa

, Naidu

and Blei

(2020) Text-based ideal points. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages 5345–57.

22.

Vávra

, BH-K

Prostmaier

, Grün

and Hofmarcher

(2024) A structural text-based scaling model for analysing political discourse . URL https://arxiv.org/abs/2410.11897

23.

Wang

, Blei

and Heckerman

(2008) Continuous time dynamic topic models. In Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence , UAI’08, page 579–86.

24.

You

, Ormerod

and Müller

(2014) On variational Bayes estimation and variational information criteria for linear regression models. Australian & New Zealand Journal of Statistics , 56, 73–87. doi:10.1111/anzs.12063.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

1.11 MB