Trends in Transportation Research

Abstract

Proceedings of journal and conference papers are good sources of big textual data to examine research trends in various branches of science. The contents, usually unstructured in nature, require fast machine-learning algorithms to be deciphered. Exploratory analysis through text mining usually provides the descriptive nature of the contents but lacks quantification of the topics and their correlations. Topic models are algorithms designed to discover the main theme or trend in massive collections of unstructured documents. Through the use of a structural topic model, an extension of latent Dirichlet allocation, this study introduced distinct topic models on the basis of the relative frequencies of the words used in the abstracts of 15,357 TRB compendium papers. With data from 7 years (2008 through 2014) of TRB annual meeting compendium papers, the 20 most dominant topics emerged from a bag of 4 million words. The findings of this study contributed to the understanding of topical trends in the complex and evolving field of transportation engineering research.

Get full access to this article

View all access options for this article.

References

Das

, Sun

, and Dutta

. Text Mining and Topic Modeling of Compendiums of Papers from Transportation Research Board Annual Meetings. Transportation Research Record: Journal of the Transportation Research Board, No. 2552, 2016, pp. 48–56. https://dx.doi.org/10.3141/2552-07.

Blei

Probabilistic Topic Models. Communications of the ACM, Vol. 55, No. 4, 2012, pp. 77–84. https://doi.org/10.1145/2133806.2133826.

Grimmer

, and Stewart

. Text as Data: The Promise and Pitfalls of Automatic Content Analysis. Political Analysis, Vol. 21, No. 3, 2013, pp. 267–297. https://doi.org/10.1093/pan/mps028.

Hofmann

Probabilistic Latent Semantic Indexing. Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, 1999, pp. 50–57. https://doi.org/10.1145/312624.312649.

Blei

, Ng

, and Jordan

. Latent Dirichlet Allocation. Journal of Machine Learning Research, Vol. 3, 2003, pp. 993–1022.

Chang

, Boyd-Graber

, Gerrish

, Wang

, and Blei

D. M.

. Reading Tea Leaves: How Humans Interpret Topic Models. Proceedings of Advances in Neural Information Processing Systems 22 ( Bengio

, ed.), Curran Associates, Inc., Red Hook, N.Y., 2009, pp. 288–296.

Mimno

, Wallach

, Talley

, Leenders

, and McCallum

. Optimizing Semantic Coherence in Topic Models. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Stroudsburg, Pa., 2011, pp. 262–272.

Andrzejewski

, and Zhu

. Latent Dirichlet Allocation with Topic-In-Set Knowledge. Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing, Association for Computational Linguistics, Stroudsburg, Pa., 2009, pp. 43–48.

Andrzejewski

, Zhu

, and Craven

. Incorporating Domain Knowledge into Topic Modeling via Dirichlet Forest Priors. Proceedings of the 26th Annual International Conference on Machine Learning, ACM, New York, 2009, pp. 25–32.

10.

Andrzejewski

, Zhu

, Craven

, and Recht

. Framework for Incorporating General Domain Knowledge into Latent Dirichlet Allocation Using First-Order Logic. Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Vol. 2, AAAI Press, Palo Alto, Calif., 2011, pp. 1171–1177.

11.

Chemudugunta

, Holloway

, Smyth

, and Steyvers

. Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning. Proceedings of the Semantic Web: 7th International Semantic Web Conference ( Sheth

, Staab

, Dean

, Paolucci

, Maynard

, Finin

, and Thirunarayan

, eds.), Springer International, Switzerland, 2008, pp. 229–244.

12.

Chen

, Mukherjee

, Liu

, Hsu

, Castellanos

, and Ghosh

. Exploiting Domain Knowledge in Aspect Extraction. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Stroudsburg, Pa., 2013, pp. 1655–1667.

13.

Chen

, Mukherjee

, Liu

, Hsu

, Castellanos

, and Ghosh

. Leveraging Multi-Domain Prior Knowledge in Topic Models. Proceedings of the 23rd International Joint Conference on Artificial Intelligence, AAAI Press, Palo Alto, Calif., 2013, pp. 2071–2077.

14.

Doshi-Velez

, Wallace

, and Adams

. Graph-Sparse LDA: Topic Model with Structured Sparsity. Proceedings of the 29th AAAI Conference on Artificial Intelligence, 2015, AAAI Press, Palo Alto, Calif., pp. 2575–2581.

15.

Yao

, Zhang

, Wei

, Qian

, and Wang

. Incorporating Probabilistic Knowledge into Topic Models. Proceedings of the 19th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer International, Switzerland, 2015, pp. 586–597.

16.

Blei

, and Lafferty

. Dynamic Topic Models. Proceedings of the 23rd International Conference on Machine Learning, ACM, New York, 2006, pp. 113–120. https://doi.org/10.1145/1143844.1143859.

17.

Kalyanam

, Mantrach

, Saez-Trumper

, Vahabi

, and Lanckriet

. Leveraging Social Context for Modeling Topic Evolution. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, 2015, pp. 517–526. https://doi.org/10.1145/2783258.2783319.

18.

Wang

, and McCallum

. Topics over Time: Non-Markov Continuous-Time Model of Topical Trends. Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, 2006, pp. 424–433. https://doi.org/10.1145/1150402.1150450.

19.

Wei

, Sun

, and Wang

. Dynamic Mixture Models for Multiple Time-Series. Proceedings of the 20th International Joint Conference on Artificial Intelligence, Morgan Kaufmann Publishers, Inc., San Francisco, Calif., 2007, pp. 2909–2914.

20.

Yan

, Guo

, Lan

, Xu

, and Cheng

. Probabilistic Model for Bursty Topic Discovery in Microblogs. Proceedings of the 29th AAAI Conference on Artificial Intelligence, AAAI Press, Palo Alto, Calif., 2015, pp. 353–359.

21.

Eisenstein

, Ahmed

, and Xing

. Sparse Additive Generative Models of Text. Proceedings of the 29th International Conference on Machine Learning, ICML, Bellevue, Wash., 2011, pp. 1041–1048.

22.

Rosen-Zvi

, Griffiths

, Steyvers

, and Smyth

. The Author-Topic Model for Authors and Documents. Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, AUAI, Arlington, Va., 2004, pp. 422–429.

23.

Ahmed

, and Xing

E. P.

. Staying Informed: Supervised and Semi-Supervised Multi-View Topical Analysis. Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Stroudsburg, Pa., 2010, pp. 1140–1150.

24.

Eisenstein

, O’Connor

, Smith

, and Xing

. Latent Variable Model for Geographic Lexical Variation. Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Stroudsburg, Pa., 2010, pp. 1277–1287.

25.

Blei

, and McAuliffe

. Supervised Topic Models. Proceedings of Advances in Neural Information Processing Systems 20 ( Platt

J. C.

, Koller

, Singer

, and Roweis

S. T.

, eds.), Neural Information Processing Systems Foundation, Inc., La Jolla, Calif., 2007.

26.

Ramage

, Manning

, and Dumais

. Partially Labeled Topic Models for Interpretable Text Mining. Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, 2011, pp. 457–465. https://doi.org/10.1145/2020408.2020481.

27.

Paul

, and Dredze

. Factorial LDA: Sparse Multi-Dimensional Text Models. Proceedings of Advances in Neural Information Processing Systems 25 ( Pereira

, Burges

C. J. C.

, Bottou

, and Weinberger

K. Q.

, eds.), Neural Information Processing Systems Foundation, Inc., La Jolla, Calif., 2012, pp. 2591–2599.

28.

Yao

, Zhang

, Wei

, Li

, Wu

, Zhang

, and Bian

. Concept over Time: The Combination of Probabilistic Topic Model with Wikipedia Knowledge. In Expert Systems with Applications, Vol. 60, Elsevier B.V., Amsterdam, Netherlands, 2016, pp. 27–38. https://doi.org/10.1016/j.eswa.2016.04.014.

29.

Bibliography on Topic Modeling: Research Papers and Abstracts. http://subasish.github.io/pages/TRB2016/topicm.html. Accessed July 2016.

30.

Blei

, and Lafferty

. Correlated Topic Model of Science. Annals of Applied Statistics, Vol. 1, No. 1, 2007, pp. 17–35.

31.

Mimno

, and McCallum

. Topic Models Conditioned on Arbitrary Features with Dirichlet–Multinomial Regression. In Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence, AUAI Press, Arlington, Va., 2008, pp. 411–418.

32.

Roberts

, Stewart

, Tingley

, and Airoldi

. Structural Topic Model and Applied Social Science. Presented at Advances in Neural Information Processing Systems Workshop on Topic Models: Computation, Application, and Evaluation, 2013.

33.

Feinerer

, and Hornik

. Tm: Text Mining Package. R Package Version 0.6-2, 2015. http://CRAN.R-project.org/package=tm. Accessed July 2016.

34.

Roberts

, Stewart

, and Tingley

. Stm: R Package for Structural Topic Models. 2016. http://www.structuraltopicmodel.com. Accessed July 2016.