SmarTxT: A Natural Language Processing Approach for Efficient Vehicle Defect Investigation

Abstract

The investigation of vehicle defects, which is generally led by the National Highway Traffic Safety Administration (NHTSA) in the U.S., is critical to the continued trust of the general public in the safety of vehicles. NHTSA routinely receives millions of reports of potential defects, complaints, recalls, and manufacturer communications, which may provide evidence of a new vehicle defect. However, the large quantity and text-based communication make efficiently identifying defect trends difficult for analysts. To accelerate the investigation of defect reports, we introduce a natural language processing (NLP) application that identifies key topics and similar defect reports to assist analysts and investigators. Further, our application is built to provide users with a web interface for interacting with the NLP models. The integration of NLP with current NHTSA datasets provides a method for quickly identifying defect trends in large text-based datasets. To demonstrate the effectiveness of our method, we apply our approach to two publicly available NHTSA datasets, namely the Technical Service Bulletins and Recalls Dataset.

Keywords

data and data science machine learning (artificial intelligence)unsupervised learning safety

Get full access to this article

View all access options for this article.

References

Transportation Research Board Artificial Intelligence Committee. A Primer on Machine Learning for Transportation. Presented at Annual Meeting of the Transportation Research Board, Washington, D.C., 2020.

Kreidieh

Parvate

Vinitsky

Bayen

A. M.

Flow: Architecture and Benchmarking for Reinforcement Learning in Traffic Control. arXiv Preprint arXiv:1710.05465, 2017.

Booth

Di Eugenio

Cruz

I. F.

Wolfson

Robust Natural Language Processing for Urban Trip Planning. Applied Artificial Intelligence, Vol. 29, No. 9, 2015, pp. 859–903.

Barua

Zou

Zhou

Machine Learning for International Freight Transportation Management: A Comprehensive Review. Research in Transportation Business & Management, Vol. 34, 2020, pp. 1004–1053.

Sligar

A. P.

Machine Learning-Based Radar Perception for Autonomous Vehicles Using Full Physics Simulation. IEEE Access, Vol. 8, 2020, pp. 51470–51476.

Ata

Khan

M. A.

Abbas

Khan

M. S.

Ahmad

Adaptive IoT Empowered Smart Road Traffic Congestion Control System Using Supervised Machine Learning Algorithm. The Computer Journal, Vol. 64, No. 11, 2021, pp. 1672–1679.

Yang

Wang

Dong

Wang

Tang

Using Deep Learning to Detect Defects in Manufacturing: A Comprehensive Survey and Current Challenges. Materials, Vol. 13, 2020, p. 5755.

Liqun

Jiansheng

Dingjin

Research on Vehicle Parts Defect Detection Based on Deep Learning. Journal of Physics: Conference Series, Vol. 1437, No. 1, 2020, p. 012004.

Nadkarni

P. M.

Ohno-Machado

Chapman

W. W.

Natural Language Processing: An Introduction. Journal of the American Medical Informatics Association, Vol. 18, No. 5, 2011, pp. 544–551.

10.

NHTSA. Risk-Based Processes for Safety Defect Analysis and Management of Recalls. NHTSA, Washington, D.C., 2020.

11.

NHTSA. General Standard Operating Procedures (SOP). NHTSA, Washington, D.C., 2017.

12.

Russell

Norvig

(eds.). Chapter 23: Natural Language Processing. In Artificial Intelligence: A Modern Approach, 4th ed, Pearson Education, London, UK, 2020, pp. 823–855.

13.

Mikolov

Chen

Corrado

Dean

Efficient Estimation of Word Representations in Vector Space. arXiv Preprint arXiv:1301.3781, 2013.

14.

Mikolov

Distributed Representations of Sentences and Documents. Proc., International Conference on Machine Learning, Beijing, China, 2014.

15.

Ramos

Using TF-IDF to Determine Word Relevance in Document Queries. Proc., 1st Instructional Conference on Machine Learning, Vol. 242, 2003, pp. 133–142.

16.

Blei

D. M.

A. Y.

Jordan

M. I.

Latent Dirichlet Allocation. Journal of Machine Learning Research, Vol. 3, 2003, p. 993–1022.

17.

Wolf

Debut

Sanh

Chaumond

Delangue

Moi

Cistac

, et al. Huggingface’s Transformers: State-of-the-Art Natural Language Processing. arXiv Preprint arXiv:1910.03771, 2019.

18.

Devlin

Chang

M.-W.

Lee

Toutanova

Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv Preprint arXiv:1810.04805, 2018.

19.

Peters

M. E.

Neumann

Iyyer

Gardner

Clark

Lee

Zettlemoyer

Deep Contextualized Word Representations. arXiv Preprint arXiv:1802.05365, 2018.

20.

Brown

T. B.

Mann

Ryder

Subbiah

Kaplan

Dhariwal

Neelakantan

, et al. Language Models are Few-Shot Learners. arXiv Preprint arXiv:2005.14165, 2020.

21.

Houlsby

Giurgiu

Jastrzebski

Morrone

De Laroussilhe

Gesmundo

Attariyan

Gelly

Parameter-Efficient Transfer Learning for NLP. Proc., International Conference on Machine Learning, Long Beach, CA, 2019.

22.

Lee

Yoon

Kim

C. H.

Kang

BioBERT: A Pre-Trained Biomedical Language Representation Model for Biomedical Text Mining. Bioinformatics, Vol. 36, No. 4, 2020, pp. 1234–1240.

23.

Chalkidis

Fergadiotis

Malakasiotis

Aletras

Androutsopoulos

LEGAL-BERT: The Muppets Straight out of Law School. arXiv Preprint arXiv:2010.02559, 2020.

24.

Gurcan

Cagiltay

N. E.

Big Data Software Engineering: Analysis of Knowledge Domains and Skill Sets Using LDA-Based Topic Modeling. IEEE Access, Vol. 7, 2019, pp. 82541–82552.

25.

Jelodar

Wang

Yuan

Feng

Jiang

Zhao

Latent Dirichlet Allocation (LDA) and Topic Modeling: Models, Applications, a Survey. Multimedia Tools and Applications, Vol. 78, No. 11, 2019, pp. 15169–15211.

26.

Liu

Tang

Dong

Yao

Zhou

An Overview of Topic Modeling and its Current Applications in Bioinformatics. SpringerPlus, Vol. 5, No. 1, 2016, pp. 1–22.

27.

Alambeigi

McDonald

A. D.

Tankasala

S. R.

Crash Themes in Automated Vehicles: A Topic Modeling Analysis of the California Department of Motor Vehicles Automated Vehicle Crash Database. arXiv Preprint arXiv:2001.11087, 2020.

28.

Haveliwala

T. H.

Gionis

Klein

Indyk

Evaluating Strategies for Similarity Search on the Web. Proc., 11th International Conference on World Wide Web, Honolulu, HI, 2002.

29.

Akhbardeh

Desell

Zampieri

NLP Tools for Predictive Maintenance Records in MaintNet. Proc., 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: System Demonstrations, Suzhou, China, 2020.

30.

McEwan

Melton

G. B.

Knoll

B. C.

Wang

Hultman

Dale

J. L.

Meyer

Pakhomov

S. V.

NLP-PIER: A Scalable Natural Language Processing, Indexing, and Searching Architecture for Clinical Notes. AMIA Summits on Translational Science Proceedings, Vol. 2016, 2016, p. 150.

31.

Kannan

Gurusamy

Vijayarani

Ilamathi

Nithya

Kannan

Gurusamy

Preprocessing Techniques for Text Mining. International Journal of Computer Science & Communication Networks, Vol. 5, No. 1, 2014, pp. 7–16.

32.

Bird

Loper

Ewan

Natural Language Processing With Python. O’Reilly Media Inc., Sebastopol, CA, 2009.

33.

Griffiths

T. L.

Steyvers

Finding Scientific Topics. Proceedings of the National Academy of Sciences, Vol. 101, 2004, pp. 5228–5235.

34.

Oghbaie

Zanjireh

M. M.

Pairwise Document Similarity Measure Based on Present Term Set. Journal of Big Data, Vol. 5, 2018, pp. 1–23.

35.

Maslej-Krešňáková

Sarnovskỳ

Butka

Machová

Comparison of Deep Learning Models and Various Text Pre-Processing Techniques for the Toxic Comments Classification. Applied Sciences, Vol. 10, 2020, p. 8631.

36.

Vaswani

Shazeer

Parmar

Uszkoreit

Jones

Gomez

A. N.

Kaiser

Ł.

Polosukhin

Attention is all you Need. In Advances in Neural Information Processing Systems ( Guyon

Von Luxburg

Bengio

Wallach

Fergus

Vishwanathan

Garnett

, eds.), 2017.

37.

Clark

Khandelwal

Levy

Manning

C. D.

What Does BERT Look At? An Analysis of BERT’s Attention. arXiv Preprint arXiv:1906.04341, 2019.

38.

Peng

Yan

Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets. arXiv Preprint arXiv:1906.05474, 2019.

39.

Mozafari

Farahbakhsh

Crespi

A BERT-Based Transfer Learning Approach for Hate Speech Detection in Online Social Media. In Proc., Complex Networks 2019: International Conference on Complex Networks and Their Applications ( Cherifi

Gaito

Mendes

Moro

Rocha

, eds.), Lisbon, Portugal, December 10–12, 2019, Springer, Cham, pp. 928–940.

40.

Sun

Xipeng

Yige

Xuanjing

How to Fine-Tune BERT for Text Classification? In Proc., Chinese Computational Linguistics: China National Conference on Chinese Computational Linguistics ( Sun

Huang

Liu

, eds.), Kunming, China, October 18–20, 2019, Springer, Cham, pp. 194–206.

41.

Reimers

Gurevych

Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks. Proc., 2019 Conference on Empirical Methods in Natural Language Processing, Hong Kong, China, 2019.

42.

Pedregosa

Varoquaux

Gramfort

Michel

Thirion

Grisel

Blondel

, et al. Scikit-Learn: Machine Learning in Python. Journal of Machine Learning Research, Vol. 12, 2011, pp. 2825–2830.

43.

Gholizadeh

Savle

Seyeditabari

Zadrozny

Topological Data Analysis in Text Classification: Extracting Features With Additive Information. arXiv Preprint arXiv:2003.13138, 2020.