Abstract
The rapid expansion of materials science literature demands scalable and intelligent systems for extracting, structuring, and utilizing scientific knowledge. Traditional manual approaches to inorganic materials database construction are labor-intensive and error-prone. In this study, we propose a novel end-to-end framework that leverages instruction-tuned large language models (LLMs) for automated knowledge extraction and discovery in inorganic materials science. By fine-tuning LLMs on domain-specific corpora—including peer-reviewed articles, patents, and chemical databases—the system accurately extracts structured material-property-synthesis relationships from unstructured text. These records are aligned to a schema and stored in a queryable knowledge graph. Furthermore, we demonstrate inverse design by prompting LLMs to generate candidate materials satisfying user-defined targets (e.g., high thermal conductivity). Evaluations on benchmark synthesis corpora show high accuracy in named entity recognition (F1-score > 95%) and low numerical error for temperature/duration extraction. The resulting database supports effective downstream applications such as Curie temperature prediction (RMSE = 23.6 K, R^2 = 0.927). This work showcases how LLMs, when combined with schema-aware reasoning and human-in-the-loop curation, can serve as robust tools for accelerating autonomous materials discovery.
Keywords
Get full access to this article
View all access options for this article.
