Agentic scientific visualization generation and caption semantic alignment

Abstract

In the era of data-intensive scientific discovery, visualization serves as a crucial cognitive tool for researchers, while captions that align with visuals are essential for accurately conveying scientific intent. However, current scientific visualization workflows face significant challenges, including high technical barriers and semantic misalignment among user intent, visual output, and textual descriptions. To address these issues, this paper proposes SciVis-AGE, a visual analytics system based on multi-agent collaboration. Its core methodologies comprise an agent-based task decomposition and operator encapsulation approach for automatic visualization generation, and a multi-agent triangular debate mechanism for semantic alignment and caption optimization. The system effectively reduces the technical burden on domain experts and, through iterative debate among Intent Guardian, Visual Verifier, and Annotation Checker agents, ensures precise alignment of generated images and captions with user intent, visual content, and highlighted features, thereby enhancing the rigor and efficiency of scientific communication.

Keywords

scientific visualization image-caption consistency semantic alignment multi-agent systems

Get full access to this article

View all access options for this article.

References

Liang

Jiao

, et al. Encouraging divergent thinking in large language models through multi-agent debate. In: Proceedings of the 2024 conference on empirical methods in natural language processing, 2024, pp. 17889–17904. https://doi.org/10.18653/v1/2024.emnlp-main.992

Torralba

, et al. Improving factuality and reasoning in language models through multiagent debate. In: Proceedings of the 41st International conference on machine learning, ICML’24, 2024, JMLR.org. https://doi.org/10.5555/3692070.3692537

Chan

Chen

, et al. Chateval: towards better LLM-based evaluators through multi-agent debate. arXiv preprint, arXiv:230807201, 2023.

Bakker

, et al. VizML: a machine learning approach to visualization recommendation. In: Proceedings of the 2019 CHI conference on human factors in computing systems, 2019, pp. 1–12. https://doi.org/10.1145/3290605.3300358

Cui

Zhang

Wang

, et al. Text-to-viz: automatic generation of infographics from proportion-related natural language statements. IEEE Trans Vis Comput Graph 2020; 26(1): 906–916. https://doi.org/10.1109/TVCG.2019.2934785

Zhang

Dong

Wang

, et al. AuraGenome: an LLM-powered framework for on-the-fly reusable and scalable circular genome visualizations. IEEE Comput Graph Appl 2025; 45(5): 78–92. https://doi.org/10.1109/MCG.2025.3581560

Narechania

Srinivasan

Stasko

NL4DV: a toolkit for generating analytic specifications for data visualization from natural language queries. IEEE Trans Vis Comput Graph 2021; 27(2): 369–379. https://doi.org/10.1109/TVCG.2020.3030378

Ahrens

Geveci

Law

Paraview: an end-user tool for large data visualization. Vis Handb 2005, pp. 717–731. https://doi.org/10.1016/b978-012387582-2/50038-1

Schroeder

Avila

Hoffman

. Visualizing with VTK: a tutorial. IEEE Comput Graph Appl 2000; 20(5): 20–27. https://doi.org/10.1109/38.865875

10.

Shen

Luo

, et al. Towards natural language interfaces for data visualization: a survey. IEEE Trans Vis Comput Graph 2023; 29(6): 3121–3144. https://doi.org/10.1109/TVCG.2022.3148007

11.

Ljung

Krüger

Groller

, et al. State of the art in transfer functions for direct volume rendering. Comput Graph Forum 2016; 35: 669–691. https://doi.org/10.1111/cgf.12934

12.

Arens

Domik

A survey of transfer functions suitable for volume rendering. In: Proceedings of the 8th IEEE/EG international conference on volume graphics, 2010, pp. 77–83. https://doi.org/10.2312/VG/VG10/077-083

13.

Tang

Wang

NLI4VolVis: natural language interaction for volume visualization via LLM multi-agents and editable 3D Gaussian splatting. IEEE Trans Vis Comput Graph 2026; 32: 46–56. https://doi.org/10.1109/TVCG.2025.3633888

14.

Haidacher

Patel

Bruckner

, et al. Volume visualization based on statistical transfer-function spaces. In: 2010 IEEE pacific visualization symposium (PacificVis), 2010, pp. 17–24. https://doi.org/10.1109/PACIFICVIS.2010.5429615

15.

Correa

KL.

Visibility histograms and visibility-driven transfer functions. IEEE Trans Vis Comput Graph 2011; 17: 192–204. https://doi.org/10.1109/TVCG.2010.35

16.

Pan

, et al. Differentiable design galleries: a differentiable approach to explore the design space of transfer functions. IEEE Trans Vis Comput Graph 2024; 30: 1369–1379. https://doi.org/10.1109/TVCG.2023.3327371

17.

Mallick

Yildiz

Lenz

, et al. Chatvis: automating scientific visualization with a large language model, 2024. https://doi.org/10.1109/SCW63240.2024.00014.

18.

Liu

Miao

Bremer

PT.

Paraview-mcp: An autonomous visualization agent with direct tool use. In: 2025 IEEE visualization and visual analytics (VIS), 2025, pp. 61–65. IEEE. https://doi.org/10.1109/vis60296.2025.00018.

19.

Liu

Miao

, et al. Ava: towards autonomous visualization agents through visual perception-driven decision-making. Comput Graph Forum 2024; 43: e15093. https://doi.org/10.1111/cgf.15093

20.

Srinivasan

Drucker

Endert

, et al. Augmenting visualizations with interactive data facts to facilitate interpretation and communication. IEEE Trans Vis Comput Graph 2018; 25(1): 672–681. https://doi.org/10.1109/TVCG.2018.2865145

21.

Wang

Sun

Zhang

, et al. Datashot: automatic generation of fact sheets from tabular data. IEEE Trans Vis Comput Graph 2020; 26(1): 895–905. https://doi.org/10.1109/TVCG.2019.2934398

22.

Shi

Sun

, et al. Calliope: automatic visual data story generation from a spreadsheet. IEEE Trans Vis Comput Graph 2021; 27(2): 453–463. https://doi.org/10.1109/TVCG.2020.3030403

23.

Hoque

Islam

MS.

Natural language generation for visualizations: state of the art, challenges and future directions. Comput Graph Forum 2025; 44: e15266. https://doi.org/10.1111/cgf.15266

24.

Poco

Heer

Reverse-engineering visualizations: recovering visual encodings from chart images. Comput Graph Forum 2017; 36: 353–363. https://doi.org/10.1111/cgf.13193

25.

Savva

Kong

Chhajta

, et al. Revision: automated classification, analysis and redesign of chart images. In: Proceedings of the 24th annual ACM symposium on user interface software and technology, 2011, pp. 393–402. https://doi.org/10.1145/2047196.2047247.

26.

Kahou

Michalski

Atkinson

, et al. FigureQA: an annotated figure dataset for visual reasoning. arXiv preprint, arXiv:171007300, 2017.

27.

Kim

Hoque

Agrawala

Answering questions about charts and generating visual explanations. In: Proceedings of the 2020 CHI conference on human factors in computing systems, 2020, pp. 1–13. https://doi.org/10.1145/3313831.3376467.

28.

Lundgard

Satyanarayan

Accessible visualization via natural language descriptions: A four-level model of semantic content. IEEE Trans Vis Comput Graph 2022; 28(1): 1073–1083. https://doi.org/10.1109/TVCG.2021.3114770

29.

Liu

Xie

Han

, et al. Autocaption: an approach to generate natural language description from visualization automatically. In: 2020 IEEE Pacific visualization symposium (PacificVis), 2020, pp. 191–195. IEEE. https://doi.org/10.1109/pacificvis48177.2020.1043

30.

Liu

Guo

Yuan

Autotitle: an interactive title generator for visualizations. IEEE Trans Vis Comput Graph 2024; 30(8): 5276–5288. https://doi.org/10.1109/TVCG.2023.3290241

31.

Liu

Mei

Jiang

, et al. Autolegend: a user feedback-driven adaptive legend generator for visualizations. arXiv preprint, arXiv:240716331 2024.

32.

Hsu

Huang

, et al. Scicapenter: supporting caption composition for scientific figures with machine-generated captions and ratings. In: Extended abstracts of the CHI conference on human factors in computing systems, 2024, pp. 1–9. https://doi.org/10.1145/3613905.3650738

33.

HYS

Hsu

Min

, et al. Understanding writing assistants for scientific figure captions: a thematic analysis. In: Proceedings of the fourth workshop on intelligent and interactive writing assistants (In2Writing 2025), 2025, pp. 1–10. https://doi.org/10.18653/v1/2025.in2writing-1.1

34.

Borkin

Bylinskii

Kim

, et al. Beyond memorability: visualization recognition and recall. IEEE Trans Vis Comput Graph 2016; 22(1). 519–528. https://doi.org/10.1109/TVCG.2015.2467732

35.

Wanzer

Azzam

Jones

, et al. The role of titles in enhancing data visualization. Eval Program Plann 2021; 84: 101896. https://doi.org/10.1016/j.evalprogplan.2020.101896

36.

Kong

Liu

Karahalios

Frames and slants in titles of visualizations on controversial topics. In: Proceedings of the 2018 CHI conference on human factors in computing systems, 2018, pp. 1–12. https://doi.org/10.1145/3173574.3174012

37.

Kong

Liu

Karahalios

Trust and recall of information across varying degrees of title-visualization misalignment. In: Proceedings of the 2019 CHI conference on human factors in computing systems, 2019, pp. 1–13. https://doi.org/10.1145/3290605.3300576.

38.

Sedlmair

Meyer

Munzner

Design study methodology: reflections from the trenches and the stacks. IEEE Trans Vis Comput Graph 2012; 18(12). 2431–2440. https://doi.org/10.1109/TVCG.2012.213

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.15 MB

0.00 MB