Sage Journals: Discover world-class research

Abstract

As Artificial Intelligence (AI) systems and data-driven tools become integral to governmental decision-making, the ability to interpret and reason with visual information emerges as a critical competence for operating effectively in AI-mediated analytical environments. However, empirical evidence on the level of data visualization literacy within public administrations remains limited. To address this gap, the study provides a large-scale, diagnostic, and descriptive analysis of Data Visualization Literacy (DVL) performance in a real public organizational setting, using a standardized assessment instrument. A cross-sectional survey of 1,219 public employees was conducted using a bilingual Spanish–Valencian adaptation of the Mini-VLAT (12 items; 25 seconds per item), evaluating participants’ capacity to interpret, analyze, and reason with graphical representations of data. Mean performance reached 57.8% correct, with 27.1% omissions and 15.1% errors. Tasks involving proportional or relational reasoning—particularly stacked charts—produced the lowest accuracy and the highest nonresponse. Performance patterns were consistent: accuracy declined with age, improved with higher educational attainment, and varied across departments. Omissions under time pressure, rather than misinterpretation, were the predominant source of error. The findings underscore the importance of treating DVL as part of the institutional infrastructure, through periodic diagnostics, shared graphic-interoperability standards, targeted domain training under time constraints, and longitudinal monitoring to preserve epistemic control while harnessing AI’s speed and scale.

Keywords

data-driven decision making data literacy human–AI interaction mini-VLAT public sector innovation data comprehension

1. Introduction

In an information ecosystem increasingly mediated by data, the ability to correctly interpret visualizations has become indispensable for civic engagement, evidence-based decision-making, and participation in public debates. This competence, known as Data Visualization Literacy (DVL), has been defined as “the ability and skill to read and interpret visually represented data and extract information from them”.¹ Similarly, it can be understood as the capacity to skillfully use data visualizations to formulate visual queries based on analytical questions and, in turn, interpret graphic patterns as meaningful expressions of the represented phenomenon.² As visualizations become ubiquitous tools across media, science, business, education, and social networks, their accurate interpretation is no longer an optional technical skill but an essential form of literacy for the twenty-first century.

The proliferation of data visualization technologies has highlighted that many individuals still lack the skills needed to decode even basic representations and that this difficulty increases drastically with complex or interactive visualizations. These limitations not only reduce the effectiveness of data communication itself but also heighten vulnerability to visual manipulation, cognitive biases, and misinformation.³ This challenge is intensified by the increasingly common contexts of visual misinformation and information overload—phenomena that have been described as part of a contemporary infodemic—in which literacy in graphical data becomes a crucial competence for distinguishing between legitimate and misleading representations.⁴

Moreover, DVL cannot be detached from recent sociotechnical changes. The widespread development and accessibility of interactive technologies (from visualizations embedded in digital media to personalized dashboards in professional contexts) have expanded both the reach and scope of visual data representations. However, as explored by Costello,⁵ this expansion has not necessarily been accompanied by an adequate understanding of the principles underlying visual interpretation, often giving rise to what could be described as an illusion of graphical comprehension.

Over the past decade, several formal instruments have been developed to assess DVL, such as the Visualization Literacy Assessment Test (VLAT),¹ the Critical Thinking Assessment for Literacy in Visualizations (CALVI)⁶—including their adaptive versions⁷—and the abbreviated Mini-VLAT.⁸ Nevertheless, significant gaps remain. Creamer et al.⁹ highlight that current definitions of DVL are conceptually fragmented, that existing assessments tend to focus on high-level tasks without decomposing the underlying cognitive processes, and that population diversity in related studies remains limited. Specifically, although DVL has been extensively studied among university students,^10–12 knowledge of data visualization skills in public and private organizations and among individual professionals remains limited, despite its importance for artificial intelligence (AI) adoption. This paper contributes to addressing this gap by presenting a large-scale, diagnostic, and descriptive analysis of DVL performance in a real, public organizational setting. The lack of empirical evidence on DVL levels in public administrations and its relevance for AI adoption motivates the diagnostic character of this study.

The deployment of AI systems in organizational environments, both public and private, has further accelerated the integration of visual interfaces into analytical and decision-making processes. Examples include automatically generated charts from AI-based analytics systems, predictive visualizations, and explainability dashboards.¹³ In this sense, DVL represents the cognitive link between algorithmic models and human decisions. Assessing this competence is key to evaluating the cognitive readiness of public institutions for AI adoption and to guiding training and visual design strategies that strengthen explainability and trust in intelligent governmental systems.

Within this framework, the present study analyzes the level of DVL competence in a sample of 1,219 public employees from a Spanish regional administration, using a bilingual (Spanish–Valencian) and context-adapted version of the Mini-VLAT (12 items; 25 seconds per item). The resulting diagnosis provides a measure of the current cognitive capacity of public sector staff to interpret visual outputs derived from AI systems increasingly embedded in institutional decision-making processes, and it supports action lines aimed at fostering a more effective, transparent, and accountable public administration. By examining how public-sector employees interpret and reason with visual data, the study provides empirical evidence on current DVL levels, situating its findings within the broader context of the growing adoption of analytical and AI-based systems, without assuming or modeling causal relationships between DVL and AI adoption.

The remainder of the article is structured as follows. Section 2 develops the theoretical framework and reviews instruments for assessing DVL, with attention to AI-mediated visualization. Section 3 details the study design, the bilingual adaptation of the Mini-VLAT, the digital administration, variables, and statistical procedures. Section 4 reports the results, overall performance and stratified analyses, including Tukey HSD tests. Section 5 discusses implications for AI adoption in public administration and offers concluding remarks.

2. Theoretical framework

DVL has emerged as an interdisciplinary construct situated at the intersection of information visualization, education, cognitive psychology, and corporate analytics. Although conceptually rooted in broader discussions of information and statistical literacy, its specific formulation responds to the growing need to interpret visual data within contemporary digital contexts.

2.1. From visual literacy to data visualization literacy

Visual literacy has traditionally been defined as the ability to interpret, evaluate, and produce visual messages,¹⁴ grounded in semiotics and a general understanding of visual language.¹⁵ This framework has predominated in fields such as art education, visual communication, and media studies. However, it has proven insufficient to address the cognitive and technical challenges posed by data visualizations, prompting the development of more specialized models.

DVL differs from visual literacy in at least two essential aspects. First, it focuses specifically on data visualizations; second, it requires not only visual interpretation but also an understanding of statistical structures, graphical encodings, and informational contexts. In this sense, Boy et al.² introduce and define DVL as a distinct competence from visual literacy. While visual literacy concerns the general interpretation of images or symbols, DVL emphasizes the graphical encoding of structured data and its use in analytical reasoning. Moreover, from a complementary perspective, Börner et al.¹⁶ propose that this literacy involves both the interpretation and production of visualizations, in a relationship analogous to reading and writing. This dual competence highlights the expressive as well as the interpretative dimensions of visualization literacy.

Extending beyond individual skills, Hedayati et al.¹⁷ suggest a three-dimensional structure encompassing specific cognitive abilities, inferential reading and analysis processes, and situate practices of visualization use in real contexts. This approach broadens functionalist conceptions of the construct. Similarly, Firat¹⁸ recommends a hierarchical model ranging from basic perceptual decoding to higher levels of critical inference, consistent with perspectives such as those of Locoro et al.¹⁹ These authors argue that visualization literacy cannot be reduced to a set of isolated technical skills; it must also encompass advanced reasoning models, contextual applications, cognitive structures, and developmental trajectories.

The distinction between DVL and related constructs such as data literacy or information literacy also warrants clarification. While these literacies partially overlap in practices and objectives, DVL specifically focuses on graphical encoding as a transversal competence. It intersects with statistical and digital literacy yet retains a distinct epistemology and pedagogy. As Raffaghelli²⁰ notes, it should be accompanied by specific educational frameworks that promote graphical reading and its critical and ethical comprehension.

Recently, the construct of visual encoding ability has also emerged, defined as the capacity to select optimal visual encodings when designing representations, thus extending beyond interpretation alone. Ge et al.²¹ introduce AVEC, the first psychometrically validated test assessing this specific skill. Integrating this concept completes the “reading–building” continuum required by contemporary educational frameworks.

2.2. Assessing data visualization literacy

Assessing DVL entails designing instruments or methods capable of reliably measuring the extent to which individuals can interpret, use, and, in some cases, critically evaluate data visualizations.

One of the pioneering tools in this domain is the Graph Literacy Scale by Galesic and García-Retamero,²² designed to measure comprehension of statistical graphs in health-related contexts. Although psychometrically robust, its functional focus limits it to simple data extraction tasks, without capturing inferential dimensions. As a complementary tool, the Subjective Graph Literacy Scale (SGL) by García-Retamero et al.²³ is a brief five-item self-report completed in under one minute. It exhibits high reliability, convergent validity with the objective scale, and predictive accuracy in risk interpretation tasks, while reducing respondent anxiety. It is considered complementary rather than independent.

A seminal contribution is offered by Boy et al.,² who present a set of visualization literacy tasks using standard chart types (lines, bars, scatterplots) and establish methodological foundations for their validation through Item Response Theory (IRT). Their work introduces a key distinction between perceptual efficiency in design and genuine comprehension by users, showing that apparently clear visualizations may induce only superficial understanding. Factors such as stimulus complexity, congruence between question and chart, and distractor presence are also incorporated, and their significant impact on performance acknowledged.

This approach inaugurates a more rigorous, sensitive, and adaptive tradition of assessment, subsequently extended by Lee et al.,¹ Ge et al.,⁶ and Cui et al.⁷ While Boy et al.’s² proposal is conceptually seminal, it should not be regarded as a standardized test per se, as it is an experimentally grounded preliminary framework with limited items and partial psychometric validation.

A major step toward standardization is the development of the Visualization Literacy Assessment Test (VLAT).¹ VLAT comprises 53 items derived from a cross-matrix of 12 chart types and 8 cognitive tasks, validated using three-parameter logistic IRT models. It remains the first psychometrically validated instrument to assess visualization competence across diverse contexts.

The 21-item preliminary scale developed by Locoro et al.¹⁹ offers a complementary formalization of visual information literacy, grounded in theoretical review and cognitive interviews. Their study employs Rasch analysis to examine construct dimensionality and psychometric properties, focusing on structural validity. Although not a fully standardized or widely applied instrument, it contributes to theoretical consolidation and the precise definition of evaluable dimensions.

The Mini-VLAT, introduced by Pandey and Ottley,⁸ is a 12-item short form of VLAT, that, demonstrating highly correlation with the original test, shows acceptable reliability (ω = 0.72). Despite covering fewer chart types and cognitive operations, it has been widely used in large-scale studies.

CALVI (Critical Thinking Assessment for Literacy in Visualizations), designed by Ge et al.,⁶ is an innovative approach that focuses on users’ analytical ability to identify misleading visualizations through subtle manipulations of scales, proportions, encodings, or labels. This orientation toward critical literacy aligns with growing concerns regarding visual misinformation.

Building on CALVI and VLAT, Cui et al.⁷ introduce adaptive computerized versions (A-VLAT and A-CALVI) using algorithms that dynamically adjust item difficulty and reduce test length without compromising reliability—representing a notable methodological advance combining psychometric precision with operational efficiency. In contrast, MAVIL offers a multidimensional evaluation framework integrating factors such as aesthetic perception, familiarity with graphical components, visual criticality, and contextual numeracy—offering a holistic approach, though still empirically limited in application.²⁴

A more experimental line combines traditional assessment with neurophysiological techniques. Yim et al.²⁵ employ the VLAT within an electroencephalography (EEG) study to estimate participants’ cognitive load during visualization tasks. Their convolutional neural network model reveals that physiological metrics capture cognitive effort not reflected by classical indices such as item difficulty or response time. However, as it relies on existing VLAT items, this approach does not constitute an independent test.

From a different perspective, Rodrigues et al.²⁶ assess not participants’ answers but the questions they formulate when interacting with visualizations. This inductive strategy provides insights into underlying cognitive frameworks and reasoning processes.

Despite these advances, several limitations persist. Samples often remain homogeneous (frequently recruited via platforms such as MTurk or Prolific) and connections between assessment and pedagogy are weak. Moreover, Cabouat et al.²⁷ question the assumed neutrality of the graphics used in testing, arguing that legibility and perceptual complexity may influence results. Consequently, graphical design itself should be treated as an evaluative variable, not merely a neutral medium.

2.3. Visual production in the era of generative artificial intelligence

The advent of Generative Artificial Intelligence (GenAI) has structurally transformed the creation, communication, and comprehension of visual data. In visualization, this represents an evolution from rule-based algorithmic approaches toward models capable of generating, adapting, and evaluating visualizations from data and linguistic descriptions, learning from examples rather than explicit encoding principles. This shift redefines visualization production from manual programming of charts to automated, context-aware generation mediated by large language–vision models (LLMs and VLMs).

Technically, AI-assisted visual production operates through four main stages²⁸:

(1) data enhancement, where generative methods complete, synthesize, or disaggregate information;

(2) automatic visual mapping, in which models produce graphical specifications from natural-language queries;

(3) stylization, adapting visual appearance to aesthetic or communicative criteria; and

(4) interaction, introducing conversational and visual interfaces that dynamically adjust or explain visualizations.

These stages mark the transition from visualization as a static product to visualization as an adaptive, multimodal process in which users interact with machines through both natural language and graphical elements.

Recent advances in VLMs (such as GPT-4o, Gemini, and ChartGemma) enable not only code generation but also interpretation and reasoning of charts. Dong and Crisan²⁹ demonstrate that these models perform chart question-answering while exhibiting spatial and semantic reasoning, approaching human visual literacy. GenAI thus internalizes cognitive rules of visual interpretation, expanding beyond mere production capabilities.

Experimental tools such as ChartGPT and VisEval illustrate the ability of LLMs to transform natural-language descriptions into executable visualizations, achieving higher accuracy than traditional NL2VIS (Natural Language to Visualization) methods.^30,31 However, these systems still face challenges related to attribute ambiguity, syntactic errors in generated specifications, and issues of legibility and visual coherence. Automation does not inherently guarantee perceptual quality or communicative adequacy.

Epistemologically, generative visual production blurs the boundary between reading and authorship. As Li³² argues, designers and analysts are shifting from technical executors to curators of the dialogue between model and data, where creativity lies in formulating precise prompts, evaluating outputs, and adjusting styles. Consequently, visual competence evolves into an augmented literacy, combining traditional graphic reading skills with the ability to guide generative models, verify their validity, and contextualize their outputs. DVL thus becomes both interpretive and metacognitive; the ability to read and to instruct a generative intelligence.

Recent studies confirm that integrating visualization and GenAI improves decision-making processes. Neri et al.³³ show that AI-assisted decision environments incorporating generative dashboards or conversational visual agents enhance comprehension and reduce cognitive load, particularly when adapted to users’ cognitive styles and supported by explanatory functions. In public administration, AI-generated visualizations may increasingly act as cognitive mediators, translating algorithmic reasoning into representations understandable to decision-makers and analysts, thereby strengthening trust in automated management and decision processes. Nevertheless, research on human–AI collaboration in visual narrative reveals that users favor balanced co-production over full automation, valuing interpretive control and transparency in generative processes.³⁴

Finally, the rise of generative visual production poses significant ethical and epistemological challenges. Scholars emphasize the urgent need for regulatory frameworks ensuring transparency, traceability, and fairness in AI-generated visualizations, particularly when used in governance or accountability contexts.³⁵

3. Methodology

3.1. Design and participants

A cross-sectional observational study was conducted using an online survey administered to employees of the Valencian regional government (Generalitat Valenciana), with a total of 1,219 responses collected from all regional departments (conselleries). Participation was voluntary and anonymous, and informed consent was presented on the first screen. Data were processed in compliance with Regulation (EU) 2016/679 (General Data Protection Regulation; GDPR) and national data-protection law, and no personal identifiers were recorded. The questionnaire and procedures were reviewed by an ethics committee to ensure scientific integrity and data confidentiality. A favorable report was obtained from the Human Research Ethics Committee of our University.

Collected variables included year of birth, gender, highest educational attainment, civil service level, employing department (conselleria), self-reported familiarity with visualizations, and declared color-vision deficiency. All items allowed a “prefer not to answer” option. Stratifications for analysis were derived from age, educational attainment, civil service level, and department.

3.2. Instrument

3.2.1. Conceptual basis

A bilingual (Spanish–Valencian) adaptation of the Mini-VLAT, the abbreviated and psychometrically validated version of the Visualization Literacy Assessment Test (VLAT), was implemented. The instrument comprises 12 multiple-choice items (one correct answer plus an “I don’t know” option), assessing the ability to read and interpret canonical visualizations and perform basic graphical inferences.

3.2.2. Adaptation process

Items were forward-translated into Spanish and Valencian, ensuring semantic and functional equivalence. Culture-specific references were replaced with neutral, local alternatives (e.g., a map of Spain by province instead of the US by state).

3.3. Digital administration environment

The questionnaire was implemented in LimeSurvey (institutional license) and distributed via corporate email. The interface includes an initial instruction screen specifying the per-item time limit, the diagnostic purpose of the test, and the anonymity of participation, as well as a block of socio-demographic and work-related questions. Respondents completed the 12 Mini-VLAT items, each with a maximum of 25 seconds, and automatic progression to the next item occurred when the time limit expired. The system automatically recorded response time, selected option, and omissions after the time limit; thus making the time limit a structural element of the research design.

3.4. Variables, coding, and analysis

The primary outcome is each participant’s “percent correct”, computed as the proportion of correct responses out of 12 items. Complementary indicators are “percent incorrect” and “percent omitted”, computed at both the individual and item levels to examine response patterns and relative item difficulty.

Records with irregular connection times were excluded. A minimum participation threshold of ≥2 answered items (out of 12) was set to preserve the validity of comparisons; records below this threshold were treated as early withdrawals or invalid sessions. Nonresponses, whether due to time expiration or explicit selection of the “No response” option, were coded as omissions. Because each item was administered under a fixed 25-second limit, omissions and accuracy should be interpreted as performance under temporal constraint rather than as untimed visualization competence per se.

Data analysis relies on one-way analysis of variance (ANOVA), applied separately to each key categorical variable. A complete-case approach is applied, including only participants with valid data for both the stratifying variable and the Mini-VLAT score in each contrast. As several socio-demographic questions were optional, effective sample sizes vary across analyses.

Given the diagnostic and exploratory purpose of the study, this approach is intended to provide a transparent descriptive map of group differences across institutional strata rather than to estimate mutually adjusted effects. Accordingly, the reported contrasts should be interpreted as bivariate patterns that may reflect correlated socio-demographic and organizational factors; future work is needed to model interactions and potential confounding explicitly. When the omnibus ANOVA test is significant, post-hoc multiple comparisons (Tukey’s HSD) are used to identify pairs of groups with statistically significant differences. Homoscedasticity and normality of residuals were assessed prior to inference, confirming the suitability of ANOVA tests given their robustness to mild deviations in large samples. Statistical significance was set at $α = 0.05$ (two-sided).

For graphical reporting, heatmaps are produced to display significant differences from post-hoc contrasts, using a diverging color scale (blue for higher performance, red for lower), with intensity proportional to the difference in percentage points of correct responses (pp).

4. Results

4.1. Overall performance and item difficulty pattern

The mean score on the adapted Mini-VLAT was 57.8% correct, with 27.1% omissions and 15.1% errors. Participants tended to abstain rather than guess when facing more complex items. This response pattern (pausing in the face of uncertainty) may be interpreted as an adaptive behavior, although it also indicates low self-confidence in inferential graphical tasks. Table 1 presents the item-level distribution of results.

Table 1.

Item-level results, visualization type, and response format in the adapted Mini-VLAT.

ID question	Question wording	Plot type	Answer type	Mean accuracy (%)	Incorrect responses (%)	Nonresponse (%)
Q1	eBay is under the Software category.	Treemap	True/False	52.83	9.76	37.41
Q2	Which country has the lowest proportion of gold medals?	100% stacked bar chart	Multiple choice	57.83	24.53	17.64
Q3	What taxi distance have customers traveled most frequently?	Histogram	Multiple choice	68.83	7.47	23.71
Q4	In 2023, the percentage for Tarragona was lower than that for Zaragoza.	Choropleth map	True/False	72.60	12.31	15.09
Q5	What is the approximate global market share of Samsung smartphones?	Pie chart	Multiple choice	44.30	35.68	20.02
Q6	Which metro network has the highest number of stations?	Bubble chart	Multiple choice	56.52	9.93	33.55
Q7	What is the price of peanuts in Seoul?	Stacked bar chart	Multiple choice	5.17	41.67	53.16
Q8	What was the price of a barrel of oil in February 2020?	Line chart	Multiple choice	81.79	3.86	14.36
Q9	What is the average Internet speed in Japan?	Simple bar chart	Multiple choice	77.19	8.04	14.77
Q10	What was the average price of a pound of coffee in October 2019?	Area chart	Multiple choice	51.03	16.24	32.73
Q11	What was the proportion of girls named “Isla” compared with “Amelia” in 2012 in the United Kingdom?	Stacked area chart	Multiple choice	16.98	16.90	66.12
Q12	There is a negative relationship between height and weight among the 85 people.	Scatterplot	True/False	52.01	13.62	34.37

Source: Compiled by the authors.

Accuracy levels vary substantially depending on cognitive task type and graphical format. A clear asymmetry emerges between the reading of explicit magnitudes and proportional inference (see Table 1). Items focused on literal reading or point-value extraction (Q8, Q9, and Q4) yield the highest accuracy rates (81.79%, 77.19%, and 72.60%, respectively), showing strong competence in direct value identification. In contrast, items requiring proportional comparison or relational reasoning (Q7 and Q11, both stacked charts) exhibit extremely low accuracy (5.17% and 16.98%) and high omission rates (53.16% and 66.12%).

4.2. Performance distribution by structural variables

Performance on the Mini-VLAT reveals a stable architecture of differences structured along three axes: educational attainment, occupational level, and practical experience with visualizations. Additionally, departmental heterogeneity reflects differentiated data-use cultures, and the balance between errors and omissions provides diagnostic insight in itself. Across all main comparisons (age, education, administrative level, and department), one-way ANOVA tests are significant (p < 0.05), and Tukey HSD post-hoc tests confirm the expected gaps.

4.2.1 Age group

Accuracy decreases systematically with age, while omissions increase. Table 2 shows accuracy rates by group. Scores decline by approximately 17 percentage points between the youngest (26–35) and oldest (56–75) cohorts. The proportion of incorrect responses remains stable (ranging from 14.10% to 17.21%), whereas the proportion of nonresponse rises from 19.87% to 34.19%. Thus, the loss of precision stems not from greater misinterpretation but from a higher rate of nonresponse.

Table 2.

Mean accuracy scores by age group.

Age group	Sample size	Mean accuracy (%)	Incorrect responses (%)	Nonresponse (%)
26-35	65	66.03	14.10	19.87
36-45	132	62.12	18.24	19.63
46-55	444	54.77	16.38	28.85
56-65	523	48.98	17.10	33.95
66-75	39	48.93	16.88	34.19

Source: Compiled by the authors.

These results contrast with recent comparative studies in which tasks have no time limit and similar accuracy is observed across age groups, thus underscoring the relevance of the time limit in our research design. In a task comprising ten low-level analysis tasks and five basic visualizations, While and Sarvghad³⁶ find that, although accuracy levels were similar, adults aged 60+ required, on average, more time than younger adults to complete visual tasks, particularly those involving distributions and correlations. Similarly, Felberbaum et al.³⁷ report that even among individuals aged 70–85, accuracy in visual reading and inference tasks equals or exceeds that of younger adults (20–40), although response times are significantly longer.

In the context of the Mini-VLAT, where each item has a strict 25-second limit, this temporal asymmetry is critical. Older participants may not have sufficient time to apply slower yet systematic processing strategies. The elevated omission rate, rather than error rate, should therefore be interpreted as the combined effect of slower visual exploration and a time restriction that penalizes such cognitive latency, rather than as a genuine deficit in literacy.

The one-way ANOVA on accuracy across age groups is significant ( $p < 0.05$ ), with Figure 1 showing the differences in accuracy that are statistically confirmed by Tukey HSD post-hoc tests. The resulting matrix reveals a clear age-based gradient in DVL, with a pronounced inflection around midlife (≈45–50 years), corresponding to the generational transition from predominantly analog to digital-visual educational environments.

Figure 1.

Significant differences across age groups (Tukey HSD post-hoc tests). Mean differences in accuracy rates (percentage points); color intensity (with blue for higher performance and red for lower) encodes effect size and direction.

However, this gradient should not be attributed solely to educational background. It reflects the interaction between cohort and cognitive pacing effects. Individuals who entered digital visual culture later in life may retain strong interpretative skills but operate with different temporal economies. In rapid-response assessments such as the Mini-VLAT, these slower, more deliberate strategies are disproportionately penalized.

4.2.2 Education attainment

Table 3 shows accuracy rates by education attainment. The educational gradient is pronounced and largely monotonic. The performance gap across the educational continuum is substantial, with declines in precision being again largely driven by omission rates. Error rates increase moderately, from 13.79% (Doctorate) and 14.41% (Master’s) to 20.75% (Upper Secondary) and 17.33% (Higher Vocational), while omissions rise from 21.66% and 25.55% to 34.40% and 35.47%, respectively. Lower educational attainment thus corresponds to higher abstention in proportionally or relationally demanding items.

Table 3.

Mean accuracy scores by education attainment.

Education attainment	Sample size	Mean accuracy (%)	Incorrect responses (%)	Nonresponse (%)
Doctorate (E9)	55	64.55	13.79	21.66
Master’s (E8)	203	60.34	14.41	25.25
Long-cycle degree ≥240 ECTS (E7)	476	54.99	16.44	28.57
Bachelor’s degree ≤240 ECTS (E6)	197	53.17	16.71	30.12
Higher Vocational Training (E5)	113	47.20	17.33	35.47
Upper Secondary Education (E4)	102	44.85	20.75	34.40
Intermediate Vocational Training (E3)	32	38.80	19.01	42.19
Lower Secondary Education (E2)	20	32.08	28.34	39.58
Primary Education (E1)	8	22.92	18.75	58.33

Source: Compiled by the authors.

The one-way ANOVA by education level is also significant ( $p < 0.05$ ), with the Tukey HSD tests (see Figure 2) revealing the largest differences between postgraduate university levels and lower or mid-level education (primary, secondary, vocational). Differences among university degrees (Doctorate, Master’s, Bachelor’s) are smaller (7–11 pp), suggesting a ceiling effect in higher education groups.

Figure 2.

Significant differences across education levels (Tukey HSD post-hoc test), with acronyms in Table 3. Mean differences in accuracy rates (percentage points); color intensity (with blue for higher performance and red for lower) encodes effect size and direction.

This educational gradient aligns with large-scale evidence on quantitative literacy. In a cross-national analysis of more than 20 education systems, Park and Kyei³⁸ show that educational attainment is the most consistent predictor of adults’ ability to understand and use numerical and graphical information. Specifically, based on microdata from the Adult Literacy and Life Skills Survey and the International Adult Literacy Survey, they find that gaps between higher-education graduates and those without secondary education reach several tens of percentage points, even after controlling for socioeconomic factors.

4.2.3 Administrative level

Administrative levels reflect the hierarchical structure of the regional civil service. Levels A1 and A2 correspond to higher-qualification, higher-responsibility tasks (technical–managerial functions), while C1 and C2 correspond to roles with lower responsibilities and primarily administrative support functions. Education access thresholds, established by national legislation, determine entry to the different levels.³⁹ Using the education-level nomenclature introduced above, A1 and A2 require ≥ E6; C1 requires at least E3; C2 requires a minimum of E2, with E1 falling below the entry threshold.

Table 4 shows accuracy rates by administrative level. Again, the gap is primarily driven by omissions rather than errors. Level A1 records 14.18% errors and 26.07% omissions, whereas C2 records 20.41% errors and 35.33% omissions. Lower levels thus exhibit greater nonresponse on items requiring proportional–relational reasoning, mirroring the educational and age patterns described above.

Table 4.

Mean accuracy scores by administrative level.

Administrative subgroup	Sample size	Mean accuracy (%)	Incorrect responses (%)	Nonresponse (%)
A1	442	59.75	14.18	26.07
A2	269	55.89	17.53	26.58
C1	217	51.27	16.21	32.52
C2	229	44.25	20.41	35.33

Source: Compiled by the authors.

The one-way ANOVA by level is significant ( $p < 0.05$ ), with Tukey HSD tests identifying the main differences (see Figure 3): A1 > C2 (+15.5 pp), A2 > C2 (+11.6 pp), C1 > C2 (+7.0 pp), and A1 > C1 (+8.5 pp). No statistically significant gaps are found between other contiguous pairs.

Figure 3.

Significant differences across administrative levels (Tukey HSD post-hoc test), with acronyms in Table 4. Mean differences in accuracy rates (percentage points); color intensity (with blue for higher performance and red for lower) encodes effect size and direction.

These results can be interpreted within the framework of organizational data literacy. As Ongena⁴⁰ conceptualizes, data literacy operates as a third-order competence encompassing five domains—identification, comprehension, use, communication, and reflexivity—whose main effects manifest in organizational performance (efficiency, effectiveness, and equity).

Accordingly, these differences reflect varying levels of organizational data maturity. Higher-ranking bodies (A1/A2), with greater decision-making responsibility and exposure to analytical environments, emphasize data understanding and data communication. Intermediate and support groups (C1/C2) exhibit a more instrumental and reactive use of data. Self-reported familiarity with visualizations supports this pattern: participants who had previously created visualizations (n = 97) achieved a mean accuracy of 61.86%, compared with 54.28% among those somewhat familiar (n = 384) and 51.91% among those with no prior experience (n = 97).

4.2.4 Departmental domain

Each conselleria (department) represents an executive unit with specific sectoral responsibilities. Comparing departments reveals how functional differences in data use translate into distinct literacy levels. Table 5 lists their official names, functional descriptions, and acronyms. Analytical departments with high exposure to data, such as Finance (FIN), show moderate error rates (15.44%) and relatively low omission rates (24.14%). In contrast, socially or legally oriented areas such as Social Policy (SOC) and Justice (JUST) exhibit higher omissions (32.82% and 33.07%) and error rates approaching 19%. These differences reflect the functional ecology of each domain: economic, environmental, or innovation departments routinely use dashboards, indicators, and time series, whereas legal and social departments rely primarily on textual or normative documents. This functional asymmetry results in a DVL gradient that stems not only from individual educational attainments but also from the cognitive structure of the work environment.

Table 5.

Mean accuracy scores by administrative department.

Department name	Department tasks	Acronym	Sample size	Mean accuracy (%)	Incorrect responses (%)	Nonresponse (%)
Conselleria para la Recuperación Económica y Social de la Comunitat Valenciana	Cross-cutting programs for economic and social recovery and resilience.	RECOV	2a	75.00	12.50	12.50
Conselleria de Hacienda y Economía	Public finance, budgeting, taxation, and regional economic policy.	FIN	116	60.42	15.44	24.14
Conselleria de Emergencias e Interior	Civil protection, emergency response coordination, and internal security affairs.	EMER	20	60.00	15.42	24.58
Conselleria de Medio Ambiente, Infraestructuras y Territorio	Environmental policy, infrastructure planning, land use and territorial development.	ENV	117	58.26	14.46	27.28
Conselleria de Agricultura, Agua, Ganadería y Pesca	Agriculture policy, water resources management, livestock and fisheries oversight.	AGRI	128	56.38	14.91	28.71
Conselleria de Innovación, Industria, Comercio y Turismo	Innovation policy, industry development, trade regulation, and tourism promotion.	INN	56	53.42	18.01	28.57
Conselleria de Educación, Cultura, Universidades y Empleo	Schools and universities system, culture promotion, and active labor market policies.	EDU	256	52.38	17.45	30.17
Conselleria de Sanidad	Regional public health system management and healthcare services.	HEALTH	160	52.19	16.20	31.61
Conselleria de Servicios Sociales, Igualdad y Vivienda	Social services, equality policies, and housing programs.	SOC	194	48.24	18.69	33.07
Conselleria de Justicia y Administración Pública	Justice services, legal affairs, and public administration/HR modernization.	JUST	97	48.20	18.98	32.82

Source: Compiled by the authors.

^aNot interpretable sample for comparative purposes.

The one-way ANOVA on departmental means is also significant (p < 0.05), with Tukey HSD tests (see Figure 4) confirming that the observed gaps correspond to distinct departmental data cultures rather than random individual dispersion. This finding aligns with the data readiness model proposed by Klievink et al.,⁴¹ which demonstrate substantial differences among public organizations in their preparedness to work with data, depending on administrative function and internal culture.

Figure 4.

Significant differences across departments (Tukey HSD post-hoc test), with acronyms in Table 5. Mean differences in accuracy rates (percentage points); color intensity (with blue for higher performance and red for lower) encodes effect size and direction.

Departments oriented toward policy planning and analysis (characterized by technical structures and evidence-based decision processes) achieve high levels of readiness across technical, organizational, and human dimensions. Conversely, units focused on administrative or normative functions remain in initial stages, characterized by limited systematization in data use and communication.

Each conselleria thus appears to have developed its own internal cognitive ecosystem. In those where visual analysis forms part of routine decision-making, interpretative competence has become naturally embedded. In others, more oriented toward normative or welfare management, data retains a peripheral role. This functional architecture of DVL ultimately reflects the diverse, stratified, and uneven information cultures that characterize public administration.

5. Discussion and conclusions

AI-mediated visual analysis environments impose a regime of immediacy in interpretation and action. In this context, contemporary Data Visualization Literacy (DVL) must operate under the accelerated pace of intelligent systems acting in milliseconds.

The present study highlights the relevance of this chronometric dimension. Under a 25-second limit per item, proportional–relational tasks in the Mini-VLAT produced the highest omission rates. This pattern should not be interpreted necessarily as a lack of visualization knowledge, but as a temporal mismatch between sequential cognitive processing habits and the rapid tempo imposed by digital interfaces. Generations trained in more analytic and linear modes of reasoning tend to suspend responses where the interface demands immediacy.

This dynamic may have operational implications. When uncertainty translates into nonresponse under time constraints, part of the interpretive work can be deferred to automated outputs that are visually coherent and readily actionable.⁴² In such settings, visual polish may be mistaken for evidential strength rather than serving purely as a design virtue. Among less educated subgroups and text-oriented departments, the combination of high omission rates and time constraints increases the risk of passive acceptance of algorithmic results, as interpretive work is more easily deferred to already synthesized visual outputs. Empirical results also point to specific vulnerability under conditions of high proportional load on graphics and time pressure, manifested in high rates of nonresponse.

Accordingly, DVL training in AI-augmented public administrations would benefit from an explicit emphasis on graphical skepticism. Reading an algorithmically generated visualization involves questioning the underlying model, aggregation loss, color and ordering criteria, and sources of invisible uncertainty. Within time-constrained analytical environments, short, domain-specific training formats using real datasets and explicit precision–time targets appear particularly well suited to strengthening rapid-reading skills without undermining interpretive control. From this perspective, a minimal discipline of verification emerges as a key condition for preserving analytical agency amid increasingly persuasive interfaces.

Rather than framing the observed dynamic as an argument for slowing analytical systems, the findings highlight the importance of reinforcing human expertise within AI-assisted decision cycles, particularly through improved reading speed, the establishment of systematic validation routines, and the development of a shared culture of visual critique. From this perspective, these orientations are intended to support organizational responsiveness while preserving interpretive oversight under conditions of temporal constraint.

Differences across departments reveal distinct cognitive profiles. In data-intensive areas where dashboards and indicators are routine, graphical reading has become a professional habit; in text-based domains, visual evidence remains peripheral. In AI-mediated work, these contrasts would acquire heightened relevance: DVL operates as an organizational property, shaped by the availability of visual tools, dashboard frequency, and internal data-communication culture.

From an organizational perspective, these results foreground the relevance of cognitive interoperability. In interdepartmental decision-making contexts, a shared visual language can help ensure that alerts, thresholds, and trends carry comparable meaning across domains such as Finance and Social Services. A minimal framework—consistent scales and units, uncertainty coding, accessible color palettes, and standard annotation syntax—may reduce interpretive friction and coordination costs in time-compressed decision cycles. Without such alignment, differences in data interpretation risk accumulating across organizational boundaries.

DVL is also a question of cognitive equity. Groups or departments exhibiting higher error or omission rates in relational tasks may face disadvantages in participating fully in AI-supported analytic processes. Within the European DigComp 2.2 framework, graphical interpretation forms part of the core “Information and Data Literacy” area.⁴³ High-level comprehension of visualizations is therefore not a scholarly luxury but a prerequisite for transparency, accountability, and operational efficacy in public policy.

Ultimately, DVL represents a form of institutional cognitive capital. In an AI-enhanced administration, it underpins what the Organization for Economic Co-operation and Development (OECD) terms a digitally enabled state, one that learns, adapts, and communicates through data in real time.⁴⁴ Treating DVL as an element of institutional infrastructure may involve regular diagnostics of staff skills, the adoption of shared graphic-interoperability standards, function-specific training, and longitudinal monitoring of accuracy and omission patterns. AI contributes speed and scale; DVL provides the cognitive foundation and epistemic oversight that helps ensure formal clarity is not mistaken for truth.

Future research could extend this diagnostic baseline in three complementary directions. First, given the marked asymmetry between literal reading and proportional–relational inference under time constraints, subsequent work could test whether Mini-VLAT performance reflects a largely unidimensional skill gradient or a differentiated latent structure across item families and cognitive operations. Second, the prominence of omissions motivates treating nonresponse as an outcome in its own right—distinct from accuracy among responders—and exploring models that explicitly incorporate the per-item time limit as a performance constraint. Third, longitudinal or intervention-based designs could examine whether short, domain-specific training shifts the omission–accuracy balance in high-load items and whether complementary assessments of critical visualization judgment converge with the present diagnostic profile in AI-supported public administration contexts.

Footnotes

Acknowledgements

The authors wish to thank Institut Valencià d’Administració Pública (IVAP), and particularly the General Subdirector of the Dirección General de Función Pública, Javier Cuenca, for their invaluable support. They also thank Marie Hodkinson for her careful linguistic revision of the manuscript, and two anonymous reviewers for their valuable suggestions. The usual disclaimer applies.

ORCID iDs

Ignacio Montalvá

Virgilio Pérez

Jose M. Pavía

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research has been supported by Generalitat Valenciana, Conselleria de Educación, Cultura y Universidades [grant number CIAICO/2023/031], Generalitat Valenciana, Conselleria de Economía, Hacienda y Administración Pública [grant number HIECPU/2023/2], and European Comission, Digital Europe Programme [grant number 101226207]. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Lee

Kim

Kwon

. VLAT: Development of a visualization literacy assessment test. IEEE Trans Vis Comput Graph 2017; 23(1): 551–560. https://doi.org/10.1109/TVCG.2016.2598920

Boy

Rensink

Bertini

, et al. A principled way of assessing visualization literacy. IEEE Trans Vis Comput Graph 2014; 20(12): 1963–1972. https://doi.org/10.1109/TVCG.2014.2346984

Cairo

. How charts lie: Getting smarter about visual information. W. W. Norton & Company, 2019.

Caccamo

Cortoni

. Infodemic, visual disinformation and data literacy: How to foster critical thinking through the emerging data-graphicacy competence. In: Lecture Notes in Networks and Systems. Springer International Publishing, 2023, pp. 315–324. https://doi.org/10.1007/978-3-031-25906-7_34

Costello

Donohoe

. Data visualisation literacy in higher education: An exploratory study of understanding of a learning dashboard tool. Int J Emerg Technol Learn 2020; 15(17): 115–128. https://doi.org/10.3991/ijet.v15i17.15041

Cui

Kay

. CALVI: Critical thinking assessment for literacy in visualizations. In: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, 2023. https://doi.org/10.1145/3544548.3581406

Cui

Ding

, et al. Adaptive assessment of visualization literacy. IEEE Trans Vis Comput Graph 2024; 30(1): 628–637. https://doi.org/10.1109/TVCG.2023.3327165

Pandey

Ottley

. Mini-VLAT: A short and effective measure of visualization literacy. Comput Graph Forum 2023; 42(3): 1–11. https://doi.org/10.1111/cgf.14809

Creamer

Padilla

Borkin

. Finding gaps in modern visualization literacy. In: Proceedings of the CHI 2024 Conference on Human Factors in Computing Systems. ACM, 2024. https://doi.org/10.31219/osf.io/jy9v2

10.

Brockbank

Verma

Lloyd

, et al. Evaluating convergence between two data visualization literacy assessments. Cogn Res Princ Implic 2025; 10(1): 15. https://doi.org/10.1186/s41235-025-00622-9

11.

Hedayati

Kay

. What university students learn in visualization classes. IEEE Trans Vis Comput Graph 2025; 31(1): 1072–1082. https://doi.org/10.1109/TVCG.2024.3456291

12.

Rogers

Jeffcoat

. Data visualization literacy skills of information science students. J Educ Libr Inf Sci Online 2024; 65(4): 410–425. https://doi.org/10.3138/jelis-2023-0024

13.

Gupta

Pandey

Pal

. Automating government report generation: A generative AI approach for efficient data extraction, analysis, and visualization. Digit Gov Res Pract 2025; 6(1): 1–10. https://doi.org/10.1145/3691352

14.

Avgerinou

Pettersson

. Toward a cohesive theory of visual literacy. J Vis Lit 2011; 30(2): 1–19. https://doi.org/10.1080/23796529.2011.11674687

15.

Gregori-Giralt

Benítez-Robles

. El concepto de alfabetización visual en la literatura especializada: Una reflexión teórica en torno a su evolución en los últimos 50 años. Observar 2024; 18: 17–46.

16.

Börner

Bueckle

Ginda

. Data visualization literacy: Definitions, conceptual frameworks, exercises, and assessments. Proc Natl Acad Sci U S A 2019; 116(6): 1857–1864. https://doi.org/10.1073/pnas.1807180116

17.

Hedayati

Hunt

Kay

. From pixels to practices: Reconceptualizing visualization literacy. 2024. https://doi.org/10.31219/osf.io/6mq42

18.

Firat

. A comparative review of evaluation in visualization literacy studies. Çukurova Üniversitesi Mühendislik Fakültesi Dergisi 2023; 38(2): 391–399. https://doi.org/10.21605/cukurovaumfd.1334205

19.

Locoro

Fisher

Jr Mari

. Visual information literacy: Definition, construct modelling and assessment. IEEE Access 2021; 9: 88862–88884. https://doi.org/10.1109/ACCESS.2021.3078429

20.

Raffaghelli

. Alfabetizzare ai dati nella società dei big e open data: una sfida formativa. Form Insegn 2017; 15(3): 299–316. https://doi.org/10.7346/-fei-XV-03-17_21

21.

Cui

Kay

. AVEC: An assessment of visual encoding ability in visualization construction. In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, 2025, pp. 1–16. https://doi.org/10.1145/3706598.3713364

22.

Galesic

García-Retamero

. Graph literacy: A cross-cultural comparison. Med Decis Making 2011; 31(3): 444–457. https://doi.org/10.1177/0272989X10373805

23.

García-Retamero

Cokely

Ghazal

, et al. Measuring graph literacy without a test: A brief subjective assessment. Med Decis Making 2016; 36(7): 854–867. https://doi.org/10.1177/0272989X16655334

24.

Saske

Koesten

Möller

, et al. A multidimensional assessment method for visualization understanding (MdamV). IEEE Trans Vis Comput Graph 2026. https://doi.org/10.1109/TVCG.2026.3653265

25.

Yim

Jung

Yoon

, et al. Revisiting visualization evaluation using EEG and Visualization Literacy Assessment Test. In: Chaine

Deng

Kim

(eds). Pacific Graphics Short Papers and Posters. The Eurographics Association, 2023. https://doi.org/10.2312/pg.20231289

26.

Rodrigues

AMB

Barbosa

GDJ

Lopes

HCV

, et al. What questions reveal about novices’ attempts to make sense of data visualizations: Patterns and misconceptions. Comput Graph 2020; 92: 1–16. https://doi.org/10.1016/j.cag.2020.09.015

27.

Cabouat

A-F

Cabric

, et al. Position paper: A case to study the relationship between data visualization readability and visualization literacy. In: CHI 2024 Workshop: Toward a More Comprehensive Understanding of Visualization Literacy, 2024. Available from: https://hal.science/hal-04523790v2

28.

Hao

Hou

, et al. Generative AI for visualization: State of the art and future directions. Vis Inform 2024; 8(2): 43–66. https://doi.org/10.1016/j.visinf.2024.04.003

29.

Dong

Crisan

. Probing the visualization literacy of vision language models: the good, the bad, and the ugly. IEEE Trans Vis Comput Graph 2025; 32: 1175–1185. https://doi.org/10.1109/TVCG.2025.3634791

30.

Tian

Cui

Deng

, et al. ChartGPT: Leveraging LLMs to generate charts from abstract natural language. IEEE Trans Vis Comput Graph 2025; 31(3): 1731–1745. https://doi.org/10.1109/TVCG.2024.3368621

31.

Chen

Zhang

, et al. VisEval: A benchmark for data visualization in the era of large language models. IEEE Trans Vis Comput Graph 2025; 31(1): 1301–1311. https://doi.org/10.1109/TVCG.2024.3456320

32.

. GenAI4Vis: generative AI for data visualization: a practical framework for insightful visualization in the AI era. Politecnico di Milano, 2024. Available from: https://www.politesi.polimi.it/handle/10589/231126

33.

Neri

Marshall

Chan

HK-H

, et al. Data visualization in AI-assisted decision-making: a systematic review. Front Commun 2025; 10: 1605655. https://doi.org/10.3389/fcomm.2025.1605655

34.

Wang

Liao

, et al. Why is AI not a panacea for data workers? An interview study on human-AI collaboration in data storytelling. IEEE Trans Vis Comput Graph 2025; 31(10): 7598–7613. https://doi.org/10.1109/TVCG.2025.3552017

35.

Devineni

. AI-enhanced data visualization: Transforming complex data into actionable insights. J Technol Syst 2024; 6(3): 52–77. https://doi.org/10.47941/jts.1911

36.

While

Sarvghad

. Toward filling a critical knowledge gap: Charting the interactions of age with task and visualization. In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, 2025, pp. 1–18. https://doi.org/10.1145/3706598.3714229

37.

Felberbaum

Sheidin

Weiss

, et al. Data visualization comprehension among older adults: A comparative study of perceptual differences across age groups. 2024. https://doi.org/10.2139/ssrn.5004876

38.

Park

Kyei

. Literacy gaps by educational attainment: A cross-national analysis. Soc Forces 2011; 89(3): 879–904. https://doi.org/10.1353/sof.2011.0025

39.

Real Decreto Legislativo 5/2015. de 30 de octubre, por el que se aprueba el Texto Refundido de la Ley del Estatuto Básico del Empleado Público. Boletín Oficial del Estado 2015; 261, Available from: https://www.boe.es/eli/es/rdlg/2015/10/30/5/con

40.

Ongena

. Data literacy for improving governmental performance: A competence-based approach and multidimensional operationalization. Digit Bus 2023; 3(1): 100050. https://doi.org/10.1016/j.digbus.2022.100050

41.

Klievink

Romijn

B-J

Cunningham

, et al. Big data in the public sector: Uncertainties and readiness. Inf Syst Front 2017; 19(2): 267–283. https://doi.org/10.1007/s10796-016-9686-2

42.

Parasuraman

Manzey

. Complacency and bias in human use of automation: An attentional integration. Hum Factors 2010; 52(3): 381–410. https://doi.org/10.1177/0018720810376055

43.

Vuorikari

Kluzer

Punie

. DigComp 2.2: The Digital Competence Framework for Citizens: with new examples of knowledge, skills and attitudes. Publications Office of the European Union, 2022. https://doi.org/10.2760/115376

44.

OECD . The OECD framework for digital talent and skills in the public sector. OECD Working Papers on Public Governance. 2021. https://doi.org/10.1787/4e7c3f58-en

The role of data visualization literacy in public administration in the era of AI adoption

Abstract

Keywords

1. Introduction

2. Theoretical framework

2.1. From visual literacy to data visualization literacy

2.2. Assessing data visualization literacy

2.3. Visual production in the era of generative artificial intelligence

3. Methodology

3.1. Design and participants

3.2. Instrument

3.2.1. Conceptual basis

3.2.2. Adaptation process

3.3. Digital administration environment

3.4. Variables, coding, and analysis

4. Results

4.1. Overall performance and item difficulty pattern

4.2. Performance distribution by structural variables

4.2.1 Age group

4.2.2 Education attainment

4.2.3 Administrative level

4.2.4 Departmental domain

5. Discussion and conclusions

Footnotes

Acknowledgements

ORCID iDs

Funding

Declaration of conflicting interests

References