Abstract
This study presents a novel multi-modal approach analyzing public sentiment towards interactive art across Jiangsu Province using social media data. By integrating computer vision and NLP techniques (fine-tuned Qwen-VL for image captioning and prompt-based LLMs for sentiment analysis), we capture nuanced digital representations of artworks and public reactions. Findings reveal complex spatial patterns challenging traditional urban-rural dichotomies and highlighting Jiangsu’s polycentric cultural innovation. Urban centers focus on technological aspects and critical discourse, while peripheral areas emphasize thematic content and audience engagement. Correlation analysis reveals relationships between socioeconomic factors and digital art engagement, reflecting cultural capital and digital divide theories. These patterns invite re-examination of China’s cultural development through postreform urbanization theories, suggesting place-based policies that recognize diverse strengths across the urban-rural continuum. This research contributes to cultural democratization debates and offers a replicable framework for data-driven cultural policy-making.
Keywords
Introduction
China’s rapid urbanization and technological advancement have catalyzed a profound transformation in the cultural landscape, particularly in the interactive art. This dynamic environment necessitates innovative approaches to understanding public engagement with and perception of interactive artworks, positioning Chinese cities at the forefront of global cultural studies and policy innovation. As urban populations grow and digital technologies proliferate, understanding public sentiment towards interactive art has become increasingly critical for sustainable cultural development and urban placemaking. Social media platforms, with their vast user bases and rich multi-modal content, offer a unique window into the collective experiences and opinions of art audiences (Yu, 2009). However, leveraging this data to gain meaningful insights into perceptions of interactive art across diverse urban and rural contexts remains a significant challenge (Wan & Li, 2024).
Previous research has explored social media data for cultural analysis, focusing on audience engagement patterns (Xia et al., 2025; Zheng et al., 2014), artistic trends (Cranshaw et al., 2012), and perceived cultural value (Dubey et al., 2016). While these studies have provided valuable insights, they often rely on single-modality data or focus on metropolitan-level analysis, overlooking the nuanced variations in perception across different urban and rural contexts. Moreover, the integration of visual and textual data from social media to comprehensively capture perceptions of interactive art remains underexplored, particularly in Chinese provinces with unique urban-rural dynamics.
To systematically analyze public sentiment towards interactive art, we develop a novel computational framework that integrates multi-modal deep learning with spatial econometrics. Our approach combines fine-tuned vision-language models for contextual image understanding, prompt-engineered Large Language Models (LLMs) for complex sentiment extraction, and advanced spatial statistical methods for socioeconomic pattern discovery. This methodological synthesis enables granular analysis of both visual and textual social media content while accounting for spatial autocorrelation and demographic heterogeneity. The framework incorporates robust validation protocols through expert-annotated ground truth data and cross-modal consistency checks, ensuring reliable insights into cultural perception patterns.
Our regional analysis reveals distinctive spatial gradients in interactive art engagement that challenge conventional urban-rural dichotomies. Core cities demonstrate sophisticated technological discourse but show unexpectedly low community engagement scores, while peripheral regions excel in participatory experiences despite limited infrastructure. These patterns suggest a more nuanced relationship between urbanization and cultural innovation than previously theorized, with implications for cultural policy design.
The socioeconomic determinants analysis uncovers complex non-linear relationships between development indicators and artistic engagement. Educational attainment shows strong threshold effects in technological appreciation (β = .82) until reaching tertiary enrollment rates of 68%, beyond which institutional factors dominate. Cultural facility density demonstrates unexpected negative correlations with certain engagement metrics, suggesting potential oversaturation effects. These findings indicate that cultural capital formation follows distinct pathways in rapidly developing regions (Lamont & Lareau, 1988), necessitating tailored policy approaches that recognize local strengths and limitations. The key contributions of this work are threefold:
A novel multi-modal framework that integrates computer vision, natural language processing, and spatial statistics to extract nuanced insights from social media representations of interactive art.
Evidence of “peripheral advantage” in community-oriented art discourse, revealing inverse relationships between urbanization and participatory cultural innovation.
Quantitative demonstration of cultural capital threshold effects across China’s urban-rural continuum, suggesting the need for differentiated policy approaches in transitional economies.
The remainder of this paper is structured as follows. Section “Related Works” examines emergent approaches in social media analytics and interactive art research. Section “Weibo Interactive Arts Dataset” details our Weibo-based dataset construction and validation protocols. Section “Methods” presents our methodological framework, integrating vision-language models with spatial econometrics. Sections “Regional Results,”“Art Perception Across Social Stratas,” and “Socioeconomic Determinants of Art Engagement” analyze the spatial distribution of interactive art engagement, examining regional variations, socioeconomic correlates, and cultural capital formation mechanisms.
Related Works
Computational Social Media Analysis
The analysis of social media data has emerged as a transformative paradigm for understanding cultural phenomena, enabling researchers to examine public engagement patterns at unprecedented scales (Gandhi et al., 2023). Early computational approaches focused primarily on text-based sentiment analysis, but recent advances have highlighted the need for multi-modal frameworks that can capture the rich interplay between visual and textual expressions in social media discourse.
The evolution of social media analytics has been marked by significant methodological innovations in demographic inference and representativeness. Wang et al. (2019) pioneered multilingual approaches for demographic attribute inference, demonstrating how post-stratification techniques can mitigate sampling biases in social media data. This work was further extended by Kumar and Singh (2022), who developed deep neural architectures for extracting geographical references from bilingual social media content. These advances have been particularly crucial for studying cultural phenomena across diverse linguistic and social contexts. Contemporary research has increasingly focused on the challenges of location inference and contextual understanding, with Lamsal et al. (2022) introducing sophisticated frameworks for origin location identification that achieve notable accuracy across different geographical granularities. These developments have been complemented by innovations in personalized content moderation and the integration of social explanations into explainable AI systems collectively advancing our ability to analyze complex social media phenomena while addressing crucial issues of bias and representativeness (Gong et al., 2024; Jhaver et al., 2023).
The field has recently witnessed a paradigm shift toward more nuanced approaches for analyzing information diffusion and user engagement patterns. Zhang et al. (2019) demonstrated the critical role of social media in disaster communication, highlighting the importance of understanding network dynamics and information flow patterns. This has been further elaborated by Meel and Vishwakarma (2020), who developed comprehensive frameworks for analyzing information pollution and content authenticity in social networks. The integration of these approaches with visual media analysis techniques, as demonstrated by Rogers (2021), has enabled researchers to develop more sophisticated understanding of how visual content shapes online discourse and cultural perception. These developments have been particularly significant for studying ephemeral content and temporal dynamics in social media engagement, as exemplified by Villaespesa and Wowkowych’s (2020) analysis of museum-related social media stories.
Interactive Arts and Generative Systems
Interactive art research has evolved through three distinct phases: technological experimentation, participatory design, and socio-spatial analysis. Early studies focused on establishing fundamental frameworks for understanding embodied interaction and audience participation levels (Kohtala et al., 2020). Contemporary research examines the integration of generative systems within interactive installations, revealing complex relationships between algorithmic creativity and human engagement (Epstein et al., 2023). These developments have catalyzed new theoretical frameworks for understanding how interactive systems mediate urban experiences and reshape cultural spaces (Brinkmann et al., 2023), particularly in the context of rapidly evolving digital landscapes.
Recent advances in generative AI have fundamentally transformed the interactive art landscape, introducing novel paradigms for creative expression and audience engagement. Studies have documented enhanced individual creativity through AI-augmented interactive installations (Doshi & Hauser, 2024), while simultaneously raising questions about collective novelty and cultural homogenization. This tension manifests particularly in public reception, where bias against AI-generated elements can paradoxically enhance perceived human creativity (Horton et al., 2023). The emergence of “machine culture” has introduced new modes of cultural transmission and evolution (Brinkmann et al., 2023), leading to sophisticated theoretical frameworks for understanding human-AI creative collaboration (Hertzmann, 2025). These developments suggest a fundamental shift in how interactive art systems function as mediators of cultural experience (Hermann & Puntoni, 2024), necessitating new approaches to studying their impact on public space and collective meaning-making.
Weibo Interactive Arts Dataset
The dataset comprises 15,201 Weibo posts specifically focused on interactive art installations across Jiangsu Province, collected through systematic API queries and manual verification from January to December 2023. The data collection protocol employed rigorous filtering mechanisms, including location-based parameters, contextual relevance scoring, and expert validation resulting in a high-fidelity corpus of interactive art documentation. Each post was annotated with standardized metadata including geographical coordinates, installation specifications, and interaction modalities. The temporal distribution ensures comprehensive coverage of both permanent installations and temporary exhibitions, while spatial sampling across 13 administrative regions maintains representative coverage of urban-rural variations. Figures 1 and 2 present sample posts demonstrating the data quality and analyze the overall distribution patterns across interactive art categories.

Sample Weibo posts from different Jiangsu regions, showcasing the diversity of interactive art content and imagery.

Dataset statistics showing post types and content characteristics: (a) distribution of text-only and image containing posts and (b) post length and images per post across categories.
Figure 1 presents ten representative examples of interactive art installations and their corresponding social media descriptions from different regions across Jiangsu Province, illustrating the diversity and sophistication of digital-physical artistic experiences. The installations demonstrate various interactive modalities: from Xuanwu, Nanjing’s immersive digital landscape that transforms traditional mountain-water paintings into dynamic multiverse experiences, to Binhu, Wuxi’s innovative “Smart Flower Landscape” that employs naked-eye 3D technology for participatory floral displays. Notable technological approaches include Suzhou’s sound-to-visual conversion system that translates drum rhythms into dynamic imagery, and Lianyun, Lianyungang’s ecosystem simulation that enables visitors to create virtual porpoises. The examples also showcase culturally-rooted innovations, such as Guangling, Yangzhou’s dialect-based digital flower installation and Jingkou, Zhenjiang’s integration of traditional ink wash techniques with digital river mapping. The installations range from large-scale immersive environments (interactive Song Dynasty exhibition in Haimen, Nantong) to intimate, AI-driven installations (gesture-responsive system in Zhonglou, Changzhou). This diverse collection reflects the province’s sophisticated integration of traditional cultural elements with cutting-edge interactive technologies, demonstrating the evolving landscape of public engagement with digital art.
Figure 2 presents a comprehensive analysis of the interactive art posts dataset. Audience Engagement emerges as the dominant category with 3,952 posts (26%), comprising 2,964 image-containing and 988 text-only posts. Artistic Techniques follows with 3,344 posts (22%), showing a notably high image presence (2,508 posts). Technological Platforms accounts for 3,040 posts (20%), with content length averaging 135 tokens and 1.9 images per post. Thematic Content represents 1,976 posts (13%), displaying the highest average post length of 160 tokens. Exhibition Contexts comprises 1,672 posts (11%), characterized by consistent image inclusion (1.7–2.1 images per post). Artist Profiles shows the smallest share with 1,217 posts (8%), yet demonstrates substantial text content averaging 140 tokens per post. Notably, image-containing posts dominate across all categories, constituting 75% of the total dataset, with post lengths ranging from 100 to 160 tokens.
This distribution reveals significant patterns in public engagement with interactive art on social media. The predominance of Audience Engagement and Artistic Techniques suggests a strong emphasis on participatory and technical aspects of interactive art, while the lower representation of Artist Profiles indicates less focus on creator-centric discourse. The consistent presence of images across categories, particularly in technique-focused posts, reflects the visual-centric nature of interactive art documentation. The varying post lengths across categories, with longer posts in Thematic Content and shorter ones in Exhibition Contexts, suggests different levels of descriptive depth required for different aspects of interactive art discussion.
Methods
This investigation employs a three-stage analytical pipeline to decode public engagement with interactive art across Jiangsu Province. The framework synthesizes computer vision for artwork interpretation, natural language processing for sentiment extraction, and spatial statistics for socioeconomic pattern discovery, enabling systematic examination of cultural perception formation mechanisms.
Multi-Modal Analysis Framework
The proposed framework, illustrated in Figure 3, integrates image and text analysis to provide a comprehensive understanding of public perceptions towards interactive art. This approach accommodates posts with both images and text, as well as text-only posts, ensuring a robust analysis of diverse social media content across Jiangsu’s regions.

Framework of the multi-modal interactive art analysis system regional development and digital perceptions of interactive art.
The first step involves topic classification, where image captions are generated for visual content of interactive artworks and combined with post text. This multi-modal input is then processed by a Large Language Model (LLM) using specific prompts to determine the post’s topic within the context of interactive art. For sentiment classification, we utilize only the post text as input to the LLM, yielding positive, negative, or neutral sentiments towards the artwork or experience. This text-centric approach for sentiment analysis ensures consistency across all post types and leverages the nuanced language understanding capabilities of LLMs.
Attention analysis quantifies the frequency of posts for each interactive art topic across regions, visualized through GIS mapping to reveal spatial patterns of engagement with different aspects of interactive art. Similarly, sentiment analysis aggregates emotional valence towards topics by region, providing insights into spatial variations in perceptions of interactive art. These analyses offer a multifaceted view of how different aspects of interactive art are perceived and discussed across Jiangsu’s diverse urban-rural landscape.
The final step employs correlation analysis to uncover relationships between socioeconomic factors and both the attention and sentiment towards various interactive art topics. This integrative approach bridges quantitative social media analysis with traditional socioeconomic indicators, offering a novel perspective on the interplay between.
LLM-Based Image Captioning for Interactive Art
To extract meaningful information from the visual content of Weibo posts related to interactive art, we employ an advanced image captioning approach leveraging Large Language Models (LLMs). This method allows us to generate descriptive captions that capture the salient features and context of each interactive artwork, providing a textual representation that can be seamlessly integrated with the natural language processing pipeline.
The LLM-based image captioning system begins with feature extraction using a pretrained Convolutional Neural Network (CNN), specifically a ResNet-152 architecture (He et al, 2016). These visual features are then encoded into a format compatible with the input requirements of our LLM, which is a fine-tuned Qwen-VL (Young et al., 2024). The LLM takes the encoded visual features as input and generates a descriptive caption through an autoregressive process. To enhance the relevance and specificity of the captions to our interactive art context, we employ a specific fine-tuning approach. We fine-tune the LLM on a large corpus of image-caption pairs from interactive art installations and exhibitions. This approach allows the model to generate captions that are not only descriptive of the visual content but also sensitive to the specific characteristics and contexts of interactive artworks.
Figure 4 presents a demonstration of the image captioning results for various interactive art scenes in Jiangsu. The examples showcase the model’s ability to capture diverse elements of interactive artworks, from large-scale installations to participatory experiences and technologically-driven pieces.

Demonstration of image captioning results for interactive art scenes in Jiangsu.
To evaluate the task-specific image captioning system, we conducted a comprehensive comparison with standard metrics such as BLEU-4 (Papineni et al., 2002), METEOR (Banerjee & Lavie, 2005), CIDEr (Vedantam et al., 2015), and SPICE I (Anderson et al., 2016) in Figure 5.

Performance comparison between base and fine-tuned models: (a) evaluation results on standard image captioning metrics and (b) model performance across six dimensions.
Figure 5a demonstrates model performance across established metrics, with CIDEr increasing from 0.998 to 1.025 and BLEU-4 from 0.321 to 0.344. These substantial improvements (CIDEr +0.027, BLEU-4 +0.023) indicate enhanced semantic understanding and structural coherence in image captioning. Figure 5b reveals distinct categorical performance through radar visualization. The base model exhibits bias toward technological platforms (0.85) while underperforming in artistic techniques (0.68) and artist profiles (0.65). Fine-tuning yields substantial improvements in weaker areas (+0.15 in artistic techniques, +0.19 in artist profiles) while maintaining technological strengths (0.89), demonstrating effective rebalancing of model capabilities.
Interactive Art Posts Analysis with LLMs
To extract meaningful insights from Weibo posts about interactive art, we employed Large Language Models (LLMs) for topic classification and sentiment analysis. This approach leverages LLMs’ advanced natural language understanding capabilities to categorize posts into predefined topics related to interactive art and assess their sentiment. The LLM-based method allows for nuanced interpretation of complex language patterns, idioms, and context-specific meanings, providing an accurate representation of public perceptions expressed in social media posts about interactive art.
Table 1 presents the topic classification scheme used for categorizing interactive art-related posts. This comprehensive framework covers various aspects of interactive art, from artistic techniques and thematic content to audience engagement and critical discourse. Sentiment analysis was performed with the LLM classifying posts into positive, neutral, or negative sentiments towards the interactive artwork or experience. To enhance performance, we implemented a prompt-based method, providing the model with examples of sentiment classification in interactive arts before processing each post. Table 2 presents examples of prompt results for topic and sentiment classification related to interactive art across different regions of Jiangsu Province. To evaluate the performance of our LLM-based approach, we compared it with several baseline methods, including advanced LLMs. Figure 6 presents the accuracy results for topic and sentiment classification.
Topic Classification on Interactive Art Social Media.
LLM Results for Topic and Sentiment Classification of Interactive Art Posts Across Jiangsu Regions.

Performance comparison of LLM models for interactive art analysis: (a) topic classification performance and (b) sentiment classification performance.
Figure 6 presents the comparative performance of LLMs and baseline methods in classifying interactive art-related social media content. Our analysis reveals that the fine-tuned LLM achieves superior performance across both topic classification (accuracy: 0.823, 0.012) and sentiment analysis (F1-score: 0.818, 0.015), outperforming conventional models including BERT (+11.2%), RoBERTa (+7.8%), and DeBERTa (+4.5%). These results validate the effectiveness of our domain-specific fine-tuning strategy and prompt engineering framework in capturing the complex semantics of interactive art discourse.
Regional Results
The analysis of social media data across Jiangsu’s regions reveals distinct patterns of public perception and engagement with interactive art. This section presents the spatial distribution of attention and sentiment towards various dimensions of interactive art, highlighting the complex interplay between socioeconomic factors and digital representations of artistic experiences.
Social Attention on Interactive Art
The attention analysis across Jiangsu’s regions reveals distinctive patterns in how different aspects of interactive art are perceived and emphasized in social media discourse. These patterns offer insights into the varied characteristics and cultural development trajectories of Jiangsu’s diverse urban rural landscape.
In Figure 7, the spatial distribution of interactive art attention exhibits pronounced polycentric characteristics across Jiangsu Province, with distinct gradients in technological and artistic dimensions. Notably, core urban districts demonstrate 25% to 35% higher attention intensities in digital interactivity (Mean = 82.3) and emerging technologies (Mean = 78.9) compared to peripheral regions. This pattern aligns with established theories of innovation diffusion in cultural geography (Florida, 2003) yet reveals anomalous clusters of high attention (>75%) in second-tier cities like Wuxi and Changzhou, particularly in collaborative creation and audiovisual elements, suggesting the emergence of specialized cultural innovation nodes outside traditional centers.

Attention analysis of interactive art across various dimensions in Jiangsu regions: (a) artistic techniques, (b) thematic context, (c) audience engagement, (d) technological platforms, (e) exhibition contexts, and (e) artist profiles.
Analysis of thematic content and audience engagement dimensions reveals a compelling inverse relationship between urbanization levels and community-oriented art attention. Rural and emerging urban districts display unexpectedly high attention scores in participatory experiences (Mean = 68.7) and cultural identity themes (Mean = 72.4), contradicting conventional center-periphery models of cultural innovation. This “peripheral advantage” phenomenon is particularly evident in traditional cultural centers like Yangzhou (82.3) and emerging zones like Yancheng (78.5), where deep-rooted cultural capital appears to transcend economic development metrics. The pattern suggests a nuanced interplay between cultural heritage preservation and interactive art engagement that challenges dominant narratives of urban cultural supremacy.
Exhibition contexts and artist profiles demonstrate highly localized attention clusters (Moran’s I = 0.723,
Table 3 reveals distinctive spatial patterns in interactive art engagement across Jiangsu’s regions. Major urban centers demonstrate pronounced attention to technological sophistication, with Nanjing emphasizing digital interactivity and installation technologies, while Suzhou balances cultural identity with physical interactivity. Second-tier cities exhibit hybrid engagement patterns: Changzhou and Wuxi leverage emerging technologies while maintaining strong audience participation. This urban-rural gradient manifests through decreasing technological complexity and increasing community engagement in peripheral regions, exemplified by Suqian’s focus on basic participation levels and digital platforms. Notably, cultural identity themes remain prominent in historically significant cities like Yangzhou, suggesting the persistence of traditional cultural capital despite technological disparities. These patterns reflect broader socio-spatial dynamics of cultural innovation diffusion, where technological sophistication correlates with urban development while community engagement and cultural preservation emerge as dominant themes in less urbanized areas.
Top Three Attention Areas for Interactive Art in Each Jiangsu Region.
Public Sentiment on Interactive Art
The analysis of public sentiment towards interactive art across Jiangsu’s regions reveals intricate patterns of perception, offering valuable insights into the province’s socio-spatial dynamics of cultural engagement. By leveraging multi-modal social media data, we uncover nuanced variations in residents’ attitudes towards key dimensions of interactive art, reflecting the complex interplay between technological innovation, cultural traditions, and socioeconomic factors.
Figure 8 presents a comprehensive visualization of sentiment analysis across seven key interactive art dimensions in Jiangsu’s regions. The striking spatial hierarchy emerges in the sentiment distributions, particularly evident in the technological platforms and artistic techniques dimensions (Figure 8a and d). Metropolitan cores exhibit markedly higher positive sentiment (75%–90%) toward digital and technological elements, while sentiment intensity diminishes along a clear gradient toward peripheral regions (45%–60%). This pattern reveals an intriguing phenomenon: the technological appreciation of interactive art appears to follow classic distance decay principles, reminiscent of Tobler’s First Law of Geography, but with notable anomalies in second-tier cities like Changzhou and Wuxi, which demonstrate unexpectedly high positive sentiment clusters (65%–80%) that disrupt the continuous spatial decay.

Sentiment analysis across various interactive art dimensions in Jiangsu regions: (a) artistic techniques, (b) thematic context, (c) audience engagement, (d) technological platforms, (e) exhibition contexts, and (e) artist profiles.
The thematic content and audience engagement dimensions (Figure 8b and c) reveal a more complex significant pattern that challenges conventional center-periphery models. We observe an inverse relationship between urbanization levels and sentiment positivity, with rural and semi-urban regions displaying notably higher positive sentiment (70%–85%) toward participatory and community oriented aspects of interactive art. This “peripheral advantage” in engagement sentiment manifests most strongly in traditional cultural centers like Yangzhou (82%) and emerging cultural zones like Yancheng (78%), suggesting the presence of deeply embedded cultural capital that transcends economic development metrics. The pattern points to a nuanced interplay between cultural heritage, community cohesion, and artistic reception that warrants reconsideration of standard cultural diffusion models.
Exhibition contexts and artist profiles (Figure 8e and f) demonstrate highly localized sentiment clusters that appear to correlate strongly with institutional presence and cultural infrastructure. Major cultural hubs like Nanjing and Suzhou show distinct positive sentiment peaks (85%–95%) surrounded by sharp gradients, creating “sentiment islands” that suggest the presence of strong institutional effects on public perception. This spatial configuration implies that sentiment toward curatorial and professional aspects of interactive art may be more sensitive to formal cultural infrastructure than previously theorized, highlighting the need for more sophisticated models of cultural sentiment diffusion in rapidly developing regions.
Art Perception Across Social Stratas
The regional socioeconomic indicators examined in this study were systematically compiled from authoritative sources, including the Jiangsu Provincial Bureau of Statistics (2024), provincial bureau reports, and cultural administrative records. These data encompass average income, educational attainment, digital infrastructure penetration, and cultural expenditure metrics across Jiangsu’s diverse regions. The comprehensive dataset provides a robust foundation for examining the intricate relationships between socioeconomic conditions and digital perceptions of interactive art, enabling nuanced analysis of spatial variations in cultural engagement patterns.
To systematically examine these multifaceted relationships between socioeconomic factors and digital perceptions of interactive art, we constructed a comprehensive correlation matrix. Figure 9 visualizes these complex interactions through color gradients and scatter plot overlays. Figure 9 reveals distinctive clustering patterns that challenge conventional assumptions about the relationship between urban development and cultural engagement (Bourdieu, 2018). Particularly noteworthy is the strong positive correlation (

Correlation matrix of art-related social media patterns.
More complex relationships emerges when examining the relationships between economic indicators and sentiment patterns. Average income demonstrates remarkably strong correlations with technological platform engagement (
The intricate web of correlations illuminates several theoretically significant patterns in the spatial distribution of interactive art engagement. First, the strong correlation triad between education, artistic techniques, and artist profiles (all
Socioeconomic Determinants of Art Engagement
This section leverages Shapley value analysis to decompose the complex relationships between socioeconomic factors and interactive art engagement patterns. By applying ensemble machine learning models to our multi-modal dataset, we quantify the relative importance of various predictors while accounting for their nonlinear interactions. This approach reveals how different socioeconomic variables contribute to both attention distribution and sentiment formation across Jiangsu’s diverse regions, offering insights into the mechanisms of cultural capital accumulation in rapidly developing urban-rural systems.
Determinants of Public Attention
To systematically evaluate the complex interactions between socioeconomic factors and interactive art engagement, we employed SHAP (SHapley Additive exPlanations) analysis across multiple machine learning models. Our approach integrated Random Forest, XGBoost, LightGBM, and CatBoost algorithms, each trained on the comprehensive dataset of social media interactions and regional socioeconomic indicators. The Shapley values, derived from cooperative game theory, provide a mathematically rigorous framework for attributing the contribution of each predictor variable to model outcomes. This is particularly crucial in our context, where traditional linear correlation analysis may obscure subtle interaction effects between development metrics and cultural engagement patterns.
Within Figure 10, the Shapley analysis reveals striking variations in the predictive power of socioeconomic variables across attention dimensions. Education level emerges as the dominant predictor for artistic techniques (0.82) and technological platforms (0.78), while household income shows the strongest influence on exhibition contexts (0.75). Notably, cultural facilities density demonstrates unexpectedly high predictive power for audience engagement (0.68) and thematic content (0.65), challenging conventional assumptions. Digital connectivity exhibits moderate but consistent effects across all dimensions (0.45–0.55), with peak influence in technological platforms (0.62). Population density shows the most variable predictive power, ranging from 0.35 for artist profiles to 0.72 for exhibition contexts, suggesting complex spatial dynamics in cultural attention patter.

SHAP-based attention analysis in Jiangsu regions: (a) artistic technique, (b) thematic context, (c) audience engagement, (d) technological platforms, (e) exhibition contexts, and (e) artist profiles.
The radial distribution patterns in Figure 10 illuminate several theoretically significant phenomena in the spatial organization of cultural attention. Most striking is the emergence of what we term “educational-technological coupling” that a distinctive pattern where educational attainment and technological infrastructure demonstrate synchronized predictive power across multiple attention dimensions. This coupling effect is particularly pronounced in artistic techniques and technological platforms, suggesting the presence of self-reinforcing knowledge-innovation cycles in certain regions. The asymmetric influence of cultural facilities density reveals an intriguing “institutional resonance effect” where physical cultural infrastructure appears to amplify attention patterns beyond its immediate spatial context. Perhaps most surprising is the discovery of “peripheral resilience zones” where relatively lower socioeconomic indicators nevertheless generate robust attention patterns, particularly in thematic content and audience engagement. These findings challenge deterministic models of cultural development and suggest the presence of more nuanced, non-linear relationships between regional development and cultural attention dynamics.
Sentiment Formation Mechanisms
Public sentiment toward interactive art reveals complex spatial patterns that defy conventional socioeconomic gradients. Our analysis uncovers distinct threshold effects where positive sentiment peaks at moderate, rather than maximum, development levels. Remarkably, regions with comparable infrastructure and economic indicators often generate contrasting emotional responses, particularly in technological appreciation and community engagement dimensions. This phenomenon suggests that sentiment formation operates through subtle cultural mechanisms beyond standard development metrics, warranting a closer examination of how institutional presence and local traditions collectively shape artistic reception across Jiangsu’s diverse cultural landscape.
Building upon the attention patterns revealed above, we further examine the sentiment formation mechanisms through Shapley value analysis. Figure 11 maps the differential impact of socioeconomic factors on interactive art sentiment across six dimensions, highlighting several key anomalies.

SHAP-based sentiment analysis in Jiangsu regions: (a) artistic technique, (b) thematic context, (c) audience engagement, (d) technological platforms, (e) exhibition contexts, and (e) artist profiles.
In Figure 11, the sentiment Shapley patterns reveal an intriguing inversion of conventional socioeconomic predictors. While education maintains strong predictive power for technological platforms (0.85) and artistic techniques (0.79), its influence on audience engagement sentiment shows unexpected negative correlation (−0.45). Cultural facilities density emerges as the dominant predictor for exhibition contexts (0.82) and thematic content (0.77), suggesting that institutional presence may play a more crucial role in shaping positive sentiment than previously theorized.
The analysis reveals distinct threshold effects in sentiment formation across Jiangsu’s regions. Most notably, the predictive power of household income exhibits a clear plateau effect around moderate development levels, particularly for artistic techniques and thematic content dimensions. This challenges conventional linear models of cultural development (DiMaggio et al., 1983). We observe that positive sentiment towards interactive art often peaks in areas with balanced distributions of educational and cultural infrastructure, rather than in regions with maximum development metrics. These patterns suggest a more complex relationship between institutional frameworks and public sentiment, where moderate levels of multiple factors may optimize cultural reception. The asymmetric distribution of predictive strength across dimensions further indicates that sentiment formation follows distinct pathways from attention patterns, potentially reflecting deeper sociocultural dynamics beyond simple socioeconomic determinism.
Limitation
The dataset’s geographic granularity, while enabling precise regional comparisons, may overlook nuanced intra-city variations in art engagement patterns. Platform-specific biases in user demographics could influence perceived attention distributions across socioeconomic groups. The proposed multi-modal framework prioritizes explicit sentiment expressions, potentially underrepresenting implicit cultural perceptions embedded in artistic discourse. The image captioning model, despite domain adaptation, occasionally simplifies complex interactive elements requiring specialized art knowledge.
Conclusion
This study pioneers a multi-modal spatial analysis framework for decoding public engagement with interactive art through social media, integrating computer vision, natural language processing, and spatial econometrics. Our methodology advances cultural analytics by capturing both explicit expressions and implicit patterns across visual-textual modalities, while establishing rigorous validation protocols for social media-derived cultural indicators. The empirical focus on Jiangsu Province provides a critical testbed for examining cultural dynamics in transitional urban-rural systems.
Three key findings emerge from our spatial-temporal analysis. First, the identification of polycentric cultural innovation clusters challenges traditional core-periphery models, with secondary cities like Changzhou demonstrating technological sophistication rivaling provincial capitals. Second, the discovered inverse relationship between technological adoption and community engagement reveals fundamental tensions in cultural development trajectories: urban cores excel in platform based interactivity while rural regions lead in participatory art experiences. Third, our Shapley decomposition reveals a sophisticated interplay between educational capital and institutional dynamics in cultural formation. The analysis demonstrates that while educational attainment initially drives technological appreciation in interactive art engagement, this relationship exhibits distinct threshold effects where institutional factors become increasingly dominant. This finding challenges linear models of cultural capital accumulation and suggests a more nuanced understanding of how educational and institutional resources interact in shaping cultural innovation. These patterns necessitate reconceptualizing cultural policy through the lens of spatial justice, advocating for differentiated strategies that acknowledge the complex interplay between regional capabilities, technological infrastructure, and community-based artistic expression while addressing the evolving nature of digital cultural divides.
The study’s platform-specific data scope and static analytical framework present opportunities for expansion through multi-source data integration and longitudinal modeling. Future research should explore dynamic interactions between physical art environments and their digital representations across evolving urban systems.
Footnotes
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the Philosophy and Social Science Bidding Project of Wuxi (No. WXSK24-JY-B08), Philosophy and Social Science Projects of Jiangsu Province (No. 2024SJYB0657). The authors acknowledge the above financial support.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Disclosure Statement
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
