Abstract
Aesthetic production, that is, the processing of material with a focus on the experiential and formal qualities of resulting objects and the process itself, encompasses basic dimensions of art, creativity, craft, and design. To explore these dimensions, we propose the Rubicon model of action phases as a general framework. Additionally, we introduce Schiller's aesthetics as an interactive account of formal/mental and material/physical aspects of aesthetic production and derive testable hypotheses from it. First, we expect form- and material-related experience to converge over an aesthetic production task; second, we assume that physical and mental actions occur with different prevalence across the action phases. These hypotheses were strengthened in a quasi-experimental mixed-methods study on a clay-molding task in an educational real-world setting (N = 30). The results suggest understanding aesthetic production as a dynamic intertwining of object-related and subject-related experience, action, and embodiment, which supports the transdisciplinary significance of aesthetic production for self-development.
Keywords
Introduction
The processing of material with particular interest in the experiential and formal qualities of both the emerging object and the process itself is a fundamental dimension of human life (McBrearty & Brooks, 2000; Nadal et al., 2009; Seitz, 2019). This process—in short, aesthetic production—can be further divided into artistic, creative, craft, and design processes according to partly distinctive and partly overlapping features. While artistic processes involve problem-finding and developing self-expressive solutions with various media or materials in an open-ended way (Botella et al., 2018; Pelowski & Chamberlain, 2023), the notion of creativity does not necessarily include these features but rather gravitates around original and appropriate productions, which also applies to purely cognitive forms of problem-solving, as in scientific creations (Lubart, 2001; Runco & Jaeger, 2012). Craft processes are stronger related to material affordances and bodily engagement and focus on the sensitive habitualization of skills through repetitive movements in dialogical rhythms between the maker and the material (Ross & Glăveanu 2023; Sennett, 2008). Although design processes are often considered to be distinct from craft due to their inclusion of theory, management, and strategy (Bean & Rosner, 2012), they also aim at concretely perceptible products, but are additionally specified by aspects such as functionality, usability, and safety (Carbon, 2019).
Aesthetic production encompasses these variations but even goes beyond them by including a basic everyday “dimension of experience, associated as it is with the creation of new forms and the production of objects as finished wholes” (Larrain & Haye, 2019; p. 8) and related implications for wellbeing, education, and therapy, for example (Wassiliwizky & Menninghaus, 2021). In their plea for empirical aesthetics, the latter authors especially call for increased basic research on the temporal dynamics of aesthetic activity but limit themselves to the cognitive processing of perceptual stimuli. Curiously, other scholars also mention aesthetic creation, but limit research to the reception and evaluation of sensory information from finished products or natural objects (Brielmann & Pelli, 2018; Chatterjee, 2011; Skov & Nadal, 2020). Of course, perception and judgment play a key role in aesthetics, but this is only one side of the coin, because the presence of finished objects is already presupposed, while their production possibly involves aspects beyond that. Hence, empirical aesthetics should be explicitly extended to aesthetic production to include agency as a key dimension and to meet the desideratum as completely as possible (Cupchik, 1992; Csikszentmihalyi & Schiefele, 1992; Koch, 2017).
In the context of pragmatism (Dewey, 1934), ecological psychology (Gibson, 1979), and enactivism (Varela et al., 1991), the dynamic interrelation of perception and action has been highlighted and further developed to an aesthetics “which emphasize(s) the way cognition is shaped by the embodied agent's interactions with the surrounding social and material world” (Lindblom, 2015, p. 4) and thus aims to overcome the indicated behaviorist and cognitivist limitations (Cochrane, 2008; Newen et al., 2018; Scarinzi, 2015). As an important aspect of enactive aesthetics, the experience of meaning-making is inextricably linked to what agents concretely do in artistic production: “‘Doing’ is inherent in creation […]. In production, doing and meaning coincide” (Tessarolo, 2015, p. 150; see also Koch, 2017; Yaniv, 2014). Against this background, we adopt the broader perspective of aesthetic production, but reserve a more precise clarification of the relationship between physical and mental forms of action beyond coincidence, inseparability (Malafouris, 2019), or “strict identity” (Kirchhoff & Hutto, 2016, p. 304). For despite all interactivity, forms of action should be analytically distinguished, as has been put forward for mental action (Brent & Titus, 2022; Fiebich & Michael, 2015) and, especially, in empirical context (e.g., Wagemann, 2022).
A first empirical approach to answering the question of what exactly people do and experience physically and mentally in aesthetic production, can be found in phase models of the creative process. In Botella et al.’ (2018) review no less than 12 phases were identified and even exceeded by the number of 17 phases extracted from their own interview study. While this reveals fine-grained aspects of the examined visual art processes, the authors concede that it might be difficult to derive a consensual understanding at a more general level. Aiming at an integrated process model of creative design, Howard et al. (2008) suggest analysis, generation, and evaluation as generalized process elements to extend the function, behavior, and structure model of design (Gero, 2004). Considering that aesthetic production involves action to a large extent, a connection can be drawn to Heckhausen and Gollwitzer's (1987) Rubicon model of action phases and its extension to the mindset theory of action phases (MAP; Gollwitzer, 1990/2012), which integrates motivational and volitional perspectives. This approach focuses on the change from a deliberative to an implemental mindset (i.e., the Rubicon to be crossed) distinguishing one predecisional and three postdecisional phases (Figure 1). Since the initial decision to engage in a creative or aesthetic process at all is not explicitly addressed by the creative process models, Howard et al.’ (2008) three-phase model can be provisionally mapped to the postdecisional phases of planning, acting, and evaluating in terms of the Rubicon model.

The Rubicon model of action phases. Adapted from Heckhausen & Gollwitzer (1987).
Having thus outlined a general and research-based account of temporal dynamics, we now need to clarify what is to be investigated as experiential content of aesthetic production. We refer—perhaps surprisingly in the empirical context—to Friedrich Schiller's aesthetic concept, which is not often but continuously discussed in philosophy, art education, and art therapy debates until today (Grossmann, 1968; Siegesmund, 2013). In view of the unclear taxonomy of aesthetic processes and from the perspective of enactive aesthetics, however, this connection lends itself if an integrative and interactive account of formal/mental and material/physical aspects of aesthetic production is sought. On the one hand, Schiller (2003) rejects Kant's (1998) position according to which the aesthetic experience of “beauty” is nothing but subjective taste based on natural mechanisms to be investigated scientifically—as echoed by proponents of empirical aesthetics; on the other hand, he does not subject “beauty” to the dictates of pure reason. Crucially, he dispends with object/product-oriented (naturalistic or idealistic) criteria in favor of the performative experience of aesthetic play, where influences from both sides interact (Schiller, 2004). He identifies these influences as the sensuous/material impulse and the form impulse, the former of which exposes the receptive individual to a changeable physical environment, while the latter pursues the motive of lending enduring determination to it. To avoid a one-sidedness of these impulses in the context of education, health, and personal development, they must be cultivated in mutual exchange: “In order not to be merely world, he [the human being] must lend form to his material; in order not to be merely form, he must make actual the potentiality which he bears within himself. He realizes form when he creates time and opposes constancy with alteration […]; he gives form to matter when he proceeds to annul time, affirms persistence within change” (Schiller, 2004, p. 53). The reciprocal interaction of form and material is driven by what Schiller calls the play impulse which aims “at the extinction of time in time and the reconciliation of becoming with absolute being, of variation with identity” (Schiller, 2004, p. 61) and thus constitutes a basic experiential dimension of aesthetic production (Lichtenstein, 2019; Svanøe, 2019).
To summarize, there are three points to note as to why Schiller's conception is well-suited to fuel an empirical study on aesthetic productivity. First, it allows for a balanced integration of creative intentions (Arango-Muñoz & Bermúdez, 2021; Seeley, 2013) and meaning-making (Lindblom, 2015; Rosenthal, 2004)—the form impulse—, on the one hand, and material-related aspects, such as affordance (Xenakis & Arnellos, 2013), effort (Rankanen et al., 2022), and difficulty (Pénzes et al., 2014)—the material impulse—, on the other hand. Second, the aesthetically productive individuals themselves are recognized as both embodied and mental agents capable of and responsible for their personal development (Kapitan et al., 2011; Larrain & Haye, 2019; Seel, 2014), which is anticipated in Schiller's notion of the “living shape” (“lebende Gestalt”, Schiller, 2004, p. 62) to be realized in aesthetic play. Third, in view of Schiller's paradoxical formulations of “creating” vs. “annulling” time or “the extinction of time in time” (see above) and in connection with the other two points as well as the MAP, testable hypotheses about the temporal structure of aesthetic production can be derived, which is explained below.
If there is an interaction of form-related and material-related aspects of experience in aesthetic production, as proposed by Bar-On (2007) in the context of a clay-molding task, it should lead to measurable effects over the temporal span of a concrete task. More specifically, Schiller's approach suggests that aesthetic production starts from a mindset in which form-related and material-related aspects are involved rather one-sidedly and separately, and that it should move to a more interrelated or integrated mindset in this respect. Therefore, we expect form-related and material-related aspects of experience to occur with decreasing distance over the action phases of planning, acting, and evaluating, although what is meant by “distance” remains to be clarified (Hypothesis 1). Because form-related and material-related aspects of experience are likely to be involved in different, that is, mental and physical forms of action, these should be distinguishable in appropriate data. Moreover, we assume that mental and physical action play different roles across the action phases and predict that mental action is more prevalent in the first and last phase (planning and evaluation) while bodily action takes charge in the second (acting) phase (Hypothesis 2).
To test these hypotheses, we conducted a practice-led, quasi-experimental study in which thirty participants were given the task of molding a hollow sphere from a specified quantity of clay within a limited period of 25 minutes. Methodologically, this obviously presents difficulties, such as obtaining both experientially rich and statistically valid data. Hence, to access participants’ aesthetic actions and experiences as directly and exactly as possible, they documented their processes both with concomitant individual notes and open-ended self-reports immediately after the trial. To meet cognitive science standards equally, the text data were analyzed using a sophisticated mixed-methods procedure involving in-depth qualitative coding, safeguarded by intercoder reliability (ICR) tests, and statistical tests based partly directly on quantitative aspects of the raw data and partly on quantified code frequencies. In the next section, this procedure is explained in more detail before results are summarized and discussed in view of other approaches, application in educational contexts, and prospects of aesthetic production research.
Quasi-Experimental Procedure
As mentioned, our study navigates between seemingly irreconcilable research traditions. On the one hand, there are qualitative studies that focus on the experiential aspects of aesthetic production in a fine-grained manner, but their results can hardly be generalized due to purely bottom-up coding and ethnographic context (e.g., Bar-On, 2007; Blumenfeld-Jones, 2016). For example, Bar-On's (2007) study provides valuable insights that make our hypotheses seem plausible but acknowledges that these findings need to be further explored through quantitative testing. On the other hand, quantitative creativity tests at the person or product level have increased, but they neglect the dynamics and experiential subtleties of the aesthetic process (Glăveanu, 2010; Kupers et al., 2018). Partially overcoming these limitations, effects of clay forming on emotion regulation were found in a mixed-methods study that deployed physiological measures and video-based recall interviews (Rankanen et al., 2022). But again, this did not capture the dynamics of experience and action during the process; and the combination of qualitative (e.g., emotional valence) and quantitative (e.g., arousal) findings led to partially inconclusive results. Therefore, as a first key feature, our work draws on a mixed-methods design which avoids such incommensurability problems (Small, 2011) through a stronger linking of qualitative and quantitative aspects.
Second, in view of the diachronic structure of action phases, a standard panel or time-series design (e.g., Fürst et al., 2012) would not be suitable, because real-life temporal patterns of experience and action cannot be decomposed into individual tasks about the action phases after each of which questions are answered. As highlighted in the context of self-regulation and learning (Schmitz & Wiese, 2006; Zimmerman, 2000), action phases proceed in recursive cycles of motivational and volitional mindsets and thus cannot be fitted into a predefined time frame but must be recorded in their individual progressions. This precludes data collection via questionnaires with closed questions, which would also have to anticipate the entire range of relevant aspects, which contradicts an exploratory study and would risk indirectly revealing hypothetical content to participants. Therefore, we opted for a combination of concurrent and retrospective data collection, which allows for a more direct, dynamic, and practice-based access to participants’ first-person perspective, as suggested by Werner's microgenetic method (Siegler & Crowley, 1991; Wagoner, 2009; Werner, 1956) or, more recently, in the context of task-based introspective inquiry (e.g., Wagemann et al., 2022; Wagemann & Walter, 2024).
Taken together, these aspects suggest a quasi-experimental one-group design as an initial, exploratory approach to assessing our research questions. While there is no independent variable randomly assigned to participants and manipulated by researchers, the positions of action phases in the protocol data, once identified, can be used as a person-inherent, quasi-independent variable, depending on which other experiential aspects are compared as dependent variables. This is justified in that we are not primarily interested in the effectiveness of a particular treatment or condition but rather in the temporal and structural dynamics of the treatment itself. However, the effects resulting from this study may provide insight into the general nature of aesthetic production and thus stimulate further studies with a full experimental design.
Settings, Materials, and Task
The study was conducted in June 2022 at Alanus University (Campus Mannheim) as part of an art minor at the beginning of a clay modeling course. A crafts room with six high tables was used, each allowing two people to work at a time, so that participants could observe each other while performing the task. Verbal communication was explicitly forbidden. On the one hand, this setting was chosen because of its ecological validity (Shiffman et al., 2008); on the other hand, the mutual influence of the participants is limited to external observation, and since the study focuses more on agentive and aesthetic experience than on physical action or social interaction, this seems to be a reasonable compromise. Before the task, participants were asked to answer an introductory questionnaire about clay: (1) What do you know about clay? (2) What experience have you had with clay so far? (3) What do you remember when you think of clay? (4) Add: Clay resembles … and describe why, (5) What do you like about clay? (6) What don't you like about clay? These questions had the function of directing participants’ attention to the specific material, opening up an experiential space for it, and thus balancing the form-oriented task. Subsequently, white clay, smooth rather than coarse and weighed out into 500-g portions, was handed out to the participants, along with the instruction sheet and the informed consent form. They were instructed to form from the clay a sphere as a closed hollow body, to use the complete material, and not to use any tools except their hands and the working table. The maximum time allowed was 25 minutes, during which the participants were to take notes on the following aspects: “Note down mental processes (e.g., thinking, feeling, and willing) and various phenomena, states, and processes in bodily terms. Note in bullet points and as precisely as possible which experiences you have in your material-related actions as well as in the mental and bodily realm during the process.” After completing the main part of the task, participants were given twenty minutes to describe in detail their experiences, actions, and bodily aspects, incorporating the notes they had made. They were asked to write in complete sentences and to write as much as they felt necessary to describe their observations as accurately and completely as possible. Some impressions from the trial are shown in Figure 2.

Impressions from the clay sphere trial. June 2022, © Alanus University, Campus Mannheim.
Given that clay work is most often situated in professional, educational, or therapeutic art or craft contexts aiming at specific aesthetic forms, manual skills, or utilization of products, this task is certainly not representative. Rather, it is uncontextualized and presuppositionless, as it does not require detailed instructions or the use of special tools and can be carried out by anyone without any prior knowledge or expertise. It is precisely for this reason that we have selected this task for this study in terms of its simplicity and reproducibility, but this does not preclude it from being integrated into clay courses, for example as an introduction, as in our case. In this way, the skills acquired with this task can be a first step for an in-depth engagement with clay sculpture, as it not only challenges problem-solving and material engagement but also promotes introspective abilities in the reflection of the aesthetic process. If this is desirable from a curricular or therapeutic perspective, the sphere task can be perfectly incorporated into larger projects.
Participants
Thirty undergraduate and graduate students from our Waldorf education programs (22 females, eight males) aged between 21 and 44 (M = 27.0) participated in the experiment, divided into three groups of eight to eleven persons. Although they were not art students in the traditional sense, as part of their teacher training they had to take comprehensive art courses aimed at understanding education as an artistic process and sensitizing them accordingly. The theoretical background of the study was not discussed with them before or during data collection. The sample size can be justified from both a qualitative and quantitative analysis perspective. For qualitative studies, recommendations range between 12 (Guest et al., 2020) and 25–30 (Dworkin, 2012) participants to achieve thematic saturation for bottom-up or data-driven coding. For assessing quantitative aspects of the data, such as word frequencies or other metrical variables derived from quantified qualitative codes, an a priori calculation of the minimum sample size, aimed at two-tailed dependent-samples t-tests with α = .05, a power of 1–β = 0.95, and large effect size d = 0.8, yielded 23 subjects (Faul et al., 2007). For chi-square tests with binarized code frequencies, a power of 0.8, α = .05 and large effect size w = 0.5, a sample size of 32 subjects was calculated, so that we can consider N = 30 as a reasonable compromise for this exploratory study.
Data Acquisition and Analysis
Data were collected via open-ended self-reports to be written during the trial (as short notes) or immediately after it (main protocols). This method was chosen because it has some advantages over other quantitative and qualitative methods and has been proven in our studies on first-person experience and agency (e.g., Wagemann, 2023; Wagemann & Raggatz, 2021). First, as already mentioned above, standard survey instruments such as questionnaires are ruled out since they cannot capture phase transitions, individual trajectories, and other, possibly unexpected aspects of participants’ experience in a fine-grained manner. For example, Schmitz and Wiese's (2006) study of learning acknowledges that measurements can only be taken before or after a training session, as additional measurement time points during the action phase could disrupt the process. Therefore, instead of using open-ended questions only to stimulate subjects’ self-reflexivity or direct their attention to a certain aspect, as in the mentioned studies, we also use them as the main instrument to record the action phases qualitatively and quantitatively and their temporal dynamics. As actions cannot be reduced to externally measurable behavior but also include forms of agentive experience, self-reports also seem appropriate in that the action phases can be identified via corresponding mindsets (Gollwitzer, 2012), which can be captured in this way.
Second, compared with other qualitative first-person methods, written self-reports are relatively unbiased as they exclude the researchers or additional staff from the recording process in contrast to interview techniques, for example. In this way, experimenter expectation (Salazar, 1990) or social biases (Sparby, 2022) can be prevented, apart from the pragmatic aspects of simultaneous data acquisition for all participants during and immediately after the task and resource-friendly further processing of the data already available in text form. The two-step approach (short notes/main protocols) was chosen to support participants’ memory to obtain reports that were as complete and diachronically adequate as possible.
As a preparatory step of analysis, the transcribed notes were used to check the completeness and order of statements in the main transcripts. While all statements, that were correctly transferred to the main protocols, were removed from the notes in order to avoid duplication of data and distortion of the action phase patterns, aspects that appear exclusively in the notes were retained. Self-reports adjusted in this sense ranged from 87 to 541 words (M = 2517) and were analyzed by a qualitative–quantitative mixed-methods approach. However, in contrast to traditional mixed methods approaches in which different types of data are collected and analyzed in different (concurrent or parallel) steps before results are integrated (Creswell, 2009), this approach circumvents the associated incommensurability problems (as occurred in Rankanen et al., 2022) by deriving qualitative and quantitative aspects from the same data source, as explained below. Given the quasi-experimental, one-group design of our study, we begin with top-down qualitative coding in terms of the action phases decomposing the individual protocols into three disjunctive partitions and serving as quasi-independent variables for subsequent analyses. These range from “early” to “late” forms of quantification, with the former directly addressing quantifiable aspects of the raw data and the latter building on further qualitative coding at other thematic levels. Here, “early” quantification of the qualitative data includes word frequencies of first-person and third-person pronouns as well as the relative positions of action phases in the protocols, while “late” quantification draws on the number of encoded data segments per protocol with a certain category or binarized code frequencies depending on whether a category occurs in a protocol or not. Analytic levels beyond the diachronic action phases (Level 1) refer to design strategy (Level 2), synchronic aspects of the subject-object relation distinguishing whether the two relata are experienced immersively (Level 3A) or appear as detached from each other (Level 3B). ICR tests were conducted for all coding levels relevant for statistical analyses. The coding levels and analytical stages are explained in more detail below.
Diachronic Action Phases (Level 1)
As indicated in the introduction, Level 1 categories are inspired by Gollwitzer's MAP, whereby they skip the predecisional phase (as this is already covered by participants’ decision to comply with the task) and only uses the following three phases (planning, acting, and evaluating) which, in addition, are adapted to the specific task (Table 1). Since MAP is supported by many studies and decades of research, the existence of distinctive action phases and corresponding mindsets can be taken for granted and thus inform our data analysis at a basic level. However, despite the clear definition of the action phases, their theoretically justified sequence with possible recursion (Gollwitzer, 1990; Zimmerman, 2000), and time series analyses based on standardized diary data (Schmitz & Wiese, 2006), there does not seem to be any empirical work on their “in-vivo” dynamics in real-world settings. Therefore, the first step needed was to identify the action phases in the data and examine them for their occurrence patterns. Based on Level 1 categories, 618 data segments (whole or partial sentences) from all protocols could be encoded without overlap covering 97% of the complete data (based on encoded vs. not encoded characters). The coding was initially done by the first author and then refined together with the second author before conducting an ICR test with another colleague who was not involved in the project. The ICR test included 100 randomly chosen and blinded data segments to be assigned to one of the three action phases based on the code definitions and examples of coded data. This resulted in a Cohen's kappa value of κ1 = 0.70, which already represents substantial agreement based on the conservative approach of Landis and Koch (1977). To further improve coding consistency, a feedback session was held to negotiate and, where possible, clarify the discrepancies in ratings (Campbell et al., 2013; O’Connor & Joffe, 2020). Of the original 22 discrepancies (Phases 1–2: 3, Phases 2–3: 13, Phases 1–3: 6), all but four were resolved, resulting in κ2 = 0.95 which is almost perfect agreement. All changes due to negotiated agreement were incorporated in the final coding before subjecting it to further analyses.
First Coding Level. Categories with Descriptions and Exemplary Excerpts From the Data.
As could be expected from theoretical remarks on recursion (see above), the occurrence of action phases was not limited to their typical position in the action sequence (preparation/anticipation–action/execution–evaluation/reflection) but was distributed across all parts of the protocols with a large variety of patterns (Figure 3). The diachronic patterns of action phases gave reason to analyze the exact positions of the action phases in the protocols and to check whether their mean positions can be statistically distinguished. To this end, each coded segment was assigned a relative position in the protocol (0%–100%) before means for the three action phases were calculated and their differences tested by dependent samples t-tests. Cohen's d was used for effect sizes of t-tests, averaging the two original standard deviations (instead of the SD of the difference values, which is less conservative; Cohen, 1988). Since one protocol did not contain data to be coded with the first phase (preparation, see also Figure 3), it was excluded from this and the subsequent analysis (N = 29).

Level1: diachronic patterns of action phases. Normalized portraits of all data sets (N = 30) with 3% uncoded data removed (yellow: anticipation and preparation, green: implementation, blue: evaluation and reflection). With a resolution of 40×40 pixels, the sequential structure of coded segments is normalized in each square and displayed in rows from upper left to lower right.
To further safeguard the structuring of the protocols through action phases, their associated mindsets were examined via an automated linguistic analysis of first-person and third-person pronouns. This idea is based on the difference between the preactional and postactional mindset, which involves a shift from the individual's preparatory actions (e.g., familiarizing with the material), weighing of design strategies, and assessing one's own abilities, on the one hand, to an evaluation of the quality of the outcome, in our case the shaped clay sphere, on the other hand. In other words, it is to be expected that in the preactional phase participants tend to refer to themselves, whereas in the postactional phase they are more likely to refer to the (more or less) completed object. In between, in the actional phase, individuals are completely absorbed in their object-related and goal-directed actions, thus adopting a mindset that lies between the first-person and third-person perspectives of the other two phases. To test this approach, following the Linguistic Inquiry and Word Count method of Pennebaker (Pennebaker et al., 2015; Chung & Pennebaker, 2007), first- and third-person pronouns were counted separately for all data segments coded according to the three action phases, and a ratio scale variable was introduced by dividing the absolute word counts for each protocol. Means of this variable for the three action phases were tested by dependent samples t-tests.
Given the theoretical and empirical validity of the diachronic patterns of action phases, their reliable coding in the qualitative data, and their statistical distinction in terms of protocol positions and first/third person pronouns, they are deployed as levels of a quasi-independent variable depending on which synchronic aspects are examined below. Regarding the results of the statistical tests, this step is subsequently justified in the results section. Since we are not concerned with causal relationships, but with the phenomenal structure of aesthetic production, we consider this approach appropriate (Privitera & Algrim-Delzell, 2019).
Design Strategies (Level 2)
An important qualitative aspect of the molding process varying over the protocols is the design strategy chosen by participants that led to the final objects. From singular references, the design strategy could be unambiguously identified in twenty-six protocols; an explicit coding of these fragmentary passages was dispensed with. Of the four remaining participants who were subsequently asked which of the identified strategies they had used, three provided useful information. Overall, we were able to distinguish four different design strategies and tested their correlation with demographic variables (age, gender) as well as code variables quantified from qualitative analysis. Regarding the latter, for example, a metric variable was derived by dividing the numbers of coded segments for mental action and bodily action (Level 3A, see below) per participant to determine any correlation with the strategy used, but did not get any significant results here, nor for the demographic variables. A single, though not statistically significant, tendency to distinguish design strategies emerged in terms of the relative proportions of action phases per protocol. Besides the distribution of the strategies across participants and their mereological and formal relationships, this aspect will be explained in the results section.
Synchronic Aspects of the Subject–Object Relation (Level 3)
To cover in detail all aspects of first-person experience reported in the protocols, the third level of analysis deals with synchronic aspects of the subject–object relation as the complement of diachronic phases. Here, categories emerged largely from bottom-up or data-driven coding which, however, does not exclude to happen upon some topics already known from our preceding work. One exception from this procedure concerns our research question about the experienced relationship between material and formal aspects during the task, which is why we set these as code categories in advance. Unlike all other codes, individual words were coded here, with adjectives where appropriate, to capture the positions of the coded segments and their distances in the data as accurately as possible, as explained below. In addition, coding reliability was improved here by including lists of words to be encoded regardless of context (form aspects: form/shape (German: Form), small, large, round; material aspects: material, clay, mass, surface, hollow).
A total of 25 codes were defined for Level 3 and grouped into six subcategories, which in turn were organized into two main categories (Figure 4). Regarding the main categories, aspects were divided into immersed (3A) and detached (3B) forms of experience or description of the subject–object relation. Based on the code book (Table 2), 95% of the whole data could be coded among all protocols, that is, 1740 segments distributed among 760 for Level 3A covering 84% of the data and 980 for Level 3B covering 22% of the data. Between Levels 3A and 3B, an overlap of 17.6% codings was identified, while intralevel overlaps of 9.1% (3A) and 6.1% (3B) were found. Coding overlaps refer to data segments which are coded with more than one category or at which multiple categories overlap to a certain extent. They can be explained by thematic relations between the codes or their co-occurrence in segments that cannot be further subdivided in syntactic phrases and by the fact that individual words were coded for material and formal aspects of the object (3B), resulting in a considerable number of segments spread over large portions of the otherwise differently coded data. Ultimately, overlap on this scale does not pose a problem and is unavoidable in a comprehensive qualitative analysis. With respect to quantitative analyses, coding overlap is not a problem if it is explicitly addressed for in the ICR tests, which were conducted on three sublevels here. The first ICR test referred to eight codes representing an overview of Level 3A (distal intention, proximal intention, physical action, mental action, agentive experience, sensory experience, form experience, and material experience). As in the above test, 100 segments were included, with the difference that they contained nine double codings, which led to κ1 = 0.73 (substantial agreement). For double coding, there were four complete matches, two partial matches, and three disagreements, with the latter and the other disagreements almost completely resolved in the feedback session, yielding κ2 = 0.99 (almost perfect agreement). Further ICR tests were conducted under equal conditions for the differentiated codes of physical and mental action. For physical action, this yielded κ1 = 0.82 and for mental action to κ1 = 0.86, both of which already indicates almost perfect agreement, which in feedback rounds was enhanced to κ2 = 0.98 both for physical and mental action. Any changes that resulted from the feedback rounds were incorporated into the final coding of the data.

Hierarchical category system and coding levels. Numbers indicate how many segments were coded according to a certain category in how many data sets (participants). Percentages indicate how much of the data was covered by the respective category.
Third Coding Level (A).
Note. Categories with subcategories, short descriptions, and exemplary excerpts from the data.
Dependencies of Synchronic Aspects on Action Phases
In view of our first hypothesis, segmental distances between material and formal aspects of experience were analyzed across the action phases. According to code definitions (Table 2), 465 segments were encoded for material aspects and 440 for form aspect. The ICR test resulted in κ1 = 0.83 (almost perfect agreement), which was optimized in one feedback round yielding κ2 = 1.0. Based on this, each sentence in the data containing at least one of these codes was assessed in terms of the distance of the coded segment (material/form) to a complementarily coded segment (form/material) by weighing it with one of three values. A maximum distance (dMF = 2) was assigned if a complementarily coded segment could only be found in another (adjacent or more distant) sentence. A medium distance (dMF = 1) was assigned if at least two complementarily coded segments occurred in the same sentence. A minimum distance (dMF = 0) was assigned if at least one double coding (coincidence of material and form) occurred in the sentence, as in the case of terms such as “sphere” or “hollow body” if they are to be understood as objects that are already shaped to some extent. These sentence weights were separately collected for the three action phases and averaged per data set resulting in three values per participant/protocol which represent the action-phase specific distances of material and form as a metric variable and were further processed with t-tests. In addition to the above-mentioned exclusion of one data set, a further one was excluded here due to the lack of material and formal aspects in the evaluation/reflection phase (N = 28).
To further contextualize the resulting differences in material-form distance across action phases, we examined corresponding dependencies of other Level 3 codes in various combinations (Level 3A overview, physical action, mental action). For this purpose, the code frequencies were binarized per data set, thus converted into nominal variables, and examined with chi-square tests or, for absolute frequencies below five, Boschloo's (1970) exact test. In the latter case, the odds ratio (OR) was used for effect size corrected according to Haldane (1940) and Anscombe (1956).
Results
Qualitative Coding and Intercoder Reliability
To begin with the qualitative analysis: The rich and complex information contained in the protocols could be successfully coded in a structured category system using both top-down and bottom-up strategies (Figure 4). To fully encode the data, some categories were considered that are not directly relevant in the context of the research questions and therefore are not pursued further in statistical terms (design strategies, further thoughts, further emotions). As a sensitive link between qualitative and quantitative analysis, a high reliability of the statistically relevant coding levels was demonstrated. For the action phases, the relatively low kappa value of the first ICR test (κ1 = 0.70) suggests some ambiguity in the code definitions despite substantial agreement. These could be clarified predominantly by not rashly identifying the past tense of language used in many protocols with the reflective/evaluative mindset of the third phase, but by considering more closely the broader context of individual segments for coding. Thus, for example, emotions as well as bodily and material experiences were assigned to the second phase if they were described as directly accompanying the action and did not involve strategy- or object-related judgment. These and similar insights prompted us to adjust and refine the code definitions, eventually leading to excellent kappa values as the basis for the statistical analyses.
Discrimination of Action Phases
As a first quantitative result, the relative mean positions of action phases were calculated by averaging the relative positions of all correspondingly coded segments in all protocols (see Diachronic Action Phases (Level 1) section). In dependent samples t-tests, they proved to be serially significantly different, for preparation (M = 0.30, SD = 0.14) and implementation (M = 0.58, SD = 0.10), t(28) = 7.96, p < .001, d = 2.07, for implementation and evaluation (M = 0.68, SD = 0.12), t(28) = 3.84, p < .001, d = 0.90 (Figure 5). Even the consideration of a Bonferroni-adjusted level of α = .025 due to double use of the implementation phase (Bland & Altman, 1995) does not affect this result. Thus, as can be surmised from the coding reliability, the discriminatory power between the first two action phases is twice as high as between the last two phases, both based on large effect sizes (Cohen, 1988). Interestingly, this tendency could also be shown for 1PP/3PP, that is, the relationship between first- and third-person pronouns, across the action phases. As expected, 1PP/3PP decreased across action phases, between preparation (M = 0.48, SD = 0.28) and implementation (M = 0.34, SD = 0.17), it decreased significantly, t(28) = 2.63, p = .013, d = 0.56, but only marginally significantly between implementation and evaluation (M = 0.27, SD = 0.18), t(28) = 1.82, p = .079, d = 0.40. Thus, at least in principle, 1PP/3PP can be used as an additional measure to discriminate action phases in verbal self-reports.

Level 1: relative positions of action phases in the protocols. Values averaged across protocols (N = 30), means with confidence intervals, ***p < .001.
Varieties of Design Strategy
As indicated above, four different design strategies can be distinguished in the data. While on the one hand the uneven distribution of the strategies is obvious (Figure 6a: A and B in relation to C and D), on the other hand their mereological and formal relations are compelling (Figure 6b). The first two strategies, mentioned together in 26 protocols, have in common to form the object from the whole mass, either starting from a solid sphere that is hollowed out or a curved surface whose opening is successively reduced and closed. The second and third strategy combines starting from planar elements, be it a holistic one or two parts that are then put together. The last two strategies, mentioned together in 3 protocols, agree in that they start from parts to create the whole of the sphere; they differ only in the number and geometric definiteness of the parts. Although the absolute frequencies are not sufficient to detect statistical effects, a correlation of the strategies with relative proportions of action phases per protocol can be suggested (Figure 6c). More specifically, Strategies C and D appear to involve a reduced evaluation and reflection phase compared to Strategies A and B, whether in favor of an increased preparation and anticipation phase (Strategy C) or an increased action and implementation phase (Strategy D).

Level 1: design strategies.
Regarding the final products, we did not find any noticeable differences that could have indicated the strategy used in their production. Due to the weighed material, similar hand sizes of the participants and the final smoothing of the surface, all spheres were almost identical in size and appearance, as can be seen (at least partially) in Figure 2.
Synchronic Aspects of Subject–Object Relation Across Action Phases
As a central aspect of this study, the segmental distance of material and formal aspects dMF was introduced by quantifying three different constellations (codes in adjacent sentences, codes in the same sentence, double coding in sentence) into corresponding values (maximum distance dMF = 2, medium distance dMF = 1, minimum distance dMF = 0) and averaging them per protocol and action phase (see Dependencies of Synchronic Aspects on Action Phases section). As a major result, dMF decreases significantly from the first action phase (M = 1.43, SD = 0.58) to the second (M = 1.01, SD = 0.32), t(27) = 3.47, p = .002, d = 0.86, and from the second to the third (M = 0.71, SD = 0.46), t(27) = 2.79, p = .010, d = 0.76 (Figure 7). Again, the consideration of a Bonferroni-adjusted level of α = .025 does not change this result. For this analysis of detached aspects of the subject–object relation, too, there seems to be a lower discrimination between the implementation and evaluation phase than between the first two phases.

Segmental form-material distance across action phases. Values averaged across protocols (N = 30), means with confidence intervals, **p < .01.
With respect to immersed aspects of the subject–object relationship (Level 3A), as measured by binarized code frequencies, three subgroups of categories were explored (Table 3 and Figure 8: overview of immersed aspects, Table 4 and Figure 9: subforms of physical and mental action). As illustrated in the figures and summarized in the tables, significant differences in most categories across action phases were found. Additionally, to further address the hypotheses, phase-specific differences were evaluated for certain pairs of categories. For form experience and material experience, a significant difference was found in an exact Boschloo’s test for the first phase (MForm experience = 0.03, MMaterial experience = 0.33), p = 0.003, OR = 10.1, differences in the other two phases were not significant. For physical and mental action, differences were significant for the first phase (MPhysical action = 0.33, MMental action = 0.60), χ2(1, N = 30) = 4.29, p = .038, w = 0.27, for the second phase (MPhysical action = 0.97, MMental action = 0.33), χ2(1, N = 30) = 26.4, p < .001, w = 0.66, and for the third phase (MPhysical action = 0.23, MMental action = 0.50), χ2(1, N = 30) = 4.59, p = .032, w = 0.28.

Level 3A: synchronic/immersed aspects of the subject–object relation across action phases (overview). Differences of binarized and relative code frequencies computed with chi-square tests, for absolute values below 5 (distal intention, mental action) with exact Boschloo’s test, *p < .05, **p < .01, ***p < .001.

Level 3A (details): bodily and mental action across action phases. Differences of binarized and relative code frequencies computed with exact Boschloo’s test, for absolute values above five (body parts) with chi-square test, *p < .05, **p < .01, ***p < .001.
Discussion
As shown, the data emerging from a simple, standardized clay molding task allowed for both open and differentiated bottom-up categorization of experiential and agential aspects and focused top-down coding of hypothesis-related content. Presumably, without the qualitative, first-person point of departure and the approach of late quantification, this would not have been possible, which in retrospect supports our methodological decision. This is especially true for the detailed interplay of the action phases used as a quasi-independent variable and the other experiential dimensions of aesthetic production that vary across them. To our knowledge, this is the first study to capture and compare action phases qualitatively and quantitatively during a real-world task at both the interpersonal and intrapersonal levels—and the first study which applies action phases to the topic of aesthetic production. Even without being able to fully exploit all aspects arising from the data and analyses here, an initial assessment of the hypotheses can be made before we discuss the results in the context of other approaches.
The first hypothesis that material-related and form-related aspects of experience converge across the action phases seems to be strengthened based on high ICR and significant differences of segmental distances of detached expressions in the protocols. At Level 3A of immersed aspects, this is supported by the significant increase of form experience across the phases, as this category focuses on already implemented formal aspects (Table 2 and Figure 8). In contrast, material experience first increases and then decreases, although not significantly, which could be understood as a maximum material involvement in the actional phase, which is balanced by form experience in the evaluation phase. While the significant difference between (lower) form experience and (higher) material experience in the first phase again supports the hypothesis, the (nonsignificant) differences in the other two phases cannot be interpreted in this way. This is because, for example, segments double coded with form experience and material experience may have a distance of both dMF = 1 or dMF = 0. Therefore, measuring the distance between form and material aspects at the immersed level is less accurate than at the detached level, justifying the adoption of the latter measurement method.
In terms of Schiller's conception, this can be interpreted as an increasing interaction and mutual influence of the material impulse and the form impulse during the process of aesthetic production. According to our framework of segmental distance, the dynamics of aesthetic play resulting from the form-material interaction can be illustrated by some paradigmatic data examples, as follows. Here are two sentences with dMF = 2: “To start the process, I need a high force impact on the clay [material]”, “… how exactly do I design the hollow body [form]?”, next one sentence with dMF = 1: “I took the mass [material] in my hand and shaped it with my hands so that I felt it was round [form]”, and finally, one sentence with dMF = 0: “At this, an admiration arises that the sphere [material & form] is now hollow [material & form] on the inside and that I can now no longer see this from the outside.” (Note the different coding of “hollow” according to the context.) Moreover, albeit this interaction dynamic was derived from the whole sample, the diachronic patterns of action phases (Figure 3) suggest that it unfolds in quite individual ways, be it with a varying number of changes and more or less balanced regarding the phase types.
Coming to the second hypothesis, physical and mental action could be reliably identified and distinguished in the data, which is also true for their data-driven defined subcodes (Figures 8 and 9). In comparison, it is first noticeable that, while physical action occurs more frequently than mental action in absolute terms (Figure 4), the latter has more differentiated forms (Figure 9). Although, for example, the category “body parts and physical support” could be further split up into gestures performed with specific fingers and movements, this would be limited to this specific task and material. In contrast, the subforms of mental action are independent of these constraints and can in principle be transferred to other forms of aesthetic production. This is a first indication of the different roles of physical and mental action in aesthetic production, which are more closely characterized by their dynamics across the action phases which, as expected, showed some significant changes. While physical action is clearly predominant in the actional phase, mental action exhibits a complementary prevalence, although the latter is substantiated by a lower significance level. And while the lower differentiation of physical action is reflected by a corresponding characteristic at subcode level, a more complex picture emerges for mental action. There are no forms of mental action that are equally pronounced across the three phases, although all forms have a component in the evaluation phase that is relatively stable at a low level. Regarding the other two phases, attention regulation constitutes mental action in the actional phase, and strategic consideration followed by vision formation and decision in process flow constitute the preparation phase. Phase-specific comparisons revealed significant differences for all phases, which, on the one hand, simply reflects the definition of action phases but, on the other hand, supports the more detailed explanation based on subcodes. Overall, the second hypothesis can also be considered strengthened.
This suggests that aesthetic production goes beyond sensory perception and reactive behavior by including differentiated forms of intentional—and, as the self-reports demonstrate, potentially conscious—action mediating between formal and material aspects of experience. Thus, this finding draws attention to the agents of aesthetic production, who not only bring external material and certain forms into encounter, but also realize themselves by acting out different dimensions of their creative potentials. In analog to the object-related connection of form and material, the interplay of mental and physical action can be interpreted as a subject-related expression of dynamic embodiment. This can be seen, on the one hand, from the significant predominance of one form of action in each of the three phases, and, on the other hand, from the fact that both forms of action also occur in the phases that are atypical for them. The convergence of formal and material aspects of experience is thus accompanied or even constituted by an oscillatory change of weaker (mental) and stronger (physical) embodied forms of action. According to the variability of mental action, different degrees of embodiment can even be assigned to its subcodes, in that attention regulation would be maximally embodied, whereas a low or absent component of mental action in the actional phase might indicate minimal embodiment. While, in cognitive science debates, proponents of strong (e.g., Gallagher, 2018) and weak embodiment (Tirado et al., 2018) position themselves against each other, the outlined option of dynamic embodiment has received little attention, except in social cognition, which in this regard shares similarities with aesthetic action (Wagemann et al., 2022).
Against this background, aesthetic production as an intertwining of two orthogonal (object-related and subject-related) forms of interaction can also be related to mimetic learning. In terms of educational anthropology, mimetic processes are featured as a kind of embodying knowledge which is driven both by “means of copying, ‘wanting to become like’, ‘becoming similar to’” and “expressing, ‘bringing something into being’ or even anticipating something that does not yet exist” (Kraus & Wulf, 2022, p. 9). In view of the clay molding task, the former can be related to the prescribed form (hollow sphere) to be realized in the material, while the latter describes participants’ intention to realize this goal through their own (physical and mental) actions. Beyond all possible forms that are instructive for mimetic or aesthetic action, however, what forms the common benchmark of object- and subject-related interaction is the “feeling of unity” which refers not only to the completed external object but also to “ourselves as aesthetic objects” (Larrain & Haye, 2019, p. 7). This explains why aesthetic production is catalytic for self-development, as already Steiner (1965) and Dewey (1934) put forward, and therefore has been deployed in all cultures since time immemorial, especially in educational, therapeutic, and spiritual contexts. While today, however, most school curricula focus on scientific-intellectual content and skills and marginalize art education (Downing, 2004; Eisner, 2002; Sehgal-Cuthbert, 2014), Waldorf education, for example, explicitly takes advantage of artistic and craft activities to strengthen the individual will as a means of self-development and embodiment (Goldshmidt, 2017; Rawson, 2019) and even to foster intellectual capacity through the practice of creative and mobile thinking (Oberski, 2006; Telfer-Radzat & Brouillette, 2021). Another example is “Makerspaces” as educational environments in which self-regulated learning is encouraged through mutual transformations between ideas and materials (Hira & Hynes, 2018; Santos et al., 2022); and even in business education, the experience of working with one's own hands is understood as a self-agentive and meaningful way of accessing the world (Crawford, 2010). This cross-curricular dimension of aesthetic production seems to be confirmed by our study and can be explained mainly by the close and dynamic interlocking of physical and mental action accompanying the convergence of formal and material aspects of experience.
From a psychological perspective on aesthetic production, as shown, motivational and volitional aspects of action unfold across diachronic phases which have been limited here to planning, acting, and evaluating. Although the predecisional phase was omitted, which could be criticized due to its importance for thematically open artistic processes, this simplification led to a structural account of the complex cognitive, enactive, and embodying dynamics of aesthetic production. Even without content-related problem-finding and new creation, the criterion of originality (Lubart, 2001) in our setting is partially fulfilled by the individual use of design strategies, manual techniques, and mental actions which, moreover, were not limited to predetermined temporal positions or patterns but independently realized during task performance. This confirms once again that originality does not only refer to a finished, representational artifact but is also a dimension of the aesthetic process experience itself. Furthermore, the found structure can serve as a post hoc justification of the task, as although the latter did not cover all aspects of art, creativity, craft, and design, it was clearly capable of evoking experiences and actions in the participants that address some of their common aspects. In other words, generalizability across these fields is supported by the dynamic form-material interaction and convergence as well as the complementary role of mental and physical action across the action phases. While the less differentiated physical forms of action were constrained by the clay task, mental action exhibited a greater variety of subforms that are in principle independent of clay work. Therefore, not only context- and material-specific but also general forms of agency in aesthetic production were involved.
To further expand this perspective, future work could implement more open tasks and thus include not only predecisional stages (e.g., problem finding) but, for example, also incubation and ideation as more nuanced preactional stages, which are crucial for creative or artistic performance (Botella et al., 2018; Carson, 1999; Osborn, 1963). If these and other phase models of creativity are aligned to the Rubicon model of action phases, whether in whole or in part, the latter can serve together with our empirical concretization of Schiller's conception as a hypothetic basic structure of aesthetic production. Compared to purely quantitative standard methods, however, this would require a stronger inclusion of the first-person perspective, but without falling back into the other one-sidedness of purely qualitative or phenomenological accounts (e.g., Bar-On, 2007; Blumenfeld-Jones, 2016). Rather, we recommend the methodological approach advocated here as a customizable mediation between these two paradigms that can be focused in one direction or the other, depending on the research question. However, we consider it crucial for any mixed-methods approach that it should relate as directly as possible to the experience and action of aesthetically productive individuals to contribute to a further clarification of the nature of aesthetic production.
Footnotes
Consent for Publication
Participants consented to the publication of their anonymized verbal reports in research contexts.
Data Availability Statement
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical Statement
This study has been approved by the Institutional Board of Alanus University (Campus Mannheim). Participation in this project involved no risks that went beyond the risks of normal life. Participants received information about the study and its purpose and participated voluntarily.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
