Abstract
The assessment of learning in the ethical domain is one of the most complex aspects to attend in the educational context. In recent years, character education has contributed greatly to different social disciplines, such as education or nursing. However, the development of this approach has run up against several obstacles and limitations, as there is little evidence regarding its long-term effectiveness or its evaluation. This essay aims to identify some of the main difficulties to assess learning in the ethical domain, as obstacles and possible constraints to Aristotelian-based character education. Methodology is analytical and of a philosophical-educational nature, based on which an argumentative analysis is constructed from the bibliographical review of the contributions of classical and contemporary authors. Results show the existence of four major problems and ten associated subproblems, both of an external nature linked to the contextual factors of assessment, and of an internal nature that affect the essence of the process itself, which highlight the difficulty of carrying out assessments of an ethical kind. Far from proposing a pessimistic position, we argue a realistic vision that allows educators at different educational levels to be aware of the limits and critical points of evaluation in the ethical domain.
Introduction
The growing concern for character education in recent decades has been accompanied by a multidisciplinary analysis that has highlighted different challenges and issues involved in it. 1 Perhaps one of the most important is that of how to assess it, 2 namely, how well educators can accurately determine the degree of a learner’s development toward the fundamental learning objective in order to ascertain the progress made, assess the effectiveness of the methods used and discover new areas for improvement. However, although the field of education is used to evaluating, when it deals with character, a set of obstacles arise that call into question the possibilities of putting it into practice. With a few exceptions and for various reasons,3,4 the desirability of its assessment is not usually discussed. Educators and researchers need to be able to measure if they are to judge and make decisions responsibly. If they were unable to measure or to quantify, they would miss out on a major criterion to understanding reality. Beyond a doubt, measurement helps educators make diagnoses, to take the pulse of reality, so the discussion should instead be situated on its possibility, that is, the means and methods available to carry it out with the rigor, validity, and reliability that it demands.
Although not yet very numerous, noteworthy proposals are emerging that address this issue. Among the first initiatives with an all-encompassing character is the well-known manual from positive psychology, proposed by Peterson and Seligman 5 in the Values in Action (VIA) project, which had the dual objective of defining and classifying virtues and character traits, as well as evaluating them using different tools aimed at different age groups. More recently, the Jubilee Centre at the University of Birmingham 6 developed another manual presenting different strategies for character assessment in two categories: Character caught and Character taught, especially focused on UK schools due to its link with the National Curriculum and other official documents. Despite this, the research that is beginning to proliferate on the effectiveness of character education programs tends to include evaluations with a greater or lesser degree of standardization and various methodological approaches - quantitative, qualitative or mixed – which in most cases lack a critical view of the limitations of the method used.
Among the factors behind the trend towards measurement on the ethical level is the emergence of new challenges of a moral kind due largely to the development of technology 7 and its various applications, significant among which is the health field,8,9 as well as a concern for the professionalization and improvement of patient care. 10 Also noteworthy is the unexpected confluence between the concern for scientificity and empirical research on the human reality of psychology and the reflective method of deep inquiry from philosophy, which has brought about a fruitful encounter between psychologists and philosophers. In Snow’s words, ‘The change is this: moral philosophers are now no longer free to ignore empirical science in their work, as had been the norm from time immemorial’, 11 p. 340. This confluence aspires to overcome a relativism that has been assumed by much of psychology and has led to the acceptance of eudamonic conceptions that allow the shared identification of objective and subjective criteria of flourishing, well-being, and satisfaction. Such is the case of Self-Determination Theory, postulated by Deci and Ryan, in collaboration with Aristotelian-oriented philosophers such as Curren. 12 Along similar lines is a recent project by Wright, Warren and Snow 13 that is also Aristotelian-oriented and based on Whole Trait Theory. Their work directly addresses the challenge of the evaluation of virtue, focusing on its most controversial aspect: its ethical dimension. This entails elements added to the assessment of other virtues or character strengths such as the instrumental ones, 14 or the intellectual ones, which have received greater attention,15,16 reaching prominence in international studies such as the recent proposal for the assessment of creative thinking by the OECD. 17
However, many issues remain unresolved. Moreover, there is neither consensus nor a definitive proposal to meet the demands of ethics education, especially that of Aristotelian inspiration. 18 In the field of healthcare and other areas, 19 this poses significant difficulty for student character formation and the guidance and regulation of practicing professionals. Indeed, alongside the contributions of virtue ethics and its resurgence in the mid twentieth century20,21 with the work of Anscombe, 22 Carr 23 and Kristjànsson, 2 among others, the Aristotelian conception of virtue has compounded the difficulties of assessment. 4 Rather than detracting from the value of character education, this implies greater responsibility for researchers from different disciplines. Therefore, this article aims to identify the main problems to assess learning in the ethical domain as obstacles and possible constraints to Aristotelian-based character education, starting with the existing literature and the possible solutions provided. Based on the Aristotelian character education approach, it is necessary to consider the importance of the development of virtues to achieve flourishing and growth in a moral sense. However, this process could be challenging. Some of these difficulties are external and contextual in nature, while others run deeper and affect the very essence of ethical assessment. Addressed below are four of these difficulties and ten associated subproblems, as well as the key arguments of each position and examples for gauging their extent. Far from taking a pessimistic view, the intention is to contribute to a clearer glimpse of a possible horizon in the ethical evaluation of character education, albeit prudently and in full awareness of its limits.
Difficulties in assessing virtue
The obsessive preoccupation with measurement
Nowadays it seems everything needs to be put to evaluation. It is often said that whatever is not evaluated is devalued, perhaps expressing a throwback to the well-known words of the physicist and mathematician William Thomson Kelvin:
24
In physical science a first essential step in the direction of learning any subject is to find principles of numerical reckoning and practicable methods for measuring some quality connected with it. I often say that when you can measure what you are speaking about and express it in numbers you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind: it may be the beginning of knowledge, but you have scarcely, in your thoughts, advanced to the stage of science, whatever the matter may be.
By this logic, anything that has not been measured in numbers must consequently be relegated to the background, among the subjective, the secondary, the questionable, and the variable. It is imperfect, insufficient knowledge, fragile, and unsure, its value debateable. In contrast, something easy to measure rises to a higher category: it enters the realm of the objective, the permanent, the indisputable, the sober, and the rational, where everyone agrees and there is no room for doubt. Two plus two is always four, whoever says it, wherever and whenever they say it. It is a universally valid statement, an exact, reliable science, with no surprises, nuances, conditions, or twists. Therefore, to the extent that human beings aspire to regulate their existence and, especially, that which they prize the most, they will be pushing forward to the kind of improvement Kelvin spoke of.
Today there is a primacy of the measurable, the objectifiable and the quantifiable, in what some authors call the age of measurement. 25 Governments carry out measurements at their schools and include them in their legislation with increasing assiduity. Even international organizations such as the OECD measure in order to compare and establish rankings such as the PISA report, guiding the educational policies of states. 26 To them, anything that does not pass through this sieve conscientiously must be rejected as ungraspable, cast aside in the attic of second-hand objects that bear the imprint of someone else’s use. Perhaps due to the powerful postmodern influence of what have been called pedagogies of suspicion, 27 anything requiring human mediation is frowned upon for being indirect, biased or, even worse, manipulated, thus losing its original brilliance and purity, its authenticity and its capacity for truth. Following Kelvin's logic, these phenomena are degraded precisely because of their impossibility to be measured directly and because they require a human interpretation that becomes irremediably suspect of being a malicious, self-interested intervention.
Numbers exude an incomparable aura of neutrality and rationality. They constitute a label with the power to conceal and blindfold, like Hans Giengen’s 1543 statue of justice in the Gerechtigkeitsbrunnen in Bern (Switzerland), considered the paradigm of impartiality, an emotionless void that keeps complementary judgments from diverting the gaze from the main objective. Few have described this situation better than Primo Levi, 28 a prisoner in Auschwitz whose testimony describes how the Nazis used the numbers printed on his clothes as a rationalizing weapon, as a form of nudity and depersonalization, thereby stripping him of his appearance of individuality, and turning him into a mass, one among many in a long line of others. The switch from a proper noun to a number is a change in nomenclature, like that of making a person a subject. This fulfils various non-neutral functions in very different areas of human life. This is therefore indicative of what can be called the subproblem of the omnipresence of quantification as a suspicion of human intervention.
The concern for measurement also refers to an assessment of the usefulness of things, that is, we measure something to find out how useful it is. What is the capacity of this hard drive? When deciding which computer to buy, a potential customer wonders if the more gigabytes it has, the more useful and valuable it will be. The customer then makes a choice based on that number accordingly. Furthermore, that number lets the customer compare it with other objects or, rather, with its degree of usefulness, in that the object takes second place to its function. However, there are many authors who speak today of a narrow perception of utility,
29
precisely insofar as this concept has been oversimplified by reducing it only to what can be measured and quantified, judging it only by the results or benefits it provides.
30
In other words, utility has become a higher value, giving rise to utilitarianism and its variant instrumentalism, which value things exclusively for what they can provide, for what they produce externally, without considering the intrinsic value they possess in themselves. It is true that external performance is more easily observable than internal performance. However, the fact that there are instrumentalizable objects, which can be treated only as means, does not mean that every object, let alone every person or every human phenomenon, can be assessed only for what it produces externally. In this sense, the mathematician Bertrand Russell,
31
also pointed out: The fact is that moving matter about, while a certain amount of it is necessary to our existence, is emphatically not one of the ends of human life. If it were, we should have to consider every navvy superior to Shakespeare.
And yet, many of the most important things in life resist measurement or utilitarianism; they defy being fully objectified, measured and quantified for various reasons. One reason may lie in the fact that not everything can be objectified or, to put it another way, many human affairs – and especially the ones that are uniquely human – can be seen from different points of view and are therefore incomparable. For them, there is no single valid solution, and some are essentially debatable, such as the fact that there is no single good way of being happy or attaining happiness and fulfilment. Rather, there can be different worthwhile progressions, though without it being possible to affirm the opposite, namely that all forms of life are ethically good.
The second reason is that not everything can be measured or quantified or, at least, not easily, clearly, or accurately. Therefore, something impossible to measure may nevertheless be good; indeed, it may even the best thing people can have and do. If Kelvin’s premise were taken at face value, an inability to measure some things would necessarily demerit them. Put the other way around, it would mean that the ability to measure something makes it more valuable, which is patently absurd and shows at the same time that the value of something cannot be reduced to the external characteristics of the measuring agent. Its value does not equate to someone else’s ability to measure it. In some cases, it is hidden within itself or inexpressible in numerical terms. Its value may be said to be incalculable. For example, love is not valuable because it can be measured, but precisely because of the opposite. When one loves, one gives oneself to the other without measure. Augustine of Hippo said ‘Love, and do what you Will’, which does not mean loving without control or irresponsibly, but that the value of love is so great and encompasses so many dimensions of human life that if one truly loves, if everything is articulated on the axis of love, everything else loses importance, its value is subordinate to that of love, which surpasses everything. Thus, he refers to a difficulty in quantifying, in reducing to quantities or numbers that which has more to do with quality. Although there may be different types of love, no one in their right mind would say to their loved one: I love you 75% or 95%, precisely because this measurement detracts from the value of such love, it undermines it. Moreover, love does not have a usefulness outside of itself, it does not submit to utilitarianism. Bernard of Clairvaux
32
expresses it clearly: Love is sufficient of itself, it gives pleasure by itself and because of itself. It is its own merit, its own reward. Love looks for no cause outside itself, no effect beyond itself. Its profit lies in its practice. I love because I love, I love that I may love.
This second subproblem may be defined as that of forgetting the intrinsic value of some things. However, this first difficulty for ethical evaluation is of an external nature, i.e., it refers to the context in which the evaluation takes place and not to the action itself. Therefore, though it cannot be said to be completely insurmountable or incapacitating for the evaluator, it is a circumstance to be taken into account because of the risks it entails.
Multidimensionality and breadth of character
A second problem is the complexity of the object of evaluation. Human nature is not a simple or one-dimensional issue, but is composed of different factors that interact dynamically with each other, 19 which some authors have called the integration thesis. 13 From an Aristotelian perspective, it involves cognitive, behavioural and emotional elements, so, to avoid previous errors in the evaluation of moral development, evaluators should assess knowledge, behaviours and affects, although the same instruments cannot be used to do so.
Educators can design a test on theoretical content that demonstrates what their students know about ethics. For example, they can be asked to describe the principles of bioethics, discuss the challenges of human experimentation, or define the idea of an ethically competent nurse. 10 An observation can be made about how students behave, how they solve practical cases 9 and work on teams, 33 how they react to unexpected situations, how they cope with the suffering of others, etc. It is also feasible to have them write an essay about their emotions or affections, desires, admirations, or opinions about potential conflicts between the legal and the ethical; 34 to explain in an individual or group interview why they consider an action fair, which virtues they appreciate most, for which people they feel greater empathy or admiration or with which characters in a story they identify the most. Indeed, without considering these three dimensions, it would be unclear that the matter at hand is ethical character, or that virtue is possessed. 35 Theoretical knowledge of virtue without its corresponding practice proves insufficient and superficial. However, even if virtue is practiced, if there is no conscious and decisive action of the will, an attachment derived from the conscious recognition of its value, the practice is weak and unstable. On the other hand, if a virtue is unknown, even if there is a will to practice it, behaviour is erratic and random, only successful by chance, and the will (volition) progressively weakens from disorientation and frustration. This subproblem could be called the triple dimension of character.
Moreover, different types of virtues may require different forms of assessment. Intellectual, moral, civic or instrumental virtues may require different acquisition times, instruments and assessment agents, and even be assessed in different contexts through different activities. 36 Wright, Warren, and Snow 13 even identify various degrees of exigence for two types of virtues: those of high fidelity and others of low fidelity, in which the threshold of virtue expression by which the person can be considered virtuous varies. For example, critical thinking can be subject to evaluation through an individual activity in the classroom such as reading and commenting on a scientific article or literary text; the ability to cooperate will be difficult to evaluate if it is not in a group setting, with the participation of other students; a moral virtue such as compassion will require facing a situation in which another person needs help, as in a service-learning project carried out off campus. 37 On the other hand, this also becomes problematic considering that, as Bollnow, 38 points out, there is no and can be no closed system of virtue. Rather, they are renewed in each era, taking on new forms as a result of human adaptation to the context, thus giving rise to an ‘unlimited and unsystematizable’ set of virtues. Indeed, there are not many authors who venture to delimit a closed list of virtues, perhaps at the risk of leaving out some essential ones, of being unsatisfactory to most sensibilities, or of falling into Kohlberg’s well-known criticism 39 of character education by proposing a bag of virtues. One of the best known exceptions in this regard is the VIA, 5 which identifies 24 character traits grouped into 6 areas, although it is worth asking why fundamental aspects for character building such as friendship, practical wisdom, generosity, sincerity, fortitude or critical spirit are not included. Thus, this can be called a subproblem of the diversity of virtues and their development.
Moreover, to complicate the situation even more, virtues do not always appear separate and isolated from each other, such that they can be clearly distinguished. 40 In fact, their development takes place interconnectedly in the person. There are virtues that are closer to others that feedback on each other in their development, and that seem to be necessarily accompanied by others, giving certain comorbidity. It seems logical to think that a humble person is at the same time grateful and generous, while someone with a good sense of humour is also expected to be creative and witty. At the same time, the opposite effect can also occur: the possession of one virtue can limit the development of another. 41 It is worth thinking of the case of justice, which, while limiting the possibility of being a criminal, can also stifle the development of other virtues such as generosity. People may act virtuously and clearly display a virtue, when their main motivation is not that same virtue but another, such as one who acts bravely in benefit of others, that is, moved by generosity. 4 Thus, virtues cannot be assessed in isolation, but rather, the interrelationships between them mean that character traits are being assessed that overlap, reinforce or hinder each other. This implies a fundamental methodological problem in that a basic principle consists of accurately delimiting the object to be measured, otherwise there is a high risk of attributing causalities to the wrong factors, thereby jeopardising the internal validity of the design. 42 Despite this, a problem can also be found at the opposite extreme: isolated evaluation of the virtues poses a problem for the meta-virtue of practical wisdom, which is responsible for regulating conflicting virtues and gives enough perspective to understand that a lower development of a virtue cannot always be considered negative, as in the non-exceptional case that such a decrease in development is a consequence of the greater growth of another virtue that is contextually more relevant. 43 Thus, this circumstance is called the subproblem of virtues that overlapping and interact.
This issue is usually addressed through the proposal of a mixed methodology, with approaches that combine the positivist (empirical-analytical) and phenomenological (naturalistic-interpretative) perspectives. 44 Such methodological triangulations allow contrasting various sources of information with perspectives from different agents, 45 resulting in different types of assessment (self-assessment, heteroassessment, co-assessment, etc.); with the use of individual encounters or group discussions. This problem is apparently less serious than the next two, but its solution is not easy either. It is a problem of availability of temporal and personal resources, because if we had enough time to carry out systematic and periodic evaluations over time, aimed at different dimensions and virtues; if we had the participation of other evaluation agents, as well as standardized instruments for different virtues and accessible to the age of our students, it would be possible to propose an evaluation of character. However, such accessibility rather resembles an ideal and uncommon situation, especially in the educational field, where the abundance of resources is not common, so despite recognizing the possibility of such an assessment, there is no doubt of its realistic difficulty.
The inapprehensible variability and influence of context
The third challenge here involves the inevitable influence of context on character. Although this effect cannot be denied, some authors3,46 are sceptical about the possibility of ethical evaluation, rejecting the very notion of permanent character traits and, therefore, their education, due to what is known as situationism. Their fundamental objection lies in asserting that the link between individuals and their context is so close that the fact that people live in continuously changing spaces means there are no stable, coherent and integrated characters or traits over time, as virtue is usually defined. Rather, character is fragmented, variable and situationally dependent and, consequently, any assessment of virtue will have little value and reliability. Smith, 40 puts it in the following terms: ‘The concentration camp guard is a civilized family man in the evenings but a monster in his work during the day’.
Although this position may have originated in classical ideas such as Rousseau’s myth of the good savage, its advocates draw on more than a few empirical experiments that, in their opinion, show the limited influence that supposed character traits have on human behaviour, in that when people are subject to certain situations, they respond in unexpected ways. One of the most paradigmatic cases usually cited by situationists is known as the Stanford Prison Experiment, carried out in 1971. Its main promoter, Philip Zimbardo, 47 argues that external conditions can turn a person into someone who accepts evil or radically into an aggressor, turning those who resist into exceptional heroes, ‘a special breed’. According to him, this is so because good and evil are not two cut-and-dried categories. Rather, people shift from one to the other depending on the circumstances, regardless of their personality, genetic configuration or family background. Thus, he provides a metaphor that is very illustrative of his approach: the person is more like a vehicle in neutral than one with a gear engaged, i.e., he moves more by the inertia of external forces, like a downhill slope, than by his own internal dispositions of his character, which would explain the disturbing results of his experiment.
Criticism against situationalism covers several aspects: 1) its biased interpretation of the research supporting it or its level of scientificity; 11 2) contradictions in demanding an ability to evaluate particular situations and make decisions, which largely resembles a permanent character trait such as practical wisdom; 48 3) many of the experiments used as a basis for support are about exceptional, extreme situations, in which one is subjected to pressure or constraints that rarely appear in everyday life – such as participating in a high-value scientific experiment likely to enable a very relevant scientific breakthrough, finding oneself in a prison as a worker or prisoner, or imagining oneself in a hotel room alone in front of one's youthful love. Moreover, many of the problems and subproblems analysed throughout this article in relation to ethical evaluation are also attributable to the research that supports the situationalist thesis, especially the behavioural consideration of evaluation and the neglect of other dispositional dimensions, 49 so that the questioning of the former also serves to question the latter. However, two issues should be considered in ethical evaluation. Bearing in mind that part of the empirical research carried out points to specific situations in which people seem to act in surprising ways in clear rupture or discontinuity with their character, it is reasonable to analyse which factors give rise to such situations and what makes people act in one way or another, whether exceptionally well or arguably badly. On the other hand, refuting situationalism does not mean denying the influence of context on the character of the person, but rather giving it a non-priority role, considering it, but without granting it all the capacity to decide and determine the human response. This leads to the subproblem of determining the influence of context on human character.
The inaccessibility of the will
The fourth problem refers to the volitional dimension of the person. Assessing what people know is not easy, but there are tools for getting closer to it. Assessing what people know how to do is more difficult, since it requires them to apply skills to a specific situation. In other words, the expression of what people know and what they do has a direct nature. But when it comes to determining what they want, that for which in their heart of hearts they feel admiration or appreciation, the situation is more complex. The difficulty lies in the fact that the will (volition) sits on the plane of the intimate, whose motives are only accessible to the people themselves. Others can only see the individuals’ outward expression, their behaviours and verbalizations, but the reasons that motivate human behaviour have only one owner: the person who performs them, who decides whether to share them or not. Kerlinger and Lee’s,
50
definition of latent variables in social research describes this situation and defines them as formulas created to identify concepts indirectly, due to the impossibility of measuring them directly because they are inaccessible to researchers: We must be cautious, however, when dealing with nonobservables. Scientists, using such terms as “hostility”, “anxiety”, and “learning”, are aware that they are talking about invented constructs. The “reality” of these constructs is inferred from behaviour. If they want to study the effects of different kinds of motivation, they must know that “motivation” is a latent variable, a construct invented to account for presumably "motivated" behaviour. They must know that its “reality” is only postulated. They can only judge that youngsters are motivated or not motivated by observing their behaviours. Still, in order to study motivation, the must measure or manipulate it. But they cannot measure it directly because it is, in short, an “in-the-head” variable, an observable entity, a latent variable. The construct was invented for “something” presumed to be inside individuals, “something” prompting them to behave in such-and-such a manner. This means that researchers must always measure presumed indicators of motivation and not motivation itself. They must, in different words, always measure some kind of behaviour, be it marks on paper, spoken words, or meaningful gestures, and then draw inferences about presumed characteristics -or latent variables.
This can be seen from another point of view with an example. If someone walking down the street is asked to participate in a sociological survey and pays attention to the type of questions asked, that person might notice that the first questions are easier to answer and refer to superficial or noncommittal aspects, such as one’s profession, marital status, etc. But the further the respondent progresses through the survey, the more the questions require a more definite pronouncement. The questions may be about politics, religion, economics, as well as the respondent’s own principles and beliefs. These questions require greater exposure and may mean that the respondent’s answer is not completely sincere. Instead, people answer as they think they are expected to answer. This is known as social desirability, 51 that is, people’s tendency to answer what they know will be better valued by the interviewer and avoid unwanted judgments or conflicts with observers.
Ideology, religious beliefs and racism are often sensitive issues to which people do not always respond honestly when asked. They are issues people consider as intimate, ones that define and singularize them, differentiate them from others, separate them from the public being that is mistaken for the mass. They characterize people as specific and identifiable, so they do not broadcast them or make them public without a context or relationship of trust that provides some security. For this reason, researchers often take measures to avoid this type of questions that do not reflect reality, but only what they think others want to hear. In the classroom teachers may encounter a similar situation when they propose that their students learn ethical values and, more specifically, when they aspire to evaluate whether their students’ learning has been effective. Evaluation implies knowing something that only the students know, that only they can reveal or, on the contrary, can conceal by revealing what they know their teachers and evaluators, expect to hear. For example, just as adult respondents are unlikely to declare themselves racists to a stranger, learners are likely to tend to answer what they know the teachers want to hear, especially when the teacher is the one doing the assessing. MacIntyre,
52
finds this the most important problem facing moral educators: Their central problem with their pupils is how to enable those pupils to pass from pursuing certain particular goals internal to certain types of activity in certain highly specific ways, only or largely because those pupils have recognised that their pursuit of those goals in those ways pleases their teachers, so that they themselves are in turn pleased by giving this kind of pleasure, to pursuing those same goals in those same ways because they have come to appreciate those goals and those particular ways of pursuing them as worthwhile in themselves. That is, their problem is how to enable their pupils to come to value goods just as and insofar as they are goods, and virtues just as and insofar as they are virtues.
In other words, the very attempt at evaluation can carry with it not only the germ of its invalidation, but also go against the very aims of moral education in Kolhbergian and Aristotelian terms. The situation described by MacIntyre may give rise to a completely contrary and counterintuitive view on virtue with at least three undesirable consequences: 1) a link to Piaget's heteronomous morality or Kohlberg’s 53 premoral stage, by governing one's response by the criterion of authority rather than by that which is considered of higher moral stature; 2) habituation may cause one to interpret as virtue that which satisfies others, whose display pleases them, while vice will be that which causes some kind of discomfort; and 3) reason is placed at the service of appetites and passions, since one thinks and acts accordingly in order to achieve them, giving rise to a reasoning and action that is more Humenian than Aristotelian. 52
To avoid these problems, different strategies are often used in evaluation and research, but at the same time, four other subproblems are associated with them: A) Most of the character assessment tests used today, especially in their ethical dimension, are self-assessment questionnaires. That is, the main source of information comes from the individual's own words. However, even assuming that individuals answer sincerely, this does not mean that their answers correspond to reality, because an individual is not only his or her self-concept. In other words, no matter how much I say that I am a duck, even after deep reflection and with absolute conviction about it, that does not necessarily make me a duck, but expresses only a misperception about myself.
2
Moreover, this type of testing may imply a problem in relation to the criterion for assessing virtue, by considering only subjective elements of well-being and disregarding objective indicators of flourishing. The former tend to be hedonistic life-satisfaction accounts, which are hampered by a failure to differentiate pleasures such as the pleasure experienced by drug use from others of a greater scope.
54
It may also entail acceptance of a life with low aspirations and easy access that makes do with mediocre development of human capabilities, or the voluntary or involuntary identification of mistaken criteria. An example of this is the use of the BBC well-being scale
55
to know the self-perception of well-being in nurse students in the UK.
56
This instrument, which assess psychological well-being, physical health and well-being and relationships, reveals that nurse students perceive high levels of well-being. However, researchers warn of the necessity of researching, in more detail, about factors that contribute to well-being with qualitative studies whose results will show useful information about well-being, because is not enough with quantitative assessment to generalize the results. Thus, its acceptance in educational terms is questionable unless objective indicators of flourishing are included to provide greater rigor to assessments of virtue. This might be called the subproblem of errors in self-description. B) Some of the ways to reduce social desirability in questionnaires are not entirely applicable to the field of education. One such way is anonymity, because in most cases the evaluator’s objective is not only (or not primarily) to assess the character development of a group, in which case there is no need to know the respondent’s name. Instead, education leans toward personalization, that is, toward helping each person in a particular way in their development process. For this, it is necessary to know that person well and to identify their individual, non-transferable progress. In addition, the field of research often uses quantitative surveys to be able to compare subjects, define averages and perform complex statistical analyses. Although this may have some utility in some teaching practices, moral response is generative and requires open-ended questions that give room for moral imagination, reflection and contextualized deliberation.8,36 Measurement of such aspects, however, is not automatic or direct and rarely if ever yields an exact number on a scale.
This transposition between the educational and research spheres is also found in the problem of suspicion about human intervention. In this sense, Biesta,
57
points out that ‘education should be understood as a moral, noncausal practice, which means that professional judgments in education are ultimately value judgments, not simply technical judgments’. Thus, causality in the pedagogical context, as a human and social context, is neither completely predictable nor linear -in such a way that A always causes B. It cannot be governed by a completely linear logic. It cannot be governed by a completely behavioural logic. Rather, it is usually circular, where A causes B and B causes A and usually with the presence of C and D, and where empirical evaluations are not exact, but rather provide estimated margins of uncertainty. These margins demand the virtue of practical wisdom in the teacher, whose intervention is not only not wrongful, biased or malicious, but inevitable and necessary. Thus, the issue here is the subproblem of the confusion of objectives between academic research and educational evaluation. C) Another reason why evaluation of moral virtues is so complex is the intimate nature of ethics. Moral virtues could be understood as the will of thinking, feeling, and behaving
58
according to moral virtues that we recognize them as good as possible in our society. Anything normative always involves a personal commitment directly addressed at how individuals live their life meaningfully in practical rather than theoretical terms. Consequently, such aspects are not easy to delimit and agree upon. Moreover, any attempt by an outside agent to uncover and assess that intimacy may be conceived as an unacceptable intrusion, making it not so much a matter of how difficult such an evaluation is, but of how wise it is to try. Carr
59
refers to professional ethics, distinguishing between general principles of professional ethics or deontological standards on one level, and personal values and virtues or aretaic norms of excellence, on a second, higher level. The former are necessary in all professions, while the latter are more exigent, as they concern not only professional performance, but also a certain continuity between professional and personal life. According to Carr, nursing and education belong to the group of professions that require the second ethical level, which entails an added difficulty to the task of identifying specific ethical values in a plural and democratic context.
This does not mean that it is not possible. As C. S. Lewis
60
states, at the heart of every human being in different cultures, there is a seed that, when it blooms, allows us to discover the good from among the bad, what makes us human from among what perverts us as persons. In any case, it is still a controversial issue and the origin of a new subproblem derived from it: the fact that many schools attend to and evaluate intellectual or instrumental virtues more frequently than specifically moral ones. This circles back to Kelvin's statement, when he observed that things that are not evaluated tend to deteriorate, to disappear, if only because of the perverse effect of focusing on things that are easier to evaluate. This has consequences for the definition of the very purposes of education,
25
as it mistakes the part for the whole and relegates the educator to a role of technician in a technocratic model that does not deliberate on the possible aims of education, but only on the most effective and efficient means, as the former are already predetermined by the possibilities of evaluation itself. For Biesta,
57
precisely because of the moral dimension of education, this is not only a strictly educational problem, but also a democratic one, and he asserts that neither researchers nor educators should accept closed definitions of problems or educational goals, but rather take them as hypotheses that can change through the process of reflection and the specific needs of the context. This gives rise to a new subproblem that can be called the dependence of the contents to be taught on what can be evaluated, or in other words, the identification of what is teachable with what is strictly evaluable. D) Implicit in this form of assessment is a narrow view of character education, which focuses more on outcomes than on processes, where what is relevant seems to be what the students end up believing and thinking rather than the process they have gone through to believe and think that way. That is, the focus is put on external behaviours rather than on the reasons behind them, which can bring about two negative effects. On the one hand, it can lead to misunderstanding the significance of actions, leading students to accept as good and desirable, for example, an apparently positive behaviour, but out of perverse and selfish reasons.
52
This conception moves closer to a sort of consequentialist ethics, which is a priori suggestive because it seems to offer something along the lines of an exact principle of quantification of ethics.
61
However, the fundamental problem of this ethics lies in the impossibility of quantifying everything, in the awkwardness produced by the criterion from ethically accepting that murdering one person would be somehow better than murdering two, solely and exclusively because one is a lower number. Indeed, suppression of the Aristotelian distinction between poiesis – production with an external result- and praxis -production with internal result – leads to an essentially instrumentalist way of living. On the other hand, when the role of the learners’ freedom, intelligence and will (i.e., practical wisdom or phronesis) takes back seat to what is expected to be evaluated, the concepts of teaching and training clash as two opposing and incompatible ideas. In short, this leads away from the ultimate meaning of education, getting dangerously close to indoctrination.
44
Thus, this may be called the subproblem of giving priority to observable results over processes of ethical reflection.
Conclusions
After the problems described above, it may seem that teachers are at a dead end, left with little choice but to give up entirely on assessing ethical learning. The Aristotelian approach ushered in new language and a solid philosophical structure, supported at the same time by a psychology that has taken steps towards interdisciplinary confluence. However, the solidity of Aristotelian character education is matched by its complexity, with highly exigent demands that imply equally exigent problems for evaluation. This article has analysed four of them along with 10 associated subproblems that, until now, had not been systematically compiled in previous research.
The analysis shows that at present the assessment of character, particularly its ethical dimension, is at a lower level of development than its theoretical study and practical application. As such, it is one of the major challenges for researchers and teachers from different levels and disciplines. Consequently, it appears that third-level evaluation is where researchers and educators must focus their attention. Third-level evaluation refers not to the degree of quality but to the degree of specialization required to assess character in its entirety. As character is not directly accessible for comprehensive evaluation, it is necessary to focus on the virtues that compose it, where existing external or internal limitations identify areas difficult to discover, such as the will. Therefore, it is necessary to go a step further and focus attention on specific dimensions of virtue. This is the educator’s task.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
This work was supported by the Universidad Internacional de La Rioja (Research Project (nº 4158719) ‘Dialogue for a democratic education in plural societies: proposals between liberal and character education’ (Art. 83 UCM-UNIR); Research Group ‘El quehacer educativo como acción’) and Universidad Complutense de Madrid (Research Project (nº RTI2018-095740-B-I00) ‘Development of a predictive model for the development of critical thinking in the use of social networks’).
