Sage Journals: Discover world-class research

Abstract

This article historicizes “rigor,” discipline,” and “systematic” as inventions of a certain rational spirit of Enlightenment that was radicalized during the 19th century. These terms acquired temporary value in a transition during the 19th century when a culture of research was established within a modern episteme. Beginning in the 20th century, this development was perceived as problematic, triggering criticism from philosophy and the arts, and even within the sciences. “Discipline,” “rigor,” and “systematic” have changed meanings over time, and recent contributions from digital humanities are promising for a renewed critical debate about rigor in research. Both digital humanities and quantitative research deal with big data sets aimed at providing a large-scale analysis. However, unlike most quantitative research, digital humanities explore uncertainties as their main focus. Attention to the human-machine collaboration has led to more expansive thinking in scientific research. Digital humanities go further by advancing a metaperspective that deals with the material hermeneutics of data accumulation itself.

Keywords

Digital humanities research evaluation validity rigor new materialisms

This article consists of two main parts: first, we historicize rigor by problematizing the definition of rigor, examining the particular case of validity, and distinguishing uncertainty from risk; second, we elaborate an alternative framework for making judgments about rigor and discipline, namely Digital Humanities/History of Education, which is an example of New Materialist approaches to educational research.

Challenges: historicizing rigor

In this section we take an analytical perspective to examine characteristics of rigor, discipline, and systematicity as those terms are currently being used in educational research. We call this approach “historicizing” because we do not seek essential definitions of the terms (i.e., rigor, discipline, systematicity). Rather we seek to examine how the terms function in discourse that is specific to time and place. Instead of aiming to establish the most accurate, most practical, or most parsimonious definition of the terms, we will instead aim to articulate the tensions, contradictions, and diversity of frameworks that constitute the meanings of the terms as they are being used in educational research literature. This section consists of three parts: mapping the diversity of meanings of the term rigor, examining the specific case of validity, and drawing a distinction between risk and uncertainty.

Mapping the diverse meanings of “rigor”

Perhaps it should come as no surprise that there are many different definitions and conceptualizations of rigor (and even a greater diversity of meanings of discipline). Every definition of rigor (as a criterion for good quality research) we found was vague, using words that are open to a wide range of interpretation. The most usual term that is used as a synonym for rigor is “trustworthiness“; other descriptors include thorough and systematic. In their encyclopedia entry for Philosophy of Education, Phillips and Siegel noted that in the case of much US-funded research the criteria for rigor were not even established by researchers: “The definition of ‘rigorously scientific‘ … was decided by politicians and not by the research community, and it was given in terms of the use of a specific research method” (Phillips and Siegel, 2015). Older definitions of rigor tended to emphasize fidelity to methodological purity (i.e., some version of a “scientific method“). However, for the most part, those overly reductionist conceptualizations of rigor have been rejected (see, e.g., Siegfried, 2015; Snow, 2015).

Smith refers to Johnson (1985) to illustrate how philosophy is locked into a basic paradox as it occurs in many academic disciplines where data are absent:

[p]hilosophy’s self-definition relies on a claim to rigor that is subverted by the literariness of its rhetoric of truth, but it is precisely that literariness that turns out to be the very model for philosophical rigor. Philosophy is defined by its refusal to recognise itself as literature; literature is defined as the rhetorical self-transgression of philosophy. (Johnson, 1985, p. 76)

Many of these definitions of rigor seem to understand rigor as the adherence to a systematic method. However, even the values of a rigorous method are not consistent across various literatures. In some cases it is necessary to test hypotheses using the “scientific method,” but in other cases rigorous research does not require hypothesis testing. There are versions of rigor that depart from previous methodologically framed approaches: poetic relevance (Smith, 2008) and relevance to practice as rigor (Gutiérrez and Penuel, 2015; Stronach, 2007).

Smith (2008) makes the case that philosophical rigor has been shaped by poetic sensibilities, in spite of philosophy’s disavowal of non-rational argumentation. Drawing from Derrida’s deconstruction and Wittgenstein’s philosophy of language, Smith (2008) argues that language is irreducibly metaphorical. Therefore, rigor cannot reasonably be treated as purely logical: “sometimes—perhaps often—what is effective is careful, sensitive attention to how language is being used, rather than anything that can be characterised as a system or method.”

The case of validity

Validity is an example of a criterion that has been used to make judgments about rigor in educational research. Validity has served as a cornerstone for quality control and evaluation of research designs and reports. The main purpose of statistical modeling and analysis has been to minimize threats to validity posed by various relationships of research instruments to inferences and claims. In this section, we focus on the history of validity as a specific example that illustrates issues of rigor. Specifically, we call attention to the conflicted relationship between validity and reliability as a case that helps us problematize validity as a criterion for measuring rigor in current educational research (Fendler, 2016).

In the United States, the generally recognized authoritative statement on validity for educational research is the AERA Standards for Educational and Psychological Testing (1985) that specified three categories for validity, namely construct, content, and criterion. The 1985 AERA Standards were based largely on the work of Cronbach (1969, 1971) and Messick (1980), who continued to develop and publish refinements to theories of validity through the 1990s. Although Cronbach and Messick are recognized as giants of validity theory, Moss (1992) provides an analysis that shows how their respective theories of validity do not always agree. Moreover, there have been theories of validity other than those of Cronbach and Messick in the social science research literature (Baker, 2013; Kadir, 2008 esp. chapter 2 on history of validity; Moss, 1992). In general, Messick tended to defend a more objectivist (i.e., philosophical realism) ontology for validity, while Cronbach’s stance tended to be more pragmatic. According to Moss (1992):

For Messick, it seems that adverse social consequences are a source of invalidity only if they are a result of construct invalidity, whereas for Cronbach, adverse social consequences, in and of themselves, call the validity of a test use into question. (Moss, 1992: 236)

Moss (1992) shows that the meaning of validity vis-à-vis reliability is affected by differences in researchers’ stances on realism, positivism, and pragmatism. In psychology, the meaning of validity is also affected by researchers’ commitments to idiographic or nomothetic approaches to research (Karson, 2007). Idiographic psychological approaches study what makes people unique and are therefore focused primarily on (internal) validity issues. Nomothetic psychological approaches, in contrast, study what makes people the same; it is in nomothetic approaches to educational research that the issues of the validity–reliability tradeoff are most salient.

In addition to the conceptual variations across schools of thought, the meaning of validity has also been affected by changes over time in the role and importance of statistics in social science research. In general, educational research protocols have moved away from direct observation and replication, and toward the development of models for making inferences: “In the social sciences, statistical tools have changed the nature of research, making inference its major concern and degrading replication, the minimization of measurement error, and other core values to secondary importance” (Gigerenzer and Marewski, 2015: 423). The increasing importance of inference as a tool of analysis for educational research has greatly elevated the attention paid to reliability as a criterion for evaluating research and has, in turn, changed the relationship between validity and reliability in research designs.

In a longer-term historical view, statisticians generally agree on two big shifts in the role of statistics in the history of the social sciences: the “probabilistic revolution” and the “inference revolution” (Baker, 2013; Gigerenzer and Marewski, 2015; Moss, 1992; Siegfried, 2015). The probabilistic revolution (conventionally located in the mid-19th century) refers to a shift away from earlier deterministic and mechanistic assumptions about the world to allow instead for change, chance, and mutation (isomorphic with Darwinian evolutionary theories). This 19th-century episteme rejected religious determinism and instead favored statistical models of probability (such as the bell curve) for interpreting the natural world.

A century after the probabilistic turn there was another shift: inference then became the highest priority for analysis in social-scientific studies. The inference revolution (conventionally located in the mid-20th century) refers to an overall shift in scientific research from idiographic toward nomothetic studies. That is, scientific research had previously been focused on the study of individual organisms (idiographic) as specimens of interest. With the inference revolution, however, more emphasis was placed on drawing generalizations—inferring patterns—from individual studies to make more generalized claims about the world (nomothetic). Inference is the primary consideration in all educational research except for studies that make no attempts to generalize, such as case studies and humanities-oriented studies (i.e., philosophy, history, and arts-based inquiry). After the inference revolution, it became more usual to speak in terms of “sample sizes” and “representative samples” as methodological protocols in science. For the most part, the possibility that we can generalize (i.e., draw inferences) from small sample sizes is now taken for granted as an acceptable scientific research method. As Gigerenzer and Marewski wrote:

The qualifier inference indicates that among all scientific tools—such as hypothesis formulation, systematic observation, descriptive statistics, minimizing measurement error, and independent replication—the inference from a sample to population grew to be considered the most crucial part of research. (Gigerenzer and Marewski, 2015: 425)

The inference revolution signaled a new relationship between validity and reliability in the social sciences because inference began to take on greater prominence among all the various analytical tools deployed in educational research to establish credibility. As inference has taken on more acceptability, other scientific methodological values have decreased in importance, including observation, measurement, and replication.

But it is Gigerenzer and Marewski’s next sentence that more dramatically conveys the magnitude of the change ushered in by the inference revolution:

[The shift to inference] was a stunning new emphasis, given that in most experiments, psychologists virtually never drew a random sample from a population or defined a population in the first place. (Gigerenzer and Marewski, 2015: 425)

If it is true that prior to the inference revolution “psychologists virtually never drew a random sample from a population or defined a population in the first place,” then previous educational and psychological research would have been concerned not with reliability but with measurement accuracy, internal validity, and replicability as the salient criteria on which to evaluate research designs and reports (i.e., an idiographic approach to research). Later, however, when inference became the most important factor in the analysis of research data, a new dimension of concepts and classifications was developed on which to formulate those inferences. The social-science research literature agrees uniformly that inference-based nomothetic-oriented educational research has become the dominant approach, and so definitions of rigor and discipline tend to align with assumptions of inference-based nomothetic-oriented research approaches.

In sum, prior to the inference revolution, social-scientific research had been concerned primarily with idiographic research criteria of internal validity, observational acuity, measurement accuracy, and replicability. There was little importance attached to extrapolation of findings from one context to another, and so reliability was not a major issue. After the inference revolution, however, the value of direct observation was overshadowed by attempts to use inference to generalize from one context to another on the basis of a relatively small sample size, a movement that underscored the need for standards of reliability (external validity). In this rendition of history, we can see that reliability is a relatively recent addition to the criteria for good research. Reliability became increasingly important in the 20th century in the wake of the inference revolution, and the tradeoff between validity and reliability has become more dynamic year after year since the mid-20th century.

It is relatively easy to see how validity and reliability function together as a tradeoff in quantitative educational research. However, it is also worth noting that similar and parallel values are built into most qualitative research designs in education. When qualitative research protocols attempt to address credibility and trustworthiness, the aim is quite similar to the efforts to establish validity and reliability in quantitative research. According to the popular textbook on qualitative research by Lincoln and Guba (1985), the four criteria for establishing trustworthiness in qualitative research are:

1. Truth value: How can one establish confidence in the “truth” of the findings of a particular inquiry for the subjects (respondents) with which and the context in which the inquiry was carried out?

2. Applicability: How can one determine the extent to which the findings of a particular inquiry have applicability in other contexts or with other subjects (respondents)?

3. Consistency: How can one determine whether the findings of a particular inquiry would be repeated if the inquiry were replicated with the same (or similar) subjects (respondents) in the same (or similar) context?

4. Neutrality: How can one establish the degree to which the findings of an inquiry are determined by the subjects (respondents) and conditions of the inquiry and not by the biases, motivations, interests, or perspectives of the inquirer? (Lincoln and Guba, 1985: 290)

In this widely cited passage, we can see parallels between the epistemological values in quantitative and qualitative research criteria: “truth value” and neutrality correspond with validity; applicability and consistency correspond with reliability. In this way, the ethical issues in the tradeoff between validity and reliability pertain to both quantitative and qualitative projects in educational research. Even idiographic social-science research may aim for generalization when insights from individual cases are meant to contribute to more general knowledge about people and phenomena related to education.

Distinguishing risk from uncertainty

Statistical reasoning in educational research has played an increasingly important role in judgments about rigor and systematic thinking in recent decades. In many cases, when researchers talk about rigor and systematicity, they are referring to statistical precision, the suitability of statistical modeling, and claims of statistical significance. In this section we call attention to the conceptual difference between risk and uncertainty because rigor, discipline, and systematicity mean one thing in the context of risk and an entirely different thing in the context of uncertainty.

For this section, we rely heavily on the theorizations of Gerd Gigerenzer (Director of the Center for Adaptive Behavior and Cognition, and Director of the Harding Center for Risk Literacy at the Max Planck Institute in Berlin). Gigerenzer (2016) summarizes the distinctions between risk and uncertainty this way:

• Risk: we know all possible alternatives and know the probabilities associated with outcomes. In this case, “fine tuning pays.” Risk pertains to the scenario in a casino.

• Uncertainty: We cannot stipulate all the possible alternatives, and we do not know the probabilities associated with outcomes.

For the most part, educational research has treated the human-educational world as if it were a situation of risk and not a situation of uncertainty. For example, test scores and demographic factors are used to label a child as being “at risk” because statistical analyses posit a correlation between demographic features and school readiness. Thinking in terms of risk is related to thinking in terms of inference; that is, assumptions are made about individual children on the basis of aggregate statistical analysis, rather than on the basis of the specific conditions of that individual child’s actual life.

Educational research has been designed, conducted, and evaluated on the assumption that we know all the possible alternatives and know the probabilities associated with outcomes because of aggregate statistical analysis. However, Gigerenzer (2016) suggests that, in educational settings, “We cannot stipulate all the possible alternatives, and we do not know the probabilities associated with outcomes.” If we accept that education poses a scenario of uncertainty, then statistical rigor is irrelevant. Within the assumption of a risk scenario, on the other hand, the conceptualization of rigor aligns with the criteria for evaluating good statistical reasoning.

From this distinction, we see that almost every aspect of the human world does not conform to situations of risk in which we would be able to account for all the variables and foresee all the possible outcomes. Rather, virtually every facet of the social world has the characteristics of uncertainty. Therefore, definitions of rigor that align with statistical reasoning (appropriate for situations of risk) are not appropriate for the human world, which align with situations of uncertainty. The examples of validity and the distinction between risk and uncertainty constitute serious challenges to current conceptualizations of rigor, discipline, and systematicity in educational research today.

Digital humanities: challenges to conceptualizations of rigor, discipline, and systematic thinking

This section looks at some new digitally inspired paradigms in the humanities while stressing historical changes in the constructions of rigor and discipline. We explore how the digital humanities can be seen as creative endeavors that deal with big data in horizontal, additive, and subversive ways that expand the idea of disciplinary order, rigor, and systematic hierarchies in educational philosophy and history of education.

Sociological and cultural turns in the 20th century challenged previous approaches to objective or “fact-based” history that tended to essentialize chronology, causality, and the archive (Priem and Fendler, 2016). The sociological turn created an atmosphere within the humanities that maintained some of von Ranke’s rigorous and extensive search for historical facts and sympathy for rational argumentation. The sociological turn created historical facts by statistical reasoning and structural approaches while focusing on an analysis of hierarchies, political and social systems, stratification patterns, and governance structures within societies. In contrast, the cultural turn, and an orientation toward anthropology as one of the leading disciplines, had a strong influence on the humanities. It was mainly the rise of micro-history that fundamentally questioned an established belief in the powerful effects of societal systems, including causal structures and related chronologies. With the more recent rise of the digital humanities there seems to be another shift accompanied by challenges that often refer to the black box of micro-history and cultural studies.

One example of historiography from the digital humanities is The History Manifesto, edited by Jo Guldi and David Armitage (2014). The book starts by looking back to a paradigm of writing history that was established by the French Annales School (founded in 1929), which also inspired the sociological and cultural turns in historiography. The Annales School put its focus on geographies and ecologies beyond national borders and simultaneously stressed the longue-durée in historiography. The Annales School history of mentalities led to major shifts in historical disciplines. It was a shift from the center to the margins, from political to social history, and from the central dominance to peripheries and marginal groups. The Annales School history of mentalities led to some surprises, as previously ignored sources—such as tax or demographic records or letters, especially family correspondences, biographical, literary, and visual sources—were elevated in status and were now seen as meaningful historical documents. The Annales School conducted quantitative-serial analyses of these sources. In this way, historiography was capable of breaking down the traditional ordering of events by time period in terms of political history and traditional hierarchies. Guldi and Armitage’s (2014) book shows that the traditional approach to the ordering of events has been replaced, and the traditional criteria for rigor and systematicity have become no longer applicable. In this way epistemological shifts in historiography contribute to reconceptualizations of the notions of rigor, discipline, and systematicity.

Guldi and Armitage (2014) gained much inspiration for their book by looking back and connecting the digital turn to the longue-durée perspective and serial analysis advocated by the Annales School. The authors demonstrated how studies on climate change, governance, and inequality profit from analysis of huge data sets that have been collected internationally (by national governments and bureaucracies, international political organizations and various non-governmental organisations) and gathered in non-digital and digital ways. In some cases, data have been collected from outside of academia and official archives. According to Guldi and Armitage (2014), these accumulated big data sets offer tremendous opportunities for creative thinking and research within the historical sciences and, in addition, could enhance the humanities’ importance for influence in political decision-making.

In Guldi and Armitage’s view, the investigation of transnational, trans-institutional, and trans-organizational big data from a long-durée perspective would provide the world with “a big picture” (Guldi and Armitage, 2014: 21). This history from a bird’s-eye perspective draws upon the analysis of masses of accumulated data in dimensions never before imagined. It is a shared vision of the digital humanities that by means of digitization, with critical and systematic application, new digital technologies of data analysis make it possible to offer a large-scale synthesis of the past that would be of public and political relevance in the present. A “roadmap” (Guldi and Armitage, 2014: 21) achieved by digital technologies of data mining and corresponding ways of data visualization would be informed by a globally mapped and reinvented past that relates to the present and, as such, has a role in political decision-making. At the same time, the humanities would be able to (re)gain political power and moral authority, and be able to position themselves at the center stage in a globalized world:

Long-durée history allows us to step outside of the confines of national history to ask about the rise of long-term complexes, over many decades, centuries, or even millennia: only by scaling our inquiries over such durations can we explain and understand the genesis of contemporary global discontents. What we think of as “global” is often the sum of local problems perceived as part of a more universal crisis, but the fact of aggregation—the perception that local crises are now so often seen as instances of larger structural problems in political economy or governance, for example—is itself a symptom of the move towards larger spatial scales for understanding contemporary challenges. These challenges need to be considered over longer temporal reaches as well. In this regard, the longue-durée has been an ethical purpose. It proposes an engaged academia trying to come to terms with the knowledge production that characterises our own moment of crisis, not just within the humanities but across the global system as a whole. (Guldi and Armitage, 2014: 37)

Analyzing big data from a long-durée perspective implies a different way of reading the historical archive. It means engaging in distant reading. Distant reading is a term used by scholars of digital humanities to describe their work when analyzing big data sets (Moretti, 2013). Contrasted with symptomatic reading, distant reading illustrates a materialist epistemology for research. Symptomatic reading assumes that meaning is veiled or hidden underneath a text; it is an approach that seeks underlying reasons for why a text appears the way it does. According to Best and Marcus, symptomatic reading “encompasses an interpretive method that argues that the most interesting aspect of a text is what it represses” (Best and Marcus, 2009: 3). Symptomatic reading is de rigueur in scientific and humanities research, including reading to detect ideology and other critical analyses. In contrast, distant reading attends to what is visible and traceable in a huge data set; the shift to distant reading is an example of the material turn in social theory. The aim of distant reading is not to detect the hidden meanings of a text, but rather to take note of the manifest features of a text and its physical interrelationships with other texts.

Distant reading asks digital-humanities scholars explicitly not to extract meaning by interpreting individual documents or engage with a symptomatic analysis of source materials, but rather to stay at the surface of documents and data in order to discover their entanglements and the dynamics of history against the grain of chronological political history. To stay distant from or at the surface of documents is to reject structuralist assumptions about the distinctions between langue and parole, between base and epiphenomena. The rejection of structuralism’s two-tiered reality is a hallmark of New Materialisms, which view practices as material constituents and objects of study. The rejection of structuralism affects changes in the meanings of rigor, discipline, and systematicity.

The growing flow of information and data in the digital age seems to ask for new tools to interpret and analyze information; this shift is analogous to the invention of the printing press that made it possible to distribute information in new ways. The index and the creation of bibliographies and cataloging offered new ways of structuring and tracing texts and images at their surfaces and as (material) appearances (Guldi and Armitage, 2014). Digital technologies of data analysis offer new devices to make historical sources and data accessible and readable, at both micro- and macro-levels of analysis. Digital technologies (including crowdsourcing) according to Guldi and Armitage allow the creation of thematically focused “timelines and spatial maps” while comparing different data sets. They allow for identifying “characterizing universal trends,” focusing on “counterfactuals and suppressed voices,” distant reading of “large bodies of text,” and finally for measuring and aggregating “changes over decades and centuries” (Guldi and Armitage, 2014: 92–93). In sum, digital technologies and the synthesizing of big data enable researchers “to question standard narratives of modernity” (Guldi and Armitage, 2014: 94) and to “interweave narratives borrowed from other places” (Guldi and Armitage, 2014: 94–95). Digital history and digital storytelling contribute to a breakthrough in terms of challenging persistent and dominant definitions of rigor and discipline that had previously been based on structuralist and symptomatic readings. Not only Guldi and Armitage but also other protagonists of digital history claim that historians and their knowledge of source criticism and archiving allow for taking the lead and assuming a prominent role in big data analysis, partially because historians are trained to not shy away from complexity and incompatibility, and to put various kinds of archival data (including objects and images) on the same plane in a non-hierarchical way. In this way, authors such as Guldi and Armitage have presented the possibility of a new kind of rigor and systematicity that are made possible by digital humanities.

A prominent example is CLARIAH (Common Lab Research Infrastructure for the Arts and the Humanities: www.clariah.nl/en/about/about-clariah), a Dutch-based digital research infrastructure that connects and offers access to large collections of digital data and enables researchers to use innovative digital tools to explore and process these data. The CLARIAH team is multidisciplinary and consists of researchers and IT specialists in order to find solutions for best practice in data curation, for linking data and applications. The project collaborates with all Dutch humanities institutes, with university and national libraries, and heritage institutions. It addresses various fields in the humanities: literary studies, history, media studies, and linguistics. The types of data offered are texts, images, audiovisual materials (radio, television, and film), and structural data. Unfortunately, access to Digital Research Infrastructure for the Arts and Humanities (DARIAH) currently is offered only to Dutch-based institutions due to copyright issues. A smaller, but equally ambitious, multinational project with partners in Switzerland (Ecole Polytechnique Federale Lausanne and University of Zurich) and Luxembourg (University of Luxembourg) is IMPRESSO funded by the Swiss National Foundation (Media Monitoring of the Past: Mining 200 Years of Historical Newspapers: https://impresso-project.ch). The project links newspaper archives and explores new avenues of critical text mining of newspaper corpora by developing new devices for network analysis and data visualization. Similar to CLARIAH, the IMPRESSO project team not only consists of historians but also includes designers, archivists, and computational linguists. Both projects have huge potential for educational research as they offer interlinked digital archives to explore, for example, the mediation of educational themes and the public use of educational media (e.g., educational films and radio productions).

Digital humanities projects usually work in teams that gather a multitude of expertise, and the large scale of research has led to some challenges despite the huge optimism of the digital turn. Digital humanities projects have to deal with technical obstacles when working with big data of different nature and origin. They have to take into consideration the shortcomings of digital literacy and the lag in computational know-how. Finally, digital humanities may be oblivious to insufficient experience and avoidance of experimental approaches to digital analysis (Cavalé et al., 2017). In addition, such optimism often not only creates mistrust but also contributes to digital myths.

Inspired by Ed Finn’s (2017) What Algorithms Want: Imagination in the Age of Computing, we would like to discuss how an understanding of scientific rigor and discipline, validity and reliability, risk and uncertainty, and their power and meaning within educational research, might change in the digital age. Several issues are at stake:

Digital humanities, just as much as language (oral and in print) and mathematics (including statistics), operate at a level of abstraction and create tensions between reality and the description of reality (i.e., the crisis of representation).

Digital humanities, like numerical representations of reality, establish “mythical connections between numbers, universal truth, and the fundamental structure of reality” (Finn 2017: 2). They nurture human fascination with complexity, are often considered to offer new pathways to finding universal truth and knowledge, and, finally, affect the ways humans interpret reality, and how they think and act in the world.

Digital technologies are “cultural machines” (Finn, 2017) that constantly blur boundaries between human cognition, culture, and technology as is the case for traditional technologies and apparatuses (Herman et al., 2017; Ihde, 1975). The digital humanities address and map old problems of rigor and discipline that actually go back to the encyclopedists during the Enlightenment era and the invention of measuring and testing apparatuses within the sciences.

In order to make productive and creative use of digital tools, Finn argues, it is necessary to develop digital literacy and attentiveness to the gaps between digital representation and reality. This implies that researchers look at the translation of ideas and theoretical frameworks into computational systems and critically analyze how implemented computational mechanisms work. Finn (2017: 47–52) sees “implementation” at the core of the translation of ideas into digital systems as it points to the discrepancies or frictions that occur when human cognition interacts with technology. To deal with this fundamental gap, research will need to develop digital literacy and specific methodological tools that reflect on human-machine interactions in creative ways and develop new avenues to imaginative thinking that reach far beyond human experiences. He writes:

Computational systems are developing new capacities for imaginative thinking that may be fundamentally alien to human cognition, including the creation of inferences from millions of statistical variables and the manipulations of systems in stochastic, rapidly changing circumstances that are temporally behind our ability to effectively comprehend. (Finn, 2017: 55)

According to Finn, the collaborative work of humans and machines in the digital age transform traditional ways of knowledge creation, establish new forms of authorship, and transform criteria for rigor and discipline. A productive and critical transformation of knowledge in the digital age implies several changes in research practices. It means that new research:

minds the gap between ideas and implementation;

addresses contradictions and insecurities;

remaps data according to new micro and macro scales;

documents the messiness and experimental character of research by hyper- or paratextual layers that would also allow readers to follow and evaluate our trails as researchers in an ever moving “ocean” of growing facts and information.

What marks the difference between traditional understandings of rigor and discipline and computational experimentation are several new commitments: to reflect on the collaboration of humans and machines, to maintain a critical focus on translation and implementation, and to explore the imaginative horizon offered by digital tools. These commitments also imply that the materiality of scientific papers as we know it is becoming obsolete (Somers, 2018) as new publication practices are forced to take into account the changing dynamic and procedural aspects of knowledge production:

Scientific methods evolve now at the speed of software; the skill most in demand among physicists, biologists, chemists, geologists, even anthropologists and research psychologists, is facility with programming languages and “data science” packages. And yet the basic means of communicating scientific results hasn’t changed for 400 years. (Somers 2018)

Conclusion

The conventional meanings of rigor and discipline do not apply to research in the digital humanities, as these terms imply systematic and hierarchical production of knowledge. Instead, digital humanities and their experimental approach focus on operational processes of research while exploring new horizons. Digital humanities imply more than numerical operations; they change our notion of knowledge production in a way that acknowledges the indispensability of machines in knowledge production. Digital humanities can be brought into proximity to uncertainty in research, and they are also part of the “interference revolution” because they do not generalize from small data sets—samples—across different contexts; rather the expanded storage and analytical capacities of digital humanities can extend across constellations and contexts. In educational research we would be able to officially declare that we do not know all “relevant alternatives, consequences and probabilities,” (Volz and Gigerenzer, 2012) because we would be working across many diverse data sets. But with that diversity, we could creatively work with the contradictions and inconsistencies that have been produced and have accumulated previous configurations of our field. For example, philosophers of education could work with hypertextual associations and layers, and demonstrate how they think, select, and prioritize; and historians of education could map their sources with digital network analysis to detect their entanglements, interconnectedness, and trajectories across time and space.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Author biographies

Karin Priem is Professor in History of Education, and head of public history at the Luxembourg Centre for Contemporary and Digital History (C2DH), University of Luxembourg. A former president of the German History of Education Research Association (2007–2011), she is currently president of the International Standing Conference for the History of Education (2018–2021). Prof. Priem specializes in social, visual and material history, history of technology and the senses, public history, history of entrepreneurship and social reform, and the history of humanitarian organisations. Some of her most important publications are: “Facts for Babies: Visual Experiments at the Intersection of Art, Science, and Consumerism,” Sisyphus - Journal of Education 3 (2015): 18–36; (with Frederik Herman and Geert Thyssen), “Body-Machine? Encounters of the Human and the Mechanical in History, Science and Education,” History of Education (2016) (DOI: 10.1080/0046760X.2016.1236219), “Seeing, Hearing, Reading, Writing, Speaking, and Things: On Silences, Senses, and Emotions during the ‘Zero Hour’ in Germany,” Paedagogica Historica 52 (2016): 286–99 and (with Lynn Fendler) ““Rationale Trennung” oder “Marriage d’Amour”? Zum Verhältnis von Geschichte und Philosophie in der Erziehungswissenschaft,” Zeitschrift für Pädagogik 5 (2015): 643–664.

Lynn Fendler is a Professor in the Department of Teacher Education at Michigan State University, USA, where she teaches courses in educational foundations, curriculum theory, philosophy, and historiography. Since 2000, Prof. Fendler has been a member of the Research Community Philosophy and History of the Discipline of Education: Evaluation and Evolution of the Criteria for Educational Research, Leuven, Belgium. In 2010-11 she served as Visiting Professor in Languages, Culture, Media, and Identities at the University of Luxembourg. In 2018 she was a Visiting Professor of Educational Theory at Vietnam National University, Hanoi. Prof. Fendler is the author of Michel Foucault (Bloomsbury Library of Educational Thought), which introduces Foucault’s philosophical, genealogical, and literary critique to teachers. Her research interests include ethics of knowledge, nonrepresentational theories, historiography, genealogy, and perceptions of taste.

References

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (1985) Standards for Educational and Psychological Testing. Washington, DC: American Psychological Association.

Baker

(2013) The chimera of validity. Teachers College Record 115(9): 1–26.

Best

Marcus

(2009) Surface reading: An introduction. Representations 108(1): 1–21.

Cavalé

Clavert

Legendre

Martin

(2017) Expérimenter les humanités numériques: Des outils individuels aux projets collectifs. Montreal: Les Presses de l’Université de Montréal.

Cronbach

(1969) Validation of educational measures. In: Proceedings of the invitational conference on testing problems (ed. PHH DuBois). Princeton, NJ: Educational Testing Service, pp.35–52.

Cronbach

(1971).Test validation. In: Thorndike

(ed.) Educational Measurement, 2nd ed. Washington, DC: American Council on Education, pp.443–507.

Fendler

(2016) Ethical implications of validity-vs-reliability trade-offs in educational research. Ethics and Education 11(2): 214–229.

Finn

. (2017) What Algorithms Want: Imagination in the Age of Computing. Cambridge, MA: MIT Press.

Gigerenzer

(2016) The dichotomy of behavioural economics. Available at: https://youtube/4VSqfRnxvV8 (accessed 21 June 2017).

10.

Gigerenzer

Marewski

(2015) Surrogate science: The idol of a universal method for scientific inference. Journal of Management 41(2): 421–440.

11.

Gutiérrez

K. D.

Penuel

W. R

. (2014). Relevance to Practice as a Criterion for Rigor. Educational Researcher 43(1): 19–23. https://doi.org/10.3102/0013189X13520289

12.

Guldi

Armitage

(2014) The History Manifesto. Cambridge: Cambridge University Press.

13.

Herman

Priem

Thyssen

(2017) Body_machine? Encounters of the human and the mechanical in education, industry and science. History of Education 46: 108–127.

14.

Ihde

(1975) The experience of technology: Human-machine relations. Philosophy & Social Criticism 2: 267–279.

15.

Johnson

(1985) Rigorous unreliability. Yale French Studies 69: 73–80.

16.

Kadir

(2008) Framing a validity argument for test use and impact: The Malaysian public service experience. Dissertation. University of Illinois at Urbana-Champaign, USA.

17.

Karson

. (2007). Nomothetic versus ideographic. In Salkind

N.J.

Rasmussen

(Eds.). Encyclopedia of measurement and statistics. New York: Sage.

18.

Lincoln

Guba

(1985) Naturalistic Inquiry. Newbury Park, CA: SAGE.

19.

Messick

(1980) Test validity and the ethics of assessment. American Psychologist 35(11): 1012–1027.

20.

Moretti

(2013) Distant Reading. London: Verso.

21.

Moss

(1992) Shifting conceptions of validity in educational measurement: Implications for performance assessment. Review of Educational Research 62(3): 229. Available at: http://ezproxy.msu.edu.proxy1.cl.msu.edu/login?url=http://search.proquest.com.proxy1.cl.msu.edu/docview/1290947129?accountid=12598 (accessed 15 June 2015).

22.

Phillips

Siegel

(2015) Philosophy of education. The Stanford Encyclopedia of Philosophy. Available at: https://plato.stanford.edu/entries/education-philosophy/ (accessed 12 February 2015).

23.

Priem

Fendler

(2016) ‘Rationale Trennung’ oder ‘Marriage d’Amour’? Zum Verhältnis von Geschichte und Philosophie in der Erziehungswissenschaft. Zeitschrift für Pädagogik 5: 643–664.

24.

Siegfried

(2015) Science is heroic, with a tragic (statistical) flaw. Science News. Available at: www.sciencenews.org/blog/context/science-heroic-tragic-statistical-flaw (accessed 15 March 2016).

25.

Smith

(2008) To school with the poets: Philosophy, method and clarity. Paedagogica Historica 44(6): 635–645.

26.

Snow

(2015) 2014 Wallace Foundation distinguished lecture: Rigour and realism – Doing educational science in the real world. Educational Researcher 44(9): 460–466.

27.

Somers

(2018) The scientific paper is obsolete. The Atlantic Daily. www.theatlantic.com/amp/article/556676/ (accessed 30 September 2018).

28.

Stronach

(2007) On promoting rigour in educational research: The example of the RAE. Journal of Education Policy 22(3): 343–352.

29.

von Ranke

(1973) The Theory and Practice of History. New York: Bobbs-Merrill.

30.

Volz

Gigerenzer

(2012) Cognitive processes in decisions under risk are not the same as in decisions under uncertainty. Front Neurosci 6: 105. doi: 10.3389/fnins.2012.00105

Shifting epistemologies for discipline and rigor in educational research: Challenges and opportunities from digital humanities

Abstract

Keywords

Challenges: historicizing rigor

Mapping the diverse meanings of “rigor”

The case of validity

Distinguishing risk from uncertainty

Digital humanities: challenges to conceptualizations of rigor, discipline, and systematic thinking

Conclusion

Footnotes

Declaration of conflicting interests

Funding

Author biographies

References