Abstract
Like all other branches of science, psychology is constantly faced with the task of establishing some level of preliminary consensus over term use, methods, and findings, while at the same time leaving open the door to challenging and revising that consensus when necessary. Suboptimal solutions in this regard are likely to lead to inefficient use of resources, and to impede the cumulative acquisition of knowledge. The current culture and incentive structure in academia does not encourage systematic consensus work enough, and there are no clear guidelines as to how this crucial kind of work is to be engaged with. We present a tentative roadmap intended to facilitate systematic consensus building processes (CBPs). It contains a long and diverse list of issues that researchers undertaking consensus work may want to consider before and during a CBP. In discussing these issues, we point to potential complications and offer recommendations on how to avoid them. This mostly concerns questions of communication, transparency, fairness, integrity and legitimacy. We assume that dealing with these topics explicitly and from the get-go will substantially increase the likelihood of a CBP to be successful.
Introduction
The editor of Personality Science, Jaap Denissen, invited us to provide an outline for how systematic consensus building processes (CBPs) in science may be organized. This invitation is accompanied by the express intention to reserve room in the journal for reporting on such initiatives (Denissen, 2024). We are humbled and grateful for being entrusted with this important task and present our thoughts on the issue in the following.
According to classic (e.g., Fleck, 1935; Kuhn, 1962, 1990; Platt, 1964) and more recent (e.g., Longino, 2020; see Oreskes, 2019, for an overview) epistemological approaches, the cumulative development of knowledge by scientific means is viewed as an inherently social process. In order for this process to be fruitful, it is often necessary that those involved in it form some kind of consensus over (a) important research goals, (b) the terminology they use for various concepts, (c) informative scientific methods and standards (e.g., regarding measurement, or the pre-processing, analysis, and sharing of data), (d) established empirical findings (e.g., knowledge about phenomena, causal relations, effect sizes), (e) plausible theories including their formal specification and derivable predictions, and (f) the degree to which theories are corroborated by empirical findings. Based on the scientific evidence, a consensus may also pertain to (g) practical recommendations (e.g., as to how certain diseases should best be treated).
On the other hand, it is just as crucial that all parts of a consensus continue to be tested as critically as possible and revised if necessary. Thus, the scientific endeavor thrives best when there is a good balance between consolidation and scrutinization (e.g., Popper, 1935, see also Kuhn, 1962, 1990; for a more anarchic approach see Feyerabend, 1993). Obviously, in order to be able to scrutinize or challenge an existing consensus, it is necessary to first know what that consensus is.
Leising et al. (2022a) summarized various methodological arguments from a pro consensus viewpoint and argued that CBPs needed to be improved and encouraged in personality psychology. The following arguments may be made in favor of systematic consensus building:
First and foremost, systematic consensus building is likely to promote the efficiency of science in several ways. For example, when different researchers agree with each other on a set of shared research goals, their terminology and/or their methods, results from different studies may be more easily compared with one another. Also, datasets may be pooled, which will enable analyses with better statistical power and accounting for a greater range of potentially relevant factors. Big Team Science will almost always require some consensus building (e.g., Paul et al., 2022). Moreover, when an issue is considered to have been settled, resources may be redirected toward other goals (e.g., the number of papers published on the issue at hand will decrease). All of this is clearly in the interest of those whose investments sustain the science system (e.g., taxpayers).
Second, consensus building is necessary to be able to document progress in science. For example, when authors claim that a certain view they consider to be wrong is held by many in the field, such a claim may not as easily be dismissed (e.g., by reviewers) when a consensus document actually shows this to be the case. As a consequence, a paper showing how this “common misconception” may be corrected will have greater value. More generally speaking, documenting what views are held by whom (and by how many) in a field at different points in time will make it possible to investigate whether, how quickly, and in what direction that field moves.
Third, systematic consensus building enables more trustworthy communication with the public, as researchers may base their claims as to what the current state of knowledge in a field is on evidence (e.g., the results of a poll) instead of on their subjective judgment, which can be biased by their own position. Likewise, educational materials (e.g., textbooks) may gain credibility when the sources that they rely on contain express statements as to how consensual certain views actually are in a field.
Several valid criticisms were raised in open peer commentaries on Leising et al.’s target article (https://doi.org/10.5964/ps.9227). These highlighted potentially negative side effects (Asendorpf & Gebauer, 2022), and the risk that weakly justified consensus may become a problem in and of itself (Hilbig et al., 2022, see also Oreskes, 2019, for many examples where consensus went astray in other sciences). In response, the proposal was revised and further specified (e.g., by accounting for challenges to existing consensus more systematically) (Leising et al., 2022b).
These discussions showed that, while attempts at consensus building are viewed with more skepticism with regard to some domains (e.g., research goals), for other domains they are almost unanimously seen as positive (e.g., terminology). Hence, even to colleagues that are critical concerning the universal promise of consensus building, it may be worthwhile to at least consider the topic. The purpose of this paper is to provide colleagues interested in pursuing consensus building projects with suggestions as to how such a project may be conducted successfully.
Consensus building processes: Examples
In many branches of science, systematic CBPs were successfully conducted in the past. An established framework for consensusbuilding is the so-called Delphi technique, initially developed by Dalkey and Helmer (1963). It typically features multiple rounds. In the initial round, experts complete surveys designed to gather detailed information on the subject matter at hand (e.g., relevant concepts, data, hypotheses). Subsequent rounds involve reviewing and evaluating these items—typically through rating or ranking—to identify preliminary priorities and areas of consensus or disagreement. Panelists may be prompted to explicitly articulate their reasons for dissent (“specify the reasons for remaining outside the consensus,” Pfeiffer, 1968, p. 152) or to revise their judgments. The process continues with additional iterations as necessary, depending on the level of consensus that is aimed for.
Within psychology, examples of CBPs using the Delphi technique included (a) the development of an agreed-upon set of characteristics differentiating clinical from nonclinical levels of fear of cancer recurrence (Mutsaers et al., 2020), (b) the derivation of expert opinions on the choice of a screening tool, antidepressant, and psychological therapy in palliative care (Rayner et al., 2011) and (c) the establishment of shared views on characteristics of psychological constructs such as wisdom (Jeste et al., 2010).
Many other examples of consensus building efforts from the research literature in psychology and other fields showcase the willingness of scientists to engage in this crucial type of work. The extent to which elements of the Delphi technique was used varies between these examples.
In the early phase of the so-called “replication crisis”, an expert group involving researchers with different methodological expertise issued a paper listing benchmark criteria for evaluating the quality of empirical research (Asendorpf et al., 2013, 2013b). The social psychology division of the German Psychological Society (DGPs) then developed specific methodological recommendations for empirical research in their own field. These recommendations were first generated by a task force of researchers with different perspectives on the topic and then accepted by member voting (93% positive votes). A follow-up poll confirmed the acceptance of these recommendations as methodological standards (Glöckner et al., 2024a).
In working memory research, a consensus process was conducted to establish benchmark findings that any theory in this field should be able to account for (Oberauer et al., 2018). Researchers with different theoretical viewpoints jointly developed a list of particularly important findings and then collected votes from the members of their consensus group and from a larger sample of researchers in the field.
In research on social evaluation and stereotype perception, proponents of five prominent theories joined an adversarial collaboration (Abele et al., 2021). They managed to integrate various aspects of their theories with one another, and to develop a joint framework for future research.
In research on distraction in visual search, a consensus process concerning terminology took place, involving researchers with divergent theoretical perspectives (Liesefeld et al., 2024). The process revealed idiosyncratic uses of various important terms in this research area. By jointly sharpening definitions of the most relevant constructs, these researchers enabled a more efficient communication between proponents of different theoretical positions, as well as shared specifications of measures for some of the most relevant constructs.
In medicine and clinical psychology, CBPs are common practice when it comes to the development of treatment guidelines. In Germany, the scientific medicine portal (https://register.awmf.org) lists all treatment guidelines that were developed by different professional societies and associations with the explicit aim of achieving consensus. These guidelines include endorsement ratings for each recommendation listed (e.g., Schäfer et al., 2022). Members of professional societies have the opportunity to submit dissenting votes that are transparently documented within the published guidelines.
In recent years, researchers working in the field of personality psychology have become increasingly interested in consensus building work, as well. A significant number of papers published in the two journals of the European Association for Personality Psychology (eapp) report on the outcomes of their efforts. These papers concern a variety of thematic domains including overall conceptual foundations and research agenda, multi-site collection of data according to a shared protocol, overviews of key robust findings, and important open questions (Back et al., 2023; Baumert et al., 2017; Bleidorn et al., 2021; Letzring et al., 2021; Paul et al., 2022).
Lack of consensus in science: Possible reasons
Although many examples of successful consensus building like the ones just reported do exist, this type of work is still relatively rare in many branches of science, including psychology. In fact, the lack of consensual terminology and methodology in psychology has been showcased and bemoaned for decades (e.g., Anvari et al., 2024; Block, 1995; https://eiko-fried.com/measurement-schmeasurement/). Here, we will give a brief overview of possible reasons for such insufficient consensus development.
Problematic aspects of the contemporary research culture may stand in the way of consensus building: For example, incentives for this type of work may simply be too weak. Consensus building projects tend to be effortful and time-consuming. In a publishing culture that puts emphasis on the quantity of an individual’s research output, the expectable payoff of one’s investment in such a project may thus be unattractively low. Also, as long as individual achievement is valued the most, it may hurt one’s career chances to “only” contribute to a CBP as a member of a relatively large group (e.g., hiring committees may interpret a candidate’s involvement with consensus building work as evidence of a lack of creativity). Furthermore, we are not aware of any funding schemes accommodating the specific needs of CBPs.
Whereas these reasons for insufficient consensus development may be seen as relevant for many or even most branches of science, it may be asked whether the field of psychology has some unique impediments in this regard, and what these impediments may be. One such impediment may arise from the sheer complexity of the issues that are being studied in psychology (Sanbonmatsu & Johnston, 2019). Working toward consensus on complex matters may simply be more difficult, effortful, and risky than reaching consensus on more simple matters. Another likely reason why psychologists seem to have a hard time reaching consensus on substantive issues is their lack of consensus regarding more basic issues: concepts, terminology, and methodology. If concepts and terms are used and operationalized in a rather unstandardized fashion, determining the current state of knowledge becomes a lot more difficult. In response to our target paper, Corker (2022) even argued that conducting meta-analyses may be fruitless, for this very reason. There is just no way around it: In order to be able to build a reliable knowledge base in a cumulative fashion, scientists need common standards. With the present paper, we hope to be able to encourage our colleagues in the field to take up this crucial work in their respective area.
Disclaimers
As a next step, we present our tentative roadmap for consensus building processes. In designing it, we drew from various sources: (a) well-documented examples of consensus processes from the literature, including discussions of difficulties that may arise in the course of such processes (Oreskes, 2019; Zachar & Kendler, 2012; Zachar et al., 2016), (b) our own personal experiences with such processes (Boessel-Debbert et al., 2023; Constant et al., 2023; Gaspelin et al., 2023; Letzring et al., 2021; Liesefeld et al., 2024; Pavlov et al., 2021), (c) published advice on managing group processes (e.g., Gattrell et al., 2024; Hsu & Sandford, 2007; Janis, 1982), and (d) common sense.
To avoid misunderstandings, a few additional remarks concerning the goals and limitations of the current project may be in order: First, the recommendations we give for organizing CBPs are, of course, not intended to be binding for anyone. In fact, we expect that only subsets of our recommendations will actually be applicable to most such projects. Thus, we merely offer suggestions. Those involved in consensus work may adopt those that they like and ignore the others. Second, the outcomes of CBPs are not thought to be binding for anyone who has not explicitly declared that they embrace (some of) them. Third, any scientific consensus is always preliminary. Consensus may and will continue to be challenged, and a proper consensus process should in fact anticipate and even encourage this.
The roadmap
Documentation
We recommend using a shared folder for storing relevant materials, and giving every member of the consensus building group (CBG) equal access and editing rights to that folder. To avoid a diffusion of housekeeping responsibility, it may be advisable to determine whose duty it is to keep the content of the folder up-to-date. For fairness, it may also be good to rotate that responsibility among group members from time to time.
The CBG will also have to make a decision as to whether - or at which point - the project folder will become publicly accessible. In the interest of maximum transparency, we recommend using a repository like the Open Science Framework (OSF) for this purpose, where all documents can receive a time-stamp, and where it is possible to keep the content of a folder private until a decision is made to allow for public access.
Protocol
It will be helpful to document the progress of the CBP online. For this, we provide a protocol template in the Appendix to this paper. The template lists a variety of issues that may be relevant to any given CBP. In the following, we briefly discuss each of these issues. The first thing to go in the project folder may be this protocol describing some of the major parameters of the CBP. Generally, we recommend specifying everything that needs to be specified as soon as possible. As the project progresses, the folder may be filled with brief updates on the outcome of steps that have been completed.
Initiation of the process
A CBP may be initiated in one of two ways, broadly speaking. One may be called the “grassroots” or “bottom-up” approach. Here, a CBP would start with the decision of a relatively small number of researchers to jointly engage in consensus building. The initial outcome of such a process would probably be a rather “local” consensus that only involves a few researchers from a few institutions. Still, such an outcome may be viewed as perfectly legitimate, and it may be very helpful to advance cumulative knowledge building (e.g., in the long run). It is also likely that this type of consensus project may be completed rather quickly, because with small numbers of participants the need for coordination is more limited.
The other approach may be called “top-down”. Here, one would aim for broader community involvement and wide-spread acceptance of the emerging consensus. As a consequence, the process may better be organized by some official organization, such as a learned society. This comes with a number of important structural advantages (e.g., listservs of potentially interested participants already in place, democratically legitimized leadership, resources). However, there also is a certain risk that such consensus initiatives may be perceived as politically tainted (e.g., as being rooted in the personal agendas of particular researchers in leadership positions). We recommend considering both approaches to be legitimate in principle. Regardless of which of the two routes is taken, the origin of the “first spark” of a CBP should be documented.
Composition of the initial consensus building group
The (group of) person(s) who initiate(s) the CBP does not have to be identical with the group of persons who later actively engage in the CBP. For example, the process may be initiated by the board of a learned society and then an open call for participation may be issued to all members of that society. Alternatively, a task force could be set by invitation.
We recommend that the names of the members of the initial CBG as well as the criteria and methods by which they were recruited should be documented. There is nothing wrong with bringing together a group of researchers who just happen to attend the same conferences and/or work on similar topics. It is, however, desirable to openly acknowledge if that was how the composition of the initial CBG came to pass.
Note that putting together a CBG will necessarily involve multiple acts of exclusion, as well, even if those may take place largely outside everybody’s awareness. Those who do not become a member of the CBG will have no or only little influence on the outcome of the process. One innocent reason why that may happen is that sometimes people working on similar issues are just not aware of each others’ work, or even each others’ existence. Other reasons for limited or non-participation might be conflicting schedules, time zones, lack of funding, or family commitments. Whenever possible, these obstacles should be considered to avoid or at least reduce avoidable exclusion.
Note, however, that means exist to make the boundaries of a CBG group somewhat permeable: Later in the process, additional members may apply for membership, or be asked to join (see “joining the group” below). Also, the group may receive counsel from other experts without necessarily making these CBG members (see “consulting external experts” below). Furthermore, when a CBP has produced certain outputs, it is possible (and recommendable) to invite members of the wider community to consider which of these they would endorse, or to provide critical commentaries.
Diversity is likely to be a relevant issue with regard to group composition, especially in regard to the legitimacy and broad acceptability of the outcome of CBPs (Oreskes, 2019). On the one hand, this concerns the theoretical stances and methodological expertise of the CBG’s members. On the other hand, it may concern demographics. For example, in a recent project analyzing the factors enabling and promoting unethical behavior in academia (Boessel-Debbert et al., 2023), the commission was intentionally set up such that individuals from all ranks of the academic hierarchy (from undergraduate student to retired professor) were represented. This is particularly important when the conclusions to be drawn from the CBP may have different real-life consequences for members of different groups.
Not all CBPs have to aim for broad outcome acceptability, however - at least not in their early stages. Sometimes, the sole goal of such a process may be to agree on some shared standards regarding experimental design or data analysis, to facilitate comparisons between future studies conducted by different teams of researchers.
Distribution of power
We recommend making a few choices early on in regard to how the distribution of power in the group shall be handled. This concerns questions like the following: (a) who sets the agenda for the group’s meetings, who schedules and who moderates them? (b) who writes the summaries of meetings?
Assuming that many CBGs probably begin the process as like-minded fellows on rather cordial terms, it may appear unnecessary to even talk about issues like these. Generally speaking, questions of power tend not to be openly dealt with by most people. However, based on our own experiences with this type of work, we recommend going the extra mile and addressing them, to avoid unforeseen complications and frictions in the process later on. The outcome of the respective discussion should be documented in writing, too. Note that assigning certain roles to certain individuals may also help counter the risk of responsibility diffusion. Moreover, it may be advisable to rotate these roles among the members of the CBG from time to time.
A somewhat special role in this regard is that of the “speaker” for the group. In science (as in politics), the word “speaker” is often used synonymously with “head” or “leader”, but here we literally mean the persons that communicate on behalf of the group with people outside the group (other researchers, media, scientific societies, academic institutions, funding agencies etc.). Several solutions in this regard are conceivable. Either, (a) a person is elected to speak for the group, or (b) all statements on behalf of the group have to be signed off on by all members, or (c) everyone may speak for themselves at any time. Regardless of which route is taken, the decision should be documented in writing.
External moderation
A possible way of mitigating undesirable effects of power dynamics in a CBG is to use external moderation. Here, a third party (i.e., a single person or a group of persons) with no stakes of their own in the CBP oversees the entire process or just particularly sensitive parts of it (e.g., votings). The more contentious the issues at hand are, the more advisable it may be to go down that road. External moderators should not have closer personal or professional relationships with some group members than with other members. Ideally, external moderation would be provided professionally, and be paid for. This cost may be accounted for in advance, as part of a grant proposal.
Initial aim and agenda
The initial aim and agenda for the CBP should be put in writing, as part of the protocol. We suspect that most such aims and agendas will contain a number of explicit questions that the CBG is supposed to answer (e.g., “what shall the term XYZ be used for?”, “how shall data of this type be prepared for statistical analysis?”, “How can we integrate the various existing theories into one common framework?”, “How strong is the evidence in favor of this alleged effect?”). A CBG may also specify some sort of common tool that the process is supposed to yield (e.g., an R-script, or a measure).
An important tradeoff to consider is that between the scope and the depth of the consensus that is aimed for. In the early stages of consensus building, it might be worthwhile to accept a somewhat more superficial result (e.g., in terms of detail and precision), as the experience of jointly reaching a first (sub-)goal may help increase trust and cooperation within the group (Schäfer et al., 2022). This in turn may make it more likely that existing differences between individuals or groups of researchers can be constructively dealt with (cf. Van Assche et al., 2023, for a meta-analysis), even when the issues at hand later become more complex and/or contentious.
Timetable
The protocol may contain a preliminary timetable including milestones along the lines of the roadmap that we present here. As a rule of thumb, we recommend aiming for the completion of a first version of the consensus document within a year. The timetable may also specify responsibilities (i.e., who will be involved in which activities).
Agreeing on means of communication
In our experience, the availability of communication means is not as much of a problem as the sheer multitude of them is. At an early stage in the CBP, the group thus needs to decide which channels will be used for (what type of) communication. For some group members, this may mean they will have to get acquainted with new online tools.
A CBG needs little more than (a) email accounts for every member and a shared email listserv for announcements, (b) a tool for regular online meetings, (c) another tool for collaboratively developing documents, including the consensus document, and (d) a tool for scheduling meetings.
Holding in-person meetings is not really necessary for a CBP to work. In our view, the function of doing this would rather be to foster good, trusting relationships among group members. If in-person meetings are wanted, we suggest having at least one at the beginning of the process (a kick-off meeting) and one at the end of it, when the question of output endorsement is on the table. To reduce travel cost and environmental burden, it may be reasonable to schedule such meetings to coincide with the dates of major conferences that will be attended by many CBG members anyway.
Accessibility needs to be accounted for when in-person meetings are planned. For example, it may be more difficult for persons from certain geographic areas to reach the location of the in-person meeting, and it may be more difficult for researchers from certain demographic groups (e.g., parents of young children) to attend a meeting that involves overnight stays. Again, avoidable exclusions should be avoided, to the extent possible, as they may (be perceived to) harm the legitimacy of the process and its outcome.
Defining inputs and outputs
We will use the abstract label “inputs” to denote the information that is fed into the consensus building process, and the abstract label “outputs” to denote the results of the process (typically a consensus document). A CBP may have several different outputs (e.g., a standard R-script and a paper explaining it).
For example, when the goal of a CBP is to determine a consensual estimate of the approximate size of some effect, then a likely method for producing that output would be meta-analysis, and the inputs would be effect sizes obtained by individual studies. When, however, the goal of a CBP is to streamline the measurement practices within a certain field, then the output would be (a limited number of) more consensual measurement practice(s), and the inputs would be various measurement practices that were previously employed.
Collecting inputs
The protocol may specify who will be involved in collecting the inputs, how much time will be spent on collecting inputs, and on how the search will proceed. The use of different methodological approaches for collecting, evaluating, and integrating inputs is recommended (Oreskes, 2019, p. 143). Typical ways of collecting inputs are (a) calls for nominating inputs via the listserv of a learned society, and (b) using a set of specified search strings for a systematic search in literature databases.
Evaluating inputs
The question of which inputs shall be permissible is of crucial importance for any consensus-process, because the outcome can only be as strong as the input from which it was derived. Therefore, the criteria by which inputs are evaluated as permissible need to be explicated - ideally in advance. With pre-registered meta-analyses, this is already standard. Various tools and checklists are available for assessing the quality of individual studies (for an overview, see Luchini et al., 2020) and for examining and dealing with publication bias (e.g., Marks-Anglin & Chen, 2020; McShane et al., 2016).
Given that the primary subject of psychology is human experience and behavior, which tend to be heavily influenced by people’s cultural backgrounds, CBGs should pay attention to likely biases in their collection of inputs. For example, the vast majority of published studies in psychology relies on samples of undergraduate students from Western countries, which may impose significant constraints on generalizability (Henrich et al., 2010; Klimstra & McLean, 2024). If possible, more diverse inputs should thus be included. Constraints on generalizability should also be made explicit when the consensus is published (see below).
Attention also needs to be paid to possible self-serving selection of inputs in the service of the CBG members’ political or professional interests (e.g., Blashfield & Reynolds, 2012). It should thus be explained how the procedure will help avoid an over-representation of the group members’ own work among the inputs. For example, members of the CBG may agree to suggest no more than three self-citations each. Another solution would be to aim for a certain (low) percentage of self-citations from group members overall.
Determining when enough input has been gathered
Both the amount and the breadth of inputs that need to be considered in the course of a CBP will always be limited. It is thus recommendable to specify the criteria by which it will be decided whether “enough” input has been collected. The answer depends strongly on the overall goal of the CBP. For example, if the goal is to derive a consensus that should be acceptable to many people working in a field, then the different views that are held by people in the respective field should be represented in the sample of inputs. If, however, a small group of experts has the goal of concocting a shared R-script to be used for a certain type of data analysis, then only a small number of (e.g., 10) research papers exploring varieties of that analysis procedure would be needed as input.
Joining the group
There should be a formal procedure in place for joining the CBG. This is to avoid a situation where individual members just “bring their friends along” and then the group grows over time with no formal demarcation of who is entitled to vote and who is not. A simple version of such a formal procedure would be to require written applications by candidates and then having a vote on these applications (requiring either an unanimous or a majority decision).
Consulting external experts
If it is deemed necessary, a CBG may always call on experts from outside the group to provide additional input (e.g., Boessel-Debbert et al., 2023). This may be planned from the outset, or be added to the protocol later on. We recommend acknowledging the involvement of external experts, including the extent of their involvement, in the final consensus document.
Updating the agenda
In our view (and experience), it is highly unlikely that a CBP will proceed perfectly as planned, without any additions or changes to the agenda being made along the way. Often, the need to address additional issues will only become obvious after some of the group’s work has already been accomplished. For example, it may become apparent in the process that a consensual definition of concept X will first be needed, or that empirical evidence regarding issue Y first needs to be collected, before the group’s work on the original agenda may continue. We recommend explicating the process by which the CBG’s agenda may be revised, should the need for doing so arise.
Avoiding groupthink
Groupthink means that the members of a group prioritize harmony and conformity in the group over critical thinking and independent judgment in the decision-making process. Diverging viewpoints may thus not be raised to begin with, or be dismissed as coming from less competent or moral individuals (Janis, 1982). To counteract this problem, a sufficiently (self-)critical discussion culture needs to be established and upheld within the CBG. The expression of dissent should be explicitly and repeatedly encouraged. Before stopping the process of collecting inputs, it should be double-checked whether there has been enough room for diverging views to be voiced, heard, considered, and weighed (see Oreskes, 2019). Also, one member of the group may take on the role of a devil’s advocate.
Resolving disagreements
Even if the specification in the CBP protocol may help anticipate and possibly avoid many sources of unnecessary friction in the process, disagreements may still arise unexpectedly. These may concern (a) the parameters of the process (e.g., whether more expert testimony is needed, or who would qualify as an expert), or (b) issues pertaining to the actual subject matter (e.g., whether the evidence in favor of position X is strong enough). We recommend specifying a procedure for dealing with the first type of disagreement (e.g., by simple majority vote). For the second type of disagreement, a systematic analysis of who disagrees with whom over what and why may be in order. The Visual Argument Structure Tool (VAST; Leising et al., 2023) has been designed for that express purpose.
Figure 1 displays a very simple example of how the tool may be used for outlining areas of agreement and disagreement between researchers. Clarity over the type and form of disagreements is a necessary prerequisite for being able to (possibly) resolve them. Such graphical depictions become more and more useful with how complex the subject matter is. VAST display (Leising et al., 2023) expressing a consensual assessment of the current state of a scientific discourse among two persons. Daniel and Susanne (analysts) agree that (a) they both think the results of Study 1 are reason (r) to believe that Y is the case, and that (b) Susanne but not Daniel thinks that the results of Study 2 are reason (r) to believe X is the main reason for Y. The coefficients to the right of the “IS” symbols reflect the strength of the displayed beliefs. The coefficients to the left of the “IS” symbols reflect the certainty with which each person holds those beliefs. “FIMM” stands for “finger-is-moon-mode”, where the distinction between concepts and their verbal labels is abandoned.
Leaving the group
It should be possible for any member to opt-out of the process at any time, without having to give reasons for that decision. Any member leaving should, however, have a right to express their reasons for doing so. The CBP protocol should specify the format in which this would take place: Departures and reasons for departures could either be mentioned in the final consensus document, or they may be documented in the CBP’s online folder.
Excluding someone from the group
Excluding someone from the group during an ongoing CBP is a tough case, which hopefully is rarely necessary and always requires good reasons. However, if the goal of the CBP will be significantly jeopardized if a certain exclusion does not take place, and if softer conflict resolution attempts were not fruitful, few other options may remain.
In such a scenario, it is critical to ensure that the process is transparent, fair, and respectful to all involved. We suggest beginning by clearly documenting the reasons for potential exclusion, linking them directly to the impacts on the consensus project’s goals. Next, we suggest contacting all members of the group, preferably in a meeting dedicated to this issue, to discuss the situation openly and give the individual concerned a chance to respond. If a decision to exclude is reached, it should be done with a significant majority and followed by a formal communication to the individual, explaining the decision respectfully and outlining any possible next steps or appeals they may pursue.
Deriving the output
The means by which the outputs (one or more consensus products) shall be derived should be explicated in advance, too. Whereas elements of the aforementioned Delphi technique are commonly used in this context, this is of course not mandatory and other approaches are possible and legitimate. Large-language models and machine learning may have a part to play in this regard (see below).
An important distinction can be made between the means that are used in the process, as opposed to the means that are used at the end of the process, to legitimize the final product (i.e., the consensus output). As for the process, the group members’ viewpoints and arguments should be exchanged and resolved in open discussions. A culture of mutual trust, respect for different opinions, and modesty in the CBG is crucial for this to work out. Attempting to establish an atmosphere in which it is assumed “that I can be wrong, that you can be right and that perhaps together we will get to the truth” is likely to foster such processes (Popper, 2003, p. 182, translated). However, certain issues may be too complex and/or contentious to be properly dealt with in discussions. These might require the exchange of (possibly: long and detailed) written statements.
As for the final outcome, an approach that lies close at hand and has been successfully used many times for this purpose is to have a voting (Gattrell et al., 2024; Zachar & Kendler, 2012). In such a case, it needs to be made clear whether a unanimous vote on the consensus as a whole is aimed for, or whether the goal is simply to assess the numbers of CBG members that endorse the individual elements of the consensus.
Documenting the outcome of the CBP
We assume that the typical outcome of a successful CBP will be a single manuscript documenting the consensus that was achieved, and the process by which it was achieved. Our use of the term “successful” does not necessarily imply that a unanimous consensus was found, but merely that some consensus was found by some people (i.e., there is more explicit agreement than before). In fact, a CBP may even be viewed as successful if members of the CBG only achieved consensus on the fact that they do not agree with one another on a set of specified issues. This still constitutes a form of scientific progress if the same persons began the CBP not knowing the extent of (dis-)agreement among them. We recommend that a version of the consensus document should be archived in the CBP folder. It may also be published in a scientific journal, preferably with open access.
Mark it as “consensus”
A small number of research projects have attempted to establish inter-rater reliability for assessments of whether an article formulates or refers to some kind of systematic consensus (e.g., Etzel et al., 2024; Leising et al., 2022b). These studies revealed that, at present, it seems to be exceedingly difficult for raters to judge this. A likely reason for this is that no widely accepted reporting standards exist yet for this type of work. Therefore, we recommend marking the outcome of a CBP as what it is, by using the term “consensus” or “consensus document” in the title of the paper and/or among the keywords.
Authorship versus endorsement
As a default, all members of the GBG should be named as authors of the consensus document, at least as long as there have been no defections. This does not mean that each group member necessarily endorses all of the outputs that were derived. If there remains non-trivial disagreement, who endorses which output should be made explicit in the document (e.g., in the form of a table). Before publishing the final version of a consensus document, it would also be possible to make the proposed consensus available to other members of the community, and to ask them whether they would like to be listed as endorsers, even though they were not involved in generating it.
Regarding authorship, several different approaches may be taken. The most traditional approach would be to name those persons who took most of the responsibility for the overall project (e.g., as process managers) as first, second, or last authors. However, this practice has been criticized for being ambiguous in terms of interpretation (e.g., what is the reason for person X occupying this position in the order of authors in this case?) and thus amenable to power abuse. We therefore recommend considering the following alternatives: (a) authoring the consensus document as a consortium (either under a consortium name, or in alphabetical or random order), (b) attributing CRediT roles to each author, to make the type of their contribution clear, (c) adding a “process history” section to the paper that briefly describes contributions of particular importance (e.g., initiation and steering of the process) in narrative form.
Acknowledging contributions by external experts
If the CBP involved the gathering of input from experts outside the CBG, this should be acknowledged, too. The respective statements should contain the times when the experts were consulted, and the subject matter of those consultations.
Format in which the consensus is documented
Depending on the domain, a consensus may be documented in a variety of forms: For example, a terminology consensus may consist in a table that gives definitions for each term. A consensus over research goals may be completely in narrative form. A meta-analytic consensus over the approximate size of some effect may consist only of a single number in a given metric. A theoretical or modeling consensus (e.g., a shared formalization of a narrative theory) may consist of a single formula or set of propositions, accompanied by some explanations. We recommend naming the relevant format as part of the agenda in the beginning of the CBP (see “defining inputs and outputs” above).
For process outcomes that have several parts, we also recommend numbering these parts (e.g., Letzring et al., 2021). This will help make later communication about specific parts of the consensus more efficient (e.g., challenges to an individual proposition). Furthermore, interdependencies among the parts of a consensus should be explicated to the extent that this is possible. For example, when a consensus document contains definitions of concepts as well as statements about relationships among these concepts, then it is basically inevitable that the endorsement of the latter depends on the endorsement of the former.
Documenting dissent
CBPs may be successful even if a consensus is achieved only by some, but not all, members of the group. We recommend making such outcomes visible, as well. This may be achieved by explicating who in the CBG endorses which parts of the consensus (see “authorship vs. endorsement” above). In addition, minority opinions on any point of dissent may be added. This option should generally be available to all CBG members, and regarding any part of the consensus that they do not agree with.
Documenting failure
Failure is always a possible outcome of any CBP - and should be documented (e.g., in the project folder). This may include the suspected reasons for the failure, in order to foster learning for future CBPs. Failure could, for example, mean that the CBG did not make any noticeable progress and no relevant output (i.e., not even a consensus concerning disagreements) was ever generated.
Updating
As we stated above, consensus in science is always preliminary. To account for this fact, we suggest that consensus documents should contain a statement as to when the further development in the respective area will be systematically reviewed again (e.g., after five years), and - ideally - who will be responsible for initiating that review. Depending on the outcome of such a review, the number of endorsers of individual consensus elements may increase, or decrease, or stay the same. Alternatively, it may be decided that the consensus needs to be (partly) revised and that a new consensus document should be written and published. To allow for easy comparisons of the development of a scientific consensus over time, we recommend versionizing consensus documents (including 1.0 for the first version).
Further considerations
Consensus documents as references
We propose that consensus documents may function as replacements for (parts of) what were traditionally the Theory and Methods sections of research papers. Instead of having to describe these things again and again with every newly written manuscript (but in different words, for copyright reasons), authors could simply refer to a consensus document in a single sentence (e.g., “we used a consensual standard for assessing variable X, as described in [citation]”). Not only would this approach spare the authors some time and energy, it would also save readers time and energy they would otherwise have to invest into finding out (e.g.) how the measurement methodology of this study compares with that used in other studies. In this way, consensus work may actually help improve the efficiency of scientific publishing. Of course, any deviations from the consensus that one refers to would have to be made explicit.
Reviewing consensus documents
The fact that consensus processes may involve most of the experts knowledgeable about a certain subject may lead to a complication: When submitting consensus work to a journal for review, it is possible that there are hardly any potential reviewers left who were not themselves part of the CBG. Fortunately, this does not have to be a major problem. Editors may in good conscience assume that the content of a consensus document already reflects the outcome of intense reviewing activity, discussions, and revisions. Therefore, it is appropriate to focus the review for the journal on the more technical aspects, such as the transparency of the process. It should be possible for reviewers to assess this, even if they do not have specific expertise with the research questions that the consensus concerns.
Effects of consensus building processes on future peer-reviews
Another potential problem that may arise from involving most of the experts working on a topic in the same CBG is that, traditionally, they would no longer qualify as potential reviewers for each others’ papers and project grants, because they recently co-authored a paper. This might cause problems when editors or grant agencies are no longer able to find suitable reviewers for papers and proposals and thus have to either desk-reject them, or resort to less qualified reviewers. The latter might underestimate the relevance of the research at hand and be unable to provide feedback of the same quality that an expert could provide. We therefore recommend against using co-authorship on a consensus paper as an exclusion criterion for reviewers if the consensus project involved a large share of the relevant research community. How large this community is and what constitutes a “large” share of it may, of course, be difficult to assess for editors and grant agencies. We therefore advise CBGs to explicitly discuss this point and prepare a written statement that their members can share with decision makers in case they receive a request for review. A variant of this statement may be incorporated in the consensus document.
Funding for consensus building
According to our knowledge, funding agencies do not yet offer any funding schemes specifically for consensus work. In our view, this constitutes a systematic disadvantage for this crucial type of work, in terms of incentives and feasibility. It might be possible, however, to amend existing funding schemes in a way that does help promote CBPs. For example, the Scientific Networks framework of the German Research Foundation (DFG) already incorporates some basic elements that are needed with most CBPs (e.g., funding for meetings and organizational tasks). However, the current version of the framework focuses on Germany, on younger researchers, and on interdisciplinarity - characteristics that might reduce their utility for some consensus projects. A moderate revision of the framework may thus be in order. Explicitly declaring that consensus building is among the framework’s purposes might increase the chances that applicants will actually consider it for their respective projects.
The role of large-language models and machine learning
Large-language models (LLMs) and general machine learning methodology have the potential to support CBPs in various ways. Experiences with using such methodology in research are still rather limited, however, and their potentials and pitfalls are not fully understood yet (see Glöckner, 2024, for a discussion). Arguably, LLMs will not make the work of consensus groups obsolete. CBGs may use the output of LLMs as starting points for their own analyses and discussions. Still, the members of the group would have to delineate the specific consensus problem, and critically evaluate the validity of any of the LLM’s responses and potential conclusions.
At least three promising applications of LLMs and machine learning in CBPs can be identified: First, LLMs can help reduce the ubiquitous jingle-jangle fallacies (Block, 1995) by identifying conceptual overlap between constructs and their operationalizations in verbal statements (e.g., items in a personality questionnaire) (Hussain et al., 2024; Wulff & Mata, 2023, see also Abdurahman et al., 2024).
Second, LLMs and machine learning can be used to account for a larger number of research publications than is typically possible in consensus processes (e.g., Youyou et al., 2023). Models may be trained on smaller sub-samples of papers for which results concerning the variable of interest are available. The resulting models may then be applied to essentially all papers in a field (e.g., by analyzing abstracts or full-texts), thus generating a suggestion for the most likely predictor for the variable of interest (Youyou et al., 2023).
Third, LLMs and machine learning can be used to construct consensus concerning theory and the current state of the empirical evidence base. Machine learning models, for example, have been applied in the domain of risky choices (Peterson et al., 2021) and probabilistic inferences (Glöckner et al., 2024b) to identify (a) how well human behavior can generally be predicted, (b) which class of theories comes closest to this benchmark, and (c) which aspects of those theories seem most relevant for increasing predictive accuracy.
Supplemental Material
Supplemental Material - A tentative roadmap for consensus building processes
Supplemental Material for A tentative roadmap for consensus building processes by Daniel Leising, Heinrich Liesefeld, Susanne Buecker, Andreas Glöckner, and Stefanie Lortsch in Personality Science
Footnotes
Author note
This paper was invited by the editor (Jaap Denissen). Jaap Denissen was the handling editor. Jaap Denissen and an anonymous reviewer reviewed this article.
Acknowledgements
Not applicable.
Author contributions
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online. Depending on the article type, these usually include a Transparency Checklist, a Transparent Peer Review File, and optional materials from the authors.
Notes
Not applicable.
Key questions for consensus building processes
Note. CBP, Consensus Building Process; CBG, Consensus Building Group.
Nr
Questions
1
Who initiated the CBP?
2
Who are the members of the initial CBG?
3
How were the members of the initial CBG recruited? What were the criteria for being considered and included? How many people were approached and how many of those said yes?
4
Is it possible/desired for other persons to join the CBG later? Is there a standard procedure for this (e.g., how would potential members apply, who would make the decisions, and how would those decisions be made?)
5
What is the broad content domain/topic that the CBP addresses?
6
What are the key concepts/terms that will be used in this CBP?
7
What are the initial targets of the CBP? Targets usually belong to one or more of the following categories: (a) Research goals, (b) terminology/definitions, (c) measurement practices, (d) data pre-processing and analysis, (e) state of knowledge/findings, (e) theory, and (f) relation between findings and theory. (recommendation: frame every target as concisely as possible in the form of open-ended questions).
8
What would be the procedure for changing/expanding/reducing/specifying targets of the CBP at a later stage of the CBP?
9
How will the process be documented? (recommendation: use an OSF project with a pre-reg and then continuously document the steps that were accomplished, including difficulties. The answers to the questions contained in this list may be filed in the same OSF project).
10
Who is in charge of the documentation? How does a person acquire this responsibility? Is there a procedure/plan for when/how this responsibility switches from one person to another? How are decisions made about when and what to add to the documentation?
11
What is the CBG members’ view on the following questions: (a) How much consensus there currently is regarding the issues at hand, (b) how many and which different positions are being held concerning the issues in the field, (c) which ones of these are the most/least contentious, and why? (d) are these different positions sufficiently represented in the CBG, (d) has there been a tradition of conflict between advocates of these different positions, and may this make it difficult or even impossible to find common ground? (e) (how) could the chances for reaching some consensus within this CBG be increased?
12
What is the tentative timetable for the CBP? (recommendation: aim for completion of process within one year)
13
What means of communication shall be used among members of the CBG?
14
Who speaks for the group?
15
Who coordinates the schedules and agendas for meetings?
16
Will external (independent) moderation play a role? What role?
17
What are the relevant inputs (e.g., individual effect sizes) and outputs (e.g., average effects sizes) of the process?
18
What are the criteria for including/excluding inputs?
19
What is the criterion for deciding that enough input has been gathered? How will it be assured that sufficiently diverse viewpoints are considered?
20
How will interpersonal conflict be dealt with?
21
What is the procedure for members’ leaving the group? Will departures be documented (possibly even including reasons)? (recommendation: yes)
22
What is the procedure for excluding someone from the group, possibly against their own will?
23
What measures will be taken to avoid Groupthink (i.e., suppression of valid but diverging views in the service of group cohesion)?
24
What are the criteria by which the CBP will be declared to have failed? (note: this is not the same as not reaching perfect or even relative consensus. A CBP may be successful even by only documenting the level of disagreement that exists, according to the consensual view of (most) members of the CBG).
25
How will the outputs be derived and legitimized (e.g., by voting)? What format will they be in (e.g., a correlation coefficient, a shared narrative term definition, an R script)?
26
Who will be the authors of the consensus document (recommendation: all members of the CBG who agree to be authors, independent of which particular outputs they endorse).
27
(How) will other members of the relevant community be given an opportunity to critically discuss and/or endorse certain outputs? (recommendation: when the consensus document stands, issue an open call for endorsement and/or open peer commentaries).
28
How will output endorsement be documented? (recommendation: in the form of a table and/or publication of commentaries).
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
