Abstract
This paper reports our research concerning dialogue strategies suitable for adoption by a human–computer debating system. We propose a set of strategic heuristics for a computer to adopt to enable it to function as a dialogue participant. In particular, we consider means of assessing the proposed strategy. A system involving two agents in dialogue with each other and a human–agent debate system are constructed and subsequently used to facilitate the evaluations. The evaluations suggest that the proposed strategy can enable the computer to act as an effective dialogue participant. It is anticipated that this work will contribute towards the development of computerised dialogue systems and help to illuminate research issues concerning strategies in dialectical systems.
Introduction
Computer-based learning, aka
We argue (e.g. Moore 2000; Naim, Moore, Yuan, Dixon and Grierson 2009) that there are two possible approaches to addressing this problem of untoward didacticism. One is to allow multiple participants in the learning interactions, so that learners are able to use the environment to communicate with each other and their tutors. A different approach is to have the computer itself be a participant in the learning interaction. If the computer is to engage in dialogue with students, it needs a model of dialogue, and such a model is potentially provided by computational dialectics. Computational dialectics is a maturing strand of research that is focused on computational utilisation of the dialogue games developed in the area of Informal Logic, which is an area of philosophy rich in models of communication and discourse, with a heavy focus on argument and “dialogue games”. If the dialectical model (e.g. dialogue game) is, as purported, a model of fair and reasonable dialogue, and if both the computer and students follow the dialogue, then fair and reasonable dialogue will ensue (Moore 1993). Our interest is to use a dialogue game as a vehicle for an educational human–computer dialogue system.
There are many types of dialogue interactions in which people reason together, such as debate, persuasion, inquiry and information-seeking (Walton and Krabbe 1995). The debating style of dialogue interaction is argued by Maudet and Moore (2001) to be important in critical thinking and developing debating and reasoning skills and is also suggested by Pilkington and Mallen's (1996) educational discourse analysis to be effective and to have a rich educational benefit. A particular concern with our research therefore is to investigate the issues surrounding a computer-based system for educational debate.
Earlier, we developed an amended dialogue model DE (Yuan 2004) based on DC (Mackenzie 1979) as the underlying model for our debating system. The motivation behind this development is that the underlying dialogue model of the debating system is required to have the ability to pick out fallacious argument and common errors when they occur during the course of debate. DE appears advantageous over DC in preventing the fallacy of question begging, inappropriate challenges and the straw man fallacy and appropriate handling of the issue of repetition (Yuan, Moore and Grierson 2003).
A particular concern with DE, however, especially from a computational perspective, is that it leaves much to the discretion of the user of the model. For example, after a challenge (why P?), various options are open: one can respond with a “no commitment” to P or a resolution demand (in some circumstances) or a support for P. Further, there is no guidance within the rules as to the content of the support. Similarly, after a withdrawal or a statement, there are no restrictions on the move types or move contents. All DE does is to legitimise a set of move types given the prevailing circumstances, and occasionally give some indication of the semantic possibilities. In a human–computer debate setting, it is therefore crucial that the computer is given some means of selecting between available possibilities, e.g. to maintain focus after a statement or a withdrawal, so that the produced moves are appropriate at the pragmatic level. This choice must be based on some suitable strategy.
Appropriate strategic knowledge is, then, essential if the computer is to produce high-quality dialogue contributions. The importance of strategies in dialectical systems has also been stressed elsewhere (e.g. Bench-Capon 1998; Walton 1998; Amgoud and Parsons 2001; Maudet and Moore 2001; Amgoud and Maudet 2002; Rahwan, McBurney and Sonenberg 2003). A set of strategies, enabling a computer to act as a debate participant was therefore proposed based on experimental study of the DC game with human participants (Moore 1993) and subsequently further developed in (Yuan 2004). However, the issue of whether the proposed strategy can in practice provide adequate services for a computer acting as a dialogue participant to produce good dialogue contributions cannot be settled on an
To assess the appropriateness of a proposed strategy, Maudet and Moore (2001) suggest that the strategic heuristics need to be tested and that a convenient way to do this is via generation of dialogue by the computer itself. There are two possible ways to approach this: (1) allow two computational agents to engage in dialogue with each other and then study the results and (2) enable a human user to debate with a computerised debating system. Both approaches are seen as important to evaluate the proposed strategy from different perspectives. The former approach focuses on assessing whether there are unexpected new situations, requiring new heuristics, which have been missed in the current proposal and assessing whether the computationally generated dialogues are reasonably sound from a dialectical point of view. The latter approach, however, focuses on assessing the usability of the proposed strategy from the users’ point of view. An agent–agent assessment is necessary prior to user-based assessment to avoid issues such as missing heuristics and apparent flaws appearing in and interfering with more expensive user studies. Both approaches are therefore used in this study. A prerequisite for the study is the construction of suitable computational agents.
The remainder of the paper is organised as follows. First, we introduce the game DE and our current set of strategic heuristics. Secondly, we discuss the construction of a set of computational agents that can engage in debate with each other, generating dialogue transcripts for subsequent analysis. Thirdly, we discuss our human–computer debating system with which the user-based evaluations were carried out. We then discuss related work in this area and the significance of this work. We finally draw the conclusion and discuss our intended future work.
The dialogue model DE
The DE system is set up with two participants in dialogue with each other. Participants’ moves are regulated by a set of rules, which prohibits illegal events. The set of rules is outlined as follows (cf. Yuan et al. 2003).
Available move types
The DE model makes the following move types available to both participants in the dialogue.
Commitment rules
Each participant in a dialogue using the DE model owns a commitment store. Each commitment store contains two lists of statements: the
Dialogue rules
Participants in a dialogue using the DE model are required to adopt the following rules.
Debating strategic heuristics
One of the primary motivations behind the development of our debating system, as argued in Section 1, is the expectation that it can be used to educational advantage – to develop students’ debating and reasoning skills and domain knowledge. In the context of an educational human–computer debate, the computer is ultimately intended to be not only a debate competitor but also an intelligent tutor. From an educational point of view, while intuitively one may wish the system to “speak the truth”, on the other hand, it could be argued that some sort of deception may be inherent in the definition of dialectical argumentation (Grasso, Cawsey and Jones 2000) and in the playing of devil's advocate, yet both of these may be educationally valuable (Retalis, Pain and Haggith 1996). A balance between trust and deception might therefore be required. It can be argued that the computer should be honest with respect to the publicly inspectable stores, since the system should be seen to be trustworthy. How, though, should the computer treat its knowledge base? The computer is required to have the ability to argue either as a proponent or as an opponent of the topic under discussion, and this implies that the computer's knowledge base can support both the opponent view and proponent view (see Appendix 1 for an example of the system knowledge base in the domain of capital punishment). As a result, the computer may constantly face inconsistent knowledge while making decisions (for example, it can find both support for and objection to the notion that capital punishment acts as a deterrent). In this situation, it is suggested that the computer is allowed to insist on its own view for the sake of argument even though it may have more reasons in favour of the user's view. Given the above discussions, the system is currently configured as what can be described as a
In the DE model, there are five dialogue situations that the computer might face, defined by the previous move type made by the user: a challenge, a question, a resolution demand, a statement or a withdrawal. Each therefore needs to be considered in relation to the strategic decisions the computer might need to make. It has been argued (e.g. Moore 1993) that these decisions are best captured at three levels.
Retain or change the current focus. Build own view or demolish the user's view. Select method to fulfil the objective set at levels 1 and 2.
Levels (1) and (2) refer to strategies which apply only when the computer is facing a statement or withdrawal, since in all other cases the computer must respond to the incoming move. Level (3) refers to tactics used to reach the aims fixed at levels (1) and (2) and applies in every game situation. These levels of decisions are discussed in turn below.
Level (1) decision concerns whether to retain the current focus or to change it. The decision, that is, involves whether to continue the attempt to substantiate or undermine a particular proposition. Moore (1993) argues that continuing to execute a plan of questions or addressing the previous move will guarantee that the current focus is retained but that it is possible not to directly address the user's latest utterance yet still retain focus. Moore further suggests that there is a presumption in favour of addressing the previous move, but that this presumption may be broken when the line of questioning is deemed a blind alley, or if a successful removal of the user's support has been made, or if, on regaining the initiative after a period without it, a resolution demand can legally be made.
The decision at level (2) considers whether to adopt a build or a demolish strategy. A build strategy involves seeking acceptance of propositions that support the computer's own thesis, while a demolish strategy seeks to remove the user's support for his thesis. The decision is needed only at the beginning of games and when level (1) decision involves a shift in focus. A demolish strategy could possibly be part of a broader build strategy, e.g. a goal-directed plan of questions building the computer's own view might involve removing some unwanted responses from the user. A building attempt might also be part of a broader demolish strategy, e.g. the computer is using a line of questions to build the case for P in order to attack the user's view ¬
The third level of decisions applies to each of the dialogue situations. Level (3) heuristics for each dialogue situation are given in turn below.
A question raised by the user
Questions asked may involve questioning an individual statement, e.g. “Is it the case that P?” or a conditional, e.g. “Is it the case that Q implies P?”. In these situations, the computer is allowed by the DE rules to answer If neither P nor ¬ If only one of them (P and ¬ If the computer has previously uttered “no commitment” to the found statement, then it utters “no commitment” to remain consistent. Else the computer utters the found statement. If both (P, ¬ If the computer has an acceptable support for ¬ If the computer has no acceptable support for ¬ If the computer has no acceptable support for ¬
A challenge made by the user
There are three DE legal options available in response to a challenge: a resolution demand, a support or a withdrawal. The first option concerns an inconsistency when the user is challenging a modus ponens consequence of his/her own commitments. From an educational point of view, it can be argued that the computer should point out this inconsistency and make the user aware of this kind of inconsistency in a debate. For the latter two options, Moore's (1993) experimental analysis suggests that one would normally reply with a carefully chosen support if available. In DE, there is no guidance within the rules as to the content of the support. The selection between alternative supports may be influenced by the profile of the agent. Given the definition of the profile of a partially honest agent, the computer should give a support according to its knowledge structure rather than invent one which may not be a suitable support. In addition, it can be suggested that a support which can be further supported is preferred over one which cannot be further supported, since a further challenge might be expected from the user. Given this, the heuristics after a challenge of P can be proposed as follows.
If P is a modus ponens consequence of the user's commitment, then pose a resolution demand. Else if there is only one acceptable support available in the Else if there is more than one acceptable support available, then state the one that can be further supported. Else if all the available acceptable supports are equally supported, then randomly choose one of the supports. Else if no acceptable support is available, then withdraw P.
A resolution demand made by the user
A resolution demand made by the user concerns an allegation that the computer has committed to an inconsistency in its commitment store. In the most likely event, the computer would face a resolution demand of the type “resolve {¬
The user might invoke another type of resolution demand (i.e. resolve (Q, Q⊃P, why P) or resolve (Q, Q⊃P, “no commitment” P)) in the event of the computer's challenging or withdrawing a modus ponens consequence of its commitments. In this situation, the computer is required, by the game DE, to withdraw either Q or Q⊃P or affirm P. Moore (1993) argues that the use of such a resolution demand would suggest that, in the user's view at least, the computer has challenged or withdrawn a proposition to which it ought to be committed given the remainder of its commitment store. In such a case, given the partially honest agent profile argued for earlier, the computer takes the option of affirming the disputed consequent P.
A “no commitment” made by the user
After a “no commitment”, DE places no restrictions on either move type or contents. The computer's decisions are therefore more open. Following Moore (1993), the heuristics after a “no commitment” are proposed as follows.
If the computer is facing a “no commitment” to a statement supporting the user's thesis If the withdrawn statement is a unique support of the user's asserted proposition Q, and Q is not the user's thesis, then challenge Q. Else check whether the user retains adherence to the thesis. If the computer is facing a “no commitment” to a statement supporting the computer's thesis If the non-committal statement is a modus ponens consequence of the user's commitments, then pose a resolution demand. Else switch the focus.
A statement made by the user
After a statement, there is no restriction on either move types or move contents in If P is a support of the computer's thesis, then use P as the starting point to build a case for the computer's thesis. Else check whether the user still adheres to his/her thesis.
When the computer is facing a statement (say P) which supports the user's view or militates against the computer's view, a set of heuristics is proposed as follows, in line with Moore (1993).
If there is an inconsistency (e.g. (P, ¬ Else if there is a piece of hard evidence in support of ¬ Else if there is any support of ¬ Else if there is any support of ¬ Else if P is challengeable, then challenge it. Else switch the focus.
To decide whether a statement is challengeable, the computer needs to consider the nature of that statement (e.g. whether it is a piece of hard evidence) and the relevant DE dialogue rules. If the computer arrives at option e and the statement in question is not challengeable, the computer reverts to level (1) of the strategic decision-making process.
A further concern is how the plans of questions in heuristics (c) and (d) are organised. Following the Walton, Reed and Macagno (2008) scheme of
During a plan execution, the user might give unwanted answers (i.e. answers not favourable to the computer's plan). The approach taken here is that the computer tries to remove the obstacles (unwanted answers) and put the plan back on track while the initiative is still held. The plan execution process is as follows.
If a wanted answer is given, then carry on to execute the plan If a non-committal answer is given If there is an expressed inconsistency in the user's CS, then pose the appropriate resolution demand If the user affirms the disputed consequence, then continue the plan Else abandon this line of questions Else abandon this line of questions If an unwanted answer (e.g. ¬ If there is an expressed inconsistency in the user's commitment store and the unwanted answer ¬ If the unwanted answer is withdrawn, then continue the plan and re-pose the question. Else abandon this line of question Else if the unwanted statement is challengeable, then challenge the unwanted statement If the unwanted answer is withdrawn, then continue the plan to repose the question of P. Else abandon this line of question Else abandon this line of question
This, then, is the set of strategic heuristics currently adopted by our human–computer debating system. The following sections consider the evaluations of such strategy.
Agent-based evaluation
This section discusses an agent-based evaluation of the strategy proposed in the previous section. We first outline the construction of the two computational agents that were used for generating dialogue transcripts. We then propose criteria against which the agent-generated dialogue transcripts are analysed. We end this section with the analysis of results.
Computational agents
Two computational agents (referred to henceforth as Simon and Chris) who conduct debate with each other, operationalising the dialogue model DE and the proposed strategy, have been built by the authors using the Java programming language and deployed on the Internet (http://staff.unak.is/not/yuan/games/simulation/dialogueSimulationSystem.htm). The game starts with one agent (say Chris) asking the other agent (say Simon) his opinion on the controversial issue of capital punishment and adopting the opposite position to engage Simon in debate on the issue. Chris can adopt either a proponent or an opponent role. That is, if Simon chooses to support the view of “capital punishment is acceptable”, Chris will adopt the opposite view of “capital punishment is not acceptable”, and vice versa. Both agents then engage in debate on the topic of capital punishment, given these initial positions on the issue.
The agent system contains five main units: the interface unit, the dialogue unit, the planning unit, the commitment unit and the knowledge base unit. The

Computational agents user interface.
The
The
In addition, there are five components (focus shift manager, build manager, demolish manager, plan generator and plan executor) that are designed to provide special services to the assertion and the withdrawal strategists. The focus shift manager will be called by the assertion or withdrawal strategist to decide whether to change the current focus. The build and demolish managers will be called by the focus shift manager to check whether there are methods available to either build its own positions or attack partner's positions. The plan generator is responsible for generating a set of propositions and forming a line of questions when required by the assertion or withdrawal strategist, the build manager or the demolish manager. The plan executor is responsible for executing a plan. The assertion and withdrawal strategists will constantly look up whether there is a plan under execution, if there is, then they call the plan executor to carry on its execution.
The
The
This, then, is our agent-based system. In this section, we discuss an experiment we conducted to investigate the system and its strategies. There are two component variables in the experiment: the strategy component and the KB component. The strategy component of an agent can adopt the proposed strategy or a random strategy. The KB components for both agents can be the same set or a different set. In order to evaluate the strategies discussed earlier, the agent system was run under three conditions.
One of the agents adopts the strategy and the other uses random argument, and both have the same KB. Both agents adopt the same strategy and share the same KB. Both agents adopt the same strategy. One of the agents’ KB is a subset of the other.
It is anticipated that using random argument (1) might reveal certain failures of the heuristics (e.g. unexpected new situations) that might be overlooked by manual use of them. Conditions (2) and (3) may reveal whether a high-quality dialogue is generated if both agents adopt the proposed strategy. Condition (3) may also be used to see whether an agent with a smaller KB might turn out to be the loser of the debate given that both agents share the same dialogue strategy.
Three dialogue examples (DE4, DE5 and DE6 – full transcripts can be found from Appendix 2)1 DE1, DE2 and DE3 were generated for a separate study of the dialogue model DE in (Yuan 2004).
The agent-generated dialogue transcripts have been analysed against the proposed criteria. Results are given next.
Analysis of the results
This section contains the results of an analysis of the agent-generated dialogue examples DE4, DE5 and DE6. During the analysis, each utterance of the dialogues is considered, in turn, via the addition of appropriate annotations in square brackets. The heuristic being invoked is indicated at the beginning of the annotations. For example:
S>Why is it the case that CP is not acceptable? ……………………………………………[S2e – S challenges for further reasons] C>Because innocent people may get killed. ………………………………………..[C3 – C gives reasons supporting its thesis]
This approach to the analysis makes it possible to examine the data under the evaluative criteria discussed in the previous section, and thus to assess whether the proposed strategy can provide adequate services to enable the computer to act as a dialogue participant and produce good dialogue contributions.
Evaluation criterion 1: robustness
The interest here is whether all dialogue situations generated by the agents are successfully dealt with by the proposed strategy. In total, the agent runs have generated 133 dialogue situations for the strategic agents to deal with. Among them 74 are
Assertion strategist.
Seventy-four assertions were generated and are summarised in Table 1. Their move contents are classified into six categories: move maker's thesis, statements supporting the move maker's thesis, opponent's thesis, statements supporting the opponent's thesis, statements handing over its turn and unrecognised statements. The right most column of the table contains row number to facilitate the discussion. The second right column of the table includes a summary of each strategic response to these dialogue situations.
Dialogue situation 1 (response to statement): summary.
Dialogue situation 1 (response to statement): summary.
The 18 instances of mover maker's thesis and 25 instances of supporting the move maker's thesis are considered as statements standing on the side of the move maker, and it might be expected that the strategic agent should attack them. It can be seen from Table 1 that the assertion strategist does provide various means of either attacking opponent's view or building its own view, with seven exceptions of giving up this opportunity (four occasions summarised at row 5 and three occasions at row 12 in Table 1). In these circumstances, the assertion strategist runs out of methods and therefore hands over its turn to the opponent. This might be seen as reasonable since the opponent may have something more to say, but on the other hand, more sophisticated means might be needed if the strategic agent constantly runs out of methods and therefore overuses the handover strategy.
For statements standing on the side of the opponent, the assertion strategist is expected to use them rather than to attack them (cf. Walton 1998). It is shown at rows 13–16 of Table 1 that the assertion strategy does provide some means of handling this, e.g. using the strategy to build its thesis or continuing its plan execution or checking the opponent's thesis adherence. On the occasion of DE6-053, the game ends since the move maker has committed to the opponent's thesis. It is interesting to see that on five occasions, the assertion strategist has to decide what to do when its dialogue partner hands over its turn. This is not specified in the heuristics. The current implementation provides some mechanisms for this. On one of the occasions (at row 18 of Table 1), the referee calls off the game since both parties have run out of methods. Concerning the remaining four instances (summarised at row 17 of Table 1), the assertion strategist checks its opponent's thesis adherence.
There are four unrecognised statements generated in
Generally speaking, the assertion strategist successfully handles most dialogue situations except unrecognised statements and the situation of running out of moves.
Eleven withdrawals (or “no commitment”) are present in the transcripts. They are categorised as follows: withdrawal of move maker's thesis, withdrawal of a statement supporting the move maker's view and withdrawal of a statement supporting the opponent's view. These are summarised in Table 2 and discussed in turn below.
Dialogue situation 2 (response to withdrawal): summary.
Dialogue situation 2 (response to withdrawal): summary.
On one occasion (at row 1 of Table 2), the move maker is withdrawing its thesis. The game is therefore ended since the move maker has given up its view. On 2 occasions (at rows 2 and 3 of Table 2), the move maker is withdrawing statements supporting its thesis. The response of the strategic agent is to challenge the statement supported by the withdrawn statement, or assess whether the dialogue partner still adheres to his thesis.
There are eight instances in which an agent replies “no commitment” to statements that support its opponent's thesis. On five occasions (at row 4 of Table 2), the withdrawal strategist deals with this by starting another line of argument. However, for the remaining three instances (at row 5 of Table 2), the withdrawal strategist fails to do so. The explanation here is that the withdrawal strategist has run out of methods, and therefore hands over its turn. This needs further consideration if the strategic agent constantly faces this kind of situation.
Given the above analysis, the withdrawal strategist seems to be working satisfactorily with the exception of needing more sophisticated strategies when running out of moves.
There are 11 challenges generated. It is shown in Table 3 that on nine occasions (at row 1 of Table 3), the challenge strategist provides a suitable ground following its knowledge structure. There are two occasions (DE6-048 and DE6-050 at row 2 of Table 3), on which the challenge strategist gives a non-committal answer. Concerning the first of these, the strategic agent cannot find a support for the statement in its KB and therefore responds with a non-committal answer. Regarding the latter, the strategic agent does have a support in its KB for the statement being challenged; however, the support is not an acceptable ground since the partner of the strategic agent has challenged the support and the strategic agent had withdrawn this support from its commitment store during the earlier stage of dialogue. The strategic agent, then, would beg the question were it to answer the challenge with this unacceptable support (cf. Yuan et al. 2003). It is therefore reasonable for the challenge strategist to give a non-committal answer rather than to commit the fallacy of question begging.
Dialogue situation 3 (response to challenge): summary.
Dialogue situation 3 (response to challenge): summary.
In sum, then, the challenge strategist seems to successfully deal with all dialogue situations in this category.
In total, 37 questions are generated and summarised in Table 4. They fall into four categories according to the nature of their move contents: game start, move content supporting move maker's thesis, opponent's thesis, move content supporting opponent's thesis. These are in turn discussed below.
Dialogue situation 4 (response to question): summary.
Dialogue situation 4 (response to question): summary.
Only the strategic agent in DE5 and DE6 needs to respond to the game starting question (at row 1 of Table 4). In DE4, it is the random agent responding and therefore not in need of analysis. Heuristics for responding to the starting question are not specified in the current strategy. The current implementation uses a random approach and agent choosing either view would be considered as reasonable.
The situation of a proposition being questioned that supports the questioner's view (24 instances at rows 2–4 of Table 4) might be interpreted as the questioner asking the opponent to commit to that proposition, according to (Walton and Krabbe 1995). The opponent might be expected not to commit to it if it has an alternative because the opponent knows it is a proposition not on his side. It is shown in Table 4 that the strategic agents respond 18 times with a
There are nine instances (at rows 5 and 6 of Table 4) that can be seen as one agent merely checking whether the opponent still adheres to its thesis. On eight occasions, the opponent insists on its view since it still has acceptable grounds for its view. On one occasion (DE6-052), the opponent accepts the opposite view. The dialogue fragment for this instance is reproduced below.
S>Is it the case that nobody is willing to die? C>Yes, I think nobody is willing to die. S>Is it the case that “nobody is willing to die” is a prima facie reason for “CP makes people less likely to commit serious crimes”? C>Yes, I think “nobody is willing to die” is a prima facie reason for “CP makes people less likely to commit serious crimes”. S>Is it the case that “CP makes people less likely commit serious crimes” is a prima facie reason for “CP is a good deterrent”? C>Yes, I think “CP makes people less likely commit serious crimes” is a prima facie reason for “CP is a good deterrent”. S>Is it the case that “CP is a good deterrent” is a prima facie reason for “CP is acceptable”? C>Yes, I think “CP is a good deterrent” is a prima facie reason for “CP is acceptable”. S>I think CP is acceptable. ………………………………. C>I don't know why innocent people may get killed. S>Is it the case that CP is not acceptable? C>No, I think CP is acceptable.
In the above dialogue fragment, the opponent (agent Chris) has no acceptable ground for its thesis since its support has been withdrawn in turn 051. Further, agent C has explicitly committed to the set of propositions and conditionals which implies its dialogue partner's thesis at turn 32–29. Agent C therefore makes a concession and accepts the opposite view in turn 053, thus losing the debate. The nine instances of questions involving thesis adherence checking can be seen as being reasonably answered, given the above analysis.
It is interesting to see that two questions of statements supporting the opponent's thesis (at row 7 of Table 4) were generated. As expected, the strategic agent takes advantage of this and gives positive responses.
Overall, most dialogue situations that can occur in DE are arguably successfully handled, and the strategy can largely be regarded as satisfying the
Of concern here is the issue of initiative. Initiative is relevant because if one dialogue participant is constantly starved of the initiative, he/she cannot fully or freely advocate his/her point of view (cf. Walton 1989; Moore 1993, p. 229).
In DE4, the strategic agent hands over its initiative nine times to the random agent during the 54-turn dialogue. There are seven instances of initiative shift during the 52-turn dialogue in DE5, and there are four instances of initiative shift during the 54-turn dialogue in DE6. The longest duration of one agent retaining the initiative is from turn DE5-022 to turn DE5-034, in that the agent made two challenges, four questions and one statement to hand over the initiative.
On the whole, both agents have had opportunities to express their point of view, and the strategy therefore appears to satisfy the
Evaluation criterion 3: coverage of issues
Of interest here is whether points implemented in the KB are revealed and discussed. The KB used by the agents discussed in this paper (partly shown in Appendix 1) can be seen as containing three subtopics (deterrent issue, popularity issue and moral issue) which support the view of “CP is acceptable” and two subtopics (moral issue and consequence issue) which support the contrary view.
It can be seen that all the issues in the KB are raised during the evolving dialogue. Further, these issues are discussed to the maximum depth possible given the KB in the DE5 and DE6 dialogues (DE4 is not considered here since one of the agents uses random argument).
In sum, the dialogue generated by the strategy is acceptable in respect of the
Evaluation criterion 4: argument flow
The analysis here will consider whether the arguments generated by the agents as they follow the strategy are flowing well. If participants’ contributions are clearly related to each other or appear logical, then, the flow will be deemed acceptable (cf. Moore 1993).
In order to inspect the protocols for disjointedness, the process of analysis adopted includes a manual insertion of missing premises of incomplete arguments into the machine generated transcripts; this approach is in line with that adopted by (Walton and Reed 2002). An example of this is the following extract from DE5 (in which presumed missing premises are indicated in italics).
C>I think CP is not acceptable. S>I think CP is a good deterrent,
By inserting the missing premises, it is easy to see the argument flow in this example – agent S is addressing agent C's utterance.
In DE4, DE5 and DE6, the strategic agents made 133 dialogue contributions. Ninety-three of them involve directly addressing the previous utterance. There are 11 instances of the strategic agent's contributions which involve handing over its turn and asking the dialogue partner to continue its line of argument. There are seven instances of checking whether the dialogue partner still adheres to its thesis given its previous statement. All these can be seen as engendering reasonable argument flow. However, there are 22 apparent instances of disjointedness, 10 of which involve building a line of questions toward the negation of the dialogue partner's previous statement and 12 of which involve a shift of the current focus. The apparent examples of disjointedness might in fact be reasonable in terms of the underlying strategy of the move maker. Consider, for example, the following extract from DE5.
S>I think it is not always wrong to take a human life. C>Is it the case that human lives are scarce? S>Yes, I think human lives are scarce. C>Is it the case that “human lives are scarce” is a prima facie reason for “it is wrong to take a human life”? S>Yes, I think “human lives are scarce” is a prima facie reason for “it is wrong to take a human life”. C>I think it is wrong to take a human life. S>I think murderers should receive capital punishment.
In the above dialogue fragment, turn 015 might appear at first sight to be disjointed from the previous utterance. However, turn 015 can be seen as agent C starting its distance plan to build the case for “it is wrong to take a human life” (culminating in turn 019) in order to rebut agent S's view “it is not always wrong to take a human life”. Turn 20 might be seen as abrupt shift, but is perhaps, justified as opening a new focus (punishment). Part of the reason for the feeling of abruptness is the lack in DE (indeed all dialectic models) of linguistic means of introducing transitions between foci.
Generally speaking, though, the arguments can be seen as flowing well – the total of 22 instances of disjointedness represents a relatively small proportion of the 133 strategic agents’ dialogue contributions. The apparent examples of disjointedness are arguably reasonable in terms of the underlying strategy of the move maker with the exception of the absence of explicit linking for a focus shift.
Evaluation criterion 5: defeatability
The interest here is whether the agent adopting the strategy is defeatable. In DE4, the strategic agent wins over the random agent. In DE5, the two agents with the same strategy and same KBs end up in a stalemate. In DE6, the strategic agent with a subset of the KB of the other loses the game. The proposed strategy seems more intelligent than a random strategy given its winning result in DE4. On the other hand, the strategy is defeatable as shown in DE6 in which the strategic agent Chris does lose the game and does so in a manner which might be considered reasonable, as opposed to a mere
The evidence therefore suggests that an agent adopting the strategy can both win and lose a game in an artificial setting. However, the fact that the agent can be defeated in the special case when the KB is a subset of the other participant KB is not sufficient to prove the defeatability criteria in the more general case of human–agent argumentation. The defeatability criteria will be further discussed in the user-based evaluation section below.
To summarise the agent-based evaluation, the qualitative assessment suggests that, generally speaking, the proposed strategy can provide good services enabling the computer to act as a dialogue participant. The assessment also reveals several weaknesses in respect to the
User-based evaluation
A prerequisite for a user-based evaluation is to construct a human–computer debating system. A fully functional human–computer debate prototype, operationalising the dialogue model DE and the proposed strategy (currently able to debate the issue of capital punishment) has been built and deployed on the Internet (http://staff.unak.is/not/yuan/games/debate/system.htm). An example user system interface can be seen in Figure 2. The user system interface provides a debate history to record the debate for subsequent analysis, commitment stores for both the user and the computer and input facilities for the user to make a move. Each commitment store is designed to have two lists of statements, those that have been explicitly stated by the owner of the store and those that have been merely implicitly accepted. In the current system, a statement that is only implicitly accepted is marked with an asterisk, as shown in Figure 2. The commitment stores are updated during the debate according to the DE commitment rules.

Human–computer debating system user interface.
Turning to the user input facilities, a menu-based approach is adopted. Under this approach, the user needs to make a double selection, choosing from the available move types and then from the list of prescribed propositions from the domain under discussion. To prevent the user from constantly breaking the rules and to increase the learnability of the system, only the legally available move types are provided by the system before the student makes a move. Once the user has selected a move type, they need to select some propositional content. The system provides a number of means for doing this, depending on the nature of the move type. The details are as follows: (1) the move contents for resolution demand and challenge move types can be selected from the computer's commitment store; (2) the move contents for a withdrawal can be selected from the user's commitment store; (3) the move contents for assertion and question move types can be selected from the list of propositions (with the aid of the
The system enables the user (S) to conduct a debate with it on the controversial issue of capital punishment. The computer (C) can adopt either a proponent or an opponent role. That is, if the user chooses to support the view of “capital punishment is acceptable”, the computer will adopt the opposite view of “capital punishment is not acceptable”, and vice versa. The system then engages the user in debate on the topic of capital punishment, given these initial positions on the issue. Further details of the system can be found from (Yuan, Moore and Grierson 2007a).
Two types of evaluations have been carried out: expert evaluation and user-based evaluation. The aim was to assess how acceptable, usable and potentially valuable this innovation was, prior to greater exploitation and subsequent further evaluations of its educational value. Three HCI experts were invited to evaluate the human-computer debating system. One expert preferred to evaluate the system cooperatively with the system author, in that the system author noted down the pertinent issues while the evaluator operated on the system (in effect, adopting a cooperative evaluation approach; Dix, Finlay, Abowd and Beale 2004). In addition, the expert agreed to take part in a short interview after the cooperative evaluation session. After the evaluation, the notes of this evaluation were formalised by the system author and emailed to the evaluator to check their accuracy. The two other HCI experts preferred to evaluate the system at their own convenience. The debating system was emailed to these experts. Formal feedback was emailed back to the system author after their evaluations.
Essentially, the expert evaluations give positive evidence concerning the usability of the system, in general, and of the DE dialogue model and the proposed strategy in particular. This is supported by the evaluators’ views on their experiences of operating on the system, such as “definitely easy for students who are familiar with computers”, “very straightforward to use it”, “no procedures annoyed me while operating on the system” and “the system's overall performance is acceptable”.
There are, however two weaknesses concerning the proposed strategy that were revealed. One participant reported that she found it rather uncomfortable when the computer constantly hands over its turn after a period of debate. She further suggested that “this is fine, to make me to explore more argument. I would say it depends on personality of the debate participants”. The second weakness is that the system fails to make a concession at the right time. The evaluator wrote: “after two long debates with the computer, it seemed to let me win. Though it is not clear why at that point it changed its mind. During these debates I thought I had the computer agree to a series of propositions that would lead it to change its initial position but it seemed to hold these incompatible ideas, without difficulty. When it did concede, it was a surprise to me”. This reflects the issue that there is no heuristic available for the computer to voluntarily concede a debate except when the user checks its thesis adherence. At some point, the computer should concede the debate voluntarily, i.e. when its thesis supports have all been removed from its commitment store and the user's thesis support added into its store, whereas currently the computer, without a heuristic for voluntary concession, simply hands over its turn to the user. Current work involves amending the system to cater for this concern.
A user-based evaluation of the debating system has also been carried out and documented in (Ævarsson 2006). Ten university students from the University of Akureyri, Iceland, who are interested in debate participated in the study. Three were from the computer science department, two from the education faculty and five from the business faculty. No participant took part in more than one study. There were two pilot studies, in order to determine the set-up of the experiment, followed by eight further studies. Each study was conducted in three stages: introduction, debate and interview. Prior to each study, the debating system was set up on the screen by the researcher. An English–Icelandic online dictionary was provided since the participants were native Icelandic and English was their second language. The introduction session involved the researcher briefing the participants about the purpose and procedures of the study. In the debate session, the participant was asked to conduct debate with the computer for 15 min in the pilot studies; this was extended to 20 min for the subsequent studies. After the debate session, a semi-structured interview was carried out concerning users’ experience of using the system. The dialogue transcripts were saved and the interviews were taped for subsequent analysis. The results are summarised in Table 5. The 10 participants successfully conducted debate with the system without difficulty. In total, 462 turns were generated. The longest debate took 79 turns and the shortest 29 turns. Incidents of users breaking the DE rules were very rare. Six participants did not break the rules at all. Two participants seemed to have changed their original view on capital punishment, eight of the debates ended with a stalemate and none of the participants made the computer concede.
Summary of dialogue transcripts.
The interview transcripts were analysed under four headings: system intelligence, user enjoyment, value of the system and the user interface issues. All of the participants agreed that the system is intelligent and a worthy debate component though some participants would like to see the system more aggressive and more attacking. Eight participants said they enjoyed the debates with the system, and they particularly liked the non-deterministic nature of the system's dialogue contributions. Two participants said they felt frustrated a little with the input facilities though they managed to debate with the system. All participants claimed they would like to debate with the system again were it available on the Internet. The value of the system was affirmed by the participants. All participants agreed that the system can be used to help them practise argumentation. One participant recommended that the system could be used as an aid to the
The user-based evaluation revealed several concerns with the user interface. First, in the beginning, the participants had to spend some time in figuring out where the debate took place, whether it was in the student commitment store, computer's commitment store or in the debate history window. Once they had worked this out, most participants stayed focused on the debate history rather than the commitment stores. Participants expressed that they would prefer a one-window arrangement like the MSN communication programme rather than the current three-window set-up. Second, the participants found it confusing sometimes being guided to select move contents from the commitment stores on the top of the screen while the major input facilities were located at the bottom of the screen. Thirdly, participants were not happy with constantly clicking the
To address these user interface issues, the user interface of the system is being redesigned and tested further. The first three weaknesses are being addressed together by moving the commitment stores to the
We have outlined our research aimed at the development and evaluation of suitable strategies for an interactive system offering dialogue involving competitive debate. Before discussing the significance of the evaluations reported above, there are three possible difficulties with the methodology that ought to be discussed. The first difficulty concerns the general approach of this study. Dynamic testing has been used (as opposed to a static analytical approach). For the dynamic approach, by its very nature, completeness is not possible (Dijkstra 1972). This nature is reflected in this study, for example, no
Second, it might be argued that only a small number of dialogue transcripts (three in total from three pairs of agents) have been generated for analysis. However, this study is intended not as a statistical enquiry, but rather as an investigation into the detail of the argument generated by the strategy. Further, 165 utterances are generated (DE4: 59; DE5: 52; DE6: 54). Each utterance needs to be considered in depth, and as a result this study does, it is held, provide sufficient data for the purpose of this assessment of debating strategies.
The third difficulty may be that there is a heavy reliance on judgements of quality by the author of the heuristics and the systems and that the criteria of quality are themselves intuitively formulated. The judgement issue may be endemic to the field, and similar criticisms could perhaps be levelled against much of the dialectics literature (Moore 1993). Further, computationally generating dialogues from dialectical theories may represent a step forward, and making the various criteria clear and explicit may well localise the issues to relatively narrow concerns at any one time, and this may detract from the judgemental element. In addition, these criteria have enabled us to provide a thorough analysis of the data collected, and to leave the results, and the data itself, available for independent inspection.
We argue, then, that the methodology adopted is sound. We believe that the work reported makes a valuable contribution to the fields of human–computer dialogue in general, and of strategies in dialectical systems in particular. Concerning the latter, we have proposed a set of strategies to be utilised with the dialectical system DE and provide a means to assess the appropriateness of the strategy. Further, since the agent systems and the human–computer system we have built can potentially be adapted to function with a different set of strategies, it potentially provides people working in the field of dialectical strategies with a test bed within which they can experiment with new strategies they develop (Maudet and Moore 2001; Amgoud and Maudet 2002).
Considerable research effort has been devoted to the formulation of strategies for dialectical systems. Grasso et al. (2000), for example, adopt, for their nutritional advice-giving system, schemas derived from Perelman and Olbrechts-Tyteca's (1969)
Our work adds to this body of research and is unique in that it concerns specifically strategies for computer-based debate. More generally, our work contributes to the field of human–computer dialogue. We have proposed a set of strategies for an educational human–computer debating system and our evaluations of the strategy in use reveal that the strategy is able to provide a good service for a computer to act as a dialogue participant. Our debate system is a unique system and therefore makes a contribution to the broadening of the human–computer interaction
Conclusion and further work
We have proposed a set of strategic heuristics for a human computer debating system regulated by the dialogue game DE. An agent-based dialogue system and a human–computer debating system have been constructed to facilitate the evaluation of the proposed strategy. Both agent-based and user-based evaluations of the strategy in use have been carried out. The evaluations provide evidence that the proposed strategy can provide good service for a computer to act as a dialogue participant. The results are essentially favourable in demonstrating the innovation's acceptability and usability. The evaluations also provide evidence for the educational and entertainment value of the system. Several weaknesses of the dialogue strategy and user interface are revealed and discussed, and our immediate further work is to address these concerns.
There is also a variety of additional interesting ways in which the research can be carried forward. A limitation of the current study is that it is restricted to the consideration of a single strategy. The comparison is with moves generated in a random manner. This was necessitated by the absence of alternative strategies. When there is a refined version of the strategy available in the future, further comparative experimental studies of the strategy can be conducted, e.g. by comparing the current set and the refined set. A similar weakness is that our evaluative criteria need to be judgementally applied. Further work will involve refinement of the criteria, in line perhaps with Amgoud and Dupin de Saint-Cyr (2008).
Turning to our basic human–computer debating system, it can be enhanced to allow the user to question or challenge a conjunction of statements (e.g. P ∧ Q) or conditional. Currently, the DE system is perhaps over-zealous at the challenge move, in that saying “why P?” removes commitment to P, whereas a dialogue participant may wish to remain committed to P but hear his interlocutor's reasons for P. Relaxing this rule could be implemented and tested. Further features, such as the system explaining its tactics and reasons for choosing one move over another to the user, along the lines of Pilkington and Grierson (1996), can also be implemented. The system could then be evaluated with a number of different domains of debate, e.g. abortion, politics, terrorism, to test the extent to which the design and knowledge representation are generic. This evaluation might be extended to encompass the use of the system to investigate pedagogic issues, such as the educational value of one to one debate, and how learners make inferences about the knowledge domain (Moore 1993). The evaluation could also be used to chart, through and across dialogues, how the way in which students engage in dialogue evolves. Ultimately, the system can perhaps be enhanced to keep track of student learning as such.
We are also planning to permit free user input for the debating system, initially via an option to enter fresh propositional content in addition to selecting from those made available by the system. This will enable us to build up the system knowledge base by adding new claims to it and, more importantly, to experiment with the extent to which the strategic heuristics can cope with such new input.
A further problem with our basic debating system, and indeed the underlying DE model, is that the move set, and in particular the question move, is restricted to bi-polar questions. Specifically, the absence of a move which enables debate participants to seek explanation of the substantive points being made, assumes that each participant fully understands all the points. In an educational context in particular, this is likely to be an undesirable restriction. Given this, our current research is seeking to extend the dialogue beyond debate per se and Naim et al. (2009) have addressed this.
The dialogue model and the strategy are currently represented informally using structured English and are hard-wired into the program logic by the developer of the system. Ideally, the dialogue model, strategy or even the knowledge base of the system could be written and modified by the users of the system and then directly translated to the program code. A formal representation is required for this. This is a general software engineering issue in the field of computational dialectics. A plausible investigation might be along the lines of (Bench-Capon, Geldard and Leng 2000; Reed 2006; Yuan, Moore, Reed, Ravenscroft and Maudet 2011) where each move act is specified using a pre- and post-condition pair.
