Abstract
This paper presents a novel argumentation framework to support Issue-Based Information System style debates on design alternatives, by providing an automatic quantitative evaluation of the positions put forward. It also identifies several formal properties of the proposed quantitative argumentation framework and compares it with existing non-numerical abstract argumentation formalisms. Finally, the paper describes the integration of the proposed approach within the design Visual Understanding Environment software tool along with three case studies in engineering design. The case studies show the potential for a competitive advantage of the proposed approach with respect to state-of-the-art engineering design methods.
Engineering design is often described as an information-processing activity based on problem-solving within the constraints of bounded rationality (Simon, 1996; Simon & Newell, 1971). It consists of decomposing an initial problem into a range of sub-problems, proposing and assessing partial solutions, and integrating them as to satisfy the overall problem. This process is collaborative and often involves communication between non-co-located engineers. The development and communication of design solutions require engineers to form and share design rationale, that is, the argumentation in favour or against proposed designs.
These aspects of the engineering design process have led to the development (Kunz & Rittel, 1970) and subsequent investigation (Buckingham Shum & Hammond, 1994; Fischer, Lemke, McCall, & Morch, 1991) of the issue-based information system (IBIS) method, a graph-based formalisation of the decisions made during a design process along with the reasons why they were made. The IBIS method envisions a decision-making process where problems (or issues) are given solutions (or answers) after a thorough debate involving technical, economical, life, environmental and safety considerations. It also provides means to actively develop, communicate and record the reasons (or arguments) in favour or against the options explored during the design process. Initially, IBIS was conceived purely as a conceptual information system and its first implementations were paper-based and totally operated by hand. However, over time several software tools supporting editing and visualisation of IBIS graphs have been developed, for example, Compendium, DRed and design Visual Understanding Environment (designVUE) (e.g. see Aurisicchio & Bracewell, 2013; Buckingham Shum et al., 2006). These IBIS-based tools, including designVUE, which was selected as a starting point for this research, still leave to the users the burden of actually deriving any conclusion from the argumentative process and, eventually, making a decision. This is a task that, depending on the structure of the graph, may not be trivial.
This paper describes the outcome of collaborative research, involving experts of engineering design and argumentation theory, undertaken to overcome the limitations of standard design tools in general, and designVUE in particular. The ultimate goal of this research is to support engineers by providing them with a visual tool to automatically evaluate alternative design solutions and suggest the most promising answers to a design issue, given the underlying graph structure developed during the design process.
Since one of the main features of argumentation theory is evaluating arguments’ acceptability (e.g. as in Cayrol & Lagasquie-Schiex, 2005a; Dung, 1995) or strength (e.g. as in Cayrol & Lagasquie-Schiex, 2005b; Evripidou & Toni, 2012; Leite & Martins, 2011; Matt & Toni, 2008) within debates and dialogues, we have singled it out as a promising companion to engineering design to achieve our research goal. For this application area, conventional notions of ‘binary’ acceptability (e.g. the notions in Dung, 1995), sanctioning arguments as acceptable or not, are better replaced with notions of numerical strength, as the latter are more fine-grained and allow to distinguish different degrees of acceptability.
This paper presents both theoretical and practical results. On the theoretical side, we propose a formal method to assign a numerical score to the nodes of an IBIS graph, starting from a base score provided by users. On the practical side, we describe the implementation of this method within designVUE and its preliminary evaluation in the context of three case studies.
The paper is organised as follows. Section 1 gives the basic notions concerning IBIS and the necessary background on argumentation theory. Section 2 introduces a form of argumentation frameworks abstracting away (a restricted form of) IBIS graphs, and Section 3 defines our approach for the quantitative evaluation of arguments in these frameworks. Section 4 studies some formal properties of our approach, and Section 5 gives formal comparisons with two traditional non-numerical argumentation frameworks, namely abstract (Dung, 1995) and bipolar (Cayrol & Lagasquie-Schiex, 2005a) argumentation frameworks. Section 6 describes an implementation of our approach as an extension of designVUE, and Section 7 illustrates its application in three engineering case studies. Section 8 discusses related work, and Section 9 concludes.
The paper expands the work in Baroni, Romano, Toni, Aurisicchio, and Bertanza (2013) in several ways, notably by studying the properties of our proposed method for the quantitative evaluation of debates (Section 4), by considering the formal relationship with two traditional non-numerical argumentation frameworks (Section 5) and by developing one additional case study (Section 7.3). The latter amounts to revisiting a well-known design problem in the engineering design literature (Ulrich & Eppinger, 2004) and comparing it with standard decision techniques used for this problem, namely decision matrices (Pugh, 1991).
Background
Issue-Based Information System
IBIS (Kunz & Rittel, 1970) is a method to propose answers to issues and assess them through arguments. At the simplest level, the IBIS method consists of a structure that can be represented as a directed acyclic graph with four types of node: an issue node represents a problem being discussed, namely a question in need of an answer; an answer node represents a candidate solution to an issue; a pro-argument node represents an approval to a given answer or to another argument; a con-argument node represents an objection to a given answer or to another argument. An answer node is always linked to an issue node, whereas pro-argument and con-argument nodes are normally linked to an answer node or to another argument. Each link is directed, pointing towards the dependent node.
Figure 1 shows an example of the IBIS graph, as implemented in designVUE, with a concrete illustration of the content of the nodes (labelled A1, A2, P1, C1 and C2, for convenience of reference) in the design domain of internal combustion engines (ICE). All the IBIS graphs presented hereafter are screenshots from the designVUE tool. This example graph has three layers: the first layer consists of an issue node, the second layer of two alternative answers and the third layer of arguments.
A simple IBIS graph.
An IBIS graph is typically constructed according to the following rules: (1) an issue is captured; (2) answers are laid out and linked to the issue; (3) arguments are laid out and linked to either the answers or other arguments and (4) further issues may emerge during the process and be linked to either the answers or the arguments.
Conceptually, the addition of an answer or an argument corresponds to a move in the exploration of the design space.
In the class of design problems we considered for our application, IBIS graphs have specific features. First, each graph concerns a single issue (but this may involve addressing several sub-issues in turn). Second, answers correspond to alternative, mutually incompatible, solutions which can satisfy or not the dependent issue. Each answer is meant to represent a full solution to the issue hence they are mutually incompatible. Typically multiple satisfactory solutions are possible and can be accepted. Argumentation is used to screen them and select just one solution to be put forward. This differs from applications in other domains, for example, in diagnosis, where a combination of different answers may provide the cause for a fault.
In the designVUE implementation of the IBIS method (Aurisicchio & Bracewell, 2013), the four nodes can have alternative statuses to help users visualise aspects of the decision-making process (Figure 2).
Possible statuses of IBIS nodes.
The precise meaning of these statuses depends on the node type and is manually assigned by the users. For example, a designer may change the status of an answer from ‘open’ to ‘accepted’. In this paper, we define a method for automatic, rather than manual, evaluation of nodes in (restricted kinds of) IBIS graphs, based on argumentation theory, reviewed next.
In this work, we will make use of abstract argumentation (Dung, 1995) and some extensions thereof. We review these briefly here.
A (finite) abstract argumentation framework (
An
Given an
In the case of finite frameworks (as in the present paper), the grounded extension corresponds to the result of the iterative application of the characteristic function starting from the empty set until a fixed point is reached:
While
A (finite) bipolar
The discrete argument evaluation for
Given a
Another direction of enhancement of
Let L be a completely ordered set, L* be the set of all the finite sequences of elements of L (including the empty sequence ()), and H
def
and H
sup
be two ordered sets. Let
Note that the local gradual valuation (
The second proposal we consider is the extended social abstract argumentation approach of Evripidou & Toni (2012), taking into account, in addition to attackers and supporters, also positive and/or negative votes on arguments. In a nutshell, the idea is that in a social context (like an Internet-based social network or debate) opinions (arguments) are evaluated by a community of users through a voting process.
An Extended Social Abstract Argumentation Framework (
Given an (acyclic)
In Section 1.1, we have seen that the design scenarios we consider require IBIS graphs with specific features, and in particular with a single specific (design) issue and answers (linking to that issue) corresponding to different alternative solutions. Whereas IBIS graphs (in general and in design contexts) allow new issues to be brought up during the argumentation, as sub-issues of the main issue that are being debated, in this paper for simplicity we will disallow this possibility and focus on design debates that can be represented by IBIS graphs where arguments can only be pointed to by other arguments. Moreover, we focus on graphs in the restricted form of trees, with issues as roots.
We will define, in Section 3, a method for evaluating arguments and answers in IBIS graphs of the restricted kind we consider, aimed at accompanying or replacing the manual evaluation available in some IBIS implementations (Section 1.1). Examining some design scenarios with the relevant experts (see also Section 7), it emerged that, in their valuations, they typically ascribe different importance to arguments, which entails that a base score is required as a starting point for the evaluation. To fulfil these requirements, we propose a formal framework as follows:
A QuAD (quantitative argumentation debate) framework is a 5-tuple the sets
The framework is referred to as ‘quantitative’ due to the presence of the base score. Ignoring this score, clearly QuAD frameworks are abstractions of (restricted forms of) IBIS graphs, with the issue node omitted since QuAD frameworks focus on the evaluation of answer nodes for a specific (implicit) issue. For example, the QuAD representation of the IBIS graph in Figure 1 has Tree representation of an example QuAD framework.
It is easy to see that a QuAD framework can also be interpreted as a
Let
Note that an
As we will see in Section 7, the choice of base scores for arguments is important for a correct evaluation outcome and far from simple since it has to take into account some case-specific factors: the definition of a methodology for assessing these scores based on application features is an important direction for future work.
Given a QuAD framework, in order to support the decision-making process by design engineers, we need a method to assign a quantitative evaluation, called final score, to answer nodes. To this purpose, we investigate the definition of a score function
We have defined direct attackers and supporters as sets (see Definition 2.2), taken from a (static) QuAD framework. However, in a dynamic design context these may actually be given in sequence. We will thus define the final score of an argument in terms of sequences of direct attackers and supporters. In this paper, we assume that these sequences are arbitrary permutations of the attackers and supporters (however, in a dynamic setting they may actually be given from the onset). For a generic argument a, let
Using the hypothesis (implicitly adopted in Cayrol & Lagasquie-Schiex, 2005b; Evripidou & Toni, 2012) of separability of the evaluations of attackers and supporters,1
Here, separability amounts to absence of interaction between attackers and supporters.
Further, we need to deal appropriately with those sequences which are ineffective, where a sequence Z is ineffective if it is empty or consists of all zeros. Formally the set of ineffective sequences is defined as
In our running example, since C1, C2 and P1 have no attackers or supporters, we thus get
To properly deal with ineffective sequences, we use a special value
For non-ineffective sequences, we define
The expression of f supp corresponds to the T-conorm operator also referred to as probabilistic sum in the literature (Klement, Mesiar, & Pap, 2000).
The definitions of
Note that this definition directly entails that, for non-ineffective sequences S,
We now establish some basic properties of
For any
By induction on k. For the base case, trivially the statement holds for k=1 given the definitions of f
att
and f
supp
. Assume that the statement holds for a generic sequence of length k−1, that is,
Then, it is of course required that
For any
Obvious for ineffective sequences. Otherwise, as to
Another desirable property of
For any
As to
In order to finalise the definition of score function, we need to define g. For this we adopted the idea that when the effect of attackers is null (i.e. the value returned by
The operator
Then, the following result directly ensues from Propositions 3.1–3.3:
The score function
For our running example, we get
On the computational side, given that in a QuAD framework the relation
In this section, we analyse some properties of the score function introduced in Section 3, namely the meaning of the extreme values 0 and 1 and the relevant behaviour of
Behaviour with extreme values
The extreme values 0 and 1 carry a specific meaning and should be used accordingly. Given an argument a,
It can also be observed that an attacker with final score 1 has a saturating role as far as attackers against some argument a are concerned, since
It is also easy to see that either an attacker or a supporter with final score 0 has no effect on the final score and could be ignored. Indeed,
Further, extreme values cannot be attained in final scores unless some extreme values are present in the input values to
Given the acyclic structure of a QuAD-framework, it follows directly from the definition of
The last property ensures in particular that extreme values (coherently with the special meaning they carry) may enter into play only by a deliberate choice of the expert providing the base scores.
In order to characterise the range of possible values of the final score, first note that the following inequalities hold, for any argument a assuming
Applying Equation (13) from Definition 3.4 when
The lower bound corresponds to an argument with very strong attackers and weak supporters: it has its base score halved. The upper bound corresponds to an argument with weak attackers and very strong supporters, for which the distance from one of the final score is half the one of the base score. This corresponds to the idea that differences in the base score assessments can only be reversed up to a certain extent as an effect of attackers and supporters. Note also that in case of a contradictory situation (both very strong attackers and very strong supporters), the final score is 0.5 independently of the base score.
Equation (13) applies in the ‘regular’ case where both the sets of attackers and supporters are non-empty and have some effect. The absence of any (effective) attacker or of any (effective) supporter is treated as a special case in Definition 3.4 and this induces a discontinuity in the behaviour of the operator g (and hence of
One may wonder whether a symmetric configuration of attackers and supporters, that is, when the number of attackers and supporters is the same and they have pairwise equal strength, gives rise to a symmetric effect in the evaluation results. We show that a symmetry holds in
Let a be an argument with
The proposition follows directly from
Let a be an argument with
The proof is by induction on the number k of supporters. For k=1, observe that
As a by-product of the proofs of the previous propositions, we observe that assuming
In this section, we discuss the relationships of our approach with abstract argumentation frameworks (
As to Dung's
Given an
As to the evaluation of arguments, in the case of an acyclic
Given an
First note that, since there are no supporters, for every argument a Equation (10) of Definition 3.4 applies, hence
Turning to
For instance, considering the QuAD framework
The proposed approach has been implemented in designVUE, a pre-existing IBIS application.3
designVUE has been chosen as a platform for the implementation of the proposed approach for various reasons: it is open source; it has been developed by the Design Engineering Group at Imperial College London; it is receiving increasing interest from academia and industry and, as a result, has a growing user community. In the following paragraphs, we describe in more detail designVUE and its extension with the QuAD framework.designVUE is an application developed using Java to attain cross-platform portability. Its GUI consists primarily of a main window, which contains the menu bar, the toolbar and the graph canvas.
The main purpose of designVUE is to draw graphs (also referred to as diagrams or maps) mostly consisting of nodes (depicted as boxes) and links (depicted as arrows) among them. The programme does not impose any restriction on the way a graph can be drawn. It is up to the user to confer any meaning to a graph. Among the large variety of graphs that can be drawn, designVUE supports IBIS graphs. These have no special treatment in designVUE and, in particular, there is no support to the evaluation of the argumentative process. In addition to the main window, there are floating windows that can be opened from the Windows menu. One of these, called Info Window, presents information about the currently selected node.
The QuAD framework has been implemented in Java and integrated into a customised version of designVUE, forking its existing codebase.4
The code is available from the designVUE web site.
a new pane called BaseScore Pane has been added to the Info Window: it displays the base score of the currently selected IBIS node and allows the user to edit it (base scores are created with a default value of 0.5);
a new pane called Score Pane has been added to Info Window: it displays the final score of the currently selected IBIS node;
a new menu item labelled Compute Argumentation on IBIS node has been added to the Content menu: it can be invoked only after selecting an IBIS answer node and triggers the score computation for the selected node (and for all the nodes on which it depends).
Figure 4 shows a screenshot of the enhanced version of designVUE evidencing the above-mentioned features.
A screenshot of the enhanced version of designVUE.
As to the algorithm to compute the final scores, it has been implemented in a Java class, which basically carries out a depth-first post-order traversal, which acts directly onto the IBIS nodes displayed in the canvas. To enhance performances in complex graphs where some pro- and/or con-arguments affect many other arguments (e.g. as in the example represented as trees in Figure 3), the algorithm implements a so-called closed list in order to reuse the scores already computed in previous phases of the graph traversal.
A preliminary evaluation of the enhanced version of designVUE was carried out through three case studies. The first, in the domain of civil engineering, concerns the choice of a foundation for a multistorey building to be developed on a brownfield. This case study was developed in collaboration with a civil engineer with more than 10 years of experience in the industry, who was already familiar with the IBIS concept having used it through the Compendium software by Buckingham Shum et al. (2006). The second, in the domain of water engineering, focuses on the choice of a reuse technology for sludge produced by wastewater treatment plants. The third, in the domain of medical engineering, focuses on the design of an improved, reusable syringe with precise dosage control for outpatient use. This case study is a reformulation of a well-known design problem by Ulrich & Eppinger (2004).
The three cases are meant to explore the application of the proposed approach to decision problems with different structures: the foundations case features a canonical IBIS structure with rich rationale and a free-flowing debate, the sludge reuse case has a more rigid structure separated in two tiers where, similarly to the decision matrix approach, a fixed set of arguments is considered for each alternative, while the outpatient syringe case is based on a decision matrix example directly taken from the literature. Further, it can be noted that, differently from the others, the sludge reuse case concerns a decision process where both technical and non-technical considerations, drawn by different classes of actors, have to be taken into account.
The case studies also illustrate the use of the QuAD formalism in different application areas, where different conceptualisations and different nuances in the use of the base scores are adopted. They represent a preliminary investigation of the possible uses of the formalism, aimed at collecting initial feedbacks from domain experts and to possibly point out major difficulties and drawbacks (which actually did not arise). While these cases were built incrementally by direct interaction between the developers and the domain experts, the development of a proper score elicitation and acquisition methodology is under way and represents a necessary prerequisite for an extensive validation.
Foundations
This case study is based on a design task, which was selected to satisfy the following criteria: the design problem had to be well known to the industry; and the problem-solving process had to rely on the application of known and established solution principles. On this basis the task presented in this case study can be considered to be at the boundary between adaptive and variant design (Pahl & Beitz, 1984). The reason for choosing this type of design task is to adopt a walk before you run approach to evaluation.
The case is based on real project experience of the collaborating engineer. However, it was not developed during the actual design process but rather reconstructed retrospectively. Prior to the development of the case, the engineer was introduced to the enhanced version of designVUE and instructed to use it including inputting values for the base scores.
As mentioned earlier, the design problem focuses on the selection of the most appropriate type of foundation for a multistorey building in a brownfield area. This is the part of urban planning concerning the reuse of abandoned or underused industrial and commercial facilities. When considering the choice of building foundations in brownfield sites, multiple alternatives are common and multiple considerations have to be made starting from the different kinds of ground and their load-bearing capabilities, which are usually different than in greenfield sites.
The starting point of the IBIS graph developed by the engineer is the issue to choose a suitable foundation given the requirements discussed earlier. Three types of foundation solutions are considered, namely Pad, Raft and Piles, and these are subsequently evaluated using several pro- and con-arguments (Figure 5).
designVUE graph of the foundation project debate. Note that in designVUE answer nodes may have multiple (manually set) statuses (as in the original IBIS). In agreement with the automatic evaluation, the status for the Pad and Raft foundation answers has been manually changed to ‘rejected’ (red crossed out light bulb icon), while that for the Piles foundation answer to ‘accepted’ (green light bulb icon).
After the development of the IBIS graph, the engineer executed the score computation on the three solutions under two situations: (1) using default values for the base scores and (2) using modified values for the base scores. The modified values for the base scores emerged through a three-step process involving extraction of the criteria behind each argument (see text in bracket at the bottom of each argument in Figure 5), analysis of the relative importance of the criteria in the context of the selected design task and assignment of a numerical value between 0 and 1 to each criterion matching the relative importance. As a result of this work the following base scores were assigned to the 10 criteria: performance and functional fulfilment (0.8); flexibility (0,4); additional structure (0.4); material use (0.3); buildability (0.2); cost (0.2); management complexity (0.1); execution complexity (0.1); unforeseen (0.1) and construction time (0.1).
The results for the situation with unchanged values indicate that Pad (0.51) is the preferred solution over Raft (0.49) and Piles (0.44). Differently, the results for the situation in which the values were changed suggest that Piles (0.56) is slightly preferable to Raft (0.55) and considerably preferable to Pad (0.41). As it can be seen, the three alternatives are ranked exactly in the reverse order. Only the results based on the modified values for the base scores were judged by the expert consistent with his conclusions.
On the one hand, this confirms the importance of weighting pro- and con-arguments with expert-provided base scores in order to get meaningful results. On the other hand, it shows that a purely graphical representation of the pros and cons is typically insufficient to give an account of the reasons underlying the final choice by the experts. In this sense, representing and managing explicitly quantitative valuations enhances transparency and accountability of the decision process.
This case concerns the selection of a technology for reuse of sewage sludge produced from the treatment of wastewater. Similarly to the case study considered in Section 7.1, this problem is well known and the relevant solution principles well established. Moreover, it is a real application example from previous experience of the collaborating expert. Two differences can be pointed out with respect to the case study in Section 7.1. First, the expert involved in the Sludge Reuse case had neither previous knowledge of the IBIS concept, nor of any tool implementing it. Second, the solution assessment is a two-step process, with different actors involved in each step, as described below.
Land application (A.1) has been the traditional sludge reuse option, due to its content of organic carbon and nutrients. Given that reuse in agriculture is subject to restrictions (since the sludge also contains pollutants), other disposal routes are considered as viable alternatives, such as reuse in the cement industry (A.2), energy recovery by combustion (A.3) or wet oxidation (A.4). The choice of the best alternative depends on technical (feasibility, applicability, reliability, etc.), economic, environmental and social factors (as pointed out by Achillas, Moussiopoulos, Karagiannidis, Banias, & Perkoulidis, 2013). In our case, nine factors were considered, five corresponding to pro-arguments (e.g. reliability) and four corresponding to con-arguments (e.g. vulnerability). While technical considerations, developed by experts, have been used to assign a score to each factor for each alternative, the importance of each factor cannot be established univocally, as it varies from site to site on the basis of other kinds of considerations. For instance, the acceptability of a technology as perceived by the neighbouring population is a very important factor in an urban context (see the NIMBY (‘not in my back yard’) syndrome), while it is almost negligible for isolated locations. Hence, the final decision pertains to public officers or committees, who, taking into account context-specific aspects (e.g. social issues), may ascribe different importance to the various factors. To represent this two-phase decision process within designVUE, the expert suggested the use of a graph with a characteristic two-tier structure (Figure 6), where:
designVUE graph for sludge reuse technology selection. Note that the four answers are in the ‘open’ status (indicated as in IBIS by a blue light bulb icon) as the decision varies according to site-specific criteria.
the first tier takes into account the technical strengths and weaknesses of every single alternative. These are the pro- and con-arguments directly linked with the answers, whose base scores have been provided by the domain expert;
each pro- or con-argument in the first tier has been assigned a weight ranging from 0 (for a factor which is irrelevant to a given alternative) to 0.1 (for a factor which is fully relevant to an alternative). For instance, the base score of the con-argument ‘Vulnerability’ linked to A.3 is 0 since combustion is deemed not to be vulnerable at all and is 0.1 for the corresponding con-argument linked to A.1 since land application is extremely vulnerable (e.g. to norm changes). Reuse in the cement industry A.2 has an intermediate degree of vulnerability (base score 0.05), while wet oxidation A.4 is not vulnerable at all (base score 0). The restriction of the range of the base scores corresponds to the choice, valid in this specific context, that each factor may individually affect the final score only to a limited extent: the higher the base score, the more a single factor can possibly play a saturating role. This individual saturating behaviour was deemed not appropriate in this case;
the second tier pertains to the final decision-makers and consists of con-arguments against the pro- and con-arguments in the first tier. By assigning the base scores to the arguments of the second tier, the final decision-makers may modulate the actual influence of first-tier arguments according to context-specific considerations. The default base score of the second-tier arguments is 0, which corresponds to leaving the base scores assessed by experts unaffected and to ascribe the same importance to all factors. The importance of each factor can be reduced by raising the base score of the corresponding con-argument in the second tier. The graph structure ensures that the same factor gets the same weight in the assessment of all alternatives.
Following this line, designVUE can be used to support a multistep methodology taking explicitly into account different classes of stakeholders. While the study of this methodology is left to future work, we carried out some preliminary experiments comparing the results obtained by varying the base scores of the second-tier arguments to show different attitudes towards the factors represented by the first-tier arguments. For instance, as shown in Figure 6, if all factors are deemed to have the same importance (i.e. the base score of all second-tier arguments is 0) then reuse in the cement industry is the preferred solution (with a final score of 0.675), followed by wet oxidation (0.671), land application (0.544) and combustion (0.506). Considering instead a scenario of strong preference for resource recovery-related factors, where the base score of the second-tier con-arguments attacking first-tier arguments not related to reuse is raised to 0.9, a different ranking is obtained where land application (final score 0.527) is the preferred solution, followed by reuse in the cement industry (0.525), wet oxidation (0.512) and combustion (0.485). This ranking is in accordance with the expert's expectation for this scenario: agriculture application is in effect the solution which allows complete material recovery (which is at high level in waste management hierarchy, accordingly with EU policies); on the contrary, combustion leads only to energy recovery, which is considered to be less valuable from an environmental perspective; reuse in the cement industry and wet oxidation processing can be considered as intermediate solutions, where energy recovery is predominant on material recovery.
As evidenced above, the development of this case study required several domain-specific modelling choices and, in fact, pointed out several open issues, first of all the need of methodological guidelines for the use of the formalism and the assessment of base scores. Nevertheless, the expert expressed a positive judgement about the results of the preliminary experiments carried out with the tool and a particular appreciation for the intuitive visual representation and the traceability of the reasons underlying the final decisions. He also remarked that in the environmental field a combination of qualitative and quantitative assessments is often used in decision processes and suggested that providing a formal counterpart to these hybrid evaluations is an important direction of future extension.
This case study concerns the development of a syringe and is based on a design task reported in the literature (Ulrich & Eppinger, 2004) to illustrate concept selection by means of a well-known design method such as the decision matrix (Pugh, 1991). In particular, it compares our enhanced version of designVUE to an application of decision matrices for concept screening. Concept screening consists of making a first cut of the concepts proposed to solve a problem with a view to identifying those upon which to undertake refinement and scoring. The data used to populate this case were extracted from the decision matrix and other design information available in Ulrich & Eppinger (2004). The design problem entails choosing the best concept for an improved reusable syringe with precise dosage control for outpatient use (Ulrich & Eppinger, 2004). The problem is described in the matrix in Figure 7 where it can be seen that seven concepts (labelled A–G) were proposed, namely master cylinder, rubber brake, ratchet, plunge stop, swash ring, lever set and dial screw. Seven selection criteria, listed in the first column of the matrix, were identified to guide the decision. The upper part of the matrix was then filled in by carrying out a qualitative comparison of each concept against a reference solution (REF) for a given criterion. The outcome of the comparison is +, − or 0, meaning, respectively, that the concept is superior, inferior or equivalent to the reference as far as the criterion is concerned. These detailed evaluations are then summarised in the self-explaining lower part of the matrix.
Matrix representing the outpatient syringe decision problem, from Ulrich & Eppinger (2004).
The ranking in the penultimate row of the matrix in Figure 7 suggests, in particular, that the master cylinder concept (A) is preferable to all the others. It has to be observed, however, that the ranking of the rubber brake (B), plunge stop (D) and dial screw (G) has no explicit justification as they all have the same net score.
Figure 8 provides the representation of this problem in designVUE, with the nodes labelled with the strength S as computed through our enhanced version of designVUE, as well as the ranking given in Figure 7 for convenience. In absence of any indication, we have used the default base score of 0.5 for all the arguments. The results indicate that the master cylinder/A (with strength 0.93) is the preferred solution followed by the swash ring (0.87), the rubber brake, plunge stop and dial screw (0.5), the lever set (0.46) and the ratchet (0.45). The order of the ranking is largely in agreement with the matrix except for the rubber brake, plunge stop and dial screw. Indeed, more coherently with the available information, in the IBIS map these concepts get the same rank. Of course if the different ranking in the matrix is induced by some a priori preference or different weighting of the pros and cons, this can be encompassed in our approach using different base scores. It is noteworthy that in this case the results reflect the idea of counting pros and cons. More precisely it can be seen that for any argument a such that both attackers and supporters are present (namely 
In engineering design, various methods are used to support the evaluation of design alternatives, for example, decision matrix (Pugh, 1991) and analytic hierarchy process (Saaty, 1980). Among these, the decision matrix, also known as the Pugh method, is the simplest and most commonly adopted. It consists of ranking alternatives by identifying a set of evaluation criteria, weighting their importance, scoring the alternatives against each criteria, multiplying the scores by the weight and computing the total score for each alternative (see our third case study in Section 7.3). Our work differs from the Pugh method in that it aims to extract a quantitative evaluation of alternatives from rich and explicitly captured argumentation rather than systematically assigned and justified scores. Hence, it seems to have the potential to lead to more logically reasoned decisions, as we discussed in Section 7.3.
The use of argumentation-based techniques has been advocated in several works in the engineering design literature.
The ABEN framework (Jin & Geslin, 2009) provides a detailed argumentation-based model of dialogues for a form of collaborative engineering design called co-construction. It encompasses protocols, strategies and tactics, but does not include any argument evaluation mechanism.
The DEEPFLOW project (Browne et al., 2011) aims at the extraction of formal arguments from design documents in natural language. This approach lies at a different modelling level than ours as it uses a logic-based argument representation rather than abstract argumentation frameworks. The paper exemplifies the use of probabilistic argumentation in this context without analysing the underlying mechanism in detail.
The approach of Liu, Raorane, Zheng, & Leu (2006) is more similar to ours. It models engineering design debates through dialog graphs featuring an IBIS-like structure with attack and support relations and argument weights in the [−1 1] interval. The dialog graph has a tree structure which is reduced to a one-layer tree (basically each argument is attached directly to the relevant answer) with modified weights using some heuristic fuzzy rules and a fuzzy set representation of the five possible qualitative interactions considered (strong/medium attack, indifference and strong/medium support). Thus, differently from our approach, a final score is produced only for answers, not for pro- and con-arguments. Formal properties of the proposed evaluation mechanism are not analysed by Liu, Raorane, Zheng, & Leu (2006); however, it can be observed that the behaviour of their approach heavily relies on the (somehow arbitrary) choice of the qualitative interaction and of the membership functions of the fuzzy sets representing them. In particular, it can be noted that, as evidenced by one of the examples presented by Liu, Raorane, Zheng, & Leu (2006), some arguments with non-zero weight may turn out to have no impact on the final result since the inference mechanism produces the same result as if they were not present.
The HERMES system (Karacapilidis & Papadias, 2001), as well as its predecessor ZENO (Brewka & Gordon, 1994; Gordon & Karacapilidis, 1997), add numerical weights and constraints representing preferences to the basic elements of the IBIS model, giving rise to a hybrid quantitative/qualitative evaluation system. While considering hybrid evaluations is an interesting direction of future work, we remark that the use of numerical weights in HERMES is quite different from ours. Initial weights are first used to determine the so-called activation of arguments only in the case the proof standard called scintilla of evidence is adopted: an argument is active simply when the sum of the weights of its supporters is greater than the sum of the weights of its attackers. The subsequent phases of the argumentation process do not use weights that come back into play in the final stage, where, for each alternative, a minimum and a maximum weight compatible with the constraints is computed and the final weight of the alternative is their average.
The CoPe_it! system (Karacapilidis et al., 2009) uses a similar argument evaluation method as HERMES within an enriched, web-based environment for visualisation of debates, providing users with means to organise and structure data as well as import legacy resources.
Also the EDEN system (Marashi & Davis, 2006) provides the visualisation and the automated evaluation of engineering design debates using an IBIS-like model. The numerical evaluation mechanism is based on a variation of Dempster–Shafer theory of evidence and produces a pair of values, called belief and plausibility, for each argument. While a detailed analysis of this approach is outside the scope of this paper, we remark the basic difference that evidence theory deals with uncertainty quantifications while our approach concerns a notion of gradual acceptability, which is, conceptually, an orthogonal dimension with respect to uncertainty.
None of the above-mentioned papers includes a detailed analysis of the basic properties of the proposed numerical formalism nor of the relationships with non-numerical formalisms of the kind provided in Sections 4 and 5.
Our system extends an existing IBIS-based tool, designVUE, already used in the engineering domain and in particular familiar to some of the experts responsible for our case studies. Other IBIS-based system exist in the literature. For example, Cohere and Compendium (Buckingham Shum, 2008; Buckingham Shum et al., 2006), adopt an IBIS methodology to support design rationale in collaborative settings. However, these systems do not incorporate means to automatically evaluate debates. Other examples are the Carneades (Gordon & Walton, 2006) and the PARMENIDES (Atkinson, Bench-Capon, & McBurney, 2006) systems. These adopt a more articulate model of debate as they use argument schemes and critical questions as basic building blocks of the argumentation process. However, they do not incorporate a numerical evaluation of positions in debates. The extension of these other systems to take advantage of our scoring methodology is a possible direction of future work.
Turning to argumentation literature, the idea of providing a quantitative evaluation of a given position on the basis of arguments in favour and against has been considered in several works.
In Besnard & Hunter (2001), in the context of a logic-based approach to argumentation, an argument structure for a logical formula α is (omitting some details) a collection of reasons supporting
The gradual valuation of
The
The approach of Gabbay (2012) also features significant similarities with our proposal. In fact the notion of real equational network introduced in Gabbay (2012) uses an evaluation function f(a) from the set of arguments to [0, 1] which is defined recursively, for an argument a, as
Other approaches to quantitative valuation have been proposed in the context of Dung's abstract argumentation where only the attack relation is encompassed. For example, Matt & Toni (2008) propose a game-theoretic approach to evaluate argument strength in abstract argumentation frameworks. In a nutshell, the strength of an argument x is the value of a game of argumentation strategy played by the proponent of x. The approach does not encompass support relations nor base scores: extending this game-theoretic perspective with these notions appears to be a significant direction of future investigation. Also, in weighted argumentation frameworks (Dunne, Hunter, McBurney, Parsons, & Wooldridge, 2011), real-valued weights are assigned to attacks (rather than to arguments). These weights are not meant to be a basis for scoring arguments, rather they represent the ‘amount of inconsistency’ carried by an attack. This use of weights is clearly different from ours and, in a sense, complementary. Investigating a combination of these two kinds of valuations (possibly considering also weights for support links) is a further interesting direction of future work.
Conclusions
We presented a novel argumentation-based formal framework for quantitative assessment of design alternatives, its implementation in the designVUE software tool and its preliminary experimentation in three case studies. In addition to those mentioned in Section 8, several directions of future work can be considered. On the theoretical side, a more extensive analysis of the properties of the proposed score function is under way, along with the study of alternative score functions exhibiting a different behaviour (e.g. concerning the effect of attackers and supporters and their balance) while satisfying the same basic requirements. On the implementation side, we plan to integrate the QuAD framework in a web-based debate system similar to www.quaestio-it.com so to gain experience on its acceptability by users in other domains. On the experimentation side, the development of further engineering design case studies (more complex and/or in other domains) is under way and we intend to continue with the on-field comparison with more traditional approaches to the evaluation of design alternatives that we have started with the third case study in this paper.
Footnotes
Acknowledgements
The authors are grateful to the anonymous reviewers for their helpful comments. The authors thank V. Evripidou and E. Marfisi for their support and cooperation. Aurisicchio and Toni thank the support of a Faculty of Engineering EPSRC Internal Project on ‘Engineering design knowledge capture and feedback’.
Conflict of interest disclosure statement
No potential conflict of interest was reported by the authors.
