Abstract
There has been increasing use of argument-based approaches in the development of safety-critical systems. Within this approach, a safety case plays a key role in the system development life cycle. The key components in a safety case are safety arguments, which are designated to demonstrate that the system is acceptably safe. Inappropriate reasoning in safety arguments could undermine a system's safety claims which in turn contribute to safety-related failures of the system. The review of safety arguments is therefore a crucial step in the development of safety-critical systems. Reviews are conducted using dialogues where elements of the argument and their relations are proposed and scrutinised. This paper investigates an approach of conducting argument review using dialectical models. After studying five established dialectical models with varying strengths and drawbacks, a new dialectical model specially designed to support persuasion and information-seeking dialogues has been proposed to suit the requirements of argument review. An argument review prototype system was then iteratively developed. It adopted the model and aims to conduct argument review dialogues in a structured manner. User-based evaluations of the system suggest the usefulness of the dialectics approach to safety argument review. The evaluation also sheds light on the future development of such an application.
Introduction
Human society has become increasingly dependent on computer-based systems. As technology advances, microprocessors and software that run on them have found their way into the hearts of products which so many of us routinely use as part of our daily lives. The presence of microprocessor-based electronic control units in devices which people trust their lives to, such as the braking systems of cars and radiation therapy machines in hospitals, justifies the importance of safety as a first and foremost requirement in the engineering of these crucial systems. These systems are called
These systems have high
There are at least three different approaches to constructing such a justification. The first is the product-based testing approach. The assessment of whether a system satisfies its dependability requirements, however, cannot be conclusively based on the testing of the final product, since product-based testing alone is not exhaustive and only proves the existence of faults and flaws, not the absence of them, as noted by Dijkstra (1972). The second is the process-based approach where safety justification tends to be
However, this process-based approach is problematic because it is not possible to demonstrate a direct causal relationship between the use of prescribed processes and high levels of safety. The evidence produced by the developers does not necessarily give a quantitative demonstration that the desired SIL or DAL has been achieved, and factors such as difficulty in detecting software failure in accidents and commercial sensitivity of failure data make it difficult to determine the software's operational levels of safety (Weaver, Fenn, & Kelly, 2003). So even if software is produced for a SIL or DAL process, it would be difficult to assess whether the required level of safety has been attained until it goes into operation. A further difficulty with the process-based approach is that although it may work well in a stable environment where best practice was supported by extensive experience, it cannot accommodate change and alternative strategies to achieve the same goal.
Over the past two decades, there has been a trend towards an explicit
The rest of the paper is organised as follows. First, we briefly review the recent development in safety arguments in Section 2. We then argue for dialectics in safety argument review and propose requirements for a suitable safety argument review model (SARM) in Section 3. A safety argument review model that suits the requirements of argument review is proposed in Section 4. A system that operates the review model is constructed and discussed in Section 5. Section 6 contains the user- and expert-based evaluation of the system and the argument review model. We finally conclude the paper and discuss our planned future work.
Safety arguments
A safety argument is part of a safety case that combines a body of evidence, showing that the evidence is sufficient to demonstrate the claim that the system is acceptably safe. To effectively assure the safety claims of a system, designers have to be able to communicate their arguments, along with the evidence which supports them, to reviewing peers and approving authorities in a clear, concise, and efficient manner. In a simple way, arguments could be described in free text; however, this approach has major drawbacks such as the lax and varying structure in many natural languages as well as the difficulties in maintaining the quality of writing once the number of contributors increases. Not only do these factors make the reviewing process very inefficient but, more importantly, they make it difficult to ensure all the stakeholders understand an argument in the same way (Kelly & Weaver, 2004).
Graphical notations are often deployed to represent arguments in a more structured and transparent manner (e.g. Buckingham Shum, 2008; Gordon & Walton, 2006; Reed & Rowe, 2004). In the safety-critical domain, there are two established, commonly used notations – the Goal Structuring Notation (GSN) proposed by the University of York (Kelly, 1999; Kelly & Weaver, 2004) and the Claims Argument Evidence proposed by Adelard LLP (Emmet & Cleland, 2002). The GSN has been adopted by an increasing number of companies in safety-critical industries and government agencies, such as the London Underground and the UK MoD, as a standard presentation scheme for arguments within safety cases (cf. Object Management Group, 2010). For example, 75% of UK military aircraft have a safety case with safety arguments expressed in GSN. This paper will use GSN to represent arguments graphically unless stated otherwise.
The GSN uses standardised symbols to represent an argument:
individual constituent elements (claims, evidence, and context) relationships between elements (e.g. how claims are supported by evidence).
In GSN, claims in an argument are shown as

Example use of goal structuring notation.
Arguments are by their nature subjective, and their robustness is not self-evident (e.g. confirmation bias (cf. Leveson, 2011)). To increase the soundness of the arguments, a review element is necessary for the assurance of safety cases. The process of documentation of arguments makes the case much more transparent and easier to review (Hawkins, Habli, Kelly, & McDermid, 2013). A review normally involves two parties: the proposing party, typically the system engineer, who asserts and defends the safety case, and the assessing party, e.g. an independent safety assessor who is or represents the certification authority, and scrutinises and attacks the arguments to uncover any vulnerability. The objective of a review is for the two parties to form a mutual acceptance of their subjective positions (Kelly, 2007).
Safety argument development and reviewing is a not a post-development activity, rather it should occur throughout the different stages of the system development life cycle. Typically at the end of the design stage, the certification authority must be convinced by the designer that the presented safety case is sound before giving approval to commence implementation. Another review by external, possibly regulatory authorities may also take place before the system is formally put into service. But internal reviews, performed by the peers or immediate superiors of the proposer, usually happen at an earlier stage with a higher frequency, not only to achieve a higher efficiency but also to lower the cost of correcting defects in a design by uncovering them earlier in the life cycle.
A successful safety argument review requires both someone to develop and defend the safety arguments and someone to challenge and critique the assumptions made (Kelly, 2008). The need for dialogue interactions has also been reinforced by the recent issue of Defence Standard 00-56 (UK MoD, 2004), as quoted below:
9.5.6 Throughout the life of the system, the evidence and arguments in the Safety Case should be challenged in an attempt to refute them. Evidence that is discovered with the potential to undermine a previously accepted argument is referred to as counter-evidence. The process of searching for potential counter-evidence as well as the processes of recording, analysing and acting upon counter-evidence are an important part of a robust Safety Management System and should be documented in the Safety Case.
The importance of dialogue interactions in safety argument development seems clear. To understand how dialogues for multiple purposes can be conducted in a systematic, structured, and efficient manner, it is necessary to look at the
There are a few advantages for the adoption of dialectical models in argument review. For example, a dialectical model can help the identification and aid with the removal of fallacious arguments and common errors (Walton, 1998), and thus provide a platform for the argument proposer and reviewer to reason in a fair and reasonable manner. The set of move types provided by the dialectical model (e.g. question, challenge, and counter-argument) can help the argument reviewer to prepare for the type of criticisms that are likely to be made against an argument. Using a dialectical model can compensate to some extent for the formlessness of free review text by regulating the moves permitted in a dialogue, such that the orderliness of the dialogue depends less on the participants’ language capabilities. It also improves accountability by providing facilities for recording dialogue history and commitments for assurance purposes. Finally, the rules set by dialectical systems could be enforced automatically using computers, so a computerised dialogue system could conduct organised dialogues with improved efficiency (Yuan, Moore, & Grierson, 2007).
A variety of dialectical models has been developed to suit dialogues serving different purposes under different circumstances. To determine which dialectical model is suitable for argument review, it is necessary to first understand which types of dialogues are present in argument reviews, and then consider a selection of models which are designed to support these dialogues. According to Walton & Krabbe (1995), dialogues between humans can be classified into six primary types, according to factors including the overall purpose of the dialogue, individual participants’ objectives, and the information held by them initially. In practice, however, dialogues may contain attributes of multiple primary types. Generally speaking, the six primary types of dialogues are:
According to Krabbe (2000), persuasion, negotiation, and eristic dialogues are considered argumentative, whereas deliberation, inquiry, and information-seeking dialogues are considered non-argumentative, despite containing reasoning elements.
Knowing the general properties such as objectives of different dialogue types, it is then possible to try to identify the kinds of dialogues which are present in argument reviews by matching the characteristics of argument review to the qualities afforded by each category of dialogues. As stated in Section 3, the goal of the argument review is to have both the proposer and the reviewer come to a shared position on the argument's strength, by which the proposer tries to convince the reviewer that the presented argument is sound, while the reviewer may also try to convince the proposer to accept a counter-argument or the existence of certain vulnerabilities and limitations. Since both sides start off having conflicting opinions on the presented argument and try actively to persuade the other side, persuasion dialogues could be seen as instrumental during argument reviews, especially at the argument criticisms stage. Other types of dialogue, such as information-seeking and inquiry dialogues, could also take place since the reviewer may have a need to seek additional information regarding the argument from the proposer as well as a need to verify if what the argument conveyed wholly represents what the proposer is obliged to make known.
To effectively assist the conduct of dialogues between participants of argument review, a dialectical model therefore has to satisfy the requirements (cf. Yuan, Moore, & Grierson, 2003) listed below.
A model for safety argument review
Dialogue model assessment summary.
Dialogue model assessment summary.
The approach of building a new model by combining useful elements from existing models has the advantage of making use of the strengths of several established models in assisting the types of dialogue they were designed to support, while simultaneously making up the shortcomings they individually pose as an argument review framework. A safety argument review model, namely SARM, is proposed. The overall review process of SARM is represented in Figure 2. The process encompasses three distinct phases: initiation, review, and revision. It starts with an initiation of a proposal of safety arguments followed by reviews conducted by independent reviewers. The proposer then responds and revises the initial proposal in light of the criticisms made by the reviewers. The revised version of the safety arguments will be further reviewed until reviewers reject or accept the arguments or the proposer withdraws. The number of iterations Safety argument review process.
The proposer maintains a commitment store as the safety case model. The safety case model is initially empty and only the proposer is allowed to add or remove elements from the model. SARM also maintains a review model which contains all the interaction sequences made by the reviewer and proposer alternately. Only the reviewer is allowed to add a new instance of interaction sequence to the model while both the reviewer and the proposer are allowed to view and expand an existing interaction sequence by appending fresh moves.
A participant is allowed to make one or more moves in every turn, until he or she voluntarily ends his or her turn and passes on the initiative. For the initiation stage, the proposer uses different forms of the
Claim (P)
Description: A GSN goal that presents the overall goal of the argument as well as the sub-goals that support the top goal. P is an atomic proposition.
Supported form:
GSN symbol: 
Precondition: Proposer has control of the dialogue.
Post-condition: Proposer is committed to P. If P is made as response to a review, the correspondent link
Permitted GSN connections: context, assumption, justification, strategy, claim, evidence.
Permitted review options: counter-argument, challenge, question, query-ref, comment. Strategy (S)
Description: A GSN strategy that presents a brief description of the argument approach, e.g. argument from hazard avoidance (Yuan & Kelly, 2011).
Supported form:
GSN symbol: 
Precondition: Proposer has control of the dialogue.
Post-condition: Proposer is committed to S.
Permitted GSN connections: claim, context, assumption, justification.
Permitted review options: challenge, question, query-ref, comment. Evidence (E)
Description: A GSN solution presents a reference (stated as a non-phrase) to an evidence item or items that provide support for a particular claim.
Supported form:
GSN symbol: 
Precondition: Proposer has control of the dialogue.
Post-condition: Proposer is committed to E.
Permitted GSN connections: claim
Possible review options: challenge, question, query-ref, comment. Context (C)
Description: presents a contextual artefact. This can be a non-phrase style of reference to contextual information, or a statement.
Supported form:
GSN symbol: 
Precondition: Proposer has control of the dialogue.
Post-condition: Proposer is committed to A. If A is made in response to a review, the correspondent link
Permitted GSN connections: claim and strategy.
Permitted review options: question, query-ref, and comment. Assumption (A)
Description: A GSN assumption that presents an intentionally unsubstantiated statement.
Supported form:
GSN symbol: 
Precondition: Proposer has control of the dialogue.
Post-condition: Proposer is committed to A. If A is made in response to a review, the correspondent link
Permitted GSN connections: claim and strategy.
Permitted review options: question, query-ref, comment. Justification (J)
Description: A GSN assumption that presents a statement of rationale for inclusion.
Supported form:
GSN symbol: 
Precondition: Proposer has control of the dialogue.
Post-condition: Proposer is committed to J. If A is made in response to a review, the correspondent link
Permitted GSN connections: claim, strategy.
Permitted review options: question, query-ref, comment. SupportedBy (R)
Description: A GSN link that presents an inferential relationship between goals in an argument or evidential relationships between a goal and the evidence used to substantiate it.
Supported form:
GSN Symbol: 
Precondition: Proposer has control of the dialogue.
Post-condition: Proposer is committed to R.
Permitted GSN connections: goal-to-goal, goal-to-strategy, goal-to-solution, strategy-to-goal.
Permitted opponent responses: challenge, question, query-ref, comment. InContextOf (R)
Description: A GSN link that declares a contextual relationship.
Supported form:
GSN Symbol: 
Precondition: Proposer has control of the dialogue.
Post-condition: Proposer is committed to J.
Permitted GSN connections: goal-to-context, goal-to-assumption, goal-to-justification, strategy-to-context, strategy-to-assumption, and strategy-to-justification.
Permitted review options: challenge, question, query-ref, comment.
The following set of move types is designed for the review stage. These move types include counter-argument, resolution demand, challenge, question, query-ref, and comment. All the moves at this stage will be added to the review model. If the move is a reply to an existing interaction sequence, then the move is appended to the tail of the sequence; otherwise a new interaction sequence (e.g. containing the current move and the argument element the current move is replying to) is created and added to the review model.
Counter-argument (Q)
Description: An argument that is opposite to the one being proposed say P.
Supported form: I argue that Q
Visual representation: the symbol (
) attached to the argument being countered.
Precondition: Reviewer has control of the dialogue. P is part of the safety case model.
Post-condition: Move added to the review model.
Permitted proposer responses: withdrawal (P), a claim or justification that defends P. Resolution demand (P and Q)
Description: Demand the opponent to resolve two or more of his commitments which the speaker perceives to be conflicting.
Supported form: Please resolve (P and Q).
Visual representation: the symbol
attached to each of the conflicting argument elements.
Precondition: Reviewer has control of the dialogue. P and Q are part of the safety case model.
Post-condition: move added to the review model.
Permitted proposer responses: withdrawal (P) or withdrawal (Q), or making a claim that justifies the inconsistent situation.
Challenge (P)
Description: The speaker casts his strong doubt towards P explicitly.
Supported Form:
Visual representation: the symbol (!) attached to the argument element being challenged.
Precondition: Reviewer has control of the dialogue. P is part of the safety case model.
Post-condition: Move added to the review model.
Permitted opponent responses: justification (J), claim (R), withdraw (P).
Question (P)
Description: A bipolar question that casts doubt on particular point, or persuades the acceptance of a particular point.
Supported Form: Is it the case that P?, or Isn't it the case that P?
Visual representation: the symbol (?) attached to an argument element being questioned.
Precondition: P is part of the safety case model. Reviewer has control of the dialogue.
Post-condition: Move added to the review model.
Permitted opponent responses: claim (P), claim (not P), withdrawal (P). Query-ref
Description: A non-binary question that seeks information, e.g. what? how?
Supported form:
Visual representation: the symbol (?) attached to an argument element being queried.
Precondition: Reviewer has control of the dialogue.
Post-condition: Move added to the review model.
Permitted proposer responses: claim, context, assumption. Comment
Description: a remark expressing an opinion or reaction.
Supported form:
Visual representation: the symbol (*) attached to the element being commented.
Pre-condition: Reviewer has control of the dialogue.
Post-condition: Move added to the review model.
Permitted proposer responses: claim, strategy, justification, context, assumption, solution, comment.
At the end of each of the proposer's turns, all the elements in the review model must be responded to, and the syntax of the safety case model (that is a connected diagraph with each path ending at least one item of evidence) must be maintained. In addition to the set of move types for the initiation stage, a further move type-withdrawal is available for the revision stage. For each move at this stage, if the move is responding to an existing interaction sequence in the review model, then it is appended to the tail of the sequence. Otherwise the move does not affect the review model.
Withdrawal (P)
Description: Abandon the speaker's commitment to P or express a lack of knowledge about a subject P, of the speaker's own accord or as an answer to a Question, Challenge, or Resolution Demand.
Supported Form:
Precondition: Proposer has control of the dialogue.
Post-condition: P and its associated links are removed from the proposer's store if they are there. Move added to the review model if it is responding to an existing interaction sequence in the review model.
Most move types (i.e. claim, challenge, resolution demand, question, withdrawal) are drawn from the DC of (Mackenzie, 1979) and the DE of (Yuan et al., 2007, 2008). These two models share a relatively long development history, and the Yuan et al. DE model is an example of the DC model's latest evolution. The usage of an actively developed model has the advantage of enjoying an increased familiarity in the computing industry (Maudet & Moore, 2001; Yuan, Moore, Reed, Ravenscroft, & Maudet, 2011), so training time is reduced for system safety engineers who are potential users. Its disadvantages of lacking information-seeking abilities and complex rules can be overcome by introducing moves from other models, and automatic enforcement of the rules by the software can remove the user need to understand the rules beforehand. In order to meet the expressive adequacy criteria of a suitable model for safety argument review, four new move types, namely various forms of ‘Argument Element’, non-binary ‘Question’, ‘Counterargument’, and ‘Comment’ are introduced.
‘Argument Element’, as the replacement of the original DC and DE ‘Statement’ move, is used to reflect different elements in an GSN argument, i.e. goal (claim), strategy, evidence, context, assumption, and justification. The ‘Question’ move type from the original DC and DE models was one of the largest obstacles in information-seeking, as only yes–no questions are allowed. In the proposed model, this has been expanded to include query-ref questions which are more open-ended, as well as binary questions. These information-seeking moves have been introduced from Ravenscroft and Pilkington's (2000) Model. The ‘counter-argument’ move, though simply being an argument, is specially designed for the reviewer to make an argument with an opposite conclusion. A similar definition is made in (Prakken, 2000). The ‘comment’ move can be used by the reviewer to make a remark that does not associate with any of the existing move types. The ‘comment’ move was strongly suggested by the participants of the user evaluations (see Section 6).
The asymmetric nature of the game where participants use different sets of moves and the commitment arrangement where only one commitment store is used for the safety case model is similar to that of Ravenscroft and Pilkington's (2000) model. A review model is introduced to manage all instances of interaction sequences. Alongside the set of GSNs, a set of review notations have been newly designed to visually represent the reviews.
SARM has a number of desirable properties for safety argument review. First, the model incorporates an influential safety argument notation, GSN, which makes it appealing to the end users. Second, the design of visual representation of review move types makes the reviews easy to manage as the moves made by the reviewer are clearly displayed on the safety case model and they will disappear when they have been responded to. Third, the inclusion of a review model which contains all the interaction sequences makes the review process traceable where participants can trace back each thread of discussion. Traceable interactions are important for assurance purposes. Fourth, the model contains a wide range of unique move types that enable the participants to express their views adequately. And finally, the simplicity of the rules poses a light cognitive load to the end users especially in a computational environment (see Section 6).
This, then, is the dialectical model we have proposed for safety argument review. Next, the appropriateness of the proposed dialogue model needs to be established. To enable human participants (e.g. safety engineers and assessors) to operationalise such a dialogue model, computer support is required, e.g. to properly record the interaction history and commitments as assurance evidence. The proposed experimental work required for this, aims at iteratively building a computational realisation of the model and establishing whether the model can readily be used to provide good service to the safety argument review process. A fully functional system, namely, Dialogue-based System Argument Review (DiaSAR), operationalising the proposed model, has been iteratively built at the University of York (Djaelangkara, 2012; Mazurek, Gerasimou, Madan, & Setivarahalli, 2011; Wan, 2010).
The current version of the system has a graphical user-interface that supports multiple user access and a backend database storing the user profiles and the review sessions. There are three types of users of the system: system administrator, argument proposer, and reviewer. The system administrator manages all the users of the system and the review sessions. The argument proposers propose and subsequently defend their arguments. The reviewers criticise the arguments made by the proposer. An example system interface for the proposer can be seen in Figure 3. A proposer can create a new review session which specifies a session name, a proposer, and a reviewer of the session. The interface displays the current status of the session, e.g. the current player, his/her role, and the current step of the review. A proposer can use provided tools (i.e. claim, strategy, evidence, context, assumption, and links) to construct safety arguments following GSN syntax as described in Section 2. There are two views of the arguments: the diagram view as shown in Figure 3 and the dialogue view as shown in Figure 4 which records the dialogue history, commitment stores, and provides a text-based input. The two views are synchronised and designed using the model-view control architecture. It is up to the user to decide which view to interact with. A session can be saved and loaded for further editing. Once it has been done, the turn can be passed to the reviewer and a notification email will be sent to the reviewer. The session will then transit from the proposing state to the reviewing state.
An example user-interface for the proposer. An example dialogue view of the system interface.

A reviewer can log onto the system and load the sessions under review. By right clicking on any elements of the safety argument, a submenu will be available with items for the reviewer to accept, challenge, and question an argument element. A reviewer can also propose a counter-argument. Graphical notations have been developed for the set of the review tools alongside GSNs. Some of them can be seen in Figure 5. For example, a rectangle with dashed borderline represents a counter-argument and a dashed line with an open arrow represents an ‘attacked by’ relation (e.g. the argument ‘Extensive testing shows no occurrence of H2’ is attacked by argument ‘Accident database shows the occurrence of H2’). A question mark represents a question (e.g. Is the analyst experienced?) and an exclamation mark represents a challenge (Why is it the case that hazard 3 is properly addressed?) made by the reviewer. Accepted elements are coloured green and withdrawn elements red. A short vertical line with open arrow at both ends is used to mark the situation where a resolution demand is made to a conflict among two or more elements.
An example user-interface for the reviewer.
Criticisms made to an argument element are recorded by the system as shown in Figure 4. When the review is completed, the reviewer can pass the turn back to the proposer. The session will transit from the review state to the revision state and a notification email will be sent to the proposer. Upon receiving the notification, the proposer can respond to the criticisms and revise the arguments following SARM rules, e.g. to respond to a challenge with a withdrawal, claim, evidence, or strategy. These rules are implemented as submenus attached to a review element where the proposer can select a suitable response. The responses, e.g. newly made claims, strategies, evidence, context, or assumptions, are automatically displayed as part of the safety argument, and the symbol of the review being responded to is removed from the user-interface. Once all the reviews have been responded to and other necessary revisions are done, it can be passed back to the reviewer for a second review. The process goes on until the safety argument has been fully accepted.
The system also provides functions to cater for the usability of the system. For example, the argument pane is resizable and scrollable according to the user's preference, the minus signs to some of the argument elements can be used to minimise all the elements beneath it, and the addition signs can be used to expand the view.
The system has been evaluated at different stages of its development. An initial design was drawn up, based on literature concerning safety argument review and (computational) dialectics, and interviews with two safety engineers at the University of York. This design was implemented as the first software prototype of the safety argument review software (version 1.0). This prototype then underwent a usability evaluation, with four safety engineers and two human–computer interaction (HCI) experts (Wan, 2010), and the outcome from this informed the development of the second prototype (version 2.0) by a group of MSc software engineering students (Mazurek et al., 2011). The second prototype was evaluated by using both user- and expert-based evaluation techniques and the outcome from this informed amendments to the next prototype (version 3.0, the current version of the software as illustrated in Section 5). This was then taken to a set of users and HCI experts for further evaluation, to assess the system's usability and acceptability with the target audience (Djaelangkara, 2012).
Two safety engineers were invited to conduct cooperative evaluations in a controlled environment separately. Each participant was asked to come to a room where the software was ready to be used on a prepared machine. The participants were then briefed about the purpose and procedure of the evaluation and the current version of the software. Each participant was given a set of tasks to perform as both a proposer and a reviewer. The tasks start from the creation of a safety case and continue until the completion of the review of the safety case. Participants are allowed to ask questions and seek clarifications. The experimenter observed the user reactions to using the system. Interviews were then conducted concerning their experience of using the system and improvements that could be made to the functionality of the system. Each evaluation took about one hour. Both users completed the predesigned tasks without difficulty. The comments from their interviews are very positive. Participant 1 thought that the system now has a clear meaning in a way that the reviewing process can be done individually by the proposer and the reviewer. He particularly liked the clarity of the status of the dialogue and the design feature that every action performed has a feedback as a result of which the number of click mistakes would be reduced. Participant 2 liked the idea of maintaining the dialogue history between the proposer and reviewer, and the fact that the database back-end is obvious and that it is now clear how it would be used in practice, in a way that the participant thought it was unclear before.
Two HCI experts were invited to evaluate the learnability of the system, that is, how easily the application can be learned and used by novice users. Each evaluation was conducted in three stages: walkthrough, usability questionnaire, and interview. At the walkthrough stage, participants were given five small tasks to perform on the application. Each task consisted of several steps. For every step executed, they would try to answer the following three questions:
Do you know what to do? Do you see how to do it? Do you understand from the feedback whether the action was correct or not?
The participants were not allowed to ask any questions and the screen actions were recorded. At the usability questionnaire stage, each participant was given 10 minutes to explore the system on his or her own or, if they prefer, to be given a set of tasks to do. They then were asked to complete a usability questionnaire based on Nielsen's heuristics (Dix, Finlay, Abowd, & Beale, 2004). This part of the evaluation was created to identify any usability issues appearing in the software. During the interview stage, participants were asked to provide comments on how the application can be improved to be more easy to use. The evaluation results from the HCI experts are very positive as well. The usability questionnaires scored in the range 4–5 (where 0 means very poor design and 5 perfect) for each of the usability criteria and with an overall average of 4.3.
The participants from both the user and expert evaluations also made a number of suggestions that can be used to improve the functionality and usability of the system and the model. With regard to the model, users would like to see multiple proposers and reviewers in the future. They also suggest that there should be indications to let the user know that the other elements have been affected by a withdrawal of an argument element. In the light of user feedback, the current arrangement appears to be insufficient especially in cases when the withdrawn element is a sole support of another element (say Q) or when an argument element (say R) only supports the withdrawn element. The affected elements should be highlighted for users’ discretion. The user, however, is liable to commit to Q if further support to Q is provided, and commit to R if R is used to support other elements.
Concerning the implementation, a number of functional and usability issues with the system are revealed. Users would like to have a delete function alongside a withdrawal. In the light of feedback, it becomes clear that the two may have different roles to play, i.e. ‘withdrawal’ is part of the model that can be used to remove earlier commitments and ‘delete’ is part of the implementation that can be used to manipulate argument elements within a turn. They also stress the need for presenting the final safety argument diagram without the unnecessary elements that resulted from the reviewing process and making sure that the completed sessions are still editable. Further details can be found in (Djaelangkara, 2012).
Conclusions and further work
The argument-based approach to safety case development has been widely adopted in Europe, and increasingly worldwide (e.g., Australia and Japan) and in a wide variety of domains (including defence, automotive, medical, and rail) (Haddon-Cave, 2009; Yuan & Kelly, 2012). Given the subjective nature of arguments, reviews are always necessary to independently scrutinise and challenge the arguments. Despite the graphical notations (e.g. GSN) that have been developed for the representation of safety arguments, tools for reviewing safety arguments are still lacking. Dialectical models seem to be the natural fit for this purpose. A new dialectical model specially designed for safety argument review has been proposed. A system operationalising the review model has been iteratively constructed and evaluated. The evaluators are in favour of the dialectical approach in safety argument review, particularly in a computational environment. The evaluations provide positive evidence for the usability of the system in general and the review model in particular.
It is believed that the work reported in this paper makes a valuable contribution to the field of safety arguments and dialectics. Concerning the former, we have proposed a new approach for safety argument review and developed a unique system that is ready to be used to facilitate safety engineers and assessors to review safety arguments. User experience of using such a system is essentially favourable. The potential pay-off in expanding computer system safety engineering is enormous. Concerning the latter, we have proposed a new model for safety argument review and this directly contributes to the dialectics. The model has a number of desirable properties that are essential for safety argument review, for example, conformance to GSN, visualised interaction, traceable reviews, expressive adequacy, and user friendly. Typically, the model moves from a mere graphical approach to a dialectical approach to facilitate argument evaluation. To the best of our knowledge, this is the first model that deals with safety argument review in particular and argument review in general. Furthermore, there is great scope for an interesting and fruitful interplay between research within dialectics per se, and research on their utilisation in computer system safety engineering. It is hoped that this paper will move this interplay forward.
There are several ways to carry this work forward. Our immediate work is to address the issues revealed from the evaluation and conduct a large-scale usability evaluation. With a suitable review model framework, the quality of review arguments is not guaranteed as this largely relies on the participants’ strategic wisdom. We are planning to provide users with a software agent, which can detect common argument fallacies (e.g. conflict and circular arguments) in system safety arguments in line with Greenwells, Holloway, and Knight's (2005) identification. We are also planning to investigate the computational use of safety argument schemes (e.g. Yuan & Kelly, 2011) to support safety argument review as each scheme provides a set of critical questions that can be used to evaluate arguments. Furthermore, SARM can also be formally specified in line with (Amgoud, Maudet, & Parsons, 2000; Amgoud, Parsons, & Wooldridge, 2003; Johnson, McBurney, & Parsons, 2003; Kontarinis, Bonzon, Maudet, & Moraitis, 2012; Yuan & Wells, 2013), thus enabling automated analysis of its properties to take place.
Footnotes
Acknowledgements
The authors thank all the participants who took part in the evaluation of the Argument Review Tool as well as research and teaching staff of the Department of Computer Science at the University of York, UK, who have generously donated their time, effort, and knowledge towards the progress of the project.
