Sage Journals: Discover world-class research

Abstract

The literature that is relevant to evaluation of treatment effectiveness is large, scattered and difficult to assemble for appraisal. This scoping review first develops a conceptual framework to help organize the field, and second, uses the framework to appraise early psychosis intervention (EPI) studies. Literature searches were used to identify representative study designs, which were then sorted according to evaluation approach. The groupings provided a conceptual framework upon which a map of the field could be drawn. Key words were cross-checked against definitions in dictionaries of scientific terms and the National Library of Medicine Medical Subject Headings (MeSH) browser. Using the final list of key words as search terms, the EPI evaluation literature was appraised. Experimental studies could be grouped into two classes: efficacy and effectiveness randomized controlled trials. Non-experimental studies could be subgrouped into at least four overlapping categories: clinical epidemiological; health service evaluations; quality assurance studies; and, quasi-experimental assessments of treatment effects. Applying this framework to appraise EPI studies indicated promising evidence for the effectiveness of EPI irrespective of study design type, and a clearer picture of where future evaluation efforts should be focused. Reliance on clinical trials alone will restrict the type of information that can inform clinical practice. There is convergent evidence for the benefits of specialized EPI service functions across a range of study designs. Greater investment in health services research and quality assurance approaches in evaluating EPI effectiveness should be made, which will involve scaling up of study sizes and development of an EPI programme fidelity rating template. The degree of complexity of the evaluation field suggests that greater focus on research methodology in the training of Australasian psychiatrists is urgently needed.

Keywords

early psychosis evaluation methodology study design

Naturalistic study evidence supports the importance of intervention in the early stages of psychotic disorders. Early disease course is the strongest predictor of long-term outcome for psychotic disorders [1]. Remission of the first psychotic episode and avoidance of relapse during the first 2 years of treatment may reduce long-term disability associated with schizophrenia by up to 30% [2]. Conversely, the chances of full recovery decline with each relapse [3], while the likelihood of relapse increases over time [4]. Antipsychotic drugs reduce the risk of relapse in first-episode schizophrenia [5–7] and treatment discontinuation results in high relapse rates [4,8]. More patients respond well to medication in the first episode (approx. 80%) [9–13] compared to subsequent episodes (approx. 50%) [14]; initial response occurs at lower doses of medication [15,16]; and symptoms appear to improve to a greater extent in first-episode patients (e.g. >60%) [17], compared with multi-episode patients (e.g. <16%) [18]. First-episode patients also respond to psy-chosocial treatments in lower ‘doses’, as illustrated by motivational interventions for comorbid substance use in early psychosis, in which a 3 h intervention achieved substantial effects in most patients [19] but protracted intensive programmes in chronic schizophrenia have only a modest effect, and are applicable to only a minority of patients [20–22]. Findings that shorter duration of (initially) untreated psychosis (DUP) predicts better treatment outcomes [23,24] are also consistent with the potential effectiveness of early detection (ED) and intervention strategies.

The strength of this circumstantial evidence (i.e. not from formal evaluation studies) in support of early psychosis intervention (EPI) [25] might suggest that it would be easy to demonstrate the efficacy of EPI in clinical trials (CTs), but this has not been the case. Indeed, the Cochrane Review concluded that there was insufficient or no evidence from randomized control trials (RCTs) to support the benefits of ED or specialist EPI teams for first-episode psychosis [25]. That Cochrane Review is a striking illustration of how few high-quality RCTs of complex mental health interventions are carried out in general. It also raises the question of whether randomized trials are appropriate in the context of EPI because individuals from the same population cannot be randomly exposed to an ED strategy, nor is it ethically appropriate to allocate half of a large group of patients with first-episode psychosis to substantially delayed treatment. If research evidence is to keep pace with increasing demands for better patient outcomes, observational evaluations of intervention effectiveness may have to substitute for, rather than complement, efficacy data from CTs.

There are numerous non-experimental approaches to evaluating treatment effectiveness, but the literature is scattered across methodological domains and is difficult to assemble and appraise. Subtle differences in study designs, even within the CT literature, create difficulty in evaluating the evidence, especially for complex interventions. Also, the sheer volume of observational studies being published, often with varying terminology, leads to problems in simply sorting articles by design or theoretical orientation in the absence of an evaluation framework. Existing evaluation frameworks or evidence hierarchies rank level (based on study design) and quality (methods used to minimize bias) of evidence [26]. Their focus is on experimental methods as the gold standard [27], with little regard for how the evaluation field is theoretically organized or observational studies subclassified. In this review the aim was to develop a conceptual framework reflecting how the published literature is organized that facilitates searching, organizing, and appraising intervention effectiveness studies, across the entire spectrum of experimental and observational methods, and to apply this to an appraisal of the EPI literature.

Methods

Exploratory database searching (Medline Ovid; PsycINFO; Web of Science) of the early psychosis literature (using search terms: early psychosis OR early schizo∗ OR first-episode psychosis OR first-episode schizo∗) was carried out initially using general key words (evaluat∗ OR effective∗). These searches resulted in large numbers of publications reporting a mixture of study designs. Reference lists in these articles showed that the evaluation of EPI drew on a broader literature in mental health. When this broader mental health literature was searched (using key words: psychiat∗ OR psychol∗ OR mental), references listed in those articles identified additional relevant evaluation subspecialties in the general medical and programme evaluation literature. That is, exploratory searching resulted in several thousand reports of evaluations of intervention effectiveness that arose from fields with different theoretical orientations and used study designs that did not lend themselves readily to classification. Hence we decided to adopt the following scoping approach [28] to our synthesis of the literature.

Publications were sorted into groups according to study design or evaluation approach. After culling low-interest articles, two senior authors independently grouped publications according to methodology. The common design characteristic of studies in each group was consensually agreed upon and cross-checked for consistency with the US National Library of Medicine Medical Subject Headings (MeSH) browser (www.nlm.nih.gov/mesh) and against definitions in authoritative dictionaries and encyclopaedias [29–31]. The final list of checked terms describing evaluation approaches was then conceptually sorted into a logical hierarchy in order to determine the most parsimonious conceptual map of the intervention evaluation field, starting with the two broadest study design classes, experimental (CTs) and observational (non-randomized).

Two sorting principles emerged as the most parsimonious to chart the field of evaluation [32] and organize the conceptual map. First, the framework was ordered from left to right along an efficacy versus effectiveness evaluation spectrum. Efficacy refers to whether a treatment can work under ideal conditions. Effectiveness is whether a treatment works in routine clinical settings, and what is the most effective service model to deliver an efficacious treatment. That is, effectiveness encompasses service provider competence, patient adherence and disease coverage, in addition to efficacy of an intervention. Second, observational evaluation subfields were ordered left to right according to the primary focus or unit of analysis: population level, service level, process level (healthcare quality and clinical practice effectiveness), and patient level (which type of patient responds to what treatment programme). Neither sorting principle could be strictly applied because study design types did not always precisely fit either dimension. For instance, health services research and economic evaluation can be based on efficacy or effectiveness designs. Also, in ordering designs according to unit of analysis there was a degree of arbitrariness because all designs ultimately use patient-level data, and it is only the perspective of the researcher that determines whether the primary focus is at the level of the population, service, practice, or patient. When it was difficult to decide where a study design should be located in the evaluation field map, the reliability of the design was used as a secondary ordering principle. The greater the chance of confounding and bias, and lesser opportunity to assess the likelihood of these to affect results, the lower the reliability of study design was ranked. More reliable designs were placed to the left nearer the efficacy end of the map and listed higher in the column hierarchies, creating a top-left (efficacy RCT) to bottom-right (case studies) gradient of robustness of study design.

The conceptual framework generated by this method is shown in Figure 1 and each component further detailed in Table 1. Comprehensive coverage of the evaluation literature drew heavily on manual searches of reference lists, and relied on referencing a large number of books and technical reports.

Figure 1.

Treatment and service evaluation. CPG, clinical practice guidelines; RCT, randomized controlled trial; TQM, total quality management.

Table 1.

General descriptions of the evaluation approaches used to assess intervention effectiveness with notes of relevance to the mental health field.

Experimental (randomized controlled trial)

Clinical trials are experiments in which treatment conditions are under the direct control of the researcher who prospectively observes (i.e. measures) outcomes [29]. In an RCT, subjects are randomized to experimental and control conditions, isolating treatment from confounding effects. Selection biases are minimized by distributing known and unknown pre-treatment confounding variables (mainly baseline patient characteristics) equally across comparison groups [33] by randomization, and/or stratification in the case of known confounding variables. It should be noted that high or differential attrition (dropout) rates across treatment groups may undo the benefits of randomization, and methods for addressing this problem, such as using the last observation carried forward, entail a high risk of biased estimates when attrition rates are high. Even when multivariate approaches are used for estimating missing data (e.g. maximum likelihood estimation) that make use of all available data as estimator variables and are based on linear mixed-effects models [34,35], large samples are required for reliable estimates and results should be subjected to sensitivity analysis (see text for description).

Efficacy RCT

The classical efficacy RCT is the design of choice for comparing simple technologies (e.g. a new drug) against placebo. Double-blind outcome assessment (neither rater nor subject is aware of treatment group membership) is used to remove assessment (rater and subject) bias. It is the most reliable (high internal validity) evaluation design, although some have argued for augmented methodologies to improve generalizability [36] and examine effects of moderator and mediator variables [37,38] .

Likelihood of detecting a treatment effect is maximized by using ideal or best practice treatment conditions, including: strict treatment fidelity monitoring to maximize separation between treatment conditions, strict eligibility criteria to reduce within-group variability, and sensitive research instruments to detect small differences in outcomes.

Many studies have methodological flaws or produce results that may have low generalizability to routine practice (low external validity) because of unrepresentative sampling [39,40] and other biases. In relation to antipsychotic drugs, efficacy RCTs: include <15% of eligible patients [41,42], exclude comorbidity, and severe and non-consenting cases, often have small sample sizes relative to treatment effect size, frequently use inappropriate antipsychotic dosing [43], are associated with high dropout rates even in short duration trials [44], and may exhibit biases related to pharmaceutical industry sponsorship [45,46], although evidence for the last is inconsistent [47].

Complex intervention RCT

A number of phases must occur before a complex intervention can be submitted to an efficacy RCT, including an intensive phase of intervention development to specify (i) the theory, logic, and processes involved [48,49]; and (ii) procedures to document the intervention [50] so that it can be reliably constructed and reproduced as a programme [51–54]. Another difficulty in evaluating the efficacy of complex interventions within a single service is isolating the effects of an experimental service programme to only those patients who are randomly allocated to it because of diffusion of experimental practice to the TAU clinicians. Of course, the many RCTs of complex interventions that compare an experimental programme to TAU, introduce the uncontrolled confounder of comparing protocol-administered with non-protocol-administered TAU.

Cluster RCT (for complex interventions)

A cluster RCT is one in which clusters of individuals, rather than individuals themselves, are randomized to different intervention groups (sometimes called group randomization trials) [55]. Randomization occurs at the level of the service and not the individual patient, and represents an adaptation to enable evaluation of a complex treatment or service model across services [55,56]. Cluster RCTs are possible where different service sites operate two contrasting service models (e.g. services with a specialist EPI team vs services without these teams). Cluster RCTs are costly to run, and are often not applicable to evaluation of mental health services either because the introduction of a new programme is not synchronized and randomized across services, or because conformity of introduction is imposed by jurisdictional direction across all services. Methodological difficulties associated with cluster RCTs are often underestimated and many evaluations using this design have been criticized [57,58].

Effectiveness RCTs

Randomization occurs but treatment is investigated in more representative groups of patients. This is achieved by:. recruiting from multiple sites that are more typical of where treatment is routinely given, including a diagnostically broader group of patients reflecting routine diagnostic standards, and not excluding patients with comorbidity or need for polypharmacy.

The control condition is usually a form of standard treatment rather than a placebo, and patient consent and assent (agreement) are both considered ethically appropriate.

Effectiveness trials often use health outcomes as primary efficacy measures (e.g. treatment discontinuation, functional capacity, quality of life, or mortality) rather than intermediate measures such as change in symptom ratings or blood pressure.

Non-compliance may be deemed an outcome measure, rather than an exit criterion.

The duration of effectiveness trials tends to be longer, aiming to mimic the time over which the effects of treatment would be assessed in routine practice.

To ensure statistical power with the use of blunter outcome measures, and more heterogeneous patient samples recruited from more diverse service settings, effectiveness studies require larger numbers of subjects: typically >1000.

Practical clinical trials

Practical clinical trials combine elements of efficacy and effectiveness trial design and use blind assessments and a broader range of outcome measurement including symptoms, side-effects, and cost-effectiveness [59].

Pragmatic clinical trials

Patients groups are typically defined by presentation rather than diagnosis, being lumped into heterogeneous categories such as ‘high service utilizers’ or ‘deliberate self-harm’.

Pragmatic trials tend to assess clinical dilemmas or strategies rather than specific treatment elements or technologies. For example, does referral to a psychiatrist improve outcomes of treatment-resistant depressed patients compared with their continued management by a general practitioner?

Pragmatic trials deal with each treatment option as a ‘black box’, ignoring the nature of the therapeutic process and the many contextual factors that can influence programme effectiveness [60,61].

Placebos are avoided; less monitoring of treatment fidelity occurs; blind assessment is not carried out; and outcomes are clinically relevant end-points rated without a research instrument (e.g. return to work; readmission to hospital; or treatment discontinuation).

Observational

The researcher observes (and measures) but plays no role in determining treatment. Disease status is considered an outcome; observed treatment is considered an exposure. Longitudinal studies separate out between- and within-person change, whereas cross-sectional studies mix together these types of change.

Clinical epidemiological designs

Inferences about the detection and treatment of disease are made by studying change in one characteristic in relation to change in another in well-designed cohort or case-control designs. Validity relies on avoidance of biase, especially that related to subject selection and attrition [62].

Prevalence studies

Prevalence of a disease is defined as the total number of cases of the disease in a given population at a designated time.

Incidence studies

Incidence of a disease is the number of new cases of a disease during a given period in a specified population, often expressed as number of new cases per year per 100 000 head of population.

Cohort studies

A cohort study identifies a group of people and follows them over a period of time to see how their exposures affect their outcomes. Cohort studies are preferred over case-control studies because case ascertainment and follow-up measurement are prospective [63], and temporal relationships can be examined.

Case-control studies

Case-control studies compare a group of patients who have a condition with a group of patients who do not and usually look back in time to see how the characteristics of the two groups differ. Case-control studies are particularly liable to bias because they tend to be retrospective.

Time series studies

Time series design refers to a single-group research design in which the same variables are measured repeatedly and consecutively over time [64], thereby allowing trends to be detected. Interrupted time series studies compare trends in outcomes or disease over multiple time points before and after the introduction of an intervention or other factor being studied. Policy development often relies upon large national sample time series studies, for instance, to compare variations in the number of patients prescribed antidepressants and the rate of suicide within the same region [65].

Service evaluation

Three subfields of literature can be distinguished, as illustrated in Figure 1: (i) health services research; (ii) programme evaluation; and (iii) economic evaluation. Each subfield has a distinctive tradition, the first developing as a public policy discipline, the second as a technological and professional field in the social sciences, and the third representing the application of economics to health care. In common, these fields use general systems theory to classify health-care organizational components into structures, processes, and outcomes [66]. Structures represent the physical, human, and organizational resource inputs to the health-care system. Processes represent interactions between patients and the health-care system: the activities that take place in the delivery of health care. Outcomes are the results of processes, representing the health benefits to patients of health care that they receive.

Health services research

Health services research examines the environment within which mental health services operate, using data aggregated by service or jurisdiction, usually collected by nationally mandated processes. This field includes a focus on the impact of international [67] and national [68] policy initiatives and funding models on service delivery. To assess the impact of financial and management models metrics include casemix classification, patient outcomes, service activity volumes and efficiency (costs per unit of activity).

Casemix classifications systems

Aim to predict resource utilization of patient subgroups [69], DRGs or ‘risk-adjustment’ procedures (statistical adjustment for variations in patient outcomes determined by the presence of risk factors such as, age, sex, diagnosis, and number of days in hospital recently) are applied to control for pre-treatment differences in patient populations across services. Mental health casemix classifications based on diagnosis have low utility, that is, they poorly predict health outcomes and future service utilization [70,71]. Psychiatric diagnosis alone has been reported to account for as little as 1 % of the variance of resource utilization [72]: the only consistent pre-treatment predictor of future service utilization being past service utilization [70]. The reasons for psychiatric diagnosis predicting so little in this context might include: inaccuracy of routinely recorded diagnosis [73]; serious problems assessing mental health service utilization [74]; and the high probability of very large variability across clinicians in the type of treatment they deliver to patients with the same diagnosis. Practice variation of this kind typically reflects low agreement about, and implementation of, best practice.

Patient outcomes

Defined as the observable ‘effects on the patient's health status that is attributable to an intervention’ [75]. In reality, patient outcomes data in mental health (e.g. HoNOS) [76] mandated for routine collection in Australasia are not analysed using models that partition variance attributable to intervention [71]. As a result, serious questions have been raised about the utility of psychiatric casemix classifications and patient outcomes measurement [77–79], especially when coded by service providers [80,81], despite proposals for reducing bias in outcomes data collection [82,83].

The most serious limitation on the value of casemix and routine patient outcomes measurement in evaluating the effectiveness of health care is the absence of agreed-upon process (reflecting intervention) measurement. There are a number of research tools for measuring mental health service ‘content of care’ [84], some of which distinguish services models [85]. The WHO instrument for classifying mental health-care activities [86] has not been shown to predict patient outcomes. Service volumes (activity levels), without detailed specification of interventions, may not necessarily reflect effectiveness. Moreover, there are no practical systems for classifying mental health services (structural descriptions). An elaborate system for describing mental health services has been developed, called the European Service Mapping Schedule [87,88], but it takes weeks of assessment work per service and appears to result in an unmanageable amount of data. The little research available on health service structural measurement has identified only three variables that predict treatment quality: ‘size of hospital’ (bed and staff number); ‘volume of procedure/ type of patient’; and ‘degree of service specialization’ [89].

The authors could not identify any published health services research of direct relevance to evaluating the effectiveness of EPI, although the feasibility of this approach has been supported [90]. Based on this review, we conclude that to date the field of mental health services research has not been informative for clinicians interested in evaluating service effectiveness and improving the quality of mental health care generally, although there are now outstanding examples in the general medical literature of how it might [91,92]. Far greater investment in service systems research is justified [93].

Programme evaluation

Programme evaluation arose in the social sciences as a methodology to assess the implementation and outcomes of social programmes [94]. Programmes are defined as: ‘a set of planned activities directed toward bringing about a specified change in an identified population’ [95]. Programme evaluation focuses on programme models, which are the primary unit of analysis.

Evaluation approaches include need assessment; specification of programme theory; and process, impact, and economic analyses [94].

A prerequisite to assessing a programme is a formal description of its concept and design, including an explicit logic that links activities to one another and to intended outcomes (i.e. programme theory, logic model, or impact pathway). Process evaluation assesses how the programme is delivered: exactly what the provider did for patients and how well it was done [94,96,97].

Outcome evaluation is concerned with what the results of the programme are.

Programme fidelity (integrity) assesses the degree to which actual practice and service delivery adhere to the implementation protocol originally developed [98], that is, was the programme implemented as planned?

Economic evaluation

Economic evaluation is defined as the comparative analysis of alternative courses of action in terms of both their costs and consequences [99,100]. The reliability of an economic evaluation is mainly determined by the study design, which may be high in an RCT or lower if an observational design is used. Generally standards of economic evaluation of mental health services are low [101,102] and inconclusive because of the use of non-standardized instruments, which prevents systematic review, a prerequisite for establishing evidence-based economics [103]. There are four main types as described here.

Cost analysis examines only costs: direct (costs of providing and receiving health care) and indirect (lost productivity due to a disease).

Cost-effectiveness analysis compares costs (in monetary terms) of producing an outcome in natural units of improved health status.

In cost-utility analysis, health-care outcomes are weighted by preference or utility, such as the quality-adjusted life year (QALY), enabling comparisons across health sectors.

In cost-benefit analysis, health-care outcomes are valued monetarily.

Quality assurance

QA/QI studies can be used to evaluate the effectiveness of clinical practice because they focus on the assessment of processes of health care and clinical practice. Key health-care processes include prevention (health promotion and risk factor screening); access (timely and equitable); assessment (thorough and accurate); treatment (evidence-based and patient-focused); continuity (across transitions in level of care); coordination (across therapists, facilities and treatment modalities); and safety (avoidance of errors and adverse events) [104].

Audit is the primary QA methodology, defined as: ‘an examination or review that established the extent to which a condition, process, or performance conforms to predetermined standards or criteria’[29]. The most serious methodological inadequacies with audit are: limited sample sizes; lack of standard-setting with reference to clinical guidelines; failure to consider patients as populations; failure to sample over an adequate time period; and failure to apply appropriate exclusion criteria [105,106].

Clinical audit is ‘the critical analysis of medical data to improve patient care’ [107] to assess the principal dimensions of quality of clinical practice: effectiveness, appropriateness and efficiency [108]. Clinical audit can reduce clinical risk and improve clinical practice [109] but is often ineffective due to a failure to recognize audit as essentially a scientific process [106,110–113].

Effective audit relies on well-designed routine process assessment. Elaborate research instruments are avoided in favour of using ‘indicators’: simple indirect or partial measures that summarize or act as proxies for complex situations [114], which have been shown to be (i) valid (actually measure what they are supposed to measure), reliable (provide the same value if measured by different people in similar circumstances) and sensitive (able to detect differences in what is measured); (ii) applicable (meaningful measures that average clinicians can easily code in relevant real-world service setting); and importantly (iii) predictive of healthcare quality (patient outcome) [75,115,116].

Measurement of services capacities and patient outcomes alone cannot evaluate health-care quality [48,117,118]. The primary focus of QA is therefore on the monitoring of process indicators [104,119–121], measures that evaluate the quality of interactions between the health-care system and the patient, in particular clinical practice.

Literally hundreds of mental health process indicators have been devised but few validated [104]. Processes range from service volume (e.g. number of outpatients visits), efficiency (e.g. hospital length of stay), treatment continuity (e.g. the percentage of patients seen within 7 days after hospital discharge), and safety (e.g. medication errors, or number of psychiatric inpatients transferred to intensive medical care). Measurement of processes of care has lagged behind casemix and outcomes because process indicators were initially considered too technical for public or regulator use, and the complexities of measuring levels of implementation of evidence-based practices has been beyond individual service provider organizations [122,123]. Also, assessment difficulties arise from the multi-level (service-, clinician-, practice-level) nature of processes [117,124], and the need to capture both health-care processes (e.g. continuity, safety) and discrete elements of treatment contents (medication type and dose) [119]. As well as the psychometric properties of process indicators, validity of measurement also relies on the accuracy of recording by clinicians and the reliability of administrative procedures for their collection and analysis [125].

Quality audit

Assesses quality of health-care processes primarily in terms of the performance of the service as a whole. Quality audit has its roots in management literature [126,127]. It represents a move away from the traditional health-care QA focusing on accreditation and credentialing using external review against formal standards, to incorporating within health-care organizations mechanisms for quality improvement. Measurement focuses on service performance assessment systems.

TQM, sometimes called continuous quality improvement, refers to a multi-modal approach to organization-wide management systems intended to continuously monitor and incrementally improve quality [128,311,312]. Examples of discrete TQM strategies are AR [128] and RCA. AR is structured around the ‘plan, do, observe, reflect’ cycle to combine generation of new knowledge with changing a system [129,130]. RCA traces cause-and-effect event chains to identify contextual root causes that indirectly contribute to the likelihood of the immediate direct cause (often human error) of a problem [131]. A related quality monitoring tool is the balanced scorecard [132], which incorporates a number of simple indicators across multiple domains with an emphasis on management of service activities and costs optimally balanced against competing priorities. Organizations use quality indicators for benchmarking, the comparison of performance, processes, or practices with peer organizations to identify industry best practice [133].

Practice audit

Assesses quality of treatment processes primarily in terms of their adherence/conformity with disorder-specific CPG.

CPG were developed within the QA frameworks of evidence-based medicine [134,135] and clinical governance [136]. In general medicine, clinical audit has been instrumental in identifying major safety concerns [137], one study showing that 16.6% of patients admitted to Australian hospitals suffered an iatrogenic adverse event, and a significant number of these had fatal outcomes [138]. Clinical practice audit and improvement activities are carried out by service providers in multi-centre networks, known as practice-based research networks [139–142].

There are well-established methodologies for designing [143,144], implementing [145], and costing [100] CPG; but much less focus on evaluating CPG implementation and its impact on patient outcomes, especially in mental health [146]. Reliable systems for coding how closely clinical practice conforms with (adheres to) CPG are required to evaluate CPG implementation, systems that are difficult to design in mental health because the CPG appear to have lower implementability [147] and evaluability [148]. When CPG adherence has been coded, the extent to which treatment is CPG adherent is predictive of better practice and patient outcomes [149]. CPG may be operationalized as clinical (care or critical) pathways to facilitate adherence coding.

Clinical pathways define the optimal sequencing and timing of interventions by all staff disciplines for a particular diagnosis or procedure [150–153]. The utility of clinical pathways in mental health has been questioned [154,155]. Another approach to evaluating CPGs is to implement CPG-informed treatment algorithms or protocols and assess their impact on patient outcomes.

Quasi-experimental designs

There are three main quasi-experimental effectiveness evaluation design types: cross-over designs [156]; non-equivalent comparison groups [157]; and single-group pretest-post-Test designs [158,159]. Case studies (case series and case reports) are included here, although some classify these studies separately as either non-experimental [160] or qualitative [64] research. Quasi-experimental design can be applied to any field of evaluation, especially in the context of programme evaluation and quality assurance studies.

Cross-over designs

Cross-over designs involve three waves of measurement in the same group: first, patients are assessed at baseline; second, patients are assessed after receiving treatment; and third, the same patients are assessed again without receiving treatment between the second and third round of assessment (A-B-A reversal design). Two-group cross-over designs in which one group (A-B-A) is contrasted with another (B-A-B) have great internal validity, especially when coupled with appropriate analysis [156].

Non-equivalent group comparisons

Outcomes of patients attending different programmes are compared as groups, in which patient allocation to programmes is not random. A number of types of comparison groups can be used. One involves retrospective comparison of the outcomes of patients who attended a disbanded programme (historical control group) with those of patients who attended a replacement programme (experimental group) within the same service. Alternatively, outcomes of patients attending two contemporaneous programmes operating in parallel, either in the same or different services, are prospectively compared. Another approach is to compare patient outcomes to ratings on a large reference group or the general population.

Single-group pretest-post-test design

This is the most frequently reported quasi-experimental study design in which ratings before and after treatment in the same patients are compared (also called a before-and-after study). Because no control group is used, this design has lower internal validity [64,159]. Reliability of the single-group pre-post-test design can be improved by increasing the number of before and after observations (time series design). The utility of small group studies can be enhanced by assessing pretest-post-test differences with the reliable change index [158,161,162].

Single-group post-test design

Only outcomes after an intervention are recorded in a case series, so no comparison is made.

Case studies

In the era of evidence-based medicine in which meta-analysis of RCT data is the only level 1 evidence [143], case studies (uncontrolled case-based research) have been relegated to the lowest form of evaluative evidence [163]. Some argue, however, that important evidence may arise from case series research in relation to safety [164], especially with large samples [165], the natural history of disease [166,167], and theory development [160]. In psychiatry, the lowly case study [168–170] has been strongly defended especially in fields of treatment innovation where little experimental research has been carried out [171].

Secondary (meta) evaluation

Methodologies for carrying out secondary evaluation of one or more primary evaluation studies.

Strength of evidence assessment

This is the semi-quantitative assessment of the strength of evidence in favour of the effectiveness of an intervention based on ratings of three dimensions: level of evidence (reflecting the degree to which bias has been excluded by study design); quality of evidence (reflecting vigour of methods used to minimize bias in the effect being measured by a study design); and statistical precision (as indicated by the size of the confidence interval) [143]. Size and clinical relevance of an effect is also evaluated.

Meta-analysis

This is a quantitative methodology for combining the results of several studies that measure the same phenomenon by identification of a common measure of effect size. The average effect size across all studies is computed as a weighted mean. Larger studies and studies with less random variation are given greater weight than smaller ones. Meta-analysis includes specified procedures for literature searching, study selection, data validation and synthesis. Results are usually shown as a forest plot (‘blobogram’).

It is worthy of note that although meta-analysis of RCTs may assist in diluting the effects of some of the shortcomings of RCTs, meta-analysis does not always ensure best evidence, as illustrated by a quantitative review of RCTs comparing depot with oral antipsychotic medication [172]. This meta-analysis showed little advantage for using depot medication. But the patients in the studies reviewed had consented to be randomized to either oral or depot formulation and they therefore represented clinical samples with little indication for depot use and were thereby unlikely to show additional benefit for it over oral medication. Such issues highlight the need to carefully check RCT study designs against the CONSORT study quality criteria [173,174].

Global evidence mapping

A semi-quantitative research methodology for objectively and systematically defining, retrieving, evaluating, and summarizing evidence across an entire research field. It combines systematic and scoping reviewing methods to describe a broad field of interest in terms of volume, nature and characteristics of relevant knowledge [176,177]. The results are stored in an interactive database, which creates global evidence maps integration (www.evidencemap.org).

AR, action research; CPG, clinical practice guideline; DRG, diagnosis-related group; EPI, early psychosis intervention; HoNOS, Health of the Nation Outcome Rating Scales; QA, quality assurance; Ql, quality improvement; RCA, root cause analysis; RCT, randomized controlled trial; TAU, treatment as usual; TQM, total quality management; WHO, World Health Organization.

We became aware of some fields of literature (e.g. evidence mapping) [32,176,177] only by internet searches (e.g. Centre for Reviews and Dissemination, www.york.ac.uk/inst/crd; Global Evidence Mapping Initiative, www.evidencemap.org). The framework shows that treatment and service evaluation draws upon diverse theoretical traditions. CTs (randomized) are primarily rooted in the evidence-based medicine literature [134,178–180], while observational (non-randomized) designs appear to encompass: (i) population-based case-control and cohort studies based in clinical epidemiology and public health fields [181,182]; (ii) service model evaluation from health services research [183,184], programme evaluation [94,95,185], and economic evaluation [99,100,103]; (iii) quality assurance (QA) studies with their origins in the management [126,127] and audit literature [110]; and (iv) quasi-experimental assessments of treatment effects from the social sciences [64]. Each of the main evaluation fields contains a number of distinct subfields (Figure 1).

Using terms listed in the conceptual map one at a time, the EPI literature was searched again (limitations: English language; years, 1970–2008). The three databases (Web of Science, Medline and PsychlNFO) produced different but overlapping listings. After grouping publication according to our conceptual framework, leading examples of evaluation methodologies (studies highly cited or using a design of high relevance) were selected for inclusion in the scoping review that appears in the following section. In accordance with scoping principles [186], breadth and comprehensiveness were considered more important than depth and study quality.

Results

Overview of evaluation approaches and an appraisal of the evidence for the effectiveness of EPI

This section summarizes the different intervention evaluation designs and illustrates their limitations and advantages by appraising representative studies from the EPI literature. Studies will be reviewed under the two broadest subheadings: experimental (randomized CTs) and observational (non-randomized designs).

Experimental (randomized controlled, or clinical, trials) study designs

Two types of CTs can be distinguished: efficacy and effectiveness RCT (Table 1; Figure 1). In relation to EPI, there are a number of published efficacy RCTs of antipsychotic medication in early psychosis patients [187–191], and prodromal patients [192,193]. Taken together, these studies indicate that early psychosis patients (i) require lower doses of antipsychotic drugs and are more sensitive to both extrapyra-midal [194,195] and metabolic [196] side-effects than multi-episode patients; and (ii) less consistently, report a modest advantage in favour of the use of second-generation drugs in acute [188] and maintenance treatment [190,197]. There is little evidence of advantage in using clo-zapine as a first-line treatment in these patients [191], unless treatment resistance is evident [198].

Although the evaluation of a single treatment element, such as a new medication, lends itself well to efficacy RCT designs, the evaluation of multi-component complex interventions, such as EPI, is problematic especially because clinicians and researchers may not have fully defined and developed the intervention [51]. The evaluation of complex interventions lends itself well to cluster RCT designs (Table 1) [55,56], in which randomization occurs at the level of the service and not the individual patient. There are two published EPI cluster RCTs. One randomized general practitioner (GP) practices (thereby clustering patients by practice) to GP education and access to an early assessment team or treatment as usual (TAU), and showed that ED procedures resulted in higher patient referral rates but did not reduce DUP overall [199]. The other cluster RCT also involves evaluating education of GPs in detection of first-episode psychosis [200], final results to be reported. While economic evaluation and evaluating the delivery and organization of health services are most reliably carried out using RCT designs, the bulk of the literature on service systems and economic evaluation in mental health is non-experimental, and hence this subfield will be mainly covered under observational study designs.

In the first quantitative review of efficacy RCTs for EPI as a complex intervention, Marshall and Rathbone identified 65 candidate studies, 58 of which were excluded for methodological reasons [25]. Meta-analysis (Table 1) of the seven eligible studies was not possible because of insufficient comparability across studies of the types of EPI or control services being trialled. (A meta-analysis of EPI evaluation studies has more recently been published but that review combined RCTs with non-randomized studies and hence its results are difficult to interpret [201]). An important point made by Marshall and Rathbone was that the complexity of EPI makes it likely that no two specialized teams will be identical, drawing attention to the urgent need for the development of a validated EPI programme fidelity rating template that can reliably measure the extent to which individual components of EPI are represented in programmes under evaluation. Although approaches to the implementation of EPI have been described [202], most evaluations of EPI have not applied the degree of programme development and description recommended for complex interventions [52]. The importance of precise programme specification is illustrated by comparing the findings of a Norwegian RCT of EPI [203] with those of the Danish intensive EPI program (OPUS) Study [204,205]. Although both studies named their EPI service model ‘integrated treatment’, very different outcomes were reported: one study found a large effect in favour of EPI at 2 year follow up [203], while the other found an inconclusive effect in favour of EPI at 2 year follow up using un-blinded assessment [204], and no apparent efficacy advantage for EPI at 5 year follow up using blinded assessments [205]. When programme descriptions for integrated treatment are compared across the two studies, differences are apparent, including programme duration (18 months in the Danish study and 2 years in the Norwegian study).

In contrast to efficacy RCTs, an effectiveness RCT aims to provide results that are more readily applicable to real-world treatment settings [206,207]. Their characteristics are listed in Table 1. Two types of effectiveness RCTs can be distinguished (Table 1): practical CTs and pragmatic CTs [208]. The Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) study is illustrative of the strengths and limitations of the practical CT [18]. CATIE's strengths were: recruitment of a very large sample (n=1493); its duration (18 months); blind assessment; and the use of a simple primary efficacy measure, all-cause treatment discontinuation. Its limitations included potential bias [209], failing to genuinely reflect real-world practice [210], a high dropout rate (the overall discontinuation rate was 74% within 18 months) and a high cost (in the order of AU$50m). Notably, CATIE found no effectiveness advantage for second-generation antipsychotics (SGAs) compared to first-generation antipsychotics (apart from clozapine), in contrast to meta-analysed efficacy RCT data from short-term studies showing a modest efficacy advantage for at least some SGAs [211,212]. Loss of efficacy in effectiveness trials is mainly accounted for by reduced patient adherence or clinician competence. In CATIE, patient adherence was not monitored stringently, and if the alarming rates of untreated physical comorbidity are any indication [213], there are questions to be raised about clinician competence as well. Thus, a single practical CT on the scale of CATIE does not provide unequivocal answers even about one element of treatment, such as antipsychotic drug treatment.

So far, the only practical CT involving early psychosis patients is the Comparison of Atypicals in First-Episode Psychosis (CAFÉ) study [214]. This double-blind multi-site RCT randomized 400 early psychosis patients to daily divided doses of olanzapine, quetiapine or risperidone and followed them for 12 months. Like CATIE, all-cause treatment discontinuation was the primary outcome measure. By 12 months, treatment discontinuation rates were approximately 70% and no drug differences were identified. That study highlights the higher rates of dissatisfaction with first-line antipsychotic drugs in early psychosis patients compared to CATIE, in which similar discontinuation rates took 18 months to occur. CAFÉ also confirmed appropriate maintenance dosing levels for early psychosis patients for whom modal doses were 10 mg olanzapine, 500 mg quetiapine, and 2 mg risperidone (only 11% of the patients on risperidone reaching the maximum allowable dose of 4 mg). Again, extremely high discontinuation rates raise the question of whether CAFÉ truly mirrored routine practice. Given the burdensome secondary effectiveness assessments (e.g. Positive and Negative Syndrome Scale (PANSS) and Quality of Life ratings) and twice-daily dosing, perhaps these procedures contributed to the high discontinuation rates and blunted the ability of the primary effectiveness measure to distinguish different treatments.

The alternative type of effectiveness RCT is called the pragmatic CT, designed to be extremely simple and flexible (e.g. use un-blinded assessment), so that a very large number of patients can be recruited quickly [215,216]. Their characteristics are described in Table 1. The first application of the pragmatic CT design to EPI, the European First-episode Schizophrenia Trial (EUFEST) [17], became highly controversial [217]. Fifty centres located in 13 European countries and Israel participated in EUFEST, which recruited 498 first-episode schizophrenia spectrum patients out of 1047 patients assessed for eligibility. Patients were randomly allocated to haloperidol (1–4 mg daily), amisul-pride (200–800 mg daily), olanzapine (5–20 mg daily) or quetiapine (200–750 mg daily). The primary effectiveness measure was all-cause discontinuation within the first 12 months of treatment. Because EUFEST was a pragmatic CT aiming to reflect routine clinical practice, treating clinicians were not blinded to treatment, and after training and calibration they carried out the effectiveness and safety assessments un-blinded, which included deciding whether and when the patient had discontinued treatment as well as rating the PANSS. These design features of the EUFEST pragmatic CT seemed unremarkable until the results of the study were known. At 12 months follow up EUFEST found that the percentage of patients who discontinued treatment for any cause was 40% for amisulpride, 33% for olanzapine, 53% for quetiapine, whereas for haloperidol it was a significantly higher 72% [17]. At 12 months follow up, the key secondary effectiveness measure (PANSS ratings) did not distinguish the different drug treatments.

Why did the EUFEST study demonstrate discontinuation rates for haloperidol comparable to CATIE yet find such low rates of discontinuation with SGAs? Some have attributed that study outcome to bias related to un-blinded assessments by clinicians who may have favoured use of SGAs in early psychosis patients [217], a criticism that has been defended by the EUFEST group (figure 4) [17]. Although this controversy illustrates an important limitation of the pragmatic CT design, perhaps the most remarkable finding from the EUFEST study of relevance to appraising the effectiveness of EPI was the extremely favourable symptom response to all first-line drug treatments: more than 60% reduction in total PANSS ratings at 12 months follow up.

Observational (non-randomized) study designs

The argument for shifting away from sole reliance on RCT designs in assessing the strength of evidence for intervention effectiveness assumes that non-experimental studies show equivalent scientific rigour. Therefore, it is essential to appreciate the limitations of the different types of observational designs, sources of bias and confounding, and the methods to minimize their effects. We therefore, in some detail, now describe these issues for the non-specialist.

When experimental studies are not ethical or not feasible, the effects of treatments are examined in observational studies in which the researcher does not control treatment conditions and simply observes (measures) outcomes and their associations with treatment exposure. Without randomization to ensure that comparable groups are contrasted under competing treatments, observational study designs that directly assess treatment effects are prone to selection bias (pre-intervention patient differences across groups that affect outcome). This bias results from doctor- or patient-determined treatment group allocation. Although attempts can be made to control for known (observed and measured) confounding variables that cause overt bias (e.g. by pairwise matching or subgroup stratification of comparison groups, or by statistical covariance adjustments of differences in baseline patient characteristics), these procedures do not control for unknown (not observed and not measured) baseline confounders that could result in hidden bias.

Among the many types of biases and confounding that can affect observational treatment evaluation studies, is the important concept of ‘confounding by indication’, bias associated with treatment allocation according to whether it is indicated. For example, the treatment of patients with chronic treatment resistance at baseline may include (have an indication for) intensive family intervention, while patients with a history of brief acute illness at baseline may not require intensive family intervention (because it is not indicated). If the two patient groups are compared, a spurious association between intensive family intervention and poorer patient outcome may be found at follow up. Subgroups of patients may be matched on a single obvious baseline confounder (called a covariate), such as pre-treatment disease chronicity. This approach becomes impractical when there are many covariates, as is the case with psychotic patient samples, when multivariate approaches to matching are required. An example of multivariate matching uses ‘propensity scores’. The propensity (to be selected into one treatment group or the other) score [218,219] is estimated using logistic regression to combine into a single score all measured covariates that predict the binary category, treatment versus control group, which is then applied to statistically match treatment groups (after the effectiveness of the score in matching groups is checked). Propensity scores do not control for unknown confounders and some argue that they add little to simpler multivariate approaches to matching [220,221], although they may be useful for stratifying a sample to explore dose-response relationships [222] or interactions between confounding covariates [221].

A more robust approach to addressing confounding by indication uses an ‘instrumental variable’, defined as a factor that strongly affects the likelihood of being exposed to treatment but does not affect the outcome of that treatment [223,224]. Randomization itself functions as a binary ‘instrumental variable’, determining who is and is not treated without influencing treatment outcome. For example, in an observational evaluation of specialist cardiology treatment, Stukel et al. created an instrumental variable by dividing the total patient sample equally into those living a long distance away from a specialist hospital, versus those living close to a specialist hospital [225]. The distance between a patient's residential address and a specialist hospital was shown to strongly affect the likelihood of receiving specialist treatment without directly affecting treatment outcome, thereby satisfying the definition for an instrumental variable [226]. This instrumental variable may also be applicable to mental health evaluation [227].

Because most observational studies of treatment over time measure outcome in response to exposure to treatment, three methodological problems are introduced. First, longitudinal interventional data are correlated (clustered) observations repeated through time on the same patient or in the same setting. Second, longitudinal data are often highly unbalanced in the sense that an equal number of measurements tend not to be available for all subjects and/or measurements are not taken at fixed time points; such data cannot be analysed using simple multivariate regression techniques [35]. Third, longitudinal data are typically very prone to incompleteness due to subject dropout, missing data, or changes in assessment procedures over time. All three challenges to interpretability can be addressed using mixed-effects [34,35,63] or multilevel [228,229] approaches to analysis. These analytic approaches include maximum likelihood estimation of missing data that is considered optimal [230], and flexible procedures for analysing variance-covariance structures in longitudinal data [35,63,175].

When concern remains that treatment and control groups are not comparable on unmeasured covariates prior to treatment in an observational study, sensitivity analysis can be used to estimate the magnitude of a hidden bias that would be needed to be present to explain the treatment effect observed. In sensitivity analysis, deviation of the odds of receiving treatment by chance is calculated for subjects matched on a confounding covariate. A study is considered highly insensitive to bias if only a very large deviation from chance allocation to treatment group could explain the observed association between treatment and outcome. As well as determining the likelihood that treatment effects could be accounted for by confounding by unmeasured covariates [63,231], ideally sensitivity analysis should be used to examine the robustness of the assumptions underlying the modelling approach to managing missing data [35,232,233].

Notwithstanding the merits of the aforementioned approaches to confounding in observational studies, it must be acknowledged that even the most carefully designed studies will have weaknesses, and conclusions should not be drawn from a single study. Replication is necessary. Despite these caveats, the results from observational studies have been largely consistent with the results of randomized trials [234,235]. We now review the four subfields of observational studies identified.

Clinical epidemiological designs

Clinical epidemiology applies the principles and methods of epidemiology to evaluate disease detection technology and clinical treatment in patients (Table 1). Non-interventional epidemiological studies of psychotic disorders are relevant to evaluating EPI. The World Health Organization Determinants of Outcome of Severe Mental Disorders (WHO DoSMeD) or Ten-Country Study was the first population-based study to provide reliable estimates for the 15–54 year age range of the incidence of narrowly defined schizophrenia (7–14 cases per 100 000 per annum) and broadly defined non-affective psychotic disorder (16–42 cases per 100 000 per annum) that could be used as benchmarks for evaluating case detection rates of EPI programmes [236]. Less reliable estimates of the incidence of affective psychosis (7.7–10.6 per 100 000 per annum [237]) and bipolar I disorder including ‘non-psychotic’ cases (10.8–20.8 per 100 000 per annum [238]) provide benchmarks for EPI programmes that recruit patients with affective psychosis. Recent prevalence studies highlight the large number of patients with diagnosable psychotic disorder who may never have received treatment. For example, when comprehensive case-finding procedures are used the reported prevalence rates for schizophrenia (10 per 1000) and non-affective psychosis (22.9 per 1000) [239] are more than double the rates generally reported in studies relying on case finding via health services contact only [240]. Taken together these epidemiological studies indicate that simply screening first service-presentations for non-affective and affective psychosis, without any additional case-finding procedures, should identify more than 30 incident cases per 100 000 population per annum [241]; comprehensive case-finding procedures could potentially double that figure.

Cohort studies of first-presentation psychosis have also provided designers of EPI programmes with valuable information, including (i) >75% of early psychosis patients will present in the 15–30 year age range [236,237]; (ii) diagnosis is unreliable at first presentation [73] and may only be accurate with longitudinal assessment and review of all available sources of data [239]; and (iii) acute-onset transient psychosis (DSM schizophreniform disorder or brief psychosis) is particularly diagnostically unstable, with male subjects, especially those with premorbid dysfunction, tending to be re-diagnosed with schizophrenia [242] and female subjects re-diagnosed with bipolar disorder [243,244]. The British AESOP (Aetiology and Ethnicity of Schizophrenia and Other Psychoses) first-onset psychosis study provided information about the sort of diagnostic mix of patients that a well-established EPI programme is likely to recruit [237] and how to design ED strategies for ethnic minorities [245,246]. Other cohort studies such as the German ABC (Age, Beginning, Course) Schizophrenia Study demonstrated in schizophrenia that there is on average a 4 year history of functional and symptomatic decline prior to onset of first psychotic symptoms, while psychotic symptoms are apparent only for the 12 months prior to hospital admission and diagnosis [247,248].

Clinical epidemiological principles are especially relevant to ED strategies. ED of psychotic disorder, especially prior to onset of currently diagnosable illness, should be informed by the general medical literature concerning possibilities of spurious associations between earlier diagnosis and better outcomes [62,249], and the poor feasibility of accurate screening tests for low-prevalence disorders [250,251]. Even if improvements in ED strategies do not achieve reductions in DUP, there is evidence that they will lead to a greater proportion of patients being treated (disease coverage) because these strategies identify more untreated prevalent cases [252], an improvement in effectiveness in its own right. Nonetheless, the results of cohort studies describing the early course of schizophrenia [247,248] are consistent with evidence that much of the trajectory of morbidity is established prior to the onset of psychotic symptoms [253–256], and that to prevent this morbidity we must develop improved clinical services and assessment technology [257,258].

Service evaluation studies

Health services research can be defined in a number of ways [259,260], but narrowly it is ‘the field of enquiry that examines the impact of the organization, financing, and management of health care services on the delivery, quality, cost, access to, and outcomes of health services’ (cited in [261]). In mental health, it had its roots in concerns about the failure to develop effective community-based systems of care [183,184]. The scope of service evaluation is broad, spanning the environment, structure and performance of health-care organizations through to the effectiveness of programmatic interventions [262]. Although some view all types of intervention evaluation as a form of service research, we have restricted this area to a subfield in which the evaluation focus is primarily on systems of health care, with health services or component programme models as the smallest unit of analysis. Further details about the three study types (health services research, programme evaluation, and economic evaluation) are contained in the Table 1.

A major difficulty with service evaluation is defining elements of the therapeutic programme and maintaining its fidelity. In mental health service evaluation, programme fidelity first emerged as a major issue in relation to assessing the effectiveness of assertive community treatment (ACT) [263–266]. Problems with programme fidelity were suggested when advantage for ACT over standard case management was not consistently found across studies with adequate power. Whether the form of case management evaluated by the UK 700 Group was genuinely a form of ACT was questioned [267]; the findings of the PRiSM (Psychiatric Research in Service Measurement) Psychosis Study were criticized because no attempt was made to ensure fidelity of the service model [268]. This literature highlighted the need for programme fidelity rating scales [48,117,120,269,270] that (i) assess the extent to which the ‘active ingredients’ of a service model or intervention are implemented; (ii) differentiate interventions with flawed logic and processes from evaluations with defective methodology [49]; and (iii) comprehensively capture programme components at both service, practice, and patient level [119,124]. This research also encouraged the development of service model fidelity scales based on programme template tools for evaluating programme content [271], for assessing treatment strength [272,273]. and structured approaches to assessing treatment content [84]. To date, fidelity scales have been developed for measuring service model adherence to ACT [274–276]; psychiatric rehabilitation [277]; integrated services for mental health and substance use comorbidity [274,278–280]; family psychoeducation [281]; and motivational interviewing [282]. Other examples of service model fidelity scales rate child and family programmes for adherence to specific treatment principles, either by staff interviews [283], or programme activity observation [284], or by assessment of observed clinical practice [285].

By contrast, it is notable that programme evaluation approaches have been little used in the EPI field. This is particularly the case in relation to the development of service model fidelity scales. We could not find a single published service model evaluation scale in the EPI literature, although the essential elements of EPI have been agreed to by consensus [286]. In their Cochrane Review cited previously, Marshall and Rathbone drew attention to the urgent need for identifying the treatment components that distinguish EPI from standard service delivery, and developing methodology to measure the implementation of these programme components in the future evaluation of EPI programmes irrespective of study design [25].

Interest in economic evaluation of EPI [287] was stimulated by evidence that patients receiving EPI required fewer inpatient days compared to TAU [204,205,288,289]. The first cost-effectiveness study, conducted at the Early Psychosis Prevention and Intervention Centre (EPPIC) in Melbourne, reported that the average cost per unit of symptomatic improvement for patients receiving EPI (EPPIC patients) was $AU16 964, while for patients receiving TAU (before EPPIC) it was $AU24 074 [290]. The Spanish PSICOST (a mental health service research group) study found that direct health-care costs for first-episode schizophrenia during the first 3 years of treatment were significantly reduced for patients receiving intensive community care versus those who did not, despite there being no differences in health outcomes [291]. Other cost-effectiveness studies in England and Sweden also found that total costs for patients receiving EPI were lower compared to alternative service models, mainly due to lower inpatient costs [292,293]. Medium-term cost savings were associated with EPI at 3 year follow up in a Canadian service [294]. A recent long-term follow up (approx. 8 years after first treatment) of patients who received EPI at EPPIC for the first 2 years of their treatment, found that the EPI patients were symptomatically better at follow up compared to historic control patients who did not receive EPI, and that their treatment costs per year were lower ($AU3445 vs AU$9503) [295]. Although all of these economic evaluations of EPI were based on quasi-experimental designs, the large size of the cost differences, irrespective of the type of comparison group used or study country of origin, strongly suggests that EPI may cost less than TAU [296].

Quality assurance studies

Health-care quality, ‘the degree to which health services for individuals and populations increase the likelihood of desired health outcomes and are consistent with current professional knowledge’, frequently falls short of evidence-based standards [297] despite effective ways to improve it [298,299]. These concerns particularly apply to mental health care [104,300,301]. QA is the ‘system of procedures, checks, audits, and corrective actions to ensure that all research, testing, monitoring, sampling, analysis, and other technical and reporting activities are of the highest achievable quality’ [29]. Quality improvement (QI) involves continual performance measurement, incrementally improving service delivery, and revising (upwards) standards in the pursuit of best practice [302]. Details concerning QA/QI, including its two main methodological forms, quality audit and practice audit, are outlined in the Table 1.

Despite the formidable practical and methodological challenges associated with QA studies, they uniquely offer the opportunity to evaluate health care as it is routinely practised on real-world patient populations without the constraints imposed by individual patient and/or staff informed consent [303,304]. Multi-centre QA studies also permit examination of the influence of service context on treatment effectiveness, of particular relevance to EPI, which relies on the optimal combination of service function and clinical practice. Multilevel modelling approaches (mixed models, multilevel or growth curve modelling) can analyse data collected on ‘units’ hierarchically nested within other units: mental health services grouped (nested or clustered) within jurisdictions; clinicians grouped within services (or programmes); patients grouped under individual clinicians; and repeated observations measured through time clustered for each patient. Multilevel modelling efficiently corrects standard errors (underestimated by ordinary linear regression) related to data correlations due to clustering, and partitions variance independently at each level of measurement [31,228,229,305,306].

Quality audit tends to use indicators broadly applicable across health-care fields (e.g. medication error rates). Quality (performance) indicators are ‘operationally-defined indirect measures of selected aspects of a system which gives some indication of how far it conforms to its intended purpose’ [307]. Practice audit tends to use indicators of evidence-based practice in relation to a specified health-care professional group for a defined patient population eligible to receive that practice (e.g. dose of antipsychotic medication in schizophrenia). Whether evaluation of these clinical processes is from the perspective of service performance and manager concerns (top-down) or from the perspective of evidence-based practice and practitioner concerns (bottom-up), the evaluation focus is on what happens to individual patients, the microsystem of clinician-patient interactions. Although patient-level data may be used and the individual patient is the unit of analysis, the primary focus of QA studies is on health-care or treatment processes and is mostly published in the quality rather than clinical literature.

There are few examples of the application of quality audit-based design in mental healthcare effectiveness evaluation [308,309–312]. Benchmarking is an emerging mental health service activity in Australasia [125]. The balanced scorecard has been applied to generic services [313], and was used to show that introduction of a specialist EPI increased the rate of treated psychosis [314,315] and reduced treatment default [315]. Consensus performance indicators for EPI programmes have been published [316]. Performance measurement has added to the evidence that EPI programmes increase service recruitment of first-episode psychosis [309,317,318], improve pathways to care [319,320], increase retention [318], and are associated with better health outcomes [321]. This literature indicates that simply instituting a service-wide flagging system for early psychosis patients increases recognition and intervention effectiveness [322]. Despite the potential for quality audit to evaluate programmes, however, to date its application to EPI has been limited.

Practice audit studies determine the conformity of clinical practice with clinical practice guidelines (CPG) (Table 1). CPG are defined as ‘systematically developed statements to assist practitioner and patient decisions about appropriate health care for specific circumstances’ [323]. They represent statements of best practice based on systematic review of published research studies on the outcomes of treatment and other health-care procedures. The first evaluation of the feasibility of measuring CPG adherence in psychotic disorders was the US Schizophrenia Patient Outcomes Research Team (PORT) studies, which found that 29% of patients received CPG-adherent doses of antipsychotic, 22% of eligible patients received vocational rehabilitation, and only 10% of available family members received psychoeducation [324]. A recent review of CPG adherence in mental health found that (i) a mere 1% of the total published literature on CPG pertained to mental health; (ii) in non-experimental studies the overall rate of adequate CPG adherence was low (27%); and (iii) in the rare study that actually examined whether the level of adherence predicted patient outcome, it did [325].

In 1998 the National Early Psychosis Project produced the Australian Clinical Practice Guidelines for Early Psychosis (EPPIC State-wide Services, Melbourne). This created an opportunity to evaluate CPG for early psychosis by operationalizing the Guidelines in terms of process indicators of CPG-adherence or clinical pathways. To date only one study in Australia using CPG-adherence indicators has been published [326]. There are two pilot studies on clinical pathways for early psychosis [327,328]. These reports show that designing CPG-adherence indicators for file audit is feasible, while successful application of clinical pathways for early psychosis is highly dependent upon implementation strategy and local service factors. A number of EPI clinical practice improvement networks have been established to monitor CPG-adherence and patient outcomes in England (National EDEN Project, a component of the NIHR Mental Health Research Network) [329]; in London (London Early Intervention Research Network: LEIRN) [330]; and in Australia (NHMRC Clinical Practice Network for Early Psychosis: CPIN-EP) [331]; but evaluation findings are yet to be published. Taken together, the literature indicates very limited application of QA study designs to the evaluation of the effectiveness of EPI and suggests considerable opportunity for further development in this field.

Quasi-experimental comparisons of treatment groups

Quasi-experimental design (Table 1) is one in which the researcher lacks control over the allocation or timing of intervention but nonetheless conducts the study as if it were an experiment, allocating subjects to groups [29]. As with CTs, the primary focus of quasi-experimental studies is whether patients have changed as a result of receiving treatment. That is, the individual patient is the unit of analysis. In this paper the quasi-experimental design is considered as a separate evaluative subfield because numerically it formed the bulk of the published literature in the mental health field and did not fit well into other sub-fields reviewed in the previous sections.

The strengths and weaknesses of the quasi-experimental design is well illustrated by the Schizophrenia Outpatient Health Outcomes (SOHO) study [332]. That 3 year prospective study of antipsychotic medication included >10 000 outpatients recruited in 10 European countries. SOHO provided useful post-marketing surveillance safety data in a large representative sample of patients with schizophrenia, SOHO highlighted the high prevalence (approx. 80%) of depressive symptoms [332], amenorrhoea (approx. 30% of female subjects), and impotence and loss of libido (40–50% of male subjects) [333]. The Sotto Study, in which clinicians could flexibly determine drug type and dose, found surprisingly high rates of treatment continuation (88.5% after 6 months), most patients staying on the same medication they started at baseline [332]. A total of 70% of first-episode patients with schizophrenia in SOHO responded very well to medication by 6 months and, notably, all patients with substance abuse at baseline ceased abuse after 6 months of drug treatment without specific psychosocial intervention [334], a replicated result [194]. SOHO, however, failed to achieve its primary aim of assessing relative effectiveness of different antipsychotic drugs because of its non-randomized design despite extensive use of sensitivity analysis [332].

There are hundreds of examples of the use of a non-equivalent comparison group in evaluation studies of EPI (including all of the economic evaluations and QA studies reviewed here). Therefore, only a few illustrative studies will be considered here. In the first, McGorry et al. reported more rapid reduction of negative symptoms in patients attending the EPPIC programme, compared with a carefully matched group of patients who attended the same service prior to commencement of EPPIC [335]. The apparent advantage for EPI, however, could not be related to earlier intervention because DUP was similar in both the pre-EPPIC and EPPIC patient groups [335]. There are many subsequent evaluations using historical comparison groups in the EPI literature [336–338]. There are also numerous studies using parallel contemporaneous programmes in non-equivalent group comparisons [318,339,340]. In addition, general population [341] or representative age-matched clinical samples [318] have been used as non-equivalent reference groups. Bias and confounding cannot be excluded from these designs, as is the case with the multitude of simple pretest-post-test evaluations of EPI [194]. It is rarely feasible to apply the more sophisticated cross-over and time series study designs to evaluating EPI. Non-experimental case studies and qualitative research are also sparingly published in the EPI field, despite their potential for methodological rigor [342] and for identifying treatment processes [343].

The Early Treatment and Identification of Psychosis Study (TIPS) is the most outstanding example of quasi-experimental evaluation of EPI [337,338,344–348]. TIPS set out to determine whether ED can shorten DUP and, in turn, improve patient outcomes. The ED strategy had two major components: (i) an intensive public information and awareness campaign targeting communities, schools, families, and GPs; and (ii) a network of detection teams with low threshold for referral and easy access [344,346]. The ED strategy was introduced into Rogaland County, Norway (population 370 000). Detection rates, DUP, and health outcomes in that sector were compared to those in two ‘parallel in time’ health sectors without ED strategies, one in Ulleval, Norway (population 190 000) and another in Roskilde County, Denmark (population, 100 000). The populations in the three sectors had remarkably similar demographics and public health-care systems typical of Scandinavia generally. In addition to this parallel non-equivalent group comparison, a historical control group design was used in a subsector of Rogaland County in which a cohort of patients presenting in 1993–1994 (before ED) was compared with a cohort presenting in 1997–1998 when ED strategies were being used.

Compared to the pre-ED cohort, the ED cohort (i) was larger in number; (ii) had shorter median DUP (4.5 weeks vs 20 weeks); (iii) was younger; (iv) had lower baseline symptom ratings; and (v) less often required inpatient admission [338,344]. Also, the cohort with ED and standard treatment had fewer negative symptoms and better peer networks at 12 month follow up, even within the schizophrenia group [348]. Patient characteristics reverted towards those of the pre-ED cohort after ED strategies in Rogaland were withdrawn [337], providing additional evidence for a specific relationship between ED strategies and shorter DUP, and more favourable baseline patient characteristics [337]. The methodologically more sound TIPS parallel group comparison produced comparable results to the historical group comparison. Compared to the non-ED sectors (Ulleval and Roskilde), the early psychosis cohort in the ED sector (Rogaland) had significantly shorter median DUP (5 weeks vs 16 weeks), and better functioning and lower symptom levels at baseline [345]. At 12 month and 2 year follow up, patients with ED had better outcomes with significantly reduced levels of negative symptoms, despite treatment after detection being essentially the same (i.e. specialist EPI was not offered in any of the four health sectors) [340,347].

In summary, the TIPS study is exemplary in the EPI field of combining an epidemiological approach and quasi-experimental design, and using multiple control groups and appropriate statistical analysis to examine whether confounding could have accounted for the ED treatment effects. TIPS also described the feasibility and costs of ED, and the clinical assessment load generated by it [344,346]. Significantly, TIPS showed that a critical component of EPI, ED, can be very effective.

Discussion

Using a scoping strategy to map the evaluation field, we were able to describe a conceptual framework for the intervention effectiveness field (Figure 1). Without this framework, it may not have been possible to sort and organize the hundreds of papers, books, and reports identified by database and Internet searches that were scattered across the management, social sciences, and clinical literature. Some caution is warranted regarding conceptual frameworks of the kind we have proposed herein because they may be prone to distortion by the theoretical perspective of their designers or overlook certain subfields [30]. To minimize these concerns we used a saturation searching strategy (searching until no new papers were found) and sorted until all representative papers could be grouped into at least one subfield. We found that ordering evaluation on an efficacy-effectiveness spectrum was useful because it tended inter alia to rank study designs according to their susceptibility to bias [349]. Typical of global evidence mapping, this review generated a voluminous bibliography, which we have included in full for use as an educational tool. Our global mapping exercise identified the broad range of evaluation designs beyond efficacy RCTs. The map made identifying gaps in evidence apparent more readily when appraising the EPI literature.

Scoping the effectiveness evaluation literature on EPI indicated a striking paucity of RCT data. Although the first Cochrane Review found CT evidence for EPI to be inconclusive [25], we detected promising trends. First, there were several well-constructed RCTs of specialist EPI showing short-term advantage [203,204,350], even when ED strategies were not used. It was also clear that without continuity of optimal care, the effects of short-term specialist EPI fade and disappear [205]. CTs of antipsychotic drugs consistently showed that positive symptoms were highly responsive in first-episode patients, although negative symptoms and cognitive deficits seemed little improved. Moreover, SGA do not seem to prevent transition to psychosis in prodromal patients, suggesting that alternative neuroprotective treatments, such as omega 3 fatty acids, N-acetyl cysteine, and metabotropic glutamate receptor agonists should be more intensively studied in prodromal patients [351–353].

Our appraisal found that the bulk of the EPI literature supporting EPI effectiveness consisted of observational studies. Although there are many sound non-interventional epidemiological studies to guide implementation of EPI programmes, most interventional studies of EPI rely on poorly controlled quasi-experimental design. We conclude that the conduct of high-quality health services research and QA studies has been grossly underresourced in spite of the critical importance of mental health systems in improving service delivery [93]. This conclusion implies that large-scale multi-service studies are required, necessitating at a national level investment in mental health services research capacity in Australia. We consider that the first urgent task in strengthening these fields of EPI evaluation would be the design of psycho-metrically sound programme fidelity rating scales.

Nevertheless, there are some important findings in the observational literature that are consistently reported irrespective of study design. First, specialist EPI service functions increase the recognition rates of first-presentation psychotic disorders [318], although these may not shorten DUP. It is likely that EPI functions draw into treatment more of the currently untreated cases of chronic psychosis in the community, estimated to be in excess of 50% of all cases [354,355]. Because disease coverage is an aspect of effectiveness, these data represent support for the effectiveness of specialist EPI. Second, irrespective of comparison group or country of origin, cost analysis studies of specialist EPI consistently showed that EPI costs less than TAU, mainly because of reduced hospital read-mission rates. Consistent with this, a recently published meta-analysis demonstrated that specialist EPI programs prevent relapse more effectively than FAU [356].

Although most quasi-experimental evaluations of EPI tend to use poorly controlled designs, our evidence-mapping procedures found examples of design features and statistical procedures that can be applied to this type of observational study to minimize the likelihood of biased results. These approaches were incorporated into the TIPS project, which produced compelling evidence that a combination of public awareness campaigns and provision of ED teams effectively engages psychotic patients at an earlier stage of their disease. In combination with state-of-the-art programme development and evaluation methodologies for implementing youth mental health campaigns [357], this area of evaluation could rapidly provide policy makers with the evidence base for funding preventative strategies [257,258,358]. Also, increased emphasis on audit and feedback in the evaluation of CPG implementation and EPI programme fidelity was identified in our review as an effective strategy for improving health-care quality and guiding service reform to support better clinical practice [109,301].

In this review and our proposed evaluation framework, we have assumed that existing evidence hierarchies [143,144] place undue weight on efficacy RCT designs relative to the diverse range of non-experimental methods. As others have proposed [359], we are concerned that existing evidence hierarchies may slow advances in fields such as mental health and public health nutrition, in which RCTs are uncommon due to feasibility or ethical constraints. We suggest that the mental health field not limit itself to an exclusive focus on RCT data, which tend to be ‘inconclusive and will augment the grey zones of practice’ [360], when there are reliable observational data supporting intervention effectiveness. Lowering the threshold for level of evidence needed before implementation of a service innovation, however, demands investment in routine quality evaluation after implementation [361], carried out by clinicians who are well-versed in research methodology [362].

In summary, our field mapping exercise was able to logically categorize the full range of evaluation approaches, and specify relevant methodological strengths and weaknesses. Using this conceptual framework we were able to comprehensively appraise the EPI effectiveness evaluation literature, collate findings that are consistent across study design types, and identify gaps in the literature that need urgent additional investment. We also concluded that the degree of complexity of the intervention evaluation literature suggests that greater focus on research methodology in the training of Australasian psychiatrists is urgently needed.

Footnotes

Acknowledgements

Declaration of interest : The authors report no conflicts of interest. The authors alone are responsible for the content and writing of the paper.

References

Harrison

Hopper

Craig

. Recovery from psychotic illness: a 15-and 25-year international follow-up study. Br J Psychiatry 2001;178:506–517.

Harrison

Croudace

Mason

Glazebrook

Medley

. Predicting the long-term outcome of schizophrenia. Psychol Med 1996;26:697–705.

Wiersma

Nienhuis

Slooff

Giel

. Natural course of schizophrenic disorders: a 15-year followup of a Dutch incidence cohort. Schizophr Bull 1998;24:75–85.

Robinson

Woerner

Alvir

JMJ

. Predictors of relapse following response from a first episode of schizophrenia or schizoaffective disorder. Arch Gen Psychiatry 1999;56:241–247.

Kane

Rifkin

Quitkin

Nayak

Ramos-Lorensi

. Fluphenazine vs placebo in patients with remitted, acute first-episode schizophrenia. Arch Gen Psychiatry 1982;39:70–73.

Crow

Macmillan

Johnson

Johnstone

. A randomized controlled trial of prophylactic neuroleptic treatment. Br J Psychiatry 1986;148:120–127.

McCreadie

Wiles

Grant

. The Scottish first episode schizophrenia study VII. Two-year follow-up. Acta Psychiatr Scand 1989;80:597–602.

Gitlin

Neuchterlein

Subotnik

. Clinical outcome following neuroleptic discontinuation in patients with remitted recent-onset schizophrenia. Am J Psychiatry 2001:1835–1842.

Lieberman

Jody

Geisler

. Time-course and biologic correlates of treatment response in first-episode schizophrenia. Arch Gen Psychiatry 1993;50:369–376.

10.

Power

Elkins

Adlard

Curry

McGorry

Harrigan

. Analysis of the initial treatment phase in first-episode psychosis. Br J Psychiatry 1998;172:71–76.

11.

Huq

. A trial of low doses of risperidone in the treatment of patients with first-episode schizophrenia, schizophreniform disorder, or schizoaffective disorder. J Clin Psychopharmacol 2004;24:220–224.

12.

Emsley

Rabinowitz

Medori

. Time course for antipsychotic treatment response in first-episode schizophrenia. Am J Psychiatry 2006;163:743–745.

13.

Malla

Norman

Schmitz

. Predictors of rate and time to remission in first-episode psychosis: a two-year outcome study. Psychol Med 2006;36:649–658.

14.

Lieberman

Koreen

Chakos

. Factors influencing treatment response and outcome of first-episode schizophrenia: implications for understanding the pathophysiology of schizophrenia. J Clin Psychiatry 1996; 57 (Suppl 9):5–9.

15.

Merlo

MCG

Hofer

Gekle

. Risperidone, 2 mg/day vs. 4 mg/day, in first-episode, acutely psychotic patients: treatment efficacy and effects on fine motor functioning. J Clin Psychiatry 2002;63:885–891.

16.

Oosthuizen

Emsley

Turner

Keyter

. A randomized, controlled comparison of the efficacy and tolerability of low and high doses of haloperidol in the treatment of first-episode psychosis. Int J Neuropsychopharmacol 2004;7:125–131.

17.

Kahn

Fleischhacker

Boter

. Effectiveness of antipsychotic drugs in first-episode schizophrenia and schizophreniform disorder: an open randomised clinical trial. Lancet 2008; 371 (9618):1085–1097.

18.

Lieberman

Stroup

McEvoy

. Effectiveness of antipsychotic drugs in patients with chronic schizophrenia. N Engl J Med 2005;353:1209–1223.

19.

Kavanagh

Young

White

. A brief motivational intervention for substance misuse in recent-onset psychosis. Drug and Alcohol Review 2004;23:151–155.

20.

Bellack

Bennett

Gearon

Brown

Yang

. A randomized clinical trial of a new behavioral treatment for drug abuse in people with severe and persistent mental illness. Arch Gen Psychiatry 2006;63:426–432.

21.

Brunette

Mueser

Drake

. A review of research on residential programs for people with severe mental illness and co-occurring substance use disorders. Drug Alcohol Rev 2004; 23:471–481.

22.

Jeffery

Ley

McLaren

Siegfried

. Psychosocial treatment programmes for people with both severe mental illness and substance misuse. Cochrane Database Syst Rev 2000; (2): CD001088.

23.

Harrigan

McGorry

Krstev

. Does treatment delay in first-episode psychosis really matter? Psychol Med 2003; 33:97–110.

24.

Drake

Haley

Akhtar

Lewis

. Causes and consequences of duration of untreated psychosis in schizophrenia. Br J Psychiatry 2000;177:511–515.

25.

Marshall

Rathbone

. Early intervention for psychosis. Cochrane Database Syst Rev 2006; (4):CD004718.

26.

Atkins

Eccles

Flottorp

. Systems for grading the quality of evidence and the strength of reccommendations 1: critical appraisal of existing approaches. The Grade Working Group. BMC Health Serv Res 2004;4:38.

27.

Black

. Why we need observational studies to evaluate the effectiveness of health care. Br Med J 1996; 312 (7040): 1215–1218.

28.

Mays

Roberts

Popay

. Synthesising research evidence. In: Fulop

Allen

Clarke

Black

, eds. Studying the organisation and delivery of health services: research methods. London: Routledge, 2001:188–220.

29.

Last

, ed. A dictionary of epidemiology, 4th edn. New York: Oxford University Press, 2001.

30.

Mathison

. Encyclopedia of evaluation. CA: Sage Publications, 2005.

31.

Everitt

Howell

. Encyclopedia of statistics in behavioral science, vols 1-4. West Sussex: John Wiley & Sons, 2005.

32.

Bates

Clapton

Coren

. Systematic maps to support the evidence base in social care. Evidence Policy 2007;3:539–551.

33.

Kleijnen

Gotzche

Kunz

Ozman

Chalmers

. So what's so special about randomisation? In: Maynard

Chalmers

, eds. Non-random reflections on health services research. London, UK: BMJ Publishing, 1997:93–106.

34.

McCulloch

Searle

Neuhaus

. Generalized linear and mixed models, 2nd edn.Hoboken, NJ: John Wiley & Sons, 2008.

35.

Verbeke

Molenberghs

. Linear mixed models for longitudinal data. New York: Springer Science+Business Media, 2000.

36.

Wells

. Treatment research at the crossroads: the scientific interface of clinical trials and effectiveness research. Am J Psychiatry 1999;156:5–10.

37.

Kraemer

Gibbons

. Why does the randomized clinical trial methodology so often mislead clinical decision making? Focus on moderators and mediators of treatment. Psychiatr Ann 2009;39:736–745.

38.

Macias

Jones

Hargreaves

. When programs benefit some people more than others: tests of differential service effectiveness. Adm Policy Ment Health Ment Health Serv Res 2008;35:283–294.

39.

Pajonk

. Clinical trial design in schizophrenia: implications for clinical decisions. Curr Opin Psychiatry 2005;18:692–699.

40.

Rothwell

. Treating Individuals 1: external validity of randomised controlled trials: ‘To whom do the results of this trial apply?’ Lancet 2005; 365 (9453):82–93.

41.

Riedel

Strassnig

Muller

Zwack

Moller

. How representative of everyday clinical populations are schizophrenia patients enrolled in clinical trials? Eur Arch Psychiatry Clin Neurosci 2005; 255:143–148.

42.

Hofer

Hummer

Huber

Kurtz

Walch

Fleischhacker

. Selection bias in clinical trials with antipsychotics. J Clin Psychopharmacol 2000;20:699–702.

43.

Leucht

. Translating research into clinical practice: critical interpretation of clinical trials in schizophrenia. Int Clin Psychopharmacol 2006; 21:S1–S10.

44.

Wahlbeck

Tuunainen

Ahokas

Leucht

. Dropout rates in randomised antipsychotic drug trials. Psychopharmacology (Berl) 2001;155:230–233.

45.

Heres

Davis

Maino

Jetzinger

Kissling

Leucht

. Why olanzapine beats risperidone, risperidone beats quetiapine, and quetiapine beats olanzapine: an exploratory analysis of head-to-head comparison studies of second-generation antipsychotics. Am J Psychiatry 2006;163:185–194.

46.

Safer

. Design and reporting modifications in industry-sponsored comparative psychopharmacology trials. J Nerv Ment Dis 2002;190:583–592.

47.

Davis

Chen

Glick

. Issues that may determine the outcome of antipsychotic trials: industry sponsorship and extrapyramidal side effect. Neuropsychopharmacology 2008;33:971–975.

48.

Oakley

Strange

Bonell

Allen

Stephenson

. Health services research: process evaluation in randomised controlled trials of complex interventions. Br Med J 2006; 332 (7538): 413–416.

49.

Hawe

Shiell

Riley

. Complex interventions: how ‘out of control’ can a randomised controlled trial be? Br Med J 2004; 328 (7455):1561–1563.

50.

Perera

Heneghan

Yudkin

. Graphical methods for depicting randomised trials of complex interventions. BMJ 2007;334:127–129.

51.

Campbell

Fitzpatrick

Haines

. Framework for design and evaluation of complex interventions to improve health. Br Med J 2000; 321 (7262):694–696.

52.

Medical Research Council. A framework for development and evaluation of RCTs for complex interventions to improve health. London, UK: MRC, 2000.

53.

van Meijel

Gamel

Swieten-Duijfjes

Grypdonck

MHF

. The development of evidence-based nursing interventions: methodological considerations. J Adv Nurs 2004;48:84–92.

54.

Bradley

Wiles

Kinmonth

Mant

Gantley

. Development and evaluation of complex interventions in health services research: case study of the Southampton heart integrated care project (SHIP). Br Med J 1999; 318 (7185):711–715.

55.

Donner

Klar

. Design and analysis of cluster randomization trials in health research. London: Arnold, 2000.

56.

Medical Research Council. Cluster randomised trials: methodological and ethical considerations. London, UK: MRC, 2002.

57.

Campbell

Grimshaw

. Cluster randomised trials: time for improvement. Br Med J 1998;317:1171–1172.

58.

Puffer

Torgerson

Watson

. Evidence for risk of bias in cluster randomised trials: review of recent trials published in three general medical journals. Br Med J 2003; 327 (7418):785–787.

59.

Tunis

Stryer

Clancy

. Practical clinical trials: increasing the value of clinical research for decision making in clinical and health policy. JAMA 2003;290:1624–1632.

60.

Wolff

. Randomised trials of socially complex interventions: promise or peril? J Health Serv Response Policy 2001; 6:123–126.

61.

Wolff

. Using randomized controlled trials to evaluate socially complex services: problems, challenges and recommendations. J Ment Health Policy Econ 2000;3:97–109.

62.

Sackett

Haynes

Guyatt

Tugwell

. Clinical epidemiology: a basic science for clinical medicine. Boston, MA: Little, Brown, 1991.

63.

Diggle

Heagerty

Liang

K-Y

Zeger

. Analysis of longitudinal data, 2nd edn. Oxford: Oxford University Press, 2002.

64.

Shadish

Cook

Campbell

. Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin, 2002.

65.

Gibbons

Brown

Hur

. Early evidence on the effects of regulators’ suicidality warnings on SSRI prescriptions and suicide in children and adolescents. Am J Psychiatry 2007;164:1356–1363.

66.

Donabedian

. The quality of care: how can it be assessed. JAMA 1988;260:1743–1748.

67.

Chisholm

Flisher

Lund

. Scale up services for mental disorders: a call for action. Lancet 2007; 370 (9594): 1241–1252.

68.

Whiteford

. Can research influence mental health policy? Aust N Z J Psychiatry 2001; 35:428–134.

69.

Vladeck

Kramer

. Case mix measures: DRGs and alternatives. Annu Rev Public Health 1988;9:333–359.

70.

Jones

Amaddeo

Barbui

Tansella

. Predicting costs of mental health care: a critical literature review. Psychol Med 2007;37:467–477.

71.

Eagar

Gaines

Burgess

. Developing a New Zealand casemix classification for mental health services. World Psychiatry 2004;3:176–181.

72.

Knapp

Beecham

Fenyo

Hallam

. Community mental health care for former hospital inpatients: predicting costs from needs and diagnoses. Br J Psychiatry 1995;166:10–18.

73.

Taiminen

Ranta

Karlsson

. Comparison of clinical and best-estimate research DSM-IV diagnoses in a Finnish sample of first-admission psychosis and severe affective disorder. Nord J Psychiatry 2001;55:107–111.

74.

Elhai

Calhoun

Ford

. Statistical procedures for analyzing mental health services data. Psychiatry Res 2008;160:129–136.

75.

Andrews

Peters

Teesson

. The measurement of consumer outcome in mental health: a report to the National Mental Health Information Strategy Committee. Sydney: Clinical Research Unit for Anxiety Disorders, 1994.

76.

Wing

Beevor

Curtis

Park

SBG

Hadden

Burns

. Health of the Nation Outcome Scales (HoNOS): research and development. Br J Psychiatry 1998;172:11–18.

77.

Happell

. The value of routine outcome measurement for consumers of mental health services: master or servant? Int J Soc Psychiatry 2008; 54:317–327.

78.

Kisely

Preston

Rooney

. Pathways and outcomes of psychiatric care: does it depend on who you are, or what you've got? Aust N Z J Psychiatry 2000; 34:1009–1014.

79.

Fossey

Harvey

. A conceptual review of functioning: implications for the development of consumer outcome measures. Aust N Z J Psychiatry 2001;35:91–98.

80.

Gilbody

House

Sheldon

. Outcomes research in mental health: systematic review. Br J Psychiatry 2002;181:8–16.

81.

Bilsker

Goldner

. Routine outcome measurement by mental health-care providers: is it worth doing? Lancet 2002; 360 (9346): 1689–1690.

82.

Slade

Thornicroft

Glover

. The feasibility of routine outcome measures in mental health. Soc Psychiatry Psychiatr Epidemiol 1999;34:243–249.

83.

Dickey

Wagenaar

. Evaluating mental health care reform: including the clinician, client, and family perspective. J Ment Health Adm 1994;21:313–319.

84.

Lloyd-Evans

Johnson

Slade

. Assessing the content of mental health services: a review of measures. Soc Psychiatry Psychiatr Epidemiol 2007;42:673–682.

85.

Salize

Kustner

Torres-Gonzalez

Reinhard

Estevez

JFJ

Rossler

. Needs for care and effectiveness of mental health care provision for schizophrenic patients in two European regions: a comparison between Granada (Spain) and Mannheim (Germany). Acta Psychiatr Scand 1999;100:328–334.

86.

De Jong

Giel

Tentlom

. International classification of mental health care: a tool for classifying service providing mental health care Part 1. Groningen: Department of Social Psychiatry, University of Groningen and World Health Organisation Regional office for Europe, 1991.

87.

Johnson

Kuhlmann

EPCAT Group . The European Service Mapping Schedule (ESMS): development of an instrument for the description and classification of mental health services. Acta Psychiatr Scand 2000;102:14–23.

88.

Becker

Hulsmann

Knudsen

. Provision of services for people with schizophrenia in five European regions. Soc Psychiatry Psychiatr Epidemiol 2002;37:465–474.

89.

Palmer

Reilly

. Individual and institutional variables which may serve as indicators of quality of medical care. Med Care 1979;17:693–717.

90.

Preston

Stirling

Perera

. A statewide evaluation system for early psychosis. Aust N Z J Psychiatry 2003;37:421–428.

91.

Doran

Fullwood

Kontopantelis

Reeves

. Effect of financial incentives on inequalities in the delivery of primary clinical care in England: analysis of clinical activity indicators for the quality and outcomes framework. Lancet 2008; 372 (9640): 728–736.

92.

Starfield

. Quality and outcomes framework: patient-centred? Lancet 2008; 372 (9640):692–694.

93.

Minas

Cohen

. Why focus on mental health systems? Int J Ment Health Syst 2007; 1:1.

94.

Rossi

Lipsey

Freeman

. Evaluation: a systematic approach, 7th edn. CA: Thousand Oaks, 2004.

95.

Smith

. Evaluability assessment: a practical approach. Boston: Kluwer Academic, 1989.

96.

Sadish

Cook

Leviton

. Foundations of program evaluation: theories of practice. Newbury Park, CA: Sage Publications, 1991.

97.

Nutbeam

Bauman

. Evaluation in a nutshell. Sydney: McGraw-Hill Australia, 2006.

98.

Mowbray

Holter

Teague

Bybee

. Fidelity criteria: development, measurement, and validation. Am J Eval 2003;24:315–340.

99.

Drummond

Sculpher

Torrance

O'Brien

B, J.

Stoddart

. Methods for the economic evaluation of health care programmes, 3rd edn. New York: Oxford University Press, 2005.

100.

National Health Medical Research Council. How to compare the costs and benefits: evaluation of the economic evidence. Canberra: 2001.

101.

Roberts

Cumming

Nelson

. A review of economic evaluations of community mental health care. Med Care Res Rev 2005;62:503–543.

102.

Evers

Salvador-Carulla

Halsteinli

McDaid

Group

. Implementing mental health economic evaluation evidence: building a bridge between theory and practice. J Ment Health 2007;16:223–241.

103.

Donaldson

Mugford

Vale

. Evidence-based health economics: from effectiveness to efficiency in systematic review. London: BMJ Books, 2002.

104.

Hermann

. Improving mental healthcare: a guide to measurement-based quality improvement. Washington, DC: American Psychiatric Publications, 2005.

105.

Buttery

Walshe

Coles

Bennett

. Evaluating medical audit: the development of audit findings of national survey of healthcare provider units in England. London: Clinical Accountability, Service Planning and Evaluation (CASPE) Research, 1994.

106.

Buxton

. Achievements of audit in the NHS. Qual Health Care 1994; l (Suppl):531–534.

107.

Nelson

EAS

. Standardized program for medical audit is needed. Br Med J 1994; 309 (6955):672–673.

108.

O'Neil

Hicks

Smalley

. Central dimensions of clinical practice evaluation: efficiency, appropriateness and effectiveness. J Eval Clin Pract 1996;2:13–27.

109.

Jamtvedt

Young

Kristoffersen

O'Brien

Oxman

. Audit and feedback: effects on professional practice and health care outcomes. Cochrane Database Syst Rev 2006; (2):CD000259.

110.

Miles

Lugon

. Effective clinical practice. Oxford: Blackwell Science, 1996.

111.

Bhopal

Thompson

. Medical audit and medical research. Qual Health Care 1992;2:274–275.

112.

Walshe

Coles

. Evaluation audit: a review of initiatives. London: Clinical Accountability, Service Planning and Evaluation (CASPE) Research, 1994.

113.

Shaw

Costain

. Guidelines for medical audit: seven principles. Br Med J 1989; 299 (6697):498–199.

114.

Jenkins

. Towards a system of outcome indicators for mental health care. Br J Psychiatry 1990;157:500–514.

115.

Mainz

. Defining and classifying clinical indicators for quality improvement. Int J Qual Health Care 2003;15:523–530.

116.

Mainz

. Developing evidence-based clinical indicators: a state of the art methods primer. Int J Qual Health Care 2003;15:15–111.

117.

Brugha

Lindsay

. Quality of mental health service care: the forgotten pathway from process to outcome. Soc Psychiatry Psychiatr Epidemiol 1996;31:89–98.

118.

Ganju

. Implementation of evidence-based practices in state mental health systems: implications for research and effectiveness studies. Schizophr Bull 2003;29:125–131.

119.

Goldman

Thelander

Westrin

C-G

. Organising mental health services: an evidence-based approach. J Ment Health Policy Econ 2000;3:69–75.

120.

Polcin

. How health services research can help clinical trials become more community relevant. Int J Drug Policy 2006;17:230–237.

121.

Beinecke

Shepard

Hurley

. Implementing evidence-based mental health practices and performance measures in Massachusetts. Adm Policy Ment Health Ment Health Serv Res 2006;33:623–628.

122.

Rubin

Pronovost

Diette

. The advantages and disadvantages of process-based measures of health care quality. Int J Qual Health Care 2001;13:469–474.

123.

Rubin

Pronovost

Diette

. From a process of care to a measure: the development and testing of a quality indicator. Int J Qual Health Care 2001;13:489–496.

124.

Tansella

Thornicroft

. A conceptual framework for mental health services: the matrix model. Psychol Med 1998;28:503–508.

125.

Meehan

Stedman

Neuendorf

Francisco

Neilson

. Benchmarking Australia's mental health services: is it possible and useful? Aust Health Rev 2007; 31:623–627.

126.

Juran

Gryna

. Quality control handbook, 4th edn. New York: McGraw Hill, 1988.

127.

Deming

. Out of the crisis. Cambridge: Cambridge University Press, 1982.

128.

Berwick

. Developing and testing changes in delivery of care. Ann Intern Med 1998;128:651–656.

129.

Hart

Bond

. Action research for health and social care: a guide to practice. Buckingham, UK: Open University Press, 1995.

130.

Stringer

Genat

. Action research in health. New Jersey: Pearson Education, 2004.

131.

Reason

. Managing the risks of organizational accidents. Ashgate: Aldershot, 1997.

132.

Kaplan

Norton

. The balanced scorecard: translating strategy into action. Cambridge, MA: Harvard Business School Press, 1996.

133.

Camp

. Benchmarking: the search for industry best practices that lead to superior performance. Portland, OR: Productivity Press, 2006.

134.

Sackett

Rosenberg

WMC

Gray

JAM

Haynes

Richardson

. Evidence based medicine: what it is and what it isn't - it's about integrating individual clinical expertise and the best external evidence. Br Med J 1996; 312 (7023):71–72.

135.

Lohr

Eleazer

Mauskopf

. Health policy issues and applications for evidence-based medicine and clinical practice guidelines. Health Policy 1998;46:1–19.

136.

Sale

. Understanding clinical governance and quality assurance. Hampshire: Palgrave McMillan, 2005.

137.

Brennan

Leape

Laird

. Incident of adverse events and negligent care in hospitalised patients. Results of Harvard Medical Practice Study 1. N Engl J Med 1991;324:370–376.

138.

Wilson

Runciuman

Gibberd

Harrison

Newby

Hamilton

. The Quality in Australian Health Care Study. Med J Aust 1995; 163:458–171.

139.

Zarin

Pincus

West

McIntyre

. Practice-based research in psychiatry. Am J Psychiatry 1997;154:1199–1208.

140.

Pincus

Zarin

Tanielian

. Psychiatric patients and treatments in 1997: findings from the American Psychiatric Practice Research Network. Arch Gen Psychiatry 1999; 56:441–449.

141.

Norquist

. Practice research networks: promises and pitfalls. Clin Psychol Sci Pract 2001;8:173–175.

142.

Borkovec

Echemendia

Ragusea

Ruiz

. The Pennsylvania Practice Research Network and future possibilities for clinically meaningful and scientifically rigorous psychotherapy effectiveness research. Clin Psychol Sci Pract 2001;8:155–167.

143.

National Health and Medical Research Council. How to use the evidence: assessment and application of scientific evidence. Canberra: Commonwealth of Australia, 2000.

144.

National Health and Medical Research Council. How to review the evidence: systematic identification and review of the scientific literature. Canberra: Commonwealth of Australia, 2000.

145.

National Health and Medical Research Council. How to put the evidence into practice: implementation and dissemination strategies. Canberra: Commonwealth of Australia, 2000.

146.

Andrews

. Randomised controlled trials in psychiatry: important but poorly accepted. Br Med J 1999; 319 (7209):562–564.

147.

National Institute of Clinical Studies. Assessing the implementability of guidelines. Melbourne: National Institute of Clinical Studies, 2006.

148.

Cluzeau

Burgers

Brouwers

. Development and validation of an international appraisal instrument for assessing the quality of clinical practice guidelines: the AGREE project. Qual Saf Health Care 2003;12:18–23.

149.

Yealy

Auble

Stone

. Effect of increasing the intensity of implementing pneumonia guidelines: a randomized, controlled trial. Ann Intern Med 2005; 143 (12):881–894.

150.

Campbell

Hotchkiss

Bradshaw

Porteous

. Integrated care pathways. Br Med J 1998; 316 (7125): 133–137.

151.

Evans-Lacko

Jarrett

McCrone

Thornicroft

. Clinical pathways in psychiatry. Br J Psychiatry 2008;193:4–5.

152.

Yellowlees

Emmerson

Denton

Wentworth

Dart

. Clinical pathways in mental health. Brisbane: Royal Australian and New Zealand College of Psychiatrists, 2002.

153.

Coffey

Richards

Remmert

LeRoy

Schoville

Baldwin

. An introduction to critical paths (hospital and quality management). Qual Manag Health Care 2005;14:46.

154.

Huby

Rees

. The effectiveness of quality improvement tools: joint working in integrated community teams. Int J Qual Health Care 2005;17:53–58.

155.

Emmerson

Frost

Fawcett

Ballantyne

Ward

Catts

. Do clinical pathways really improve performance in mental health settings? Australas Psychiatry 2006; 14:395–398.

156.

Jones

Kenward

. Design and analysis of cross-over trials, 2nd edn. Boca Raton: Chapman & Hall/CRC, 2003.

157.

Eccles

Grimshaw

Campbell

Ramsay

. Research designs for studies evaluating the effectiveness of change and improvement strategies. Qual Saf Health Care 2003;12:47–52.

158.

Speer

. Mental health outcome evaluation. San Diego, CA: Academic Press, 1998.

159.

Trochim

WMK

Donnelly

. The research methods knowledge base, 3rd edn. Mason, OH: Atomic Dog, 2008.

160.

Yin

. Case study research: design and methods, 3rd edn. Thousand Oaks, CA: Sage Publications, 2003.

161.

Speer

. Clinically significant change: Jacobson and Truax (1991) revisited. J Consult Clin Psychol 1992; 60:402–408.

162.

Jacobson

Truax

. Clinical significance: a statistical approach to defining meaningful change in psychotherapy research. J Consult Clin Psychol 1991;59:12–19.

163.

Farmer

. The demise of the published case report: is resuscitation necessary? Br J Psychiatry 1999; 174:93–94.

164.

Dalziel

Round

Stein

Garside

Castelnuovo

Payne

. Do the findings of case series studies vary significantly according to methodological characteristics? Health Technol Assess 2005; 9 (2): 1.

165.

Whitaker

Farrington

Spiessens

Musonda

. Tutorial in biostatistics: the self-controlled case series method. Stat Med 2006;25:1768–1797.

166.

Angst

Sellaro

. Historical perspectives and natural history of bipolar disorder. Biol Psychiatry 2000; 48:445–457.

167.

Biederman

Mick

Faraone

Spencer

Wilens

Wozniak

. Pediatric mania: a developmental subtype of bipolar disorder? Biol Psychiatry 2000; 48:458–466.

168.

Walter

Rey

Dekker

. The humble case report. Aust N Z J Psychiatry 2001;35:240–245.

169.

Dattilio

. Does the case study have a future in the psychiatric literature? Int J Psychiatry Clin Pract 2006; 10:195–203.

170.

Stiles

. When is a case study scientific research? Psychother Bull 2003; 38:6–11.

171.

Davies

Howells

Jones

. Evaluating innovative treatments in forensic mental health: a role for single case methodology? J Forensic Psychiatry Psychol 2007; 18:353–367.

172.

Adams

Fenton

MKP

Quraishi

David

. Systematic meta-review of depot antipsychotic drugs for people with schizophrenia. Br J Psychiatry 2001;179:290–299.

173.

Keech

Gebski

Pike

. Interpreting and reporting clinical trials: a guide to the CONSORT statement and the principles of randomised controlled trials. Pyrmont: Australasian Medical Publishing, 2007.

174.

Altman

Schulz

Moher

. The revised CONSORT statement for reporting randomized trials: explanation and elaboration. Ann Intern Med 2001;134:663–694.

175.

Singer

JDA

Willett

. Applied longitudinal data analysis: modeling change and event occurence. Oxford: Oxford University Press, 2003.

176.

Popay

Roberts

Sowden

Petticrew

Arai

Rodgers

. Guidance on the conduct of narrative synthesis in systematic reviews: version 2. Lancaster, UK: Lancaster University, 2005.

177.

Arai

Britten

Popay

. Testing methodological developments in the conduct of narrative synthesis: a demonstration review of research on the implementation of smoke alarm interventions. Evidence Policy 2007;3:361–383.

178.

Guyatt

. Evidence-based medicine: a new approach to teaching the practice of medicine. JAMA 1992; 268 (17):2420–2425.

179.

Everitt

Wessely

. Clinical trials in psychiatry. Oxford: Oxford University Press, 2004.

180.

Friedman

Furberg

DeMets

. Fundamentals of clinical trials, 3rd. New York: Springer-Verlag, 1998.

181.

Haynes

Sackett

Guyatt

Tugwell

. Clinical epidemiology: how to do clinical practice research, 3rd edn. Philadelphia: Lippincott Williams & Wilkins, 2005.

182.

Grobbee

Hoes

. Principles and methods of clinical epidemiology. Sudbury, MA: Jones and Bartlett Publishers, 2008.

183.

Levin

Petrila

Hennessy

. Mental health services: a public health perspective, 2nd. New York: Oxford University Press, 2004.

184.

Thornicroft

Tansella

. The mental health matrix: a manual to improve services. Cambridge: Cambridge University Press, 1999.

185.

Orwin

Goldman

. Evaluating alcohol, drug abuse, and mental health services. Levin

Petrila

Hennessy

. managing mental health systems, 2nd. Oxford: Oxford University Press; 2004:419–466.

186.

Arksey

O'Malley

. Scoping studies: towards a methodological framework. Int J Soc Res 2005;8:19–32.

187.

Emsley

. Risperidone in the treatment of first-episode psychotic patients: a double-blind multicenter study. Schizophr Bull 1999;25:721–729.

188.

Lieberman

Tollefson

Tohen

. Comparative efficacy and safety of atypical and conventional antipsychotic drugs in first-episode psychosis: a randomized, double-blind trial of olanza-pine versus haloperidol. Am J Psychiatry 2003;160:1396–1404.

189.

Green

Lieberman

Hamer

. Olanzapine and haloperidol in first episode psychosis: two-year data. Schizophr Res 2006;86:234–243.

190.

Schooler

Rabinowitz

Davidson

. Risperidone and haloperidol in first-episode psychosis: a long-term randomized trial. Am J Psychiatry 2005;162:947–953.

191.

Lieberman

Phillips

. Atypical and conventional antipsychotic drugs in treatment-naive first-episode schizophrenia: a 52-week randomized trial of clozapine vs chlorpromazine. Neuropsychopharmacology 2003;28:995–1003.

192.

McGorry

Yung

Phillips

. Randomized controlled trial of interventions designed to reduce the risk of progression to first-episode psychosis in a clinical sample with subthreshold symptoms. Arch Gen Psychiatry 2002;59:921–928.

193.

McGlashan

Zipursky

Perkins

. Randomized, double-blind trial of olanzapine versus placebo in patients prodromally symptomatic for psychosis. Am J Psychiatry 2006;163:790–799.

194.

Catts

Frost

ADJ

Gifford

Scott

. Real-world use of quetiapine in early psychosis: an acute inpatient and community follow-up effectiveness study. Int J Psychiatry Clin Pract 2008;12:65–73.

195.

Kumra

Oberstar

Sikich

. Efficacy and tolerability of second-generation antipsychotics in children and adolescents with schizophrenia. Schizophr Bull 2008;34:60–71.

196.

Strassnig

Miewald

Keshavan

Ganguli

. Weight gain in newly diagnosed first-episode psychosis patients and healthy comparisons: one-year analysis. Schizophr Res 2007;93:90–98.

197.

Leucht

Barnes

TRE

Kissling

Engel

Correll

Kane

. Relapse prevention in schizophrenia with new-generation antipsychotics: a systematic review and exploratory meta-analysis of randomized, controlled trials. Am J Psychiatry 2003;160:1209–1222.

198.

Kumra

Kranzler

Gerbino-Rosen

. Clozapine and ‘high-dose’ olanzapine in refractory early-onset schizophrenia: a 12-week randomized and double-blind comparison. Biol Psychiatry 2008;63:524–529.

199.

Power

Iacoponi

Reynolds

. The Lambeth early onset crisis assessment team study: general practitioner education and access to an early detection team in first-episode psychosis. Br J Psychiatry 2007; 191:S133–S139.

200.

Tait

Lester

Birchwood

Freemantle

Wilson

. Design of the BiRmingham Early Detection In untREated psyChosis Trial (REDIRECT): cluster randomised controlled trial of general practitioner education in detection of first episode psychosis [ISRCTN87898421]. BMC Health Serv Res 2005;5:19.

201.

Harvey

P-O

Lepage

Malla

. Benefits of enriched intervention compared with standard care for patients with recent-onset Psychosis, a metaanalytic approach. Can J Psychiatry 2007; 52:464–472.

202.

Edwards

McGorry

. Implementing early intervention in psychosis: a guide to establishing early psychosis services. London: Martin Dunitz, 2002.

203.

Grawe

Falloon

IRH

Widen

Skogvoll

. Two years of continued early treatment for recent-onset schizophrenia: a randomised controlled study. Acta Psychiatr Scand 2006;114:328–336.

204.

Thorup

Petersen

Jeppesen

. Integrated treatment ameliorates negative symptoms in first episode psychosis: results from the Danish OPUS trial. Schizophr Res 2005;79:95–105.

205.

Bertelsen

Jeppesen

Petersen

. Five-year follow-up of a randomized multicenter trial of intensive early intervention vs standard treatment for patients with a first episode of psychotic illness. Arch Gen Psychiatry 2008;65:762–771.

206.

March

Silva

Compton

Shapiro

Califf

Krishnan

. The case for practical clinical trials in psychiatry. Am J Psychiatry 2005;162:836–846.

207.

Fleischhacker

Widschwendter

. Treatment of schizophrenia patients: comparing new-generation antipsychotics to each other. Curr Opin Psychiatry 2006;19:128–134.

208.

Stroup

Alves

Hamer

Lieberman

. Clinical trials for antipsychotic drugs: design conventions, dilemmas and innovations. Nat Rev Drug Discov 2006;5:133–146.

209.

Kasper

Winkler

. Addressing the limitations of the CATIE study. World J Biol Psychiatry 2006;7:126–127.

210.

Ragins

. Should the CATIE study be a wake-up call? Psychiatr Serv 2005; 56:1489.

211.

Davis

Chen

Glick

. A meta-analysis of the efficacy of second-generation antipsychotics. Arch Gen Psychiatry 2003;60:553–564.

212.

Leucht

Corves

Arbter

Engel

Davis

. Second-generation versus first-generation antipsychotic drugs for schizophrenia: a meta-analysis. Lancet 2009; 373 (9657):31–41.

213.

Nasrallah

Meyer

Goff

. Low rates of treatment for hypertension, dyslipidemia and diabetes in schizophrenia: data from the CATIE schizophrenia trial sample at baseline. Schizophr Res 2006;86:15–22.

214.

McEvoy

Lieberman

Perkins

. Efficacy and tolerability of olanzapine, quetiapine, and risperidone in the treatment of early psychosis: a randomized, double-blind 52-week comparison. Am J Psychiatry 2007;164:1050–1060.

215.

Hotopf

Churchill

Lewis

. Pragmatic randomised controlled trials in psychiatry. Br J Psychiatry 1999;175:217–223.

216.

Hotopf

. The pragmatic randomised control trial. Adv Psychiatr Treat 2002;8:326–333.

217.

Rosenheck

. Pharmacotherapy of first-episode schizophrenia. Lancet 2008; 371 (9618): 1048–1049.

218.

D'Agostino

. Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Stat Med 1998; 17 (19):2265–2281.

219.

Rubin

. Estimating causal effects from large data sets using propensity scores. Ann Intern Med 1997;127:757–763.

220.

Sturmer

Joshi

Glynn

Avorn

Rothman

Schneeweiss

. A review of the application of propensity score methods yielded increasing use, advantages in specific settings, but not substantially different estimates compared with conventional multivariable methods. J Clin Epidemiol 2006;59:437–447.

221.

McMahon

. Approaches to combat with confounding by indication in observational studies of intended drug effects. Pharmacoepidemiol Drug Saf 2003; 12:551–558.

222.

Foster

. Propensity score matching: an illustrative analysis of dose response. Med Care 2003;41:1183–1192.

223.

Newhouse

McClellan

. Econometrics in outcomes research: the use of instrumental variables. Annu Rev Public Health 1998;19:17–34.

224.

Martens

Pestman

de Boer

Belitser

Klungel

. Instrumental variables application and limitations. Epidemiology 2006;17:260–267.

225.

Stukel

Fisher

Wennberg

Alter

Gottlieb

Vermeulen

. Analysis of observational studies in the presence of treatment selection bias: effects of invasive cardiac management on AMI survival using propensity score and instrumental variable methods. JAMA 2007;297:278–285.

226.

McClellan

McNeil

Newhouse

. Does more intensive treatment of acute myocardial infarction in the elderly reduce mortality? JAMA 1994; 272:859–866.

227.

Fortney

Rost

Zhang

Warren

. The impact of geographic accessibility on the intensity and quality of depression treatment. Med Care 1999;37:884–893.

228.

Goldstein

. Multilevel statistical models, 2nd edn. New York: Halstead Press, 1995.

229.

Raudenbush

Bryk

. Hierarchical linear models: applications and data analysis methods. London: Sage Publications, 2002.

230.

Streiner

. The case of the missing data: methods of dealing with dropouts and other research vagaries. Res Methods Psychiatry 2002;47:70–77.

231.

Rosenbaum

. Observational studies 2nd edn. New York: Springer-Verlag, 2002.

232.

Scharfstein

Rotnitzky

Robins

. Adjusting for nonignorable drop-out using semiparametric nonresponse models. J Am Stat Assoc 1999; 94 (448): 1096–1120.

233.

Beunckens

Molenberghs

Thijs

Verbeke

. Incomplete hierarchical data. Stat Methods Med Res 2007; 16:457–492.

234.

Benson

Hartz

. A comparison of observational studies and randomized, controlled trials. N Engl J Med 2000; 342 (25): 1878–1886.

235.

Concato

Shah

Horwitz

. Randomized, controlled trials, observational studies, and the hierarchy of research designs. N Engl J Med 2000; 342 (25): 1887–1892.

236.

Jablensky

Sartorius

Ernberg

. Schizophrenia: manifestations, incidence and course in different cultures. Psychol Med 1992; (Suppl 20): 1–97.

237.

Kirkbride

Fearon

Morgan

. Heterogeneity in incidence rates of schizophrenia and other psychotic syndromes: findings from the 3-center AESOP study. Arch Gen Psychiatry 2006;63:250–258.

238.

Bebbington

Ramana

. The epidemiology of bipolar affective disorder. Soc Psychiatry Psychiatr Epidemiol 1995;30:279–292.

239.

Peraala

Suvisaari

Saarni

. Lifetime prevalence of psychotic and bipolar I disorders in a general population. Arch Gen Psychiatry 2007;64:19–28.

240.

Saha

Chant

Welham

McGrath

. A systematic review of the prevalence of schizophrenia. PLoS Med 2005; 2:413–433.

241.

Baldwin

Browne

Scully

. Epidemiology of first-episode psychosis: illustrating the challenges across diagnostic boundaries through the Cavan-Monaghan study at 8 years. Schizophr Bull 2005;31:624–638.

242.

Singh

Burns

Amin

Jones

Harrison

. Acute and transient psychotic disorders: precursors, epidemiology, course and outcome. Br J Psychiatry 2004; 185:452–459.

243.

Sajith

Chandrasekaran

Unni

KES

Sahai

. Acute polymorphic psychotic disorder: diagnostic stability over 3 years. Acta Psychiatr Scand 2002;105:104–109.

244.

Jorgensen

Bennedsen

Christensen

Hyllested

. Acute and transient psychotic disorder: a 1-year follow-up study. Acta Psychiatr Scand 1997;96:150–154.

245.

Morgan

Mallett

Hutchinson

. Pathways to care and ethnicity. I: sample characteristics and compulsory admission - report from the AESOP study. Br J Psychiatry 2005;186:281–289.

246.

Morgan

Fearon

Hutchinson

. Duration of untreated psychosis and ethnicity in the AESOP first-onset psychosis study. Psychol Med 2006;36:239–247.

247.

Hafner

Maurer

Ruhrmann

. Early detection and secondary prevention of psychosis: facts and visions. Eur Arch Psychiatry Clin Neurosci 2004;254:117–128.

248.

Hafner

Maurer

Trendler

an der Heiden

Schmidt

Konnecke

. Schizophrenia and depression: challenging the paradigm of two separate diseases - a controlled study of schizophrenia, depression and healthy controls. Schizophr Res 2005;77:11–24.

249.

Friis

Larsen

Melle

. Methodological pitfalls in early detection studies: the NAPE Lecture 2002. Acta Psychiatr Scand 2003;107:3–9.

250.

O'Toole

. Screening for low prevalence disorders. Aust N Z J Psychiatry 2000; 34:S39–S46.

251.

Jablensky

. Prevalence and incidence of schizophrenia spectrum disorders: implications for prevention. Aust N Z J Psychiatry 2000; 34:S26–S34.

252.

Amminger

Harris

Conus

. Treated incidence of first-episode psychosis in the catchment area of EPPIC between 1997 and 2000. Acta Psychiatr Scand 2006;114:337–345.

253.

Lieberman

Perkins

Belger

. The early stages of schizophrenia: speculations on pathogenesis, pathophysiology, and therapeutic approaches. Biol Psychiatry 2001; 50:884–897.

254.

Hafner

Loffler

Maurer

Hambrecht

an der Heiden

. Depression, negative symptoms, social stagnation and social decline in the early course of schizophrenia. Acta Psychiatr Scand 1999;100:105–118.

255.

Job

Whalley

Johnstone

Lawrie

. Grey matter changes over time in high risk subjects developing schizophrenia. Neuroimage 2005;25:1023–1030.

256.

Bilder

Reiter

Bates

. Cognitive development in schizophrenia: follow-back from the first episode. J Clin Exp Neuropsychol 2006;28:270–282.

257.

McGorry

Yung

Bechdolf

Amminger

. Back to the future: predicting and reshaping the course of psychotic disorder. Arch Gen Psychiatry 2008;65:25–27.

258.

Catts

. Identifying the young person at risk of psychosis. Med Today 2008;9:961–965.

259.

Phillips

. What do you do for a living? Toward a more succinct definition of health services research. BMC Health Serv Res 2006;6:117.

260.

Scott

Campbell

. Health services research: what is it and what does it offer? Intern Med J 2002;32:91–99.

261.

Scallet

Robinson

. Opportunities in mental health services research. Health Aff (Millwood) 1993;12:240–250.

262.

Wells

Miranda

Bruce

Alegria

Wallerstein

. Bridging community intervention and mental health services research. Am J Psychiatry 2004;161:955–963.

263.

Burns

Creed

Fahy

Thompson

Tyrer

, White I for the UK 700 Group. Intensive versus standard case management for severe psychotic illness: a randomised trial. Lancet 1999;354:1128.

264.

Marshall

Gray

Lockwood

Green

. Case management for people with severe mental disorders. Cochrane Database Syst Rev 1998; (2):CD000050.

265.

Burns

Creed

Fahy

. Intensive versus standard case management for severe psychotic illness: a randomised trial. Lancet 1999; 353 (9171):2185–2189.

266.

Wykes

Leese

Taylor

Phelan

. Effects of community services on disability and symptoms: PRiSM Psychosis Study 4. Br J Psychiatry 1998;173:385–390.

267.

McGovern

Owen

. Intensive case management for severe psychotic illness. Lancet 1999; 354 (9187):1384.

268.

Marshall

Bond

Stein

. PRiSM psychosis study: design limitations, questionable conclusions. Br J Psychiatry 1999;175:501–503.

269.

Slade

Kuipers

Priebe

. Mental health services research methodology. Int Rev Psychiatry 2002;14:12–18.

270.

Kazdin

. Almost clinically significant (p < .10): current measures may only approach clinical significance. Clin Psychol Sci Pract 2001;8:455–462.

271.

Scheirer

. A user's guide to program templates: a new tool for evaluating program content. San Francisco, CA: Jossey-Bass, 1996.

272.

Sechrest

Phillips

Redner

Yeaton

. Some neglected problems in evaluation research: strength and integrity of treatments. Beverly Hills, CA: Sage Publications, 1979.

273.

Yeaton

Sechrest

. Critical dimensions in the choice and maintenance of successful treatments: strength, integrity, and effectiveness. J Consult Clin Psychol 1981;49:156–167.

274.

Teague

Bond

Drake

. Program fidelity in assertive community treatment: development and use of a measure. Am J Orthopsychiatry 1998;68:216–232.

275.

McGrew

Bond

Dietzen

Salyers

. Measuring the fidelity of implementation of a mental health program model. J Consult Clin Psychol 1994;62:670–678.

276.

McGrew

Bond

. Critical ingredients of assertive community treatment: judgments of the experts. J Ment Health Adm 1995;22:113–125.

277.

Bond

Williams

Evans

Salyers

Kim

H-W

Sharpe

. Psychiatric rehabilitation fidelity toolkit. Cambridge, MA: Evaluation Centre, Human Services Research Institute, 2000.

278.

Teague

Drake

Ackerson

. Evaluating use of continuous treatment teams for persons with mental illness and substance abuse. Psychiatr Serv 1995;46:689–695.

279.

Mueser

Fox

Bond

Salyers

Yamamoto

Williams

. Dual-disorder Treatment Fidelity Scale. In: Mueser

Noordsy

Drake

Fox

, eds. Integrated treatment for dual disorder. New York: Guilford Press, 2003:337–354.

280.

Substance Abuse and Mental Health Services Administration. Co-occurring disorders: integrated dual disorders treatment. US Dept Health and Human Services, 2003. [cited 5/1/10.] Available from URL: http://mentalhealth.samhsa.gov/cmhs/communitysupport/toolkits/cooccurring

281.

McFarlane

Dixon

Lukens

Lucksted

. Family psych-oeducation and schizophrenia: a review of the literature. J Marital Fam Ther 2003;29:223–245.

282.

Madson

Campbell

. Measures of fidelity in motivational enhancement: a systematic review. J Subst Abuse Treat 2006;31:67–73.

283.

Brans

Suter

Leverentz-Brady

. Relations between program and system variables and fidelity to the wraparound process for children and families. Psychiatr Serv 2006;57:1586–1593.

284.

Nordness

Epstein

. Reliability of the Wraparound Observation Form-second version: an instrument designed to assess the fidelity of the wraparound approach. Ment Health Serv Res 2003;5:89–96.

285.

Walker

Bruns

. Building on practice-based evidence: using expert perspectives to define the wraparound process. Psychiatr Serv 2006;57:1579–1585.

286.

Marshall

Lockwood

Lewis

Fiander

. Essential elements of an early intervention service for psychosis: the opinions of expert clinicians. BMC Psychiatry 2004;4:17.

287.

McCrone

Knapp

. Economic evaluation of early intervention services. Br J Psychiatry 2007; 191:S19–S22.

288.

Nordentoft

Jeppesen

Petersen

. Opus project: a randomised, controlled trial of integrated psychiatric treatment in first episode psychosis. Schizophr Res 2003;60:297.

289.

Kuipers

Holloway

Rabe-Hesketh

Tennakoon

. An RCT of early intervention in psychosis: Croydon Outreach and Assertive Support Team (COAST). Soc Psychiatry Psychiatr Epidemiol 2004;39:358–363.

290.

Mihalopoulos

McGorry

Carter

. Is phase-specific, community-oriented treatment of early psychosis an economically viable method of improving outcome? Acta Psychiatr Scand 1999;100:47–55.

291.

Salvador-Carulla

Haro

Cabases

Madoz

Sacristan

Vazquez-Barquero

. Service utilization and costs of first-onset schizophrenia in two widely differing health service areas in North-East Spain. Acta Psychiatr Scand 1999;100:335–343.

292.

Valmaggia

McCrone

Knapp

. Economic impact of an early intervention in people at high risk of psychosis. Psychol Med 2009;39:1617–1626.

293.

Cullberg

Mattsson

Levander

. Treatment costs and clinical outcome for first episode schizophrenia patients: a 3-year follow-up of the Swedish ‘Parachute Project’ and Two Comparison Groups. Acta Psychiatr Scand 2006;114:274–281.

294.

Goldberg

Norman

Hoch

. Impact of a specialized early intervention service for psychotic disorders on patient characteristics, service use, and hospital costs in a defined catchment area. Can J Psychiatry 2006;51:895–903.

295.

Mihalopoulos

Harris

Henry

Harrigan

McGorry

. Is early intervention in psychosis cost-effective over the long term? Schizophr Bull 2009; 35:909–918.

296.

Access Economics. Cost effectiveness of early intervention for psychosis. Melbourne: Access Economics for ORYGEN Research Centre, 2008.

297.

Institute of Medicine of the National Academies. To err is human: building a safer health system. Washington, DC: National Academies Press, 2000.

298.

Institute of Medicine of the National Academies. Crossing the quality chasm: a new health system for the 21st century. Washington, DC: Institute of Medicine, 2002.

299.

Berwick

. A user's manual for the IOM's ‘quality chasm’ report. Health Aff (Millwood) 2002; 21 (3):80–90.

300.

World Health Organization. Quality improvement for mental health, Report No.: 9241545976. Geneva: World Health Organization, 2003.

301.

Institute of Medicine of the National Academies. Improving the quality of health care for mental and substance-use conditions. Washington, DC: National Academies Press, 2006.

302.

Skok

Swerissen

Macmillan

. Standards and quality improvement processes in health and community services: a review of the literature. Canberra: Commonwealth Department of Health and Aged Care, 2000.

303.

Lynn

Baily

Bottrell

. The ethics of using quality improvement methods in health care. Ann Intern Med 2007;146:666–673.

304.

NHMRC. When does quality assurance in health care require independent ethical review? Canberra: Commonwealth of Australia, 2003.

305.

Leyland

Goldstein

. Multilevel modelling of health statistics. Chi Chester: John Wiley and Sons, 2001.

306.

Rogosa

Brandt

Zimowski

. A growth curve approach to the measurement of change. Psychol Bull 1982;92:726–748.

307.

Glover

Kamis-Gould

. Performance indicators in mental health services. In: Thornicroft

Strathdee

, eds. Commissioning mental health services. London: HMSO, 1996:265–272.

308.

Arya

Callay

. Using continuous quality improvement to implement a clinical governance framework in a mental health service. Australas Psychiatry 2005;13:241–246.

309.

Tobin

Yeo

Chen

. The beginning of a structural reform: reorganizing the front line of a mental health system. Aust Health Rev 2000;23:64–76.

310.

Berk

Callaly

Hyland

. The evolution of clinical audit as a tool for quality improvement. J Eval Clin Pract 2003;9:251–257.

311.

McFarland

Harmann

Lhotak

Wieselthier

. The quest for TQM in a community mental health center: using the Bald-rige criteria as a framework. Jt Comm J Qual Improv 1996;22:37–47.

312.

Hermann

Regner

Erickson

Yang

. Developing a quality management system for behavioral health care: the Cambridge health alliance experience. Harv Rev Psychiatry 2000;8:251–260.

313.

Coop

. Balancing the balanced scorecard for a New Zealand mental health service. Aust Health Rev 2006;30:174.

314.

Tobin

Hickie

Yeo

Chen

. Discussing the impact of first onset psychosis programs on public sector health services. Australas Psychiatry 1998;6:181–182.

315.

Chong

Verma

Mythily

Poon

McGorry

. The Early Psychosis Intervention Programme in Singapore: a balanced scorecard approach to quality care. J Ment Health 2008;17:79–91.

316.

Addington

McKenzie

Addington

Patten

Smith

Adair

. Performance measures for early psychosis treatment services. Psychiatr Serv 2005;56:1570–1582.

317.

Chong

Lee

Bird

Verma

. A risk reduction approach for schizophrenia: the early psychosis intervention programme. Ann Acad Med Sing 2004;33:630–635.

318.

Catts

O'Donnell

Spencer

. Early psychosis intervention in routine service environments: implications for case management and service evaluation. In: Kashima

Falloon

IRH

Mizuno

Asai

, eds. Comprehensive treatment of schizophrenia. Tokyo: Springer-Verlag, 2002.

319.

Yung

Harris

. Management of early psychosis in a generic adult mental health service. Aust N Z J Psychiatry 2003;37:429–436.

320.

Turner

Smith-Hamel

Mulder

. Pathways to care in a New Zealand first-episode of psychosis cohort. Aust N Z J Psychiatry 2006; 40:421–128.

321.

Malla

Schmitz

Norman

. A multisite Canadian study of outcome of first-episode psychosis treated in publicly funded early intervention services. Can J Psychiatry 2007;52:563–571.

322.

O'Kearney

Garland

Welch

Kanowski

Fitzgerald

. Factors predicting program fidelity and delivery of an early intervention program for first episode psychosis in rural Australia. Aust e-J Adv Ment Health 2004; 3 (2): 1–9.

323.

Field

Lohr

. Clinical practice guidelines: directions for a new program. Washington, DC: Institute of Medicine, National Academy Press, 1990.

324.

Lehman

Steinwachs

. Patterns of usual care for schizophrenia: initial results from the Schizophrenia Patient Outcomes Research Team (PORT) client survey. Schizophr Bull 1998;24:11–20.

325.

Bauer

. A review of quantitative studies of adherence to mental health clinical practice guidelines. Harv Rev Psychiatry 2002;10:138–153.

326.

Gorrell

Cornish

Tennant

. Changes in early psychosis service provision: a file audit. Aust N Z J Psychiatry 2004;38:687–693.

327.

Hanson

Grypma

Tee

MacEwan

. Evaluation of a community mental health carepath for early psychosis. J Eval Clin Pract 2006;12:112–119.

328.

Reilly

Newton

Dowling

. Implementation of a first presentation psychosis clinical pathway in an area mental health service: the trials of a continuing quality improvement process. Australas Psychiatry 2007;15:14–18.

329.

Mental Health Research Network. A national evaluation of early intervention for psychosis services: DUP, service engagement and outcome. UK National Institute for Health Research, 2005. [cited 5 January 2010.] Available from URL: www.mhrn.info/index/portfolio/Studies/Psychosis/NationalEDEN.html

330.

Fisher

Theodore

Power

. Routine evaluation in first episode psychosis services: feasibility and results from the MiData Project. Soc Psychiatry Psychiatr Epidemiol 2008;43:960–967.

331.

Catts

Carr

O'Toole

. Variance components model predicting outcome in a multi-site, multilevel study of early psychosis treatment. Schizophr Bull 2007; 33:584–585.

332.

Haro

Edgell

Novick

. Effectiveness of antipsychotic treatment for schizophrenia: 6-month results of the Pan-European Schizophrenia Outpatient Health Outcomes (SOHO) study. Acta Psychiatr Scand 2005;111:220–231.

333.

Haro

Salvador-Carulla

. The SOHO (schizophrenia outpatient health outcome) study: implications for the treatment of schizophrenia. CNS Drugs 2006;20:293–301.

334.

Gasquet

Haro

Novick

Edgell

Kennedy

Lepine

. Pharmacological treatment and other predictors of treatment outcomes in previously untreated patients with schizophrenia: results from the European Schizophrenia Outpatient Health Outcomes (SOHO) study. Int Clin Psychopharmacol 2005;20:199–205.

335.

McGorry

Edwards

Mihalopolous

Harrigan

Jackson

. EPPIC: An evolving system of early detection and optimal management. Schizophr Bull 1996:305–326.

336.

Cassidy

Schmitz

Norman

Manchanda

Malla

. Long-term effects of a community intervention for early identification of first-episode psychosis. Acta Psychiatr Scand 2008; 117:440–148.

337.

Joa

Johannessen

Auestad

. The key to reducing duration of untreated first psychosis: information campaigns. Schizophr Bull 2008; 34:466–172.

338.

Larsen

McGlashan

Johannessen

. Shortened duration of untreated first episode of psychosis: changes in patient characteristics at treatment. Am J Psychiatry 2001;158:1917–1919.

339.

Rosenbaum

Valbak

Harder

. The Danish National Schizophrenia Project: prospective, comparative longitudinal treatment study of first-episode psychosis. Br J Psychiatry 2005;186:394–399.

340.

Melle

Larsen

Haahr

. Prevention of negative symptom psychopathologies in first-episode schizophrenia. Arch Gen Psychiatry 2008;65:634–640.

341.

Schultze-Lutter

Picker

Ruhrmann

Klosterkotter

. The Cologne early recognition and intervention center for mental crises (FETZ). Evaluation of service use. Med Klin 2008; 103 (2):81–89.

342.

Fossey

Harvey

McDermott

Davidson

. Understanding and evaluating qualitative research. Aust N Z J Psychiatry 2002;36:717–732.

343.

Judge

Estroff

Perkins

Penn

. Recognizing and responding to early psychosis: a qualitative analysis of individual narratives. Psychiatr Serv 2008;59:96–99.

344.

Larsen

Johannesen

McGlashan

Horneland

Mardal

Vaglum

. Can duration of untreated illness be reduced? In: Birchwood

Fowler

Jackson

, eds. Early intervention in psychosis. Chichester: John Wiley and Sons, 2000:143–165.

345.

Melle

Larsen

Haahr

. Reducing the duration of untreated first-episode psychosis: effects on clinical presentation. Arch Gen Psychiatry 2004;61:143–150.

346.

Johannessen

McGlashan

Larsen

. Early detection strategies for untreated first-episode psychosis. Schizophr Res 2001;51:39–46.

347.

Larsen

Melle

Auestad

. Early detection of first-episode psychosis: the effect on 1-year outcome. Schizophr Bull 2006;32:758–764.

348.

Larsen

Melle

Friis

. One-year effect of changing duration of untreated psychosis in a single catchment area. Br J Psychiatry 2007; 191:S128–S32.

349.

Maclure

. Taxonomic axes of epidemiologic study designs: a refutationist perspective. J Clin Epidemiol 1991;44:1045–1053.

350.

Garety

Craig

TKJ

Dunn

. Specialised care for early psychosis: symptoms, social functioning and patient satisfaction: randomised controlled trial. Br J Psychiatry 2006;188:37–45.

351.

Amminger

Schafer

Papageorgiou

. Relationship between erythrocyte membrane fatty acids and transition to psychosis in ultra-high risk individuals: basic research findings from a RCT. Schizophr Bull 2009;35:375.

352.

Berk

Copolov

Dean

. N-acetyl cysteine as a glutathione precursor for schizophrenia: a double-blind, randomized, placebo-controlled trial. Biol Psychiatry 2008;64:361–368.

353.

Patil

Zhang

Martenyi

. Activation of mGlu2/3 receptors as a new approach to treat schizophrenia: a randomized Phase 2 clinical trial. Nat Med 2007;13:1102–1107.

354.

Vonkorff

Nestadt

Romanoski

. Prevalence of treated and untreated DSM-III schizophrenia: results of a two-stage community survey. J Nerv Merit Dis 1985;173:577–581.

355.

Kessler

Berglund

Bruce

. The prevalence and correlates of untreated serious mental illness. Health Serv Res 2001;36:987–1007.

356.

Alvarez-Jimenez

Parker

Hetrick

McGorry

Gleeson

. Preventing the second episode: a systematic review and meta-analysis of psychosocial and pharmacological trials in first-episode psychosis. Schiz Bull, DOI: 10.1093/schbul/sbpl29

357.

Wright

McGorry

Harris

Jorm

Pennell

. Development and evaluation of a youth mental health community awareness campaign: the Compass Strategy. BMC Public Health 2006;6:215.

358.

International Early Psychosis Association Writing Group. International Clinical Practice Guidelines for early psychosis. Br J Psychiatry 2005; 187 (Suppl 48): 120–124.

359.

Truswell

. Levels and kinds of evidence for public-health nutrition. Lancet 2001; 357 (9262): 1061–1062.

360.

Naylor

. Grey zones of clinical practice: some limits to evidence-based medicine. Lancet 1995; 345 (8953):840–842.

361.

Evans

Catts

O'Toole

. Is a national framework for implementing early psychosis services necessary? Results of a survey of Australian mental health service directors. Early Interv Psychiatry (in press).

362.

Hay

Mulder

Boyce

. The scientific practitioner in psychiatry for the 21 st century. Australas Psychiatry 2003; 11:442–A45.

Appraising Evidence for Intervention Effectiveness in Early Psychosis: Conceptual Framework and Review of Evaluation Approaches

Abstract

Keywords

Methods

Results

Overview of evaluation approaches and an appraisal of the evidence for the effectiveness of EPI

Experimental (randomized controlled, or clinical, trials) study designs

Observational (non-randomized) study designs

Clinical epidemiological designs

Service evaluation studies

Quality assurance studies

Quasi-experimental comparisons of treatment groups

Discussion

Footnotes

Acknowledgements

References