Abstract
Aim:
The aim of this study was to develop a suspicion index that aids diagnosis of secondary schizophrenia spectrum disorders in regular clinical practice.
Method:
We used the Delphi method to rate and refine questionnaire items in consecutive rounds. Differences in mean expert responses for schizophrenia spectrum disorders and secondary schizophrenia spectrum disorders populations allowed to define low/middle/high predictive items, which received different weights. Algorithm performance was tested in 198 disease profiles by means of sensitivity and specificity.
Results:
Twelve experts completed the Delphi process, and consensus was reached in 19/24 (79.2%) items for schizophrenia spectrum disorders and 17/24 (70.8%) for secondary schizophrenia spectrum disorders. We assigned rounded values to each item category according to their predictive potential. A differential distribution of scores was observed between schizophrenia spectrum disorders and secondary schizophrenia spectrum disorders when applying the suspicion index for validation to 198 disease profiles. Sensitivity and specificity analyses allowed to set a >8/10/16 risk prediction score as a threshold to consider medium/high/very high suspicion of secondary schizophrenia spectrum disorders.
Conclusion:
Our final outcome was the Secondary Schizophrenia Suspicion Index, the first paper-based and reliable algorithm to discriminate secondary schizophrenia spectrum disorders from schizophrenia spectrum disorders with the potential to help improve the detection of secondary schizophrenia spectrum disorder cases in clinical practice.
Keywords
Introduction
Schizophrenia is a chronic psychiatric disorder with its typical onset during adolescence and early adulthood. This disorder commonly has a severe impact on social, academic, vocational and relational functioning (Owen et al., 2016). Worldwide, the consensus prevalence rate for schizophrenia is below 1% of the general population, but reaches 2–3% in more recent studies, including all types of psychotic illnesses (van Os et al., 2009).
Schizophrenia spectrum disorders (SSD) (Laurens et al., 2015; Spitzer et al., 1992) are often associated with numerous organic conditions of different aetiology including neurometabolic (Bonnot et al., 2014), endocrine (Kwon et al., 2009), immunologic (Leboyer et al., 2016), (para)neoplastic or epileptiform disorders (Meyer, 2009), or infections. The term ‘secondary’ refers to those psychiatric symptoms/disorders that can be explained by non-psychiatric disorders (Spitzer et al., 1992). Due to the number and heterogeneity of these associated diseases, the true prevalence of the so-called ‘secondary SSD’ is unknown but is probably underestimated, as observed in well-studied and narrow populations such as patients with early-onset schizophrenia (Giannitelli et al., 2018; Rapoport et al., 2012). Early diagnosis of secondary SSD conditions is crucial to prevent progressive neurodegeneration, particularly considering that several of these underlying disorders have a disease-specific therapy – particularly those of neurometabolic and endocrine origin (Kwon et al., 2009).
Patients with schizophrenia and other SSDs, as for many other psychiatric illnesses, neither seek nor obtain a level of health care and medical workup comparable to that of the general population (De Hert et al., 2007). Moreover, there is an approximately threefold higher mortality rate in people with schizophrenia patients compared with that of the general population, which can reduce life expectancy by 10–20 years (Gatov et al., 2017). Given their heterogeneous clinical presentation, the detection of primary disorders responsible for secondary schizophrenia often requires thorough and costly medical assessments, since a relative lack of medical review may result in a substantial diagnostic delay (Josephs et al., 2003).
Diagnostic algorithms can be useful tools to identify primary disorders, thereby promoting early referral for specialised care. Unfortunately, previous attempts to generate such an algorithm for secondary SSD failed to provide conclusive results (Barak et al., 2002; Cutting, 1987; Horiguchi et al., 2009; Johnstone et al., 1988). A common limitation of those studies was the limited sample size and heterogeneity, and the difficulty in ascribing causality. Therefore, most of the clinical features that are presumably associated with secondary SSD are largely empirical, such as concomitant cognitive impairment, visual hallucinations, catatonia or treatment resistance (Desai and Grossberg, 2010). We have previously defined a list of atypical psychiatric features that may be suggestive for secondary SSD (Bonnot et al., 2014, 2015), although it was limited by the lack of predictive validity to diagnose a primary pathological process.
To cover this unmet need, we aimed to generate a paper-based suspicion index (SI) with the potential to discriminate secondary SSD from SSD. Our goal was to develop a simple tool that can be efficiently and readily implemented in clinical practice. For this purpose, we integrated the Delphi method to obtain a reliable source of experts’ judgements (Hall et al., 2018), the rational development of the SI by exploiting Delphi results, and the validation and refinement of the SI through disease profiles. The Delphi method is particularly suitable when there is no large population data available, and when expert knowledge and experience is a crucial contribution (Hall et al., 2018).
Materials and methods
Questionnaire development and panel constitution
The two main methodological issues of the Delphi process are the design of a clinically relevant questionnaire and the selection of the panel of experts. The source material in this study was a 22-item questionnaire developed by our research group based on specific clinical features of secondary SSD (Bonnot et al., 2014, 2015).
The panel group comprised 12 experts selected by their demonstrated expertise in the field of secondary SSD, with extensive clinical practice experience and participation in scientific publications in this narrow field. To prevent bias, the expert panel did not include any participant directly involved in the development of the questionnaire. Experts were invited by e-mail through a cover letter explaining the purpose of the study and practical instructions and procedures of the Delphi process. A helpdesk line was available for consultation. Questionnaire responses were kept blind from each other to ensure anonymity and avoid response bias.
Delphi process
The Delphi process consists of iterative questionnaire rounds where the experts can rate and refine each questionnaire item. The panel of experts was requested to allocate, for each questionnaire item, the frequency of four categories (absent, slightly present, evident or prominent) based on 10 representative disease profiles with SSD and 10 with secondary SSD. Mean scores were calculated for each item by considering all expert responses. Then, they were circulated anonymously between the experts in consecutive Delphi rounds to allow for re-defining scores or including amendments.
To assess the questionnaire items that reached consensus, we transformed the four levels of response for each item obtained from 10 representative disease profiles into a weighted average score from 1 to 4 (1 = absent, 2 = slightly present, 3 = evident, 4 = prominent), by applying the following equation:
Where n is the number of representative profiles distributed in A = absent, S = slightly present, E = evident and P = prominent categories. The weighted average score for each expert is shown in Figure 1(A) and (B).

Expert consensus for each item of the questionnaire after three rounds of Delphi for SSD and secondary SSD profiles. Dots represent the expert weighted average score after three rounds of Delphi. Category frequencies for each item were transformed into a 1–4 score (1 = absent, 2 = slightly present; 3 = evident; 4 = prominent) for (A) SSD and (B) secondary SSD.
Following literature suggestions (Bandelow and Meier, 2003; Hasson et al., 2000; Milholland et al., 2010), we established consensus when >75% (9/12) of expert weighted scores clustered within the global weighted average score (scale 1–4) ±0.5 points, which represents a 33% of the scale.
SI development
The objective of this step was to obtain a simplified scoring system by assigning specific weights to each item according to its discriminatory potential. We explored four algorithms to exploit differences between mean SSD and secondary SSD scores: (1) absolute differences, (2) linearised differences, (3) ratios and (4) integer weights. In the absolute differences algorithm, differences between SSD and secondary SSD mean scores determined the weight of each item category. The linearised differences algorithm assigned weights based on the linear regression of absolute differences between SSD and secondary SSD mean scores to smoothen the pattern. In the ratio algorithm, weights resulted by dividing secondary SSD and SSD mean scores. In the integer weights algorithm, we assigned rounded scores to each category by considering score intervals obtained from the ratio algorithm.
SI validation with disease profiles
We applied each diagnostic algorithm to 198 disease profiles with SSD (n = 103) and secondary SSD (n = 95) for validation. Experts had to determine in their patient lists two populations: (1) SSD profile, patients with no known associated disorders or no argument for it and (2) secondary SSD profile, patients with known documented associated disorders. Algorithm performance was evaluated via sensitivity and specificity analysis across different thresholds.
Statistical analyses
Descriptive statistics were used to monitor the initial average and variations over consecutive Delphi rounds. Algorithm performance comparisons were carried out via sensitivity and specificity analysis, which were plotted vs risk prediction scores (RPS) to obtain the threshold that defines the best trade-off between sensitivity and specificity. If no response was obtained, scores from the second round were used.
Results
Delphi process
A panel of 15 international experts was initially proposed and 12 were finally recruited (11 psychiatrists and 1 psychologist), representing 6 different countries and 3 continents (Europe, Australia and South America). Between September 2017 and June 2018, the panel of experts completed at least two Delphi rounds of questionnaire revision (12 experts in the first round, 11 experts in the second round and 7 experts in the third round).
In the second and third rounds, experts were able to revisit their scorings in front of the results from the global average of responses (Table 1 and Supplemental Figure S1). After the first Delphi round, two experts suggested refinement of some questionnaire items, including re-phrasing questions 4 and 7, and adding two new questions (items 23 and 24). New items included were ‘Autonomic dysfunction, especially for auto-antibody encephalitis (item 23)’ and ‘EEG measures, especially slow waves theta-delta temporal or frontal before the beginning of antipsychotic treatment’ (item 24) (Clancy et al., 2014; Kayser and Dalmau, 2016; Leboyer et al., 2016).
Mean expert’s scores in the consecutive Delphi rounds.
SSD: schizophrenia spectrum disorders; MRI: magnetic resonance imaging; EEG: electroencephalogram.
The left panel shows the final questionnaire survey used in the Delphi rounds comprising 24 items (adapted from Bonnot et al., 2014, 2015). The right panel shows mean expert scores for each Delphi round in the following order: third, second and first round. For each item, experts distributed 10 representative disease profiles for SSD and 10 for secondary SSD according to 4 categories (absent, slightly present, evident and prominent). Absent indicates that the symptom is not clinically present in the patient symptomatology. Slightly present denotes a fluctuation of symptoms and/or an unclear or subtle presence, with other symptoms being more prominent in the patient. Evident is indicative of a symptom whose presence is clearly established (concomitantly with other symptoms) although it can fluctuate without completely disappearing, but it does not dominate the clinical picture. Prominent characterises a symptom with a stable presence, dominating the clinical picture, and severely interfering in daily life functions.
Items 23 and 24 were added during the Delphi Process following expert suggestions.Results from round 1 are in bold.
Figure 1 shows the weighted average scores for each expert response in the third Delphi round, and boxes delimit the responses that reached consensus. According to the criteria described in Methods, we reached overall consensus (SSD and secondary SSD) in 15 out of 24 items (62.5%) (Figure 1(A) and (B)). For SSD, 19 out of 24 (79.2%) questions were agreed by the experts (Figure 1(A)). For secondary SSD, individual weighted responses generally clustered around the global weighted average (17/24, 70.8%) (Figure 1(B)).
SI development
We observed a differential distribution of item categories between the responses relative to profiles with SSD and secondary SSD, which defined the discriminatory potential of each question. Most of the questions (21/24) were categorised as absent/slightly present in the SSD and as evident/prominent in secondary SSD (except for questions 9, 19 and 20, showing an inverted pattern) (Supplemental Figure S1).
In the integer weights algorithm (based on intervals from the ratio algorithm), we assigned rounded values to each item category to obtain an algorithm that allows paper-based calculation with no need for computation. To simplify the calculation, three levels of weights were established for low/middle/good discriminatory questions. We assigned 0-0-1-2 scores in the integer weights algorithm when the average ratio of prominent and evident categories ranged from 1.5 to 4.5 points, 0-0-2-4 when ranged from 4.5 to 9 points, and 0-0-4-8 when ⩾9 points (see Supplemental Table S1). For those questions with an inverted pattern (questions 9, 19 and 20), the same weights were assigned but in inverted order (e.g. 2-1-0-0, 4-1-0-0).
Additional modifications were done: slightly present was always given 0 points, as it did not show remarkable differences between groups (Supplemental Figure S1). Intellectual disability (6) was discarded. Both items related to progressive cognitive decline (4 and 5) were fused into one and, similarly, psychiatric disease in first- and second-degree relatives (19 and 20) were combined as each one alone did not show substantial differences between SSD and secondary SSD. The resulting matrix of weights is shown in Supplemental Table S1.
SI validation with disease profiles
We assessed the algorithm performance in discriminating SSD and secondary SSD via sensitivity and specificity analyses. For this purpose, the algorithms were applied to 198 disease profiles (103 with SSD and 95 secondary SSD) provided by the panel of experts as part of this study, and total scores were calculated (Figures 2 and 3, Supplemental Table S2).

Algorithm performance (sensitivity and specificity) for secondary SSD vs SSD profiles. (a) In the ratio algorithm, weights resulted by dividing secondary SSD and SSD mean scores. (b) In the integer weights algorithm, we assigned rounded scores to each category by considering score intervals obtained from the ratio algorithm.

Distribution of scores in the Secondary Schizophrenia Suspicion Index (SS-SI) algorithm across SSD and secondary SSD profiles. (A, B) Distribution and frequency distribution of SS-SI risk prediction scores across SSD and secondary SSD profiles. (B) Normal distribution fitting curve obtained from the histogram showing the overlapping areas between both SSD and secondary SSD profiles. (C) Level of suspicion for each cut-off score in the SS-SI.
Similar sensitivity and specificity curves were observed for the different algorithms, with an intersection at around 0.85. Compared with the integer weights algorithm, we observed a slightly higher sensitivity and specificity with the ratio algorithm (0.863 and 0.893, respectively) (Figure 2). However, since we aimed to develop a paper-based SI and, given the similar algorithm performance observed, we decided to balance calculation simplicity with enough discrimination power and selected the integer weights algorithm for further analyses (henceforth termed Secondary Schizophrenia Suspicion Index [SS-SI]).
When applied to disease profiles, the scores obtained with the SS-SI algorithm showed a clear differential distribution between SSD and secondary SSD. SSD disease profiles (n = 103) yielded a mean RPS of 4.5 ± 3.67 (95% confidence interval [CI]: [3.78, 5.21]) points whereas the average score in secondary SSD profiles was remarkably higher (17.0 ± 8.46; 95% CI: [15.29, 18.73]) (p-value < 0.0001; two-tailed t-test) (Figure 3(A)). Similarly, most of the SSD profiles scored within 0–8 points, whereas secondary SSD mainly scored ⩾8 points (Figure 3(B)). Considering these results, a threshold greater than 5/8/10/16 was considered to have a low/medium/high/very high suspicion level of secondary SSD (Figure 3(C)). The proposed SS-SI is presented in Figure 4, together with the associated integer scores for each item category.

Secondary Schizophrenia Suspicion Index (SS-SI) tool.
Discussion
The main outcome of this study was the development of an SI, comprising 21 questions, which can discriminate secondary from ‘primary’ SSD with considerable sensitivity and specificity. The Secondary Schizophrenia SI (SS-SI) was the result of a stepwise approach involving expert judgement through the Delphi method, methodological development of the SI and validation through disease profiles. Although different SIs are available for other neurological disorders (Wijburg et al., 2012), this is, to date, the first specific SI for SSD.
The exclusion of a ‘primary’ organic illness in patients with SSD is a common dilemma in psychiatric practice. Mental disorders have been traditionally subdivided into ‘organic’ and ‘functional’ (Hyde and Ron, 2011). However, the suitability of the term ‘organic’ has generated significant controversy in the field. In 1992, Spitzer et al. (1992) proposed the term ‘secondary disorders’ to avoid the connotative problems associated with ‘organic mental disorders’, since they considered that psychiatric disorders were, like non-psychiatric disorders, organic. The term ‘secondary schizophrenia(s)’ has been used extensively since then. One of the required criteria to define schizophrenia in Diagnostic and Statistical Manual of Mental Disorders (5th ed.; DSM-5) is the absence of ‘physiological effects of a substance [...] or another medical condition’ that could explain the disorders (American Psychiatric Association, 2013). To identify schizophrenia symptoms that can be explained by another medical condition, the DSM used the term ‘Psychotic Disorder Due to Another Medical Condition’ (code 293.XX). Despite the current controversy, we used the term ‘secondary SSD’ in this study as no term is completely consensual among the experts.
Clinical research in the field of secondary SSD presents boundaries that, although shifting, limit the availability of evidence-based recommendations. Thus, recommendations based on expert judgement are promising alternatives with growing interest. In the Delphi process, different methodological issues are critical to prevent bias such as decentralisation (through anonymous votes and controlled feedback) and aggregation (with the statistical collation and circulation of results to redefine previous statements) (Bandelow and Meier, 2003; Hasson et al., 2000; Milholland et al., 2010). The size of the expert panel is also an important factor and, although there is no categorical definition of the appropriate number of participants (Akins et al., 2005; Sinead et al., 2006), in most studies between 15 and 20 panellists are involved (Boulkedid et al., 2011; Linstone and Turoff, 1975). Our panel size may thus be a limitation that likely reflects the limited number of experts in this area (Bandelow and Meier, 2003).
Another methodological issue of the Delphi process is how to define consensus. In general, it is suggested to carry out repeated Delphi rounds considering the balance between achieving more consensus against losing experts’ interest and/or not obtaining relevant changes between iterations (Hall et al., 2018; Jorm, 2015). However, there is no unequivocal definition of consensus (McMillan et al., 2016; Sinead et al., 2006). Previous studies rated the responses using a Likert-type scale and defined consensus when most of the experts (>70%) scored around 30% of the scale length (Bandelow and Meier, 2003; Hall et al., 2018; Hasson et al., 2000; Milholland et al., 2010). In our study, experts distributed 10 representative disease profiles with SSD and 10 with secondary SSD across 4 categories of response, thus allowing a deeper degree of reality than that obtained with bimodal responses. We decided to ask the experts about representative disease profiles rather than actual patients to prevent the inherent bias due to sample size and obtain a more representative sample. Despite this, we reached consensus in only 15 out of 24 questions after three Delphi rounds, suggesting that even in this narrow clinical field there remains a considerable variability among experts in the conceptual understanding of secondary SSD.
We developed the SI by exploiting the differential pattern of response observed between SSD and secondary SSD. Most of the questions were rated as absent/slightly present in the SSD and as evident/prominent in secondary SSD, reinforcing the suitability of the items included in the questionnaire. Given the similar diagnostic performance obtained with the different system of weights explored in this study and, considering that a simple tool is much more likely to be used by clinicians for the sake of practicality in the daily routine, the final SS-SI was based on the integer weights algorithm.
Performance better than 86% was not achievable with the final SS-SI algorithm, probably because some disease profiles showed inverted patterns, which impeded the algorithm from discriminating based only on the 21 items. Contrary to our expectation, the presence of psychiatric symptoms in medical history and psychiatric diseases in relatives (first and second degree) might be less frequent in patients with secondary SSD compared to SSD ones. Whereas these are common findings in primary SSD (Gejman et al., 2010; Owen et al., 2016; van Os et al., 2009), little evidence exists of positive family history in secondary SSD cases (Cutting, 1987; Horiguchi et al., 2009; Johnstone et al., 1988; Meyer, 2009). These results could be explained considering that most secondary SSD patients have an autosomal recessive background (compound heterozygous patients) and carriers often do not have an increased risk for developing psychiatric symptoms as a single mutation do not induce any effect. In contrast, as schizophrenia is a multi-genetic disease, it is more likely that relatives share some relevant mutations and develop some symptoms as well.
This SI is the first-in-class in the field of secondary SSD and may help detect primary organic disorders in universal or selected psychosis populations and increase knowledge in this field (and thus the capacity to detect). Suspicion generated by the SI can thus guide the clinician towards further investigative processes or referral pathways to confirm a secondary SSD diagnosis, ensuring appropriate care and rational use of serological, neuroimaging or other assessments. The SI may assist clinicians in determining whether to arrange further examinations and investigations. Indeed, schizophrenia is a very common disorder compared to the probability of associated organic diseases. The validity of this SI could be further extended by examining its performance in selected or indicated populations, such as first-episode psychosis groups, or in suspected secondary SSDs, which may be variably enriched with organic disease, alongside metabolic, immunologic (in particular auto-immune encephalitis; Ellul et al., 2020), genetic (DNA chips) and endocrine parameters. The recently highlighted ‘autoimmune psychosis’ clinical group, for example, shares in common a number of the ‘red flags’ that are highlighted in our SI (Pollak et al., 2020). Additionally, medico-economic studies associated with the detection of secondary SSDs, and how definitive treatment affects patient outcome, are necessary.
The main limitations of this study are the reduced panel size and expert dropouts across the Delphi rounds, the lack of consensus for a considerable number of questions, the limited number of disease profiles assessed and the fact that they were provided by the panel of experts involved in the Delphi process. All these limitations further reflect the heterogeneity and lack of established diagnostic criteria for secondary SSD and reinforce the importance of our study. Although future prospective studies are required to validate and refine this tool, this first-in-class SI covers an unmet need and could have promising applications in the diagnosis of secondary SSD.
Conclusion
We developed the first SI for secondary SSD (SS-SI), a simple tool based on empirically derived items and refined by clinician input. The applicability of this diagnostic algorithm should be validated by involving a broader expert group and much larger patient populations and in future prospective studies.
Supplemental Material
sj-docx-1-anp-10.1177_00048674211025715 – Supplemental material for Development of a suspicion index for secondary schizophrenia using the Delphi method
Supplemental material, sj-docx-1-anp-10.1177_00048674211025715 for Development of a suspicion index for secondary schizophrenia using the Delphi method by Olivier Bonnot, Jose Luis Insua, Mark Walterfang, Juan Vincente Torres and Stefan Armin Kolb in Australian & New Zealand Journal of Psychiatry
Footnotes
Acknowledgements
The following investigators are members of the secondary SSD group and participated in the Delphi process: David Cohen (Institut des Systèmes Intelligents, Université Pierre et Marie CURIE); Marie Raffin (CHU Pitié-Salpêtrière, Paris); Vladimir Ferrafiat (CHU Charles Nicolle, ROUEN, France); Caroline Demily (Association Francophone of Remediation Cognitive, The Hospital Vinatier); Mirkovic Bojan (Psychiatre-Pédopsychiatre Attaché à la Pitié-Salpêtrière, Paris); Mario Speranza (Université de Versailles Saint-Quentin); Alexandre Ferreira Bello (Inborn errors of metabolism service [PEIMP] – psychiatry unit of São José dos Pinhais, Brasil); Paulo Andre Pera Grabowski (DPMASP [early dementia, mania and psychotic syndromes] + PEIMP [programme for Inborn errors of Metabolism in adult Psychiatry, Brasil]); Alexandre Paim Diaz (Universidade do Sul de Santa Catarina, Brasil); Mark Walterfang (Royal Melbourne Hospital; Australia Psychiatry, neuropsychiatry, schizophrenia, neurodegenerative disorders, white matter disease [NP-C disease]); Alkomiet Hasan (Facharzt für Psychiatrie und Psychotherapie, München, Germany); Maria Fernanda Verdaguer (Hospital Britanico de Buenos Aires, Argentina); Francesca Nardecchia (Sapienza University of Rome–Child and Adolescent Neuropsychiatry); Arianna Terrinoni (Sapienza University of Rome – Child and Adolescent Neuropsychiatry). Carla Granados of Trialance SCCL provided medical writing assistance.
Author Contributions
O.B., S.A.K., J.L.I., J.V.T. and M.W. contributed to the design of the study. J.L.I. and J.V.T. were involved in statistics. O.B. and S.A.K. worked on the first draft of the MS, which was extended and edited by M.W., O.B. and S.A.K.
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: M.W. and O.B. have served on advisory boards and received research funding from Actelion Pharmaceuticals and Vtesse Pharmaceuticals with no conflict of interest in the present work. Other authors have no disclosure other than their own affiliation (Industry, Actelion Pharma, for Dr Stefan Kolb).
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Syntax for Science SL received financial support from Actelion to perform statistical analyses. Medical writing services were funded by Syntax for Science SL.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
