Abstract
In psychiatric epidemiology, mania is commonly assessed via personal interviews with structured diagnostic instruments conducted by ‘lay’ personnel (i.e. individuals with some relevant training but who are not experienced clinicians). The validity of this approach has rarely been studied and some data suggest that it may be poor. In a small clinical reappraisal study, Kessler and coworkers [1] found that 70.8% of community respondents classified as bipolar type-I in the National Comorbidity Survey (NCS) [2] were not confirmed when reinterviews were carried out by trained clinicians using the Structured Clinical Interview for DSM-III-R (SCID) [3].
In the present report, we investigate the accuracy of the diagnosis of a lifetime manic episode from the telephone screening of a national survey of Swedish twins, the Screening Across the Lifespan Twin (SALT) study. In addition to evaluating the validity of lifetime prevalence estimates of mania in this sample of the general population, we sought to evaluate the performance of different diagnostic algorithms.
The Swedish Twin Registry (STR) is a longitudinal database as well as the largest twin registry in the world. It has been the source of a large number of epidemiological, genetic epidemiological, and molecular genetic studies. Its structure has been described in detail elsewhere: the registry is composed by three cohorts including all twins, respectively, born 1886–1925 (‘Old cohort’), 1926–1958 (‘Middle cohort’) and 1959–1990 [4], [5].
This report has four goals: (i) to assess participation bias, with relation to hospitalization for a psychiatric condition that included a manic episode, by comparing participants and non-participants in the SALT survey; (ii) to measure the concordance between lifetime mania, as assessed by telephone interview, with hospitalization data (i.e. postdictive criterion validity analysis) [6]; (iii) to understand which individual characteristics (e.g. age and gender) predict agreement; and (iv) to estimate prevalence and incidence of hospitalization history for a psychiatric condition consistent with the presence of a manic episode, and the lifetime prevalence of mania at the interview in this nationally comprehensive twin sample.
Method
Sample, databases, and criterion assessment
All twins born in Sweden 1958 and earlier (the ‘Old and Middle cohorts’ of the STR) who were still alive at the beginning of 1998 were invited to participate in a computer-assisted telephone interview. Psychiatric conditions screened for include lifetime manic and depressive episodes. The criterion to which we compared diagnoses derived from the telephone interview was inpatient hospitalization for a psychiatric condition that included a manic episode. We linked, by means of personal identification numbers, all twins in the SALT target sample (regardless of participation in SALT) to hospitalization records derived either from the Swedish National Psychiatric Registry (years 1969–1986) or the National Inpatient Registry (years 1983-present): the 4-year overlap is due to a transition period in which the hospitalization data changed registry of destination following a regional pattern. These two registries are essentially a complete listing of all inpatient admissions in Sweden and have been the source of numerous publications on the epidemiology of psychiatric disorders [7], [8]. The criterion was considered present if the hospital discharge diagnoses included a standardized code consistent with the presence of a manic episode. During the years included in this study, for bipolar disorder (BD), these codes included ICD-8 (296.1, 296.2, and 296.3), ICD-9 (296.0, 296.1, 296.4-296.7), and ICD-10 (F30.0, 30.2 and 30.9 and F31.0-31.9). As schizoaffective disorder (SAD) can also include a manic episode, we included the relevant ICD-8/ICD-9 (295.7), and ICD-10 codes (F25.0) [9], [10]. The data collection procedures were reviewed and approved by the Swedish Data Inspection Board and the Regional Ethics Committee of the Karolinska Institutet. All subjects provided verbal informed consent during the telephone interview which was later confirmed by postcard.
Diagnostic assessment
The SALT interview was constructed in a branching format that allowed skipping follow-up questions whenever a person answered negatively to key introductory items. Lay interviewers, appropriately trained and possessing adequate medical background, assessed the presence or absence of manic symptoms. The eight questions used to pose a diagnosis of mania over the lifetime are based on the ICD-10 and the DSM-IV criteria of manic episode [10], [11] and were adapted from the Composite International Diagnostic Interview (CIDI) [4], [12]. The SALT interview, akin to the CIDI, has a stem question based on the DSM-IV ‘A’ criterion for mania: ‘Has there ever been a period of 1 week or more when you were so happy or excited or high that other people did not think you were your usual self?’ Respondents who answered positively were asked seven additional questions and those who answered negatively skipped to the next section. The other seven diagnostic items cover the DSM-IV ‘B’ criterion for a manic episode (excessive self-confidence, reduced need for sleep, increased talkativeness, flight of ideas, distractibility, augmented activity, and involvement in pleasurable/risky behaviours). The presence of mixed (‘C’ criterion) or psychotic features, impairment or hospitalization as consequences of the episode (‘D’), and the presence of concomitant medical conditions or treatments (‘E’) were not assessed.
A diagnostic algorithm with eight cut-points for the definition of mania was tested with the number of items ranging from one (criterion ‘A’ stem question only positive) to eight (criterion ‘A’ plus all seven items of the ‘B’ criterion positive). The algorithm requiring at least three positive ‘B’ items had a DSM-IV-like structure.
Analytic procedures
Characteristics of participants and non-participants were compared using simple statistics and logistic regression [13].
We conducted two separate receiver operating characteristic (ROC) analysis, one for the primary criterion ‘≥ 1 hospitalization for BD or SAD’ and another for the secondary criterion ‘≥ 2 hospitalizations’, to assess specificity and sensitivity at each of eight cut-points [14]. Positive and negative predictive values [15] and κ statistics [16] were calculated.
Characteristics of true and false positives were then compared fitting a logistic regression model to the data [13], [17]. Models took into account the twin-structure of data and correlation of measures among family members. Discrimination of the final logistic models was tested through C-statistics, and calibration of the models through Hosmer-Lemeshow χ2 goodness-of-fit statistics [13]. Lifetime prevalences were computed for individuals hospitalized with a discharge diagnosis consistent with the presence of a manic episode, as well as for those diagnosed with a lifetime manic episode through the diagnostic algorithm. Using first admission dates for BD or SAD, we calculated incidence rates (IR) of first hospitalization and smoothed hazard estimate curves. All analyses were performed using the Intercooled Stata 8.2 software package [18].
Results
60 236 twins were eligible for participation and 41 838 individuals (69.46%) had completed at least part of the SALT interview. As shown in Table 1, participants in the study were more likely than-non-participants to be female, to be a member of a monozygotic twin pair, to be like-sexed when dizygotic twins, and not to have had a history of hospitalization with a discharge diagnosis consistent with the presence of a manic episode. These results did not change considerably when each association was adjusted in a multivariate fashion for the other measured covariates.
Characteristics of 60 236 twins who did and did not participate in the SALT study
†Expressed as difference of mean values (95% CI)
Characteristics of twins negative and positive for a lifetime manic episode (DSM-IV-like algorithm) are reported in Table 2. Twins positive for mania tended to be male, younger, unmarried and with lower self-reported health, and were about six times more likely to be positive for a depressive lifetime episode at the interview.
Characteristics of 41 838 participating twins positive or negative for lifetime mania according to the SALT telephone interview (DSM-IV-like algorithm)
†Expressed as difference of mean values (95% CI); ‡assessed at the SALT interview with a DSM-IV-like algorithm
Of twins who endorsed both the mania and the depression questions, about 43% of those with a diagnosis of manic episode (1.10% of respondents) were negative for lifetime depression: history of hospitalization among them was only 3.20%, against 10.00% for those positive to both mania and depression at the interview. Among 63 individuals who reported being on medications for mania, only 24 (38.10%) had a history of hospitalization for BD or SAD.
To test validity of lifetime prevalence estimates of mania via telephone interview we used two criteria. The first required ≥1 hospitalization with a discharge diagnosis consistent with the presence of a manic episode and the second required ≥2 hospitalizations. We then conducted ROC analysis using the criterion of ‘at least one hospitalization’ for eight cut-points, from the least (criterion ‘A’ only) to the most restrictive (criterion ‘A’ plus all seven positive ‘B’ items). The most accurate overall cut-point (i.e. closest to the upper-left corner in the graph of the ROC curve) was the least restrictive (criterion ‘A’ only) (39.00% and 96.61%) while the DSM-IV-like cut-point showed a lower level of total accuracy (36.50% and 97.55%), but was more specific. The area under the curve (AUC) was 0.68.
The prevalence of twins having at least one lifetime hospitalization for BD or SAD is 0.54% among those who took part in the SALT study and 0.70% overall, while 3.57% is the prevalence of those who responded positively to the stem question: this cut-point showed a negative predictive power (NPP) of 99.68% and a positive predictive power (PPP) of 5.49%, indicating that only 5–6 subjects out of 100 who are diagnosed using the stem question alone were ever hospitalized. The κ statistic was 0.09. The DSM-IV-like cut-point had a marginally better PPP (6.99%) and an equivalent very high NPP (99.67%), while κ was 0.11.
Another ROC curve was computed to test the criterion of ‘≥ 2 hospitalizations.’ As expected, the sensitivity of the algorithm increased and its better performance is mirrored in an AUC of 0.76. The most accurate cut-point was again the first, requiring just the stem question.
Adopting a more selective criterion, the prevalence of twins with a history of hospitalization decreased to 0.27% among SALT participants, and to 0.39% for all subjects in the STR. The first cut-point had a NPP of 99.88% and a PPP of 3.94%: only 4 out of 100 who were considered positive by virtue of the answers to the telephone interview had been hospitalized at least twice for a condition consistent with the presence of a manic episode. κ for this comparison was 0.07.
Predictors of agreement between mania at the interview (DSM-IV-like algorithm) and history of hospitalization (at least one) were identified comparing true versus false positives in a logistic regression model: lifetime depression at the interview (adjusted odds ratio [OR] = 3.23), absence of a spouse or partner (1.92), low self-reported health (1.91), female sex (1.41) and older age at interview (1.30, for each 10-year increase in age) predicted convergent diagnoses. Variables tested but not included in the final model comprise zygosity, housing, tobacco, coffee and alcohol consumption. Model discrimination between true and false positives was acceptable (C-statistics = 0.73) and the Hosmer-Lemeshow χ2 goodness-of-fit statistics, that there is no difference between the observed and the predicted values, was not rejected (H-L χ2 = 5.83, p = 0.67) [13].
History of hospitalization for BD or SAD was present for 0.70% of all living twins, while lifetime prevalence of mania at the interview (DSM-IV-like algorithm) was 2.63%.
The incidence rate (IR) for first hospitalization for BD or SAD was 2.08/10 000 year –1 (95% CI = 1.88–2.29) over a 32-year period (1969–2000). IR for males was 1.39/10 000 year –1 (95% CI = 1.17–1.66), and almost twice as high for females: 2.69/10 000 year –1 (95% CI = 2.39–3.03).
Discussion
Validity and predictors of agreement
We compared two possible definitions of the gold standard or criterion, against eight possible definitions of mania based on different cut-offs at the interview, for a total 16 possible combinations. As shown in Figure 1 and Table 3, the validation analysis encompassed sensitivity and specificity, negative and positive predictive values and κ statistics, allowing an appreciation of the dynamics of the assessment of mania in a community sample.
Prevalence of positive items for mania. Sensitivity, specificity, negative and positive predictive power at each cut-point for two different definitions of the criterion. A vertical line is drawn at the DSM-IV-like cut-point.
Prevalence, ROC analysis, NPP, PPP and? statistics comparing the eight cut-points and the two criteria of ‘at least one’ (primary) and ‘at least two’ (secondary) lifetime hospitalizations for mania†
†Values expressed as percentage (95% CI); κstatistics (SE); ROC, receiver operating characteristic; NPP, negative predictive power; PPP, positive predictive power
With the ROC analysis we have looked at all the possible combinations between the two most plausible inpatient criteria and the eight cut-points corresponding to the number of positive items. In both ROC curves, using ‘≥ 1’ or ‘≥ 2’ hospitalizations, the most accurate overall cut-point was just with a positive A-criterion. Because of low PPP and high false positive rate, the corresponding interview-based algorithm would not be optimal for future studies of mania that may require more specific diagnoses. In such a case a diagnostic procedure with positive ‘B’ items is warranted, even if it implies reducing the number of eligible subjects for the study.
Using the second criterion (‘≥ 2 hospitalizations’) means reducing criterion prevalence and the test appears to have a better sensitivity (19]: likely, hospitalization is a specific but not highly sensitive measure, probably fairly sensitive for BD type-I (i.e. mania) but less for type-II (i.e. hypomania). The second criterion, being more restrictive, is more specific but less sensitive than the first criterion for ‘true’ (i.e. clinically diagnosable) mania. Indeed, if we consider the triple truth-criterion-test relation, there is a trade off between a more sensitive truth-criterion relation and a more sensitive criterion-test relation.
Reasons for poor agreement between the survey based diagnosis of mania and the criterion can be identified in low prevalence or base-rate of the illness [15] and of the corresponding hospitalization criterion, further decreased in typical survey samples by non-response bias, since subjects with a history of BD or SAD are less likely to participate to a survey. Additional explanations for the difficulty to diagnose mania accurately in the community are employment of lay interviewers, inadequate patient recollection, illness denial and poor insight: the structured nature of interviews performed by nonclinicians does not allow further expert exploration of patient answers.
In our study PPP for DSM-IV mania diagnosis (7.0%) was much lower than in the NCS study (29.2%) [1]: reasons can be identified in criterion choice, interview modality and sample size. On one hand, in the small NCS clinical reappraisal study the criterion used was the gold standard of a clinically diagnosed condition; also, a more sophisticated approach to question formulation and interpretation at the interview was used [20], while we employed a basic set of CIDI questions [4]. In addition the study by Kessler et al. like previous surveys of mania [21], adopted face-to-face interviews [2], while our data are based on telephone interviews, allowing cost-efficient screening of a very large number of individuals. It has been shown that agreement of telephone and face-to-face interviews for assessment of major mood disorders is good, and minor disagreements are counterbalanced by economic and logistic advantages [22]. On the other hand, the NCS small clinical reappraisal sample size allows imprecise estimates due to sampling variability, while our STR based estimates are calculated on a very large study population. Cross-national differences in prevalence of lifetime mania between the Swedish and the U.S. populations could in part account for different results.
Predictors of false positive status for lifetime mania include characteristics that can be seen related with wellbeing or enhanced psychomotor activity such as no lifetime depression at the interview, presence of a spouse or partner, high self-reported health, male gender, and younger age. Subjects who are positive to lifetime mania but not to lifetime depression at the interview could include a few cases with a history of unipolar mania [23].
Prevalence and incidence
Lifetime prevalence of DSM-IV mania diagnosed via telephone interview was 2.63%. Because of non-response, this estimate is calculated on the population of participants, which has 0.54% prevalence of hospitalization as shown in 1], compared to the previously reported 1.6% [2]. The earlier epidemiologic catchment area (ECA) study estimated a 0.8% lifetime prevalence of manic episodes [21]. High false positive BD rates in community surveys based on the CIDI questionnaire have been also reported in a Dutch study where only 11 of 49 subjects where reconfirmed as having BD type-I when reinterviewed by trained clinicians with the SCID [24].
A recent study that employed the Mood Disorder Questionnaire (MDQ) estimated the lifetime prevalence of BD to be as high as 3.7% in US adults [25]. The limitations of the MDQ study have been debated [26]. The authors considered the 3.7% estimate to be conservative, acknowledging that limitations of the study were likely to underestimate the prevalence of BD [25]. In the MDQ community-based validation study, Hirschfeld and collaborators [27] reported sensitivity and specificity of the screening instrument to be 28.1% and 97.2%. Even a specificity that is very close to 100% can yield high false positive rates with low base-rates, as it is the case of lifetime prevalence of manic episodes in the community.
As Baldessarini and colleagues pointed out, ‘along with sensitivity and specificity, the prevalence rate of the target diagnosis in the test population is of critical importance in estimating a test's predictive power or utility’ [15]. Disregarding base-rates in evaluating the results of diagnostic tests seems common [28]. Focusing solely on sensitivity and specificity of a test can induce the reader to mistake inflated prevalence estimates for accurate estimates [29].
Finally, IR of first hospitalization for BD or SAD was for males 1.39/10 000 year –1 (95% CI = 1.17–1.66), and almost twice as much for females: 2.69/ 10 000 year –1 (95% CI = 2.39–3.03). These results are consistent with the existing literature on incidence rates of BD [30].
Criterion and participation bias
To our knowledge, this is the first study comparing the performance of a telephone screening interview using hospitalization history as a criterion. The choice of the hospitalization criterion has strengths and weaknesses. Among the strengths there is the unique opportunity offered by the Swedish network of registries and by the universal health coverage in the country, which makes hospital discharge records useful. Indeed, as far as access to care is maximized, hospitalization is a reasonably accurate measure of the lifetime prevalence of manic episodes. The hospitalization criterion allows estimating participation bias, an occurrence that is usually difficult to assess in psychiatric survey studies.
One weakness is the necessity to assume as a criterion an entity that is less prevalent than the illness itself. The low base-rate of mania is a major source of inaccuracy [15]. Subjects who were still alive in 1998 and who had a history of hospitalization limited to the period preceding 1969 are negative to our criterion for mania. Other phenomena that could account for an underestimation of lifetime manic episodes through the hospitalization criterion are the closing of some psychiatric hospitals in Sweden, and that in the scarcely populated northern areas of the country there are often large distances from the closest medical facility. Furthermore, only part of those individuals who reported being on medications for mania had a history of hospitalization for a psychiatric condition consistent with the presence of a manic episode: this may suggest that the hospitalization criterion has limited accuracy. Sources of measurement inaccuracy in terms of either a manic episode classified otherwise or a non-manic patient (e.g. schizophrenia or agitated depression) classified as bipolar or schizoaffective at hospital discharge may encompass limited reliability of ICD diagnostic categorizations as appraised by hospital psychiatrists, or an error in transcribing the corresponding codes in the registry system.
Hospitalization for a psychiatric condition that included a manic episode was two fold higher in nonresponders as those affected by a current or a past psychiatric condition are less predisposed to participate in surveys. This finding is consistent with previous studies [31].
Regarding generalizability, Swedish twin population is not distinguishable from singletons in terms of incidence of treated psychotic and affective illness [32]. Our conclusions appear thus to be fairly applicable to all national residents.
Conclusions
The present analyses of the STR data of mania suggest caution in interpreting results of surveys investigating lifetime prevalence of mania or BD and emphasize limitations of current methods used for population screening. In reading and understanding results from such studies special attention should be given to the possibly high rate of false positives for a lifetime manic episode, which may overestimate the true prevalence of this condition.
Our ascertainment of non-response among those with and without a history of hospitalization for a psychiatric condition that included a manic episode provides an OR estimate of 0.5 that could be used as a correction factor in analysing or interpreting surveys of lifetime mania or BD where no valid direct estimate of this important source of bias is available.
Footnotes
Acknowledgements
Presented in part at the 156th Annual Meeting of the American Psychiatric Association, May 17–22, 2003, San Francisco, CA, and at the Fifth International Conference on Bipolar Disorder, 12–14 June, 2003, Pittsburgh, PA, US.
This report is based on data collected in the Screening Across the Lifespan Twin (SALT) study. PFS & NLP were supported by NIH grants NS-031483 and CA085739. NLP and SALT data collection also received support from NIH grant AG-08724 and a grant from the Swedish Scientific Council. FS was supported by a grant from the Swedish Foundation for International Cooperation in Research and Higher Education (STINT Institutional Grant Program, exchange Harvard Biostatistics-Stockholm Biostatistics Research Group, Karolinska Institutet; IG2001-038) and by the PhD program of the University of Pisa (Dept. of Psychiatry, School of Medicine). The authors have no conflicts of interest in relation to material presented in the manuscript.
The authors are especially grateful to Drs Deborah Blacker and Stefano Pini for their insightful suggestions and to Dr Ronald Kessler for his critique of the manuscript.
