Abstract
The convenience and privacy of the Internet makes this an attractive channel for providing and accessing health information [1]. As of March 2009, there were 251.3 million Internet users in North America, a penetration of 74.4% of the total population. Data for Oceania/Australia for the same time found that there were 20.8 million Internet users, 60.5% penetration of the total population [2]. US studies have shown that 80% of these search the Internet each year for health-related information, the equivalent of 95 million adults [3], with 70% reporting that this information influences their decision about treatment [4]. Although there are no published data on how frequently those with bipolar disorder access the Internet, it is likely to be a common tool for gathering information on this condition. Clinically, the provision of psychoeducation on bipolar disorder is now recognized as an important evidence-based component of management [5, 6]. The significant role of psychoeducation is also reflected in its prominence in several clinical practice guidelines [7, 8].
A major concern in the management of bipolar disorder is the delay in obtaining treatment after the onset of mood disturbance [9]. Such a delay in treatment may reduce the likelihood of response. Because the Internet is now such an accessible means of providing health material, such as information on the clinical characteristics and treatment of bipolar disorder, it has the potential to empower individuals to seek earlier medical review of diagnosis and treatment.
Despite its enormous potential, the quality of general health information on the Internet has been of concern for some time. A systematic review by Eysenbach et al. evaluated 79 studies that assessed the quality of health information on the web [10]. Seventy per cent of these studies judged the quality of information to be poor, 22% judged it neutral, and only 9% deemed it to be positive. Although the need to evaluate the quality of health information on the Internet is not contested, there is uncertainty as to how evaluation should be carried out. In 1997 Silberg et al. proposed that a set of transparency criteria developed for the print media could be useful indicators of the quality of web-based health information [11]. These were: authorship, attribution, disclosure and currency. In the review by Eysenbach et al. the 79 studies had evaluated 5941 different health websites, using 86 different quality criteria [10]. Eysenbach et al. distilled these 86 into five main quality measures: (i) accuracy, defined as the ‘degree of concordance of the information provided with best evidence or with generally accepted medical practice’; (ii) completeness, defined as ‘the scope or coverage of information contained in the website’; (iii) readability (this was not defined as such by Eysenbach et al. but this criterion can be assessed by using formulae such as the Flesch–Kincaid grade level index to give an overall reading level for content; such formulas do not consider important subjective elements such as the use of jargon or tone/mood and writing style that impact on the understandability of material); (iv) design (aesthetics; although criticized by the authors as being too subjective a measure to be useful, design criteria focus on the visual aspect or layout of the site); and (v) technical criteria, defined by Eysenbach et al. as ‘general, domain-independent criteria, that is, how information is presented’. Eysenbach et al. proposed that technical criteria were variations on the Silberg et al. transparency criteria. Examples include disclosure of authorship, date of website creation or page update, references provided, author's credentials, email address and details of sponsorship.
In 2000 Griffiths and Christensen published one of the few studies that has specifically examined the quality of mental health information on the Internet [12]. Their cross-sectional survey of websites that provide information on the treatment of depression found the quality of such information to be poor. They questioned the usefulness of the Silberg et al. accountability criteria, because they found other website characteristics, such as ownership by an organization and the existence of an editorial board, to be better indicators of content quality. To date there has been only one brief report published that has examined the quality of websites on bipolar disorder [13].
Aims
A survey of websites on bipolar affective disorder was undertaken with the aim of providing guidance on the quality of sites for both patients and mental health clinicians. The specific aims of the present study were to (i) identify major bipolar disorder websites using common search engines; (ii) review the quality of these sites using defined quality criteria; and (iii) develop a new website quality assessment methodology for bipolar disorder sites assessing features not included in current more generic scales.
Methods
Measures of website quality
A literature review was undertaken to identify instruments that would be suitable for evaluating health information websites such as those focused on bipolar disorder. The databases searched included Medline (1966 +), Medline in-process and non-indexed citations, PsychINFO, CINAHL, EMBASE, PubMed, Science citation index, and Psych Articles, as well as the Internet search engine Google Scholar. The following key words were used: ‘quality’ OR ‘reliability’ OR ‘accuracy’ OR ‘readability’ OR ‘evaluation’ OR ‘assessment’ AND ‘information’ AND ‘internet’ OR ‘web’ OR ‘www’.
Despite the large body of conceptual work that exists in the area of website evaluation, no validated instrument was found that met all the requirements of this review. The most relevant was the DISCERN instrument, a validated tool developed by an expert panel in conjunction with patients (funded by the British Library) to evaluate online information about medical treatment [14]. It was designed to be a generic tool to assist in the evaluation of all health websites, and was not developed to evaluate the quality of specific disorder or treatment content. DISCERN consists of 16 items, the last of which provides an overall global quality score (which was not used in the present review). Items 1–15 are summed to give the total DISCERN score. Each item consists of a 5-point Likert scale with defined anchor points. Given the lack of other validated instruments this tool was used to assist in evaluation of the selected websites. This instrument is now available online (http://www.discern.org.uk/). At the time of the present review this online version was not available and so the original paper-based version was used.
In addition, the authors developed a new Bipolar Website Quality Checklist (BWQC) to specifically evaluate the quality of bipolar disorder websites, using the criteria deemed to be the most critical in the bipolar disorder literature. The psychometric properties are described in the present article. All identified websites were evaluated using both the DISCERN and BWQC instruments.
The BWQC (Appendix 1) consists of six subscales, with a total of 56 items: Credibility (seven items); Currency (two items); Objectivity (six items); Availability and Usability (four items); Design and Aesthetics (two items); and Breadth and Accuracy (35 items). These subscales consist mainly of the quality criteria identified by Silberg et al.[11] and Eysenbach et al.[10], with the exception of the excessively complex Readability criterion. This was excluded because it required the use of specific tools that were unfamiliar to the raters and required specific training to ensure validity. The last subscale (Breadth and Accuracy) consists of 35 items, each of which relates to a specific diagnostic or treatment recommendation in the clinical practice guidelines for the treatment of bipolar disorder published by the Royal Australian and New Zealand College of Psychiatrists [7]. The majority of these recommendations are consistent with those of the clinical practice guidelines of the American Psychiatric Association [8]. This approach was based on that used by Griffiths and Christensen, who used the guidelines for the treatment of depression written by the US Agency for Health Care Policy and Research in development of their guideline score [15, 16]. Their guideline score was determined by the number of times a guideline was accurately reflected by the content of the website. The individual items of the BWQC are detailed in Appendix 1. Each item of the BWQC was developed as a 5-point Likert scale with the following anchors: 0 = no, 3 = partially and 5 = yes. To enhance interrater reliability, the main author (CB) developed a BWQC User Guide that provided detailed descriptors for all items.
Selection of websites
Commonly found websites on bipolar disorder as identified during August 2007
†Rank based on the frequency the website appeared across the seven search engines as well as the total number of times the website was listed using search keywords.
Website characteristics
One of the aims of this survey was for the results to be comparable with other studies evaluating mental health websites. Some of the website characteristics used by Griffiths and Christensen [12] in their study of depression websites were therefore adopted: (i) presence of an editorial board; (ii) ownership type (professional, patient or commercial); and (iii) the scope of information. We further subcategorized the latter as ‘bipolar disorder information only’ and ‘bipolar plus’, that is, when the website also contained other non-bipolar disorder health information.
Raters
Three raters were used to evaluate the websites. Two (CB and AW) were doctoral students in the School of Psychiatry at the University of New South Wales, which is affiliated with the Black Dog Institute in Sydney; one was a psychiatrist and the other a medical journalist with a science degree. The third rater was a specialist child and adolescent developmental clinical psychologist in Perth who had previously worked with a commercial health service provider with specialisation in online disease management systems (Sentiens). All raters evaluated these sites independently of each other.
Statistical analyses
The statistical software SPSS version 15 (SPSS, Chicago, IL, USA) was used for the analyses. The mean scores of the three raters for the total DISCERN score and the total and subscale scores of the BWQC were compared against the three chosen website characteristics of Griffiths and Christensen (presence of an editorial board, ownership type, and scope of information). Data were assessed for normality using the Kolmogorov–Smirnov statistic. Those subscales with a normal distribution were compared against the site characteristics using independent t-tests and ANOVA. For subscales with a non-normal distribution, non-parametric statistical tests were used (Mann–Whitney and Kruskal–Wallis). To correct for Type I errors due to multiple testing, the Bonferroni correction method was used. This involved calculating a new alpha by dividing 0.05 by the number of dependent variables that were being evaluated against the site characteristics. (0.05/8, p = 0.006). Scheffé post-hoc tests were used with the ANOVAs. Associations were examined using the Pearson product-moment correlation coefficient. Interrater reliability between the three raters with the BWQC was examined using intra-class correlation coefficients and calculated using the two-way mixed model with absolute agreement type. The authors of the DISCERN instrument used weighted κ as measure of agreement between each rater. It was felt by these authors to be the ‘appropriate measure for the analysis of data in ordered categories, such as the 5-point Likert scale used to rate each item on the DISCERN as it does not treat all disagreements equally’ [14].
Results
Website selection
Total number of hits per search engine for keywords ‘bipolar disorder + mania + manic depression + hypomania’
Website characteristics
The majority of the websites (73%) originated from North America. Sixty per cent of the websites were commercial, with professional and patient websites making up the remaining 40%. The majority (80%) of the websites were owned by an identifiable organization, with one directly owned by a pharmaceutical company. Just under a half of the websites indicated that they had an editorial board (47%). The majority of the websites (67%) could be defined as being bipolar plus, that is, that they contained information about more health issues than only bipolar disorder.
Interrater reliability
The interrater reliability between the three raters for the total DISCERN score was r = 0.61 (p = 0.009). The intra-class correlation coefficient for the mean total BWQC score was higher at r = 0.89 (p < 0.00005). For the BWQC subscales, there was a high interrater reliability for all scales, with the highest (r = 0.96; p < 0.0005) being for Mean Currency, and the lowest (r = 0.75; p = 0.001) being for Mean Availability and Usability and Mean Design and Aesthetics. The higher interrater reliability for the BWQC compared to the DISCERN may have related to the intensity of training in the two scales. Both the BWQC and the DISCERN instruments had user guidelines, but rater training was more intense with the new scale. The interrater reliability of the DISCERN was comparable to those scores reported in its initial development. Charnock et al. reported that the chance corrected agreement (weighted κ) for the overall quality rating was κ = 0.53 (95% confidence interval (CI) = 0.48–0.59) among the expert panel, while for self-help group members κ = 0.23 (95%CI = 0.19–0.27) [14]. The weighted κ was calculated by generating a κ score for each possible pair of raters for each item being rated. An overall κ score was then generated by calculating the average of individual κ with an appropriate overall standard error. The present interrater reliability findings for the DISCERN are also consistent with those reported for other studies [17].
Overview of DISCERN and BWQC ratings
Overall the quality of websites on bipolar disorder was disappointing. The mean total score on the DISCERN was 50.2 (SD = 7.6; possible range = 15–75), reflecting 58.7% of the maximum possible score. The mean total score for the BWQC was 180.3 (SD = 34.6, possible range = 56–280), reflecting 55.5% of the maximum possible score The highest scoring subscales were Currency (mean = 8.7, SD = 2.4, possible range = 2–10), reflecting 83.8% of maximum possible score and Availability and Usability score, (mean = 17.1, SD = 2.1, possible range = 4–20), reflecting 81.9% of maximum possible score. The lowest scoring subscale was Breadth and Accuracy (mean = 105.3, SD = 30.3, possible range = 35–175), reflecting 50.2% of maximum possible score and Credibility (mean = 21.5, SD = 6.9, possible range = 7–35), reflecting 51.8% of the maximum possible score.
Comparison of DISCERN and BWQC against the site characteristics of Griffiths and Christensen
Website characteristics and scores for BWQC and DISCERN (mean±SD)
BWQC, Bipolar Website Quality Checklist. †Significant results (p < 0.006) after Bonferroni correction; ‡significant results at p < 0.05; §significant after Scheffé post hoc.
A series of one-way between-groups analysis of variance (ANOVA) was conducted to explore the impact of ownership type (professional, commercial or patient) on the mean scores for subscales of the BWQC, the total BWQC score and the total DISCERN score. There was initially a statistically significance difference between groups at the p < 0.05 level for the subscale Mean Credibility, (F = 4.1, df = 2, p = 0.04), but this did not remain significant after applying Bonferroni correction. Scheffé post-hoc tests were still performed that found that this initial significant difference was due to the professional websites’ Mean Credibility scores (mean = 29.8, SD = 2.8) being greater than the Mean Credibility score for Commercial websites (mean = 18.8, SD = 6.9; p = 0.044). The patient websites’ Mean Credibility scores (mean = 21.6, SD = 1.2) did not differ significantly from either the professional or commercial websites.
Factorial structure of the BWQC
To explore the structure of the BWQC, a principal components analysis (PCA) was performed. To ensure that the data set was suitable for conducting such an analysis, the sample size and strength of the relationship among the variables were considered. The Kaiser–Meyer–Oklin value was 0.68, reaching the recommended value of 0.6 and thereby supporting the factorability of the correlation matrix. The PCA showed two components with eigenvalues exceeding 1, explaining 45.6% and 20.7% of the variance, respectively. Inspection of the scree plot supported retaining the two components model for further investigation. Component 1 (consisting of four subscales with high loadings: Credibility, Objectivity, Currency and Breadth and Accuracy, i.e. ‘Substance and Detail’) contributed 46.6% of the variance, and component 2 (consisting of two subscales: Mean Availability and Usability and Mean Design and Aesthetics, i.e. ‘Usability and Accessibility’) contributed 20.7% based on the loading before rotation.
Principal component analysis with varimax rotation
BWQC, Bipolar Website Quality Checklist.
Website characteristics scores on the two components of the BWQC
BWQC, Bipolar Website Quality Checklist. ∗p < 0.05.
Correlations between the DISCERN and BWQC instruments
There was a strong positive correlation between the mean (calculated from the three raters), total DISCERN score (calculated by summing questions 1–15), and total mean BWQC scores (r = 0.78, p = 0.001). There was also a strong positive correlation between the mean BWQC component 1 (Substance and Detail) score and the mean total DISCERN score (r = 0.78, p = 0.001). There was no significant correlation between DISCERN and BWQC component 2 (Usability and Accessibility).
Ranking of websites
Website ranking vs search engine, mean total DISCERN and BWQC
BWQC, Bipolar Website Quality Checklist. Ownership type: 1 = professional, 2 = commercial, 3 = patient.
In order to further examine these discrepancies in the rankings of the NIMH and Black Dog Institute websites, the mean scores for the BWQC were examined at an individual item level. Both websites had very similar scores on the BWQC subscales Currency, Availability/Usability and Design and Aesthetics. On the Credibility subscale, the Black Dog website scored more highly on the item ‘code of conduct’ (5 vs 2.3) and for ‘presence of a quality marker’ on the home page (5 vs 1). The NIMH website scored more highly on items ‘credentials shown for authors’ (5 vs 3) and ‘references shown for content’ (5 vs 4.3). The greatest difference, however, between these two websites was within the subscale Breadth/Accuracy. Although both websites scored highly on items focusing on signs and symptoms and aetiology of bipolar disorder as well as the item on the risk and management of suicide, the Black Dog Institute website overall scored more highly on the majority of the other items because it provided more detailed information on a variety of treatments both biological and psychological and covered in more detail issues around differential diagnosis, comorbid conditions and hospitalization. For example, mean scores for the items ‘difference in treatment for bipolar I vs bipolar II disorder’ (5 vs 2.3), ‘combination of mood stabilizers’ (4 vs 2.7),'role of atypical anti-psychotics’ (4 vs 2.3), and ‘role of emerging treatments e.g. Omega 3’ (5 vs 3). The Black Dog Institute also scored more highly on items on psychoeducation (5 vs 3.7) and role of lifestyle in keeping well (4 vs 2.3). The BWQC was able to evaluate the quality of the content of the website, which the DISCERN is not able to do as a generic quality rating instrument.
Discussion
This paper has detailed the development of the first website quality evaluation instrument specifically designed to assess bipolar disorder websites. This instrument, the BWQC, consists of 56 items and six subscales (Credibility; Currency; Objectivity; Availability and Usability; Design and Aesthetics; and Breadth and Accuracy), which were derived from previously developed concepts on the appropriate means for measuring website and print media quality [10–12]. PCA demonstrated a structure of two main components (Substance and Detail, and Usability and Accessibility), which together accounted for 67% of the variance. The BWQC demonstrated high interrater reliability (r = 0.89). The BWQC correlated strongly (r = 0.78) with the more generic DISCERN instrument [14].
Although the top-ranked website in terms of hits with common search engines was the US patient site www.helpguide.org, the leading website in terms of quality (as rated by both the BWQC and the DISCERN) was that of the Black Dog Institute in Sydney, Australia (www.blackdoginstitute.org.au).
Overall, there were only minor variations in the ranking of the websites by the two instruments. The exceptions were the discrepant rankings of the NIMH website (www.nimh.nih.gov) and the aforementioned patient site www.helpguide.org. The NIMH site was ranked second of 15 on the DISCERN, but seventh on the BWQC. The probable reason for this low ranking was that this website scored poorly on the subscale Breadth and Accuracy, reflecting its limited amount of detailed information. This particular subscale was based on the guideline score of Griffiths and Christensen [12], with the detail being derived from the recommendations of the Royal Australian and New Zealand College of Psychiatrists’ clinical guidelines for bipolar disorder [7]. The specific details of these guidelines were very similar to those of the American Psychiatric Association [8].
The present search strategy found few websites linked to government mental health service providers or educational institutions. The majority of websites were affiliated to private organizations or commercial entities. Although major search engines do not make the details on how they rank websites publicly available, it is acknowledged that webmasters and web developers are able to manipulate how they present their websites to a search engine in order to improve their numerical ranking. Google's popularity as a search engine has in part been due to its development of the algorithm Page Rank, which has made its ranking system more resistant to such manipulation. As well as examining keywords within the text, it also assesses the page's strength and quality of inbound links. It may be that other high-quality bipolar disorder websites developed by government agencies and tertiary educational facilitates were not included in this survey due to low ranking on popular search engines such as Yahoo and Google.
It should be noted, however, that recent work by Griffiths and Christensen reported that evidence-based quality scores correlated significantly with Google page rank [18]. Those authors suggest that this may indicate that Google page rank had ‘promise as an automatic indicator of quality’ and was moving towards being an acceptable marker of quality. Although this finding may reflect slowness on the part of professional institutions in adopting the Internet as a medium of disseminating information, it may also reflect that commercial websites have been developed with a greater emphasis on gaining a high website ranking on these search engines. Organizations and institutions producing information on bipolar disorder for the Internet should therefore be aware of how search engines identify and rank sites.
Limitations
A major limitation of this survey was the lack of patient involvement in the ranking process. The importance of this has been raised by Eysenbach et al.[10], but recent work by Griffiths and Christensen in their cross-sectional survey of quality of depression websites using both health professional and patient raters found a significant positive correlation between patient and health professional ratings using the DISCERN ratings [18].
Two out of the three raters had affiliations with the organization that created the top-ranked website www.blackdoginstitute.org.au. It is possible that, despite the objective nature of the two instruments, there could have been a halo effect despite the intent to remain impartial. It should be noted that the third independent rater had no affiliation with the Black Dog Institution. All reviewers were Australian, which, it could be argued, may also have had an impact on this result. It should be noted, however, that six out of the seven search engines were international, making it unlikely that the initial selection of websites for review based on the ranking by these search engines was biased towards an Australian website.
Because a major subscale within the BWQC (Breadth and Accuracy) was developed from clinical treatment guidelines, it possible that it had an inherent bias towards rating diagnostic and clinical information more highly than other types of information. This may make it more difficult for patient websites to obtain high scores on this particular subscale.
Finally, although over 140 websites were initially identified using the search strategy described in this paper, only 15 were finally reviewed in detail. This relatively small sample size may limit generalization of some of the findings, especially when distinguishing between professional, commercial and patient websites. Clearly, further research by other groups will be necessary to confirm or refute these findings. The authors would recommend that any future reviews attempt to sample a larger sample of websites and include a cross-section of potential end-users such as patients and carers, as raters.
Conclusions
Overall, the quality of information on bipolar disorder on the Internet appears to be variable, which is a somewhat different conclusion from an earlier review that reported the quality to be good [19]. Those authors found a negative relationship between readability and interactivity of a website and its content quality, and commented that the DISCERN score could not predict the quality of content of a website. This finding has been noted by the developers of the DISCERN [14] and is further supported by the present findings. It appears that there are some excellent websites that are easily accessible that not only provide high-quality detailed information on bipolar disorder and its treatment, but furthermore are using the Internet's unique interactive elements to provide many other useful features. The low ranking (14th), however, by the Internet search engines of www.blackdoginstitute.org.au (the highest quality ranking website, according to our evaluation), is of concern because this makes it less likely that this website would be found by a casual browser. When developing websites, search engine optimization (e.g. through meta tags and key word attribution or descriptors) will be crucial if websites are to be easily accessed.
Because this review was undertaken in August 2007, a repeat of the website search strategy today would be likely to identify new websites, and to find some of those identified in this paper to have been markedly modified. This fluidity is one of the Internet's most striking features. In reading this review clinicians cannot solely rely on its 2007 findings, but are encouraged to reflect on these when seeking out and recommending contemporary websites to patients. The authors also believe that it is important that clinicians attempt to match websites with individual needs, because patient websites often have many useful interactive features such as chat rooms and discussion forums that provide support and promote self-help and self-management. These features could not be rated with the BWQC or DISCERN instruments, and are often missing from commercial websites.
The Internet is increasingly becoming the portal of choice for patients and carers in accessing information on health issues. Clinicians therefore need to be able to assist patients to avoid the ‘bad and the ugly’ [19] and should be able to recommend quality websites as part of their routine provision of information on bipolar disorder. By doing so, clinicians are meeting current treatment guidelines that highlight the importance of psychoeducation as a crucial component of the management of bipolar disorder.
Bipolar Quality Website Checklist URL:______________________________________Date:___________Reviewer:___________
