Abstract
There is general agreement on the characteristics of a good website [1–4]. These include quality of content, design and aesthetics of the site, currency of information, authority of source, ease of use and accessibility, and disclosure of authors, sponsors or developers. The Health on the Net Foundation has issued a code of conduct for medical and health sites covering much the same area [5]. However, this is only an advisory body and cannot enforce minimum standards. In a systematic review of 79 studies assessing the quality of consumer health information on the Internet, 70% of papers concluded that quality was a problem [6]. In the area of mental health, a study of 21 popular sites on depression found that the quality of site content was poor, lacked balance and did not provide accurate information on treatment [7]. Similar findings were reported in a further study that was restricted to Australian websites on depression [8]. In addition, the content of many websites requires a high reading ability [6, 9].
Although much has been published on the quality on information on the web, there have been relatively few systematic studies, and these have used different methods that make comparison difficult [6, 10–13]. Authors have used textbook summaries, author opinion or meta-analyses of available opinion as criteria against which to evaluate content. Where meta-analyses have been used, numbers of reviewed sites can be low [7, 8].
We aimed to systematically assess the quality, accountability and readability of Internet information on the treatment of schizophrenia and Attention Deficit Hyperactivity Disorder (ADHD), using a standardized pro forma. We chose these two conditions, as they are some of the commonest DSM-IV diagnoses, after depression [14], that are featured on websites. Although websites often cover areas such as aetiology and prognosis, we only assessed content on treatment. This is because there are clinical guidelines against which we could compare sites.
We based the pro forma on previously published work assessing the quality of web-based information [7, 15], rather than questionnaires such as DISCERN, and IQ Tool [16, 17]. These questionnaires are targeted at patients and their families, rather than health professionals [16, 17]. They are generic instruments that have not been specifically designed to assess the scientific quality or accuracy of the evidence presented for specific conditions, although a recent study has reported an association between clinical practice guidelines and DISCERN scores [8]. We thought that it was important to include aesthetics, accountability and readability in the pro forma although some of these measures have been criticised as indicators of site quality [7]. There is little point in recommending a site if patients cannot understand the content, or find the layout unappealing.
Method
Site assessment
The standardized pro forma contained four sections. The first section assessed accountability using Silberg et al.'s scale as adapted by Griffiths and Christensen [3, 7]. This 9-point scale assessed authorship (whether authors, their affiliations and credentials were clearly identified), attribution (whether sources and references were mentioned), disclosure (whether ownership of the site and sponsorship were disclosed), and currency (whether the site had been modified in the previous month and whether the date it was created or modified was specified) [3, 7].
The second section looked at aesthetic issues using Abbott's criteria [15]. These covered the presence of headings or subheadings, diagrams and hyperlinks to external sites, as well as the absence of advertising, with a potential maximum score of four.
Section three graded the information according to evidence-based practice using the methodology of Griffiths and Christensen [7]. A core guideline score for each diagnosis was derived from guidelines for treating schizophrenia [18–24] and ADHD [25–27]. These were found through searches of MEDLINE, the Cochrane Database of Systematic Reviews, the Database of Abstracts of Reviews of Effectiveness and the National Guideline Clearinghouse. All the guidelines were based on a systematic review of the literature, with meta-analyses where applicable, and included a grading of the evidence. Some guidelines were restricted to particular treatments [18–22, 24, 27], while others covered a wide range of interventions [23, 25, 26]. Tables 1 and 2 show the recommendations for which there was most consensus. The only disagreement between guidelines was the use of family interventions in schizophrenia, for which one Cochrane Review found only equivocal evidence [22]. A point was given for the presence of each core guideline (Tables 1,2).
Accountability, presentation, content and readability of sites on schizophrenia
Accountability, presentation, content and readability of sites on Attention Deficit Hyperactivity Disorder
Readability was assessed using the Flesch-Kincaid Grade Level score. This rates text on US school years or grade levels [28]. A score of 8.0 means that an eighth grader would be able to understand the document easily and is the recommended level for standard documents [28]. The Flesch-Kincaid Readability Index is included in the Microsoft Word spellchecker. A point was awarded for a score of eight or less. A further point was awarded for the recommendation that a qualified professional be consulted for advice.
Definitions were operationalized to increase the reliability of the pro forma [29]. As in other work, the overall score was the sum of the individual items from each of the sections, every item being weighted equally [7].
Site selection
We entered the standard search phrases ‘Attention Deficit Hyperactivity Disorder’ and ‘schizophrenia’ into the five most popular World Wide Web search engines as sampled by MediaMetrix in May 2001 [30]. These were in descending rank: Microsoft Network (MSN), America-on-Line (AOL), Yahoo, Lycos and Excite [30]. The 20 highest ranked websites from each search engine were then reviewed according to set criteria. Sites were reviewed between June and September 2001.
As in other studies, a simple search method was chosen, and performed without further refinement of initial search results, to produce a list of websites similar to one generated by a person who has limited medical, Internet or computer knowledge [7, 8]. In particular, we did not restrict the search term to the treatment of either condition. Sites were only excluded from the review for the following reasons: the site was inaccessible, had already been reviewed in the current study; contained no information on management, required an access fee, or was a discussion group, message board or other open forum.
We assessed the interrater reliability of pro forma scores on a random sample of 20 sites for each diagnosis. In each case, agreement was assessed between the rater for that diagnosis and the project leader (SK). We compared the proportion of sites in the top 10% of scores that were owned by an organization or with an editorial board, with those in the bottom 10%. These variables have been suggested as indicators of website quality as they show a statistically significant association with quality of content [7].
Results
We reviewed 20 websites for each search engine covering the management of schizophrenia and ADHD during the study period. Addresses of websites we reviewed are available at http://www.geocities.com/skisely/websites.html. [31]. There was little overlap in the sites identified by different search engines. In the case of schizophrenia, one site was identified three times, and eight were identified twice, while for ADHD four sites were identified twice [31]. There was high correlation in our overall scores for schizophrenia (r = 0.95, p < 0.001) and ADHD (r = 0.96, p < 0.001) in the analysis of the subsample. Agreement on accountability, presentation, content and readability scores ranged from 0.68 to 0.95 for schizophrenia and 0.86–0.96 for ADHD (all p < 0.001). There was no overlap in sites used to assess interrater reliability
Accountability
The mean Silberg score was 3.2 out of 9. Most of the sites clearly specified the authors of the web content (55%) and indicated when the site had been created or modified (73%). However, only a minority gave author credentials and affiliations, or mentioned sources and references (Tables 1,2). Only 10% had been modified in the previous month. There were no statistical differences in accountability between sites covering schizophrenia and ADHD in terms of individual items (χ2 = 6.6, df = 8, p = 0.59) or overall mean score (t-test = 1.97, df = 198, p = 0.06).
Presentation
The mean score for presentation was 1.93 out of 4, and there was no significant difference in mean score between sites on schizophrenia and ADHD (t-test = 1.97, df = 198, p = 0.06) (Tables 1,2). Around 90% of sites had headings but less than 2% diagrams (Tables 1,2). Advertisements were absent in three-quarters of sites on schizophrenia, but less than a half of those on ADHD. By contrast, links to external web pages were present in less than a quarter of sites on schizophrenia but over a half of ADHD sites (χ2 = 20.1, df = 3, p = 0.0001).
Consistency of content with evidence-based medicine guidelines
The agreement between websites and the systematic reviews was poor for both diagnoses (Tables 1,2). Depending on the recommendation, agreement with evidence-based practice for schizophrenia ranged from only 2 to 55% (Table 1), the highest agreement being for the use of antipsychotic medication followed by mention of atypical antipsychotic medication. As regards psychological interventions, about a half of sites mentioned individual education and support, or family interventions. Only a small minority mentioned other interventions and, depending on the particular intervention, agreement was as low as 2% (Table 1).
Agreement with systematic reviews of ADHD was from 12 to 54% (Table 2), with most agreement being for the use of behavioural therapy. Although psychostimulants were often mentioned, only 14% of sites indicated that evidence from randomized controlled trials only exists for therapy of up to 2 years duration (Table 2) [25–27].
Readability
The information on the vast majority of websites required a high ‘reading level’ to understand. The average Flesch-Kincaid reading grade was 11.5 and there was no statistical difference between sites covering schizophrenia or ADHD (t-test = 1.97, df = 198, p = 0.06) (Tables 1,2). Only four sites had a score of less than 8.0, the recommended level for most standard documents [28].
Recommendation to consult professionals
Only a half of the sites recommended that a qualified professional be consulted. When present, this information was often in small print at the foot of the page or in a separate disclaimer page making it less likely that the average Internet user would find these recommendations without specifically looking for them.
Association between accountability, presentation and content scores
Accountability and presentation scores were significantly correlated (r = 0.18, p = 0.01), as were those for presentation and content (r = 0.20, p = 0.04). There was no association between content and accountability scores (r = 0.10, p = 0.13). Readability scores did not show a significant correlation with any of the other domains.
Overall scores and characteristics of web pages in the top decile
Total mean scores for schizophrenia and ADHD were 7.9 (range 2–17) out of 27, and 8.7 (range 2–15) out of 21, respectively. Table 3 shows the 18 websites that were in the top 10% of scores for each condition. Organizations, rather than individuals, owned all but one, and 12 out of 18 had an editorial board. Of the 18 websites that were in the bottom 10% of scores, individuals owned six out of 18, and none had an editorial board. These differences were statistically significant (χ2 = 4.4, df = 1, p = 0.04 and 15.8, df = 1, p = 0.0001, respectively).
Web pages in the top decile of scores
Discussion
This study indicates that the quality of information on the web is generally poor, a finding that is common to other reports [6–13]. Sites were evaluated against explicit criteria for accountability, presentation, content and readability, and checks were made of interrater reliability for both diagnoses. Although the allocation and weighting of scores follows previous work [7], it is possible that some items were more strongly associated with overall quality. This is why the results for each item were also presented separately in Tables 1 and 2. The finding that sites within the top 10% of scores were significantly more likely to be owned by an organization or have an editorial board than those in the bottom 10% of scores suggests we differentiated between sites on their quality. These variables have previously been shown to be associated with quality of website content [7].
The search strategy used in this paper was only one of a number of different strategies including the use of ‘metasearch’ search engines. We chose to use the five most popular World Wide Web search engines as sampled by MediaMetrix in April 2001 and, given the number of sites reviewed in this paper, it is unlikely that the findings would have been very different if a different strategy had been used. We also acknowledge that the Internet evolves rapidly and that the results are only a snapshot of content at the time we collected the data.
We only assessed site content on treatment as opposed to other areas such as aetiology and prognosis. This is because there are clinical guidelines against which we could compare sites. Equally, we did not distinguish between sites with comprehensive coverage of treatment of schizophrenia or ADHD and more general sites with limited consideration of the area. This is because there is no evidence that patients would restrict their search to the former.
The pro forma was designed to identify sites in the basis of accountability, evidence-based content, presentation and readability. As in previous work, content did not show a significant association with some of these other variables [7]. Although their value has been questioned, we felt that their inclusion was important because they reflect moves in peer-reviewed journals for greater accountability. These include the need to state credentials and affiliation, and a statement of potential conflict of interests. They are of particular importance on the Internet where users may purchase as well as read material from websites. The findings from this study do suggest that accountability is not necessarily associated with quality of content, and that reliance on this measure alone could be misleading. Similarly, the correlation between presentation and quality of content only just reached statistical significance suggesting that this may not always be a guide for which websites to trust.
Although there have been several initiatives for ensuring the quality of health information on the Internet (upstream filtering) [5, 32–36], these have relied on voluntary codes of conduct [5, 32], are in development [33–35] or charge for accreditation [36]. Accountability and content are well covered, but less attention is paid to readability aside from a general comment that language complexity should be appropriate to the target audience [32]. Evidence-based information that is difficult to read can also be potentially misleading.
An alternative is the use of ‘web gateways’ where users are directed to sites that have been evaluated by authoritative panels. These include Organizing Medical Network Information and Healthfinder [37, 38]. However, the proliferation of websites makes it difficult for gateways to keep up-to-date [12].
While the effects of these initiatives are awaited, the dissemination of criteria to help Internet users select information on their own (manual downstream filtering) may remain the most reliable method [39]. Questionnaires, such as DISCERN, and IQ Tool have been developed to help patients and their families assess the reliability of information about treatment choices [16, 17]. Both help users to systematically evaluate websites through a scoring system. However, they are generic instruments and have not been specifically designed to assess the scientific quality or accuracy of the evidence presented for specific conditions. A recent study has reported an association between clinical practice guidelines and DISCERN scores, and it is possible that this instrument may also have a role in measuring the quality of content [8].
The pro forma presented in this work could be adapted for use by health care professionals for any condition for which there has been an evidence-based review. The results could then be discussed with patients to help them determine which websites to trust and complement instruments such as DISCERN. Areas for future research would be the involvement of patients and carers in the evaluation of websites using instruments such as DISCERN [8].
