Abstract
According to the 36-Item Short Form Health Survey questionnaire developers, a global measure of health-related quality of life such as the “SF-36 Total/Global/Overall Score” cannot be generated from the questionnaire. However, studies keep on reporting such measure. This study aimed to evaluate the frequency and to describe some characteristics of articles reporting the SF-36 Total/Global/Overall Score in the scientific literature. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses method was adapted to a scoping review. We performed searches in PubMed, Web of Science, SCOPUS, BVS, and Cochrane Library databases for articles using such scores. We found 172 articles published between 1997 and 2015; 110 (64.0%) of them were published from 2010 onwards; 30.0% appeared in journals with Impact Factor 3.00 or greater. Overall, 129 (75.0%) out of the 172 studies did not specify the method for calculating the “SF-36 Total Score”; 13 studies did not specify their methods but referred to the SF-36 developers’ studies or others; and 30 articles used different strategies for calculating such score, the most frequent being arithmetic averaging of the eight SF-36 domains scores. We concluded that the “SF-36 Total/Global/Overall Score” has been increasingly reported in the scientific literature. Researchers should be aware of this procedure and of its possible impacts upon human health.
Keywords
Introduction
In the era of globalization, researchers play an important role in the “industrialization” of academy and “collegialization” of research. Academic organizations work with enterprises and industries to increase the commercialization of scientific research and methods. Access to knowledge no longer belongs to the public sphere but to the private one. 1 Research tools that have been developed, validated, and patented by enterprises or by the academy must be used according to the specifications of their developers. The Medical Outcomes Trust, Health Assessment Lab, QualityMetric Incorporated, and Optum Incorporated, the organizations that hold all SF-36 copyrights and trademarks, have developed common policies for granting permissions for the use of SF-36 form. These organizations offer their licensing programs for both scholarly research and commercial applications that evaluate completeness of data, consistent responses and internal consistency, and assure the accuracy of data scoring and proper interpretation as well. 2
The 36-Item Short Form Health Survey questionnaire (SF-36) 3 is a very popular instrument for evaluating Health-Related Quality of Life. A PubMed search using the term “SF-36 health survey” found 9722 items. 4
The SF-36 measures eight scales: physical functioning (PF), role physical (RP), bodily pain (BP), general health (GH), vitality (VT), social functioning (SF), role emotional (RE), and mental health (MH). Component analyses showed that there are two distinct concepts measured by the SF-36: a physical dimension, represented by the Physical Component Summary (PCS), and a mental dimension, represented by the Mental Component Summary (MCS). All scales do contribute in different proportions to the scoring of both PCS and MCS measures. 3 The correct calculation of SF-36 summary measures PCS and MCS requires the use of special algorithms, which are strictly controlled by a private company. 5
The SF-36 Scoring Manual 3 doesn’t provide support to calculate a single measure of health-related quality of life, such as a “SF-36 Total/Global/Overall Score.” According to its developers, it is pointless trying to combine the two SF-36 summary measures to produce an overall score of health-related quality of life. 6 Despite this, some researchers continue to use and to extrapolate erroneously from such measures.
This study evaluates the frequency and some characteristics of articles reporting single scores of health-related quality of life (the SF-36 Total/Global/Overall Score) in scientific literature.
Methods
The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) method 7 was adapted to this scoping review. While a systematic review may include few study designs, the scoping review allows the inclusion of several study designs. 8 We included studies that have mentioned the use of a SF-36 Total/Global/Overall Score.
Search strategy
We conducted searches for articles and reviews in PubMed, Web of Science, SCOPUS, BVS, and Cochrane Library electronic databases, from 1990 to 2015, with any restrictions of language, date, and so on. The following queries were used for SF-36 Total Score: “sf-36 total score” OR “sf36 total score” OR “sf 36 total score”; and for SF-36 Global Score: “sf-36 global score” OR “sf36 global score” OR “sf 36 global score.” The PubMed search using the query “sf-36 overall score” OR “sf36 overall score” OR “sf 36 overall score” did not work well, probably because of the term “overall.” We then performed the search for SF-36 Overall Score using the following syntax: sf-36[All Fields] AND (“overall score”[All Fields] OR “overall scores”[All Fields] OR “overall scoring”[All Fields] OR “overall scoring system”[All Fields]).
Two researchers analyzed the full-text of all the studies listed in the databases and selected them for deeper analysis. Subsequently, the studies were submitted to the inclusion and exclusion criteria (Figure 1).

Flow chart: selection of papers in the scoping review.
Eligibility criteria
Articles in English language only: presence of the terms SF-36 total score, SF-36 global score, or SF-36 overall score in title, abstract, or full-text screening.
Data collection and classification, and study quality
Two independent researchers analyzed the relevant studies, classified the study design and identified the procedures for calculating SF-36 Total Score. We did not use any scoring method to assess the methodological quality of papers because our objective was just to demonstrate the misuse of a measure. Each study was classified according to the most recent evaluation of the Journal Citation Reports (JCR) Impact Factor of the periodical in which it was published and stratified as Impact Factor < 3.00 or with Impact Factor ⩾ 3.00.
Results
The PubMed search retrieved 131 studies (26 SF-36 Total Score, 41 SF-36 Global Score, and 64 SF-36 Overall Score); the Web of Science search retrieved 27 studies (23 SF-36 Total Score, 2 SF-36 Global Score, and 2 SF-36 Overall Score); the SCOPUS search retrieved 49 studies (42 SF-36 Total Score, 3 SF-36 Global Score, and 4 SF-36 Overall Score); the BVS search retrieved 29 studies (25 SF-36 Total Score, 2 SF-36 Global Score, and 2 SF-36 Overall Score); and the Cochrane Library search retrieved 11 studies (9 SF-36 Total Score, 1 SF-36 Global Score, and 1 SF-36 Overall Score).
Comparing the five databases searches, 247 articles which were considered potentially relevant were submitted to the inclusion and exclusion criteria. In total, 197 articles were excluded from our database because 97 were duplicates, 69 did not mention SF-36 Total/Global/Overall Score, 21 were not written in English (9 Chinese, 5 French, 2 Spanish, 1 German, 1 Portuguese, 1 Russian, 1 Japanese, and 1 Italian), 3 were not full-text articles but summaries presented in scientific events, and 10 articles could not be accessed: one because the publication of the respective periodical had ceased, and the other nine because access to the periodical was paid. By reviewing reference lists of the remaining 50 full-text articles, we obtained other 122 relevant citations. The final sample consisted of 172 articles, published between 1997 and 2015.
The study designs were cross-sectional (41), randomized clinical trial (35), non-randomized clinical trial (22), non-controlled clinical trial (9), diagnostic accuracy (25), cohort (19), case-control (4), systematic review (5), case series (11), and case report (1).
The studies came from 36 countries: the United States (25), Iran (15), Israel (13), the United Kingdom (12), Italy (11), Turkey (9), Canada (8), The Netherlands (7), Korea (6), China (6), Australia (5), Spain (5), France (5), Greece (5), Serbia (5), Sweden (4), Taiwan (3), Brazil (3), India (3), Switzerland (2), United Arab Emirates (2), Norway (2), Germany (2), Malaysia (2), Russia (1), Belgium (1), Mexico (1), Tunisia (1), Poland (1), Slovak Republic (1), Austria (1), Portugal (1), Jamaica (1), Saudi Arabia (1), Thailand (1), and Japan (1).
In total, 110 (64.0%) out of the 172 articles were published from 2010 onwards (Table 1).
Number of articles reporting SF-36 Total Score according to year of publication.
Fifty-one (30.0%) out of the 172 studies were published in journals with Impact Factor 3.00 or greater.
In total, 129 (75.0%) out of the 172 studies did not specify the method used for calculating the “SF-36 Total/Global/Overall Score.” Overall, 13 studies specified their methods, referring to articles published by the SF-36 developers at the beginning of the nineties or other authors. The remaining 30 articles were grouped into five different strategies for calculating the SF-36 Total Score, the most common being the arithmetic averaging of the eight SF-36 domains scores (Table 2).
Methods for calculating SF-36 Total/Global/Overall Score in 172 articles.
Cited Kosinski et al. 181
Cited Ware and Sherbourne. 182
Cited Ware. 184
Cited Ware and Gandek. 185
Cited Ware et al. 186
Cited Kalantar-Zadeh et al. 167
Cited Hays and Sherbourne. 187
Cited Alonso et al. 188
Cited Ware et al. 189
Cited Brazier et al. 190
Cited Brazier et al. 191
Cited Zabel et al. 192
Cited Brazier et al. 193
Cited Bronfort and Bouter. 159
Cited Ware et al. 194
Some studies summed the scores of the eight SF-36 domains yielding Total Scores above 100, such as 524.4, 151 482, 152 and 110.4. 153 Other studies still reported that the SF-36 Total Score “was found to have a high internal consistency,” as measured by Cronbach’s α = 0.91, 20 α = 0.94, 74 and α = 0.93. 89
Discussion
The SF-36 developers state categorically,
The components analyses showed that there are two distinct concepts measured by the SF-36®—a physical dimension and a mental dimension. Therefore, it is not appropriate to try and come up with one overall score; thus instead the two summary scores are used.
6
Analogously, the WHOQOL-BREF questionnaire does not recommend the calculation of a single index of quality of life. The instrument comprises a four domain structure: Physical health, Psychological, Social relationships, and Environment. Each particular domain is individually scored and interpreted. The calculation of a single/total/global/overall score of quality of life is not recommended (www.who.int/mental_health/media/en/76.pdf).
In the early nineties, it was questioned whether SF-36 scores could be used to generate a valid single index of health-related quality of life. 193 The SF-6D provides a single index for use in economic evaluation or for determination of quality adjusted life years, derived from seven of the eight health domains of the SF-36. The GH domain is excluded; and the RP and the RE are combined in only one domain.190,191 The SF-6D is a single utility measure, widely validated in several studies, 195 but it cannot be considered as a single index of health-related quality of life, strictly speaking.
Questionnaires addressing health-related quality of life can measure a single or multiple constructs and be classified as unidimensional or multidimensional, accordingly. This classification can be made after empirical demonstration, using adequate statistical techniques, such as confirmatory factor analysis or Rasch analysis. Once unidimensionality is proved, the items that compose the questionnaire can be added in order to yield a single/total score. Dealing with indexes derived from multidimensional questionnaires remains as a controversial issue. Some researchers argue that if the questionnaire is multidimensional, the items gathered into different scales must be scored and interpreted separately. This interpretation respects the questionnaire theoretical structure. However, other researchers feel no constraint in creating a total/global/overall score from a multidimensional questionnaire, using factor analysis or preference-based methods. Because this procedure doesn’t respect the different natures of the components dimensions, it can be criticized as an “adding apples and oranges” practice.
The exact balance between the physical and the mental components and their contributions to the health-related quality of life probably will remain unknown. Some studies177-179 we reviewed in this article have calculated a single index by averaging the physical and mental components. By doing so, they intrinsically assumed that the best (100%) measure of our health-related quality of life would result from a “perfect equilibrium” between the physical (50%) and the mental (50%) components.
Statistical strategies must be critically and parsimoniously used in the evaluation of health measures. Analyses of SF-36 dimensionality denote cross-loadings from the eight domains (https://campaign.optum.com/content/dam/optum/resources/Manual%20Excerpts/SF-36v2-Health-Survey-Measurement-Model.pdf) and also between the Physical and the Mental Component Summaries.185,197,198 These findings would suggest the existence of unidimensionality that would support the calculation of a single index of health-related quality of life. A review of the studies addressing SF-36 unidimensionality would be welcome.
A study 74 calculated SF-36 Global Score even after a factor analysis has extracted two main dimensions (physical and mental).
In 2005, a study 180 used structural equation model analysis to investigate the SF-36 structure in a population-based sample of adults older than 18 years living in Athens, Greece. The full model included the eight first-order factors (PF, RP, BP, GH, VT, SF, RE, and MH), three second-order factors (PCS, MCS, and “well-being,” based on GH and VT domain scores), and a third-order factor. This third-order factor would correspond to an SF-36 Overall Score, “indicating that all SF-36 responses address a single underlying phenomenon: health.” This study confirmed the multidimensional structure of the SF-36, but did not recommend the use of a SF-36 Overall Score. Instead, the authors recommended the use of the eight SF-36 domains together with the second-order factors (PCS, MCS, and “well-being”) for measuring health-related quality of life.
Presently, some instruments recommend the calculation of a single index of quality of life, such as the PedsQL (Pediatric Quality of Life Inventory (http://www.pedsql.org/score.html)), EuroQuol-5D (http://www.euroqol.org/), and the 15-D (http://www.15d-instrument.net/15d). Commenting on the adequacy of these instruments validation and interpretation of their total indexes is beyond the scope of this article.
SF-36 questionnaire aimed at two different constructs to measure health-related quality of life: the Physical Component and the Mental Component. In the end of the 1990s, after an exhaustive and sophisticated validation process, the SF-36 developers concluded that their questionnaire was adequate for measuring these two constructs of the health-related quality of life. 185 However, they have never proposed but, in fact, they disapproved the use of SF-36 for building a single index of health-related quality of life. Subsequently, analyses of the SF-36 dimensionality conducted in general populations confirmed the extraction of these two main factors (Physical and Mental).74,196,199,200
Fairly high fees should be paid for each administered SF-36 questionnaire 5 and applications of the scoring software.2,3 Researchers from low-income countries could have limited possibilities to pay for these fees. However, the great majority (72.4%) of 174 studies which we identified have come from high-income countries, classified according to the World Bank criteria: Gross National Income per capita of US$12,736 or more (http://data.worldbank.org/about/country-and-lending-groups#High_income).
This review has identified a high number of studies that have worked the “SF-36 Total/Global/Overall Score” out. These studies came from countries all over the world; 29.3% of them were published in high impact-factor journals and 63.8% were published in the last 6 years.
Calculating or not a SF-36 Total Score could be taken as a matter of different standpoints. However, we can argue against this position. Only a tiny proportion of the published papers uses the SF-36 Total Score. Applying our data to the 9722 papers found in a PubMed search using the query “SF36 health survey,” we have found that only 1.8% (172/9722) calculated SF-36 Total Score. Even considering that a certain proportion out of these 9722 papers were not interested in calculating a global measure of health-related quality of life, it is reasonable to assume that this different standpoint is not in the mainstream of the scientific community practice. Besides, papers reporting the SF-36 Total Score did not provide scientific arguments to support this measure.
This review has identified at least nine different ways of calculating this SF-36 Total Score (Table 2). It is difficult to conceive that all nine ways of calculation would provide the same (and valid) measure. If this different standpoint still applies, another question arises—which would be the correct way of calculating the SF-36 Total Score?
In our opinion, calculating a SF-36 Total/Global/Overall Score is a measurement bias (a systematic error) that can lead to a measure with poor validity, considering the latter as “the degree to which a health related-patient reported outcomes (HR-PRO) instrument measures the construct(s) it purports to measure.” 201
If calculating a SF-36 Total/Global/Overall Score is an error, this might contribute to the building of a body of knowledge without the necessary scientific basis. The identification of any real implications of such error on decision making in clinical practice, developing guidelines, and on patients’ life lies beyond the scope of this study. However, researchers should be aware of this fact and of its possible impacts upon human health.
Study results based on a measure (SF-36 Total Score) with questionable validity may produce negative effects on individual and community health and waste public and private resources. It is difficult to evaluate the magnitude of the impact of the methodological errors in the studies we identified. In our scoping review, we identified five systematic reviews that aimed to consolidate knowledge produced about the quality of life of patients with critical conditions such as acute coronary syndrome, 33 pain after spine surgery, 164 in treatment for iron-deficiency anemia, 43 osteoarthritis, 100 and with movement disorders, undergoing deep brain stimulation. 63 However, these five studies considered a SF-36 Total/Global Score, a measure that needs a more consistent scientific basis. Researchers, editors, and reviewers of scientific periodicals should take responsibility for the trustworthiness of their research and for the preservation of research integrity.
Strengths and limitations
This review was limited because the queries “SF-36 total score,” “SF-36 global score,” and “SF-36 overall score” are not indexed in all the databases we used. This can partially explain the scarcity of articles retrieved from these databases when compared to the number of articles obtained by consulting article references or by accessing the Internet. Another limitation is that our study was restricted to five databases. Despite these limitations, our review was able to identify a substantial number of studies dealing with the SF-36 Total/Global/Overall Score.
Conclusion
The SF-36 Total/Global/Overall Score, a global measure of health-related quality of life, has been increasingly reported in the scientific literature. Many studies using this measure were published in highly prestigious journals. However, its validity as a measure of total health-related quality of life can be questioned. Such total measure may contribute to build a biased body of knowledge.
Footnotes
Acknowledgements
We would like to thank Flávia Catarino Conceição Ferreira, librarian from the Reference Sector, Unified Health Library, Federal University of Bahia, for technical help in planning the scientific literature searches. We also thank Prof. Kionna Oliveira Bernardes Santos, from Federal University of Bahia, for useful criticisms to the final version of the manuscript.
Declaration of Conflicting Interests
The authors declare no financial or non-financial competing interests with organizations that hold copyrights and trademarks of questionnaires, particularly the SF-36. The findings of this study were not discussed with the SF-36 developers or with the SF-36 organization.
Ethical approval
Ethical approval was not sought for this study because it did not deal with human beings directly but with public data, already published.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article:F.M.C. received a research fellowship from Brazilian National Research Council—CNPq (304563/2014-5).
Informed consent
Informed consent was not sought for this study because it did not deal with human beings directly.
