Abstract
Objectives:
To use the history of the Karnofsky Performance Scale as a case study illustrating the emergence of interest in the measurement and standardisation of quality of life; to understand the origins of current-day practices.
Methods:
Articles referring to the Karnofsky scale and quality of life measurements published from the 1940s to the 1990s were identified by searching databases and screening journals, and analysed using close-reading techniques. Secondary literature was consulted to understand the context in which articles were written.
Results:
The Karnofsky scale was devised for a different purpose than measuring quality of life: as a standardisation device that helped quantify effects of chemotherapeutic agents less easily measurable than survival time. Interest in measuring quality of life only emerged around 1970.
Discussion:
When quality of life measurements were increasingly widely discussed in the medical press from the late 1970s onwards, a consensus emerged that the Karnofsky scale was not a very good tool. More sophisticated approaches were developed, but Karnofsky continued to be used. I argue that the scale provided a quick and simple, approximate assessment of the ‘soft’ effects of treatment by physicians, overlapping but not identical with quality of life.
Introduction
Monitoring and assessing the quality of life of patients being treated over long periods of time in ways that are practical, reproducible and meaningful, has been a focus of much interest and debate in the management of chronic conditions over the past few decades. It is perhaps surprising that the earliest devices used to quantify and standardise quality of life were developed for trials of experimental chemotherapy, involving patients suffering from terminal cancer. In this article, I will trace the history of the numerical scale devised in the late 1940s to record what its inventors, David Karnofsky and Joseph Burchenal termed the ‘performance status’ of cancer patients. 1 The Karnofsky scale covered 11 stages from 100 (normal health) to 0 (death). An even simpler, five-grade performance scale was developed by Zubrod et al. 2 for the Eastern Cooperative Oncology Group (ECOG) in the 1950s and endorsed by the World Health Organization (WHO) in a 1979 handbook. 3 Interestingly, their inventors did not conceive of these scales as devices to record quality of life. They were used only sporadically in the 1950s and 1960s, and more or less exclusively around experimental cancer chemotherapy. Only in the 1970s were they reinterpreted, retrospectively, as simple quality of life indices. I suggest that historical case studies such as the one I present here can help us understand the origins of current-day clinical practices.
In the article, I will discuss two stages in the history of the Karnofsky performance scale. In the first part I will discuss the scale in the context in which it was developed, in the early days of experimental cancer chemotherapy, as a standardisation device that helped quantify effects of cytotoxicchemicals on patients. In the second part I will address how and why, in the 1970s and 1980s, the scale turned into a (somewhat crude) marker of quality of life. Just as there was increased interest in measuring quality of life more broadly, as I will point out, the frequency of publications in medical journals on the quality of life of cancer patients rose markedly. This went along with much more frequent references to performance scales, along with calls for other ways of assessing quality of life, deemed more adequate. 4 –7 I will address changes in interactions between cancer professionals and patients in the 1970s, which accompanied and informed the new interest in measuring the quality of life of patients undergoing highly interventionist and often debilitating courses of therapy, at a time when for some cancers remission appeared to turn into cure. 8
Methods
Standard historical methods were applied: articles and books referring to performance scales and quality of life measurements published from the 1940s to the 1990s were identified by searching databases and screening journal literature. Pubmed (keyword) and Science Direct (full text) searches for both ‘quality of life’ and ‘performance status’ were performed in August 2011. Texts were analysed using close-reading techniques. Suitable secondary literature was identified on broader developments in medicine and wider society during the time span covered, and consulted in order to understand the context in which performance scales were developed and concerns with quality of life emerged.
Results
1940–1970: standardisation of marginal effects
The Karnofsky scale had its origins in experiments with nitrogen mustard (a close relative of the mustard gases used in chemical warfare). The man who gave his name to the scale, David Karnofsky received his medical degree from Stanford University in 1940. As a resident at the Colis P. Huntington Memorial Laboratory at Harvard University he became interested in clinical cancer research. During the Second World War he joined renal physiologist Homer Smith’s team at New York University, for secret research on the biological effects of mustard gases funded by the US Office of Scientific Research and Development. As a first Lieutenant with the Army Chemical Warfare Service under Cornelius Packard Rhoads, Karnofsky spent months studying what happened to goats exposed to mustard gas. Rhoads was an influential figure in the history of cancer chemotherapy. 9 Having begun his professional career in haematology at the Rockefeller Institute in the 1920s, in 1940 Rhoads had been appointed director of the Memorial Hospital for Cancer and Allied Diseases in New York, just before he became Chief of the Medical Division of the Chemical Warfare Service. 10 The budget of the Chemical Warfare Service, one million dollars per annum, was roughly the same as the combined budget for cancer research in the US at the time. 11 Still in the army, Karnofsky became interested in exploring the potential of nitrogen mustard as an anti cancer drug.
Karnofsky was assigned to temporary duty at Memorial Hospital to initiate a clinical study using nitrogen mustard. Following his discharge from the Army, he returned to Memorial, like many others who had served under Rhoads in the Chemical Warfare Service. He joined Rhoads as one of the core staff members at the Division of Chemotherapy Research at the new Sloan-Kettering Institute, an institution whose defining feature was, according to Karnofsky’s colleague Joseph Burchenal, that ‘biochemical and animal studies could be immediately translated to practical application in the patient’. 12 Karnofsky spent his whole career at Memorial Sloan-Kettering, culminating in a position as chief of the chemotherapy division of the institute and the Medical Oncology Service at Memorial Hospital. Karnofsky – a life-long non-smoker – died in 1969 from lung cancer, a disease that had been the focus of much of his work in the 1940s.
The performance status scale that came to carry Karnofsky’s name was developed in the late 1940s for a trial of nitrogen mustard in the treatment of lung cancer patients. The historian Robert Bud points to another factor that made the Sloan-Kettering Institute different from earlier institutions: the industrial work models applied to cancer research. 11 The institute was named after the President of the General Motors Corporation, Alfred P. Sloan, a trustee of the hospital who donated four million dollars, and his Director of Research, Charles F. Kettering. By suggesting that the institute should carry Kettering’s name besides his own, Sloan signaled that he saw industrial R&D as a model for medical science. A press release on occasion of the opening of the institute in 1945 announced that it was to ‘concentrate on the organization of industrial techniques for cancer research’. 11 It is fitting, thus, that Sloan-Kettering researchers engaged in attempts to standardise aspects of clinical medicine that previously had not been subject to much standardisation.
The use of nitrogen mustard for cancer chemotherapy is often viewed as the result of a chance finding following the bombing of an American ship in the Italian port of Bari in 1943, which carried some of the poisonous chemical in its hull. Army doctors, the story goes, observed that soldiers and sailors pulled out of the waters at Bari after the bombing, showed alarmingly low white blood cell counts, and these doctors then made the connection with the chemicals in the water. 13 In fact, as Ilana Löwy 14 points out, the capacity of certain chemicals to kill white blood cells selectively had been observed much earlier. However, this had not been the object of systematic investigation until 1941, when a team at Yale University led by the pharmacologists Alfred Gilman and Louis Goodman obtained funding for such a study from the Office of Scientific Research and Development. They found that injections with the substance induced remissions in laboratory mice with lymphoma and also managed, in 1942, to induce remission in a human lymphoma patient. 15 These results were not published until 1946, but they were discussed at medical meetings and confirmed by other groups. Nitrogen mustard was a highly toxic substance, and not very specific, but the limited success with this poison – especially in the treatment of lymphoma patients – boosted the search for better chemotherapeutic agents. It also encouraged clinical cancer researchers to try nitrogen mustard as a treatment of last resort for other malignancies, including lung cancer.
Performance status as defined by Karnofsky, Burchenal and their colleagues in 1948
Reproduced with kind permission from Wiley. 18
Use of performance status along with criteria for objective and subjective improvement by Karnofsky and colleagues in their trial of nitrogen mustard in patients with carcinoma of the lung
Reproduced with kind permission from Wiley. 18
A similar scale, but even simpler, was devised by Zubrod et al. 2 for the Eastern Cooperative Cancer Chemotherapy Group (later Eastern Cooperative Oncology Group, ECOG) about a decade later. The Group was formed in 1955 under the sponsorship of the US National Cancer Institute (NCI) and its Clinical Director at the time, Zubrod. A member of the same generation as Karnofsky and Burchenal, Zubrod, too, was socialised into clinical research during the war. Working under the future director of the US National Institutes of Health (NIH), Dr James Shannon, Zubrod was involved in trials of anti-malarial therapies. 9 In 1946 he joined the staff of Johns Hopkins University School of Medicine, working in the departments of Pharmacology and Medicine. During a period when the first antibiotic treatments were developed, he was part of a team researching bacterial pneumonia. After a brief stint at St Louis University, he accepted the position at the NCI, where he was expected to develop the programme in cancer chemotherapy. He was involved in developing some of the basic methodologies used in modern cancer chemotherapy trials. Performance status was one of several parameters entered under ‘Patient Reaction’ into progress report forms. On a scale from 0 (normal activity) to 4 (unable to get out of bed), this was meant to record the performance a patient was capable of.
Why did these pioneers of cancer chemotherapy and cancer clinical trials feel the need
for another means of measuring the effect of treatment, in addition to subjective and
objective improvements? While Zubrod does not address this question, Karnofsky and
Burchenal are explicit about their reasons: The fact that subjective and objective evidence of improvement can occur in a
patient, while the patient remains bedridden, has suggested to us the need for another
criterion of effect. This has been called the performance status, or PS. It is a
numerical figure, in terms of percentage, describing the patient's ability to carry on
his normal activity and work, or his need for a certain amount of custodial care, or
his dependence on constant medical care in order to continue alive. These simple
criteria serve a useful purpose, in our experience, in that they measure the
usefulness of the patient or the burden that he represents to his family or
society.
1
This passage suggests that performance status – a means of measuring the levels of ability and disability experienced by patients after treatment rather than merely the effects of drugs on tumours – is linked to the concerns with military and industrial efficiency which Robert Bud has found to characterise modern cancer research in the immediate postwar era. Performance status, however, was not recorded extensively in the 1950s and 60s, not even by organisers of cancer clinical trials. Neither was there much explicit concern over the ‘quality of life’ of patients: the term simply does not appear – neither in medicine nor elsewhere.
1970–1990: the discovery of quality of life
‘Quality of life’ was a new concept in the 1970s, much discussed both in the public arena and by scholars. Interestingly, nobody seemed to know where the term had originated, not even the participants in what was probably the first major international meeting devoted to comparative studies of life quality, during the Ninth World Congress of Sociology held in Uppsala, Sweden, in August 1978. 20 Some of the older participants faintly remembered having first encountered the expression in the late 1950s or early 1960s, not in scholarly contexts but in pamphlets or newspapers, mostly in association with issues of environmental pollution or the perceived deterioration of living conditions. 21
While origins may be unclear, the term was clearly associated with some of the major social shifts of the 1960s, namely the rise of environmental concerns and the question if there were limits to progress. But how did concerns over quality of life find their way into cancer medicine, a domain dominated by an ideology of progress and intervention at all costs? ‘Assessment of quality of life and, more generally, cost-benefit evaluations in the health area have become a subject of increasing interest in recent years’, wrote the epidemiologists Bardelli and Saracci 4 in an article for the journal of the International Union against Cancer, the UICC Technical Reports, published in the year of the Uppsala symposium, in 1978. Comparing data from trials published in six cancer journals with wide international circulation in 1965 and 1966, and in 1975 and 1976, Bardelli and Saracci found, not surprisingly, that randomised clinical trials had become more common in the 1970s, both in relation to non-randomised trials and as a portion of all papers published. More surprising to them was that the proportion of trials whose organisers chose ‘hard’ parameters such as survival time as endpoints had increased from 33% to 78%. They had expected a shift towards ‘soft’ parameters, in line with the increased interest in quality of life.
The use of what Bardelli and Saracci took to be indicators of quality of life – direct or indirect – had in fact increased as well, but far less than that of ‘hard’ parameters. Where side effects were recorded, or disease free intervals, Bardelli and Saracci interpreted this as indirect indicators of quality of life. Direct indicators, to them, were the performance status scales devised by Karnofsky and Burchenal or ECOG, or some closely derived variations. These were used in 6% of the trials examined for 1965–1966 and in 10% for the years 1975–1976. While this may seem low, the article by Bardelli and Saracci in itself is evidence for the new interest in assessing the quality of life of patients. Indeed, the authors of a 1983 review article in Statistics in Medicine, Fayers and Jones 6 found more than 200 papers measuring ‘quality of life’ in cancer clinical trials between 1978 and 1980; the author of a Lancet editorial in 1991 counted 207 in Index Medicus with the keyword ‘quality of life’ for 1980 alone, and 846 for 1990. 7
Bardelli and Saracci offer no reflections on the accuracy and appropriateness of performance scale type measurements when it comes to representing quality of life – they appear to be glad to find parameters at all in 1960s trials that they could interpret as quality of life measurements. This situation changed in the 1980s. The need to measure quality of life during clinical trials of cancer therapy, according to the authors of a paper published in the British Journal of Cancer in 1988, was now ‘widely recognised’ as treatment was often toxic and frequently given with palliative rather than curative intentions. 22 Hang on, did not Karnofsky and Burchenal in the late 1940s also argue that the effects of their experiments with nitrogen mustard – a very toxic chemical – had had predominantly palliative effects? What had changed?
There had been major developments both in cancer chemotherapy and palliative care. Medical oncology, the medical specialism organised around chemotherapy and clinical trials, somewhat marginal in the 1960s, was now increasingly firmly established. 23 With radical courses of chemotherapy, often involving combinations of drugs, medical oncologists had managed to reduce recurrence and increase survival rates for leukaemia, lymphomas and some solid tumours (predominantly in children), enough to support claims that these diseases had now become curable. 8 Having established chemotherapy firmly as a treatment modality, medical oncologists in the 1980s and 1990s increasingly turned to solid cancers in adults, with some success, but also much disappointment. Lung cancers, for example, continued to defy them. Nevertheless, courses of chemotherapy became part of many lung cancer patient pathways – usually towards the end. 19 While benefits measurable in terms of hard parameters – such as survival time – were modest, medical oncologists argued that survival was not the only goal, that patients wanted something to be done, and that, indeed, for some their performance status improved. There were, however, as Nick Thatcher deplored in 1995, regrettably few formal studies focusing on quality of life. To Thatcher, like others before him, performance status was a valid measure of quality of life. 24
Lung cancer also illustrates some changes in the meanings associated with palliatiation. While chemotherapies – like radiotherapy – rarely succeeded in making lung cancer patients live significantly longer, they alleviated some of the symptoms caused by the cancer; they were of palliative value. Palliatiation, however, acquired a new set of meanings in the 1970s, in association with the hospice movement. 25 Palliative care now was about more than just relieving symptoms. It was about enabling patients to die dignified deaths, with spiritual concerns moving centre stage. This is not something that Karnofsky and Burchenal would have recognised as part of their remit. To them the ultimate goal was survival; and only if they could not record any significant differences in survival they resorted to ‘soft’ parameters. ‘The achievement and triumphs that may occur in the fight against cancer will come from doctors that do too much’, Karnofsky argued in 1960, emphasising that intervention in the interest of prolonging life should always have priority. 26 The new forms of palliative care with their spiritual take on what ‘quality of life’ was and their emphasis on relieving pain and other symptoms while letting go of life, provided different answers to the wishes of patients that something had to be done, and they required different approaches to assessment. Quality of life, now, also included quality of death. However, even in this new context, the Karnofsky scale continued to be used. 27
Concerns over the appropriateness of different approaches to measuring that elusive thing, quality of life, moved centre stage in publications in the 1980s, and despite much criticism, the Karnofsky scale continued to deliver what was required by many trial organisers: a rough and ready, quantitative assessment of the ‘soft’ effects of treatment by physicians. But it offered a rather poor approximation of how patients actually felt. 22 Most other assessment instruments took the form of questionnaires, sometimes developed with input from psychologists and completed mostly by doctors, nurses, or social workers in conversation with trial subjects. 6,28,29
Much of the criticism, in fact, was related to questions of agency: should quality of life be recorded by doctors, or better by nurses, or, indeed, did the patients themselves have to be enrolled? 22 Such concerns resonated with changes in the status of nurses as, increasingly, experimental treatments became part of routine pathways. And, indeed, they were related to the changing status of patients themselves, in light of patients rights campaigns and the new view of patients as consumers. 30,31 An example of an alternative that incorporated the patient’s perspective was a diary card that had to be filled in by the subjects of clinical trials themselves and that by the early 1980s was used in several cancer treatment studies organised by the Medical Research Council (MRC). 6 It was devised by the MRC Tuberculosis and Chest Diseases Unit, which since the 1950s had played a pioneering role in the development of techniques and technologies for clinical trials. 32
Like the Karnofsky index, this was a simple instrument: a pre-folded, compact card asking a few simple questions and using 5-point scales to record the answers. It was born out of necessity – most trial subjects were now treated as outpatients and went home after each course of chemotherapy – as much as the recognition that patients may be able to report on their experiences more adequately than their physicians. The word ‘cancer’ did not appear on the cards as, in the 1980s, still, some patients were unaware of what they were suffering from – or preferred not to think about it. 33 The card addressed ‘obvious and relatively objective side effects’, such as nausea and vomiting, as well as less objective questions regarding the subjects’ mood and general condition. Its designers recognised the ambiguity and context-dependency of categories such as ‘average mood’, suggesting that ‘perhaps ‘normal for me before my present illness’ would be preferable’. However, they found, ‘such distinctions [were] difficult to convey succinctly and clearly on a simple card or form’. This statement points to a dilemma central to all such forms of assessment: if they were to be useful they had to compromise between efficiency and adequacy. A long and very detailed form would reflect the feelings and perceptions of trial subjects more adequately, but would be impractical. In fact, among the most practical forms of assessment, still, were performance indices: physicians did not even have to talk to patients to score them. To be sure, most agreed that performance scales were poor, unreliable instruments for assessing quality of life. 34 However, quick, easy and intuitive, they continued to be used where trial subjects were treated as inpatients and remained under observation by clinicians.
Discussion
The changes in the tools and techniques viewed as legitimate means for assessing quality of life reflect broader changes in medicine. In the rather patriarchal 1940s, if one wanted to know how a patient felt about the effects of an experimental treatment, who did one ask? The expert, of course: the physician administering that treatment. Who would one turn to following the challenges to established authorities that marked the late 1960s and 1970s? Not clear. Doctors? Nurses? Or patients themselves, some of them increasingly vocal and well informed? In any case, the goal was to find an instrument that translated feelings, perceptions, hope, and in some cases very significant implications for a person’s identity into numerical values on a scale, fit to be used in statistical calculations. There was no really good quality of life questionnaire; they all just produced scores that were more or less fit for the purpose. The Karnofsky scale did the job it was designed for in the 1940s, but it was no longer entirely appropriate by the 1970s, after the end of what historians have described as the ‘Golden Age’ of modern medicine, in a medical world that has become much more pluralistic, with doctors no longer unchallenged with regard to treatment decisions. 35
The proliferation of chemotherapeutic regimes and their inclusion in more and more patient pathways may make it necessary to think about quality of life and performance issues in new ways. Children treated forleukemia in the 1970s and 1980s, when the disease was declared curable, have grown into adults and found that they needed to develop means to manage the late effects of their treatment. 8 Patients diagnosed with solid cancers that used to lead to fairly rapid deaths a few years ago, quite frequently survive now for 5 years or more. 36 A recent, high-profile example is the American feminist writer and academic Susan Gubar, who was diagnosed with ovarian cancer in 2008 and has published a memoir on her experiences. 37 While Gubar and others in similar situations know that their cancers are ultimately not curable, these cancers become manageable in ways that make them comparable to conditions which were transformed from acute to chronic through the introduction of new treatment regimes years or even decades ago, such as diabetes, 38,39 kidney failure, 40 or more recently HIV-AIDS. 41 Rather than labeling these patients as ‘survivors’ as in much of the campaign literature produced by cancer charities, 31 we may want to conceive of them as chronic illness patients. 42
I have argued that the Karnofsky scale was originally conceived as an instrument to measure performance rather than quality of life, or in the words of Karnofsky and Burchenal: ‘the patient's ability to carry on his normal activity and work, or his need for a certain amount of custodial care, or his dependence on constant medical care in order to continue alive’. 1 Questions of performance and thus the need for paid or unpaid care moved to the background somewhat in the quest for reliable quality of life questionnaires for the purposes of clinical trials. However, with more patients living with terminal cancers for increasingly long periods of time, in that grey zone between the management of a chronic condition and end of life care, these issues become increasingly pressing. One example is the current debate over changes to the welfare system in England and Wales, which may make it impossible for cancer patients undergoing some forms of treatment to claim benefits. 43 The transformation of some cancers into chronic conditions is a fairly recent development, which does not appear to have received much attention in considerations over care and welfare provision. We may be able, however, to draw on the histories of ‘older’ chronic conditions such as diabetes or kidney failure as models. 38 –40
Ultimately, what constitutes quality of life will be different from person to person, and
people may change their minds at different stages of their lives. Authors have commented
that ‘[w]hat constitutes quality of life is a personal and individual question which lends
itself to a philosophical rather than a scientific approach’.
22
There may be times and situations, for
example, when survival is everything. A young cancer patient, possibly with small children,
may prioritise survival over aspects of personal wellbeing. An elderly, frail patient may
set different priorities. Depending on the ways in which treatment and care are funded,
financial concerns may also form part of the equation. The author of an insightful
Lancet editorial on the issue in 1991 quotes Aristotle, illustrating that
this is by no means a new observation: When it comes to saying in what happiness consists, opinions differ … and often the
same person actually changes his opinion. When he falls ill he says it is his health and
when he is hard up he says that it is money.
7
Call it happiness or quality of life, or even performance status, like so much else in modern medicine, this issue is inextricably linked to cost-benefit calculations. Organisers of clinical trials and health care analysts may continue to debate the best ways of measuring and standardising this elusive entity, but what costs are considered acceptable and what benefits achievable and desirable, is, and has been for a long time, dependent on biographical and historical contexts. Nevertheless, the transformation of some cancers into chronic conditions needs to be addressed by policy makers, and more or less adequate attempts to measure performance and quality of life will continue to play an important role.
Footnotes
Acknowledgments
I would like to thank my colleagues Michael Worboys, Robert Kirk, Stephanie Snow, David Thompson, Elizabeth Toon and Duncan Wilson, as well as two anonymous referees, for useful comments on earlier versions of this paper. The work was first presented to the ESF-funded Drug Standards, Standard Drugs workshop, ‘The view from below: On standards in clinical practice and clinical research,’ at the Charité Institute for the History of Medicine, Berlin, in September 2012.
Funding
This work was supported by the Wellcome Trust (Grant Numbers 068397 and 092782).
