Abstract
This article offers a new reading of the US–UK Diagnostic Project (1965–75), a series of influential collaborative studies that tested the ability of psychiatrists on either side of the Atlantic to diagnose schizophrenia, exploring its historical origins, significance, and legacy. Using archival materials to trace the contest between two of the key players in the Diagnostic Project, Aubrey Lewis and Morton Kramer, it explores how the methodological allegiance between biostatistics and clinical psychiatry was forged in a decade in which psychiatry was undergoing a public crisis. The article unpacks how the place of diagnosis in psychiatry was fundamentally transformed through this important and overlooked collaboration, in which computational methods were strategically used to steer post-war psychiatry from crisis to consensus.
In 1970 two groups of psychiatrists, one made up of clinicians from the UK, the other from New York, were shown videotapes of psychiatric patients and asked to make a diagnosis (Kendell et al., 1971: 123). While they seemed to reach similar conclusions on some patients, on others there were serious disagreements. In one particular case the disagreement was extreme. Below is the description of ‘Patient F’ provided in the published study report: This 30-year-old bachelor from Brooklyn had been hospitalized briefly several times, had no close friends, and had rarely held a steady job. He described having once had a hysterical paralysis of his arm and gave a vivid account of the fluctuations in his mood and morale and of his willingness to abuse alcohol or drugs whenever the opportunity arose. (ibid.: 125) David Healy: How much of an impact did the US/UK study have? Robert Spitzer: Oh it had an important impact in that it showed that differences in national trends were due to different conceptions of what the diagnosis meant. And that inevitably led to justifying diagnostic criteria. So yes that was very important. But also we were treated with total contempt by the British psychiatrists. David Healy: You were the guys who couldn’t make a diagnosis. Robert Spitzer: We didn’t know how to make a diagnosis. John Wing and Cooper and those guys. So I think when DSM came along they were still dismissive. And of course they had total control of the ICD [International Classification of Diseases]. The ICD always looked towards the Maudsley trained guys. (Healy, 2000: 422)
This latter aspect of the DP is less well understood by historians of psychiatry. The DP typically features as a milestone in the post-war shift in American psychiatry from a psychosocial paradigm to a more empirical and biological paradigm. In Shorter's influential A History of Psychiatry, the DP is briefly mentioned as having ‘made it clear that the two countries were badly out of sync’, and provided ballast for the diagnostic reformers (Shorter, 1997: 143). In Hannah Decker's history of DSM-III and Allan Horwitz's more recent history of the American diagnostic manual, the project receives only brief mention (Decker, 2013: 163; Horwitz, 2021: 56). This is surprising given that one of the key incentives to revise DSM-III was to make it compatible with the ICD and the revision process of both classifications was dominated by figures who participated in the DP. The US-centric historiography has by and large failed to dwell on the importance of this collaboration.
In relation to archival materials that shed some light on the origins of the DP, I want to look more closely at this scientific collaboration in order to draw out the competing aims of the two sides. The DP emerged to answer a very practical question: could mental hospital statistics be trusted? But it also came to answer a more epistemic question: who could really diagnose schizophrenia? Rather than provide final answers to these questions, I will argue, the DP exemplified a new way of testing the trustworthiness of mental hospital statistics through carefully designed statistical comparisons. This new way of testing psychiatrists to diagnose fundamentally shaped the process of diagnostic reform throughout the 1970s.
My discussion of the DP will focus on two of its key players: Aubrey Lewis and Morton Kramer. As others have argued, it was Lewis who was responsible for introducing a descriptive approach to the ICD reforms and who played a key role in the early history of post-war diagnostic reform (Fulford and Sartorius, 2009). The Chair of Psychiatry at the Institute of Psychiatry from its establishment in 1946 until his retirement in 1966, Lewis was the central figure in shaping and promoting the post-war style of Maudsley psychiatry. The other figure is the biostatistician Morton Kramer, who, along with Lewis, was a key mover in the early period of diagnostic reform. Kramer became director of the Biometrics Branch of the National Institute of Mental Health (NIMH) in 1949, having been trained in statistics and epidemiology. According to Decker, Kramer was at first reluctant to take the job, protesting to the chief of NIMH that he knew nothing about mental health (Decker, 2013: 124). He was perceived by his more clinically engaged colleagues as a serious epidemiologist who was focussed strictly on data about populations, lacking perhaps the concern for the individual patient (ibid.). Using archival material and a close reading of the relevant literature, I will explore how Lewis and Kramer had two competing interests in the DP, although both in their own way were successful.
The struggle between Lewis and Kramer can be understood in one sense as a contest over the meaning of psychiatric diagnosis, the former understanding it is an act of expert judgement, the latter as a technology for surveying populations. It is well established by historians of science and medicine that in the post-war era individual clinical expertise became subordinated to statistically tested guidelines. As Theodore Porter has provocatively put it, ‘Epistemology, I would suggest, is becoming a subfield of administration, and this is nowhere more significant, or more consequential, than in medicine’ (Porter, 2005: 400). In the DP we see how statistical evaluative categories were used to shift the place of diagnosis in psychiatry and subordinate it to the administrative functions of psychiatry. The import of the statistical evaluative categories of reliability and validity was how they circumscribed evaluation to a specific purpose which was already professionally defined. In this framework there is no point discussing the reliability per se of diagnosis, or the validity per se of schizophrenia, but rather discussion was forced to focus on the statistical differences between attempts to carry out specific tasks. In this specific sense, psychiatric diagnosis was yoked into a subfield of public health administration. This movement of methods and evaluative categories into psychiatry was cemented by external pressures on the profession. It was in the context of the public crisis of psychiatry from 1965 to 1975 that these evaluative categories, commonplace within psychometrics and statistics from the early 20th century, became epistemological weapons for the defence of psychiatric diagnosis. The lure of computational methods drove a lot of the excitement around developing simulations of psychiatric diagnosis, but already by 1975 it was clear these methods were more or less a dead end. What the attempt to simulate diagnosis ultimately achieved was the transformation of psychiatric diagnosis from a clinical skill into an administrative procedure. This was not because psychiatrists necessarily adopted new standardized diagnostic methods, but because the terms in which diagnosis was evaluated became biostatistical. Put differently, by developing standardized interviews and diagnostic algorithms, these studies laid the foundations for the alienation of diagnostic intelligence and, unintentionally, the slow erosion of diagnostics as a form of psychiatric expertise.
Many commentators overemphasize the impact of these diagnostic classifications or see these new methods as totally transforming the diagnostic encounter. Rather, as I argue here, the DP showed the importance of training and regular communication for generating reliable diagnostic data, and, as a corollary, demonstrated the impossibility of scaling up such an approach to regular data collection in mental hospitals. The task of actually monitoring and surveying how psychiatrists diagnose in practice would require small groups of disciplined diagnosticians using standardized tools. This system of surveillance was ultimately complicated, costly and required serious levels of commitment from the psychiatrists involved.
Crisis, consensus, and computers
In the mid 1960s two researchers funded by the Medical Research Council (MRC) sent a questionnaire concerning the diagnosis of schizophrenia to around 350 psychiatrists in England (Willis and Bannister, 1965). These psychiatrists worked in mental hospitals, teaching hospitals, and psychiatric units in general hospitals. They were given a list of 31 potential schizophrenic ‘manifestations’ or symptoms, including features such as hallucinations, thought withdrawal, and thought disorder, and asked to assess their contribution to a diagnosis of schizophrenia. The questionnaire allowed the psychiatrist to rate each symptom as one of five categories: as ‘necessary’, whereby presence of the symptom was a constant feature of diagnosis, as ‘sufficient’, whereby its presence was decisive in arriving at a diagnosis, as both, as ‘contributory’ or as having ‘no significance’. The results showed that the psychiatrists ‘had not felt able to make the detailed logical discriminations requested’ (ibid.: 1167). Many either rejected the use of such logical categories outright or used them idiosyncratically (ibid.: 1170). The authors concluded that ‘this suggests that this degree of logical differentiation is not acceptable in psychiatric practice’, even though commonly used terms such as pathognomonic implied a logical value for a symptom characteristic of a diagnosis (ibid).
To their surprise, given the received opinion that ‘psychiatrists’ concepts of schizophrenia and treatment show continual variation’ and the ‘often voiced assertions that the diagnosis of schizophrenia is largely intuitive and idiosyncratic’, they found ‘considerable consensual agreement’ on the ‘major symptoms’ taken into account when diagnosing schizophrenia and that the ‘symptom hierarchy which appears is in substantial agreement with standard authorities on the subject’ (Willis and Bannister, 1965: 1165, 1170). The highest-ranked symptoms of importance for diagnosis were ‘thought disorder’, followed by ‘incongruity of affect’, ‘neologisms’, ‘passivity feelings’, ‘paranoid delusions’, and ‘hallucinations’ (ibid.: 1165). The authors had enlisted the use of the computer facilities at Elliot Medical Automation Ltd. to conduct a cluster analysis on the ranked symptoms to detect any potential hidden groupings of symptoms, but none were revealed. Cluster analysis involves measuring similarity between elements using a set of ratings of those elements; in this instance, it uses the psychiatrists’ rating of symptoms to assess how similarly rated these symptoms were. They concluded that while there is agreement on the most important symptoms for diagnosis, there is no pathognomonic symptom that guarantees a diagnosis of schizophrenia; nor is there an agreed core grouping of schizophrenic symptoms. This piece of diagnostic reconnaissance suggested that psychiatrists were not logical machines and these respondents were largely irreverent of demands to diagnose with logical precision. British psychiatrists diagnosed on the basis of symptom hierarchies learnt from textbooks and clinical instruction.
As the leading British textbook of the time Clinical Psychiatry explained, tabulated or algorithmic approaches to diagnosis had been tried in the past and were largely unsatisfactory (Mayer-Gross, Slater, and Roth, 1954: 275) Most psychiatrists, regardless of their school of psychopathology, could in most cases agree on a diagnosis of schizophrenia, which often was based on the first encounter with a patient. Critical to the diagnosis was ‘the total inability of many schizophrenics to have insight into either themselves or their environment’ (ibid.: 270; emphasis in original). The typical British psychiatrist believed that schizophrenia was a progressive and chronic illness, therefore patients without characteristic symptoms may be early cases. Moreover, if there was a family history of psychosis, such early cases were even more suggestive of a diagnosis. The resistance to using a checklist of mental symptoms derived from the fact that it was understood that only the rarer symptoms were truly pathognomonic and since these were absent most of the time, diagnosis could not rely upon them (ibid.: 275).
Statistical and psychometric approaches to psychopathology had been developed since the 1930s and with some sophistication by the 1950s, using techniques such as factor analysis. However, these remained professionally separate to the domain of clinical psychiatrists. In the USA they were largely ignored, while in the UK, for instance through Hans Eysenck at the Maudsley, they had more influence (Eysenck, 1955). Nonetheless, the various proposed new classificatory systems did not reflect the training and expertise of practising psychiatrists. A typical response to such work by psychiatrists was to dismiss these classifications as statistical phantoms lacking clinical reality. As one reviewer of the American psychologists Lorr, Klett and McNair's work Syndromes of Psychosis summarized in 1964, It is not an uncommon feature of contemporary psychiatric research to pay great attention to methods that are usually borrowed from other disciplines such as sociology, epidemiology, etc., but to ignore all expertise in psychiatry itself – psychiatric research without psychiatry so to speak. This applies to a certain extent to this work. The sophistication in statistics is matched by an almost complete disregard of clinical psychiatry itself. (Hoenig, 1964: 606)
It is perhaps not surprising then that Donald Bannister, the psychologist who helped design the survey of English psychiatrists, developed the results into an argument that if schizophrenia was logically incoherent, it could not be the object of scientific research: consensus was not enough for good science (Bannister, 1968). Whether it was referred to as a ‘syndrome’ or ‘the schizophrenias’, used as an adjective (e.g. schizoid or schizophreniform), or distributed across other features (e.g. a schizophrenic reaction) or regarded as a ‘field of study’, the result simply obscured the underlying logical incoherence of the concept (ibid.: 182). Bannister's diagnosis was stark: ‘Research into schizophrenia, as such, should not be undertaken’ (ibid.; emphasis in original). As he explained, ‘However viable it is for the clinician as a means of roughly encoding his observations, it falls well short of the rigorous definitional requirements which must be met by concepts used in a scientific context’ (ibid.: 183).
As part of the necessary theoretical shifts prescribed by Bannister, greater attention needed to be given to how specific studies related to the general clinical entity. He attacked existing research in its failure to link ‘conceptual and operational definitions’, whereby features observed in patients, say ‘distractibility’, were linked to experimental operations, such as a formalized test for levels of distraction, but without any explanation of how one got from the general notion of ‘distractibility’ found in patients to the results generated by the test. The question was which formalized tests captured the patient's clinical picture best and how one might decide between them. For instance, in the case of research linking biochemical agents to schizophrenia there was a lack of intervening arguments for how one got from the specific agent to the clinical feature. Psychiatric research needed to use operational definitions, and this meant dropping schizophrenia. In concluding his critique, he warned of the dangers of an emerging research machine that produced data but no science (Bannister, 1968: 187).
Bannister argued that diagnosis in psychiatry did not follow demands of logical coherence, but rather obeyed the exigencies of clinical utility. For Bannister, the solution was to drop the concept altogether from research. This conclusion was eagerly referenced by Ronnie Laing and Aaron Esterson in their introduction to Sanity, Madness and the Family as confirming the scientific emptiness of the schizophrenia concept, a position cultivated by anti-psychiatrists, like Laing, David Cooper, and Thomas Szasz (Laing and Esterson, 1971). Bannister's study and his subsequent deployment of the results into an attack on the scientific value of the concept of schizophrenia reflect the climate of the late 1960s, in which psychiatric wisdom and its institutions were subjected to extreme and widespread critiques. But what the survey of English psychiatrists had revealed was that, contrary to frequent complaints in the psychiatric literature and in public that the concept was used arbitrarily, there was in fact a considerable degree of consensus among psychiatrists concerning the symptoms that made up the clinical picture of nuclear schizophrenia. It was this consensus that would be mobilized by the DP in order to produce a yardstick for testing psychiatrists how to diagnose schizophrenia.
Essential to turning this consensus into a quantifiable value were biostatistical experts who provided the methods to design tests that could evaluate the diagnostic procedure. These figures were experts in the use of statistical methods and clearly defined criteria that characterized clinical research during and after the Second World War. The use of diagnostic criteria in the influential group of psychiatrists at Washington University in St. Louis most likely came from the psychiatrist Mandel Cohen and cardiologist Dudley White, who were sponsored by the National Research Council in the 1940s to investigate soldiers with anxiety-related heart problems (Healy, 2002). Given the various names for this problem in use at the time – neurocirculatory asthenia, Da Costa syndrome, effort syndrome, anxiety neurosis, neurasthenia – they started using diagnostic criteria to make explicit which patients they were talking about. The new language of operational definitions, field studies, and strategic projects stemmed from the military context in which medical researchers into psychological disorders worked during the Second World War. One of the first uses of the phrase operational concept in relation to schizophrenia was by the Austrian-British psychiatrist Erwin Stengel, who had been commissioned by the World Health Organization to assess the state of classification in mental illness (Stengel, 1959: 601). Norman Sartorius, a key player in the WHO's work in psychiatric epidemiology and a young recruit to the DP, reflected in an interview that the language and hierarchy of the early WHO was similar to that of the military, with a director general, specific fields, strategies of action, medical debriefs, and campaigns against disease (Sartorius, 2021).
It was the migration of these quantified methods from the social sciences into psychiatry, aided by newly available computational power, wartime organizational structures, and the post-war expansion of clinical research, that shaped the context for the emergence of the DP. As I will argue at the end of this article, in response to the public crisis of psychiatry the researchers who designed the DP in one sense embraced the most radical idea of the era, that mental disorders might not exist, that they were unproven theoretical constructs, but used this scepticism and the public crisis of trust in psychiatry to legitimize a conservative, biomedical, and distinctively Maudsley approach to classification.
The DP explicitly set out to develop a yardstick for schizophrenia diagnosis, in other words, to turn informal consensus into a specific standard. It featured one of the earliest uses of a computer algorithm to develop a perfectly reliable diagnosis against which psychiatrists could be statistically tested. It strategically used the new evaluative standards of biometrics to highlight diagnostic discrepancies, while positioning its authors as experts capable of resolving the issue. The psychiatric school of thought that had developed at the New York State Psychiatric Institute, which cultivated a psychoanalytic approach, was used as a foil to legitimize the crisis of schizophrenia diagnosis while simultaneously mobilizing the consensus on nuclear schizophrenia.
Some origins of the US–UK Diagnostic Project
As Joseph Zubin observed on his tour of the leading centres for psychiatric research in Europe in the late 1950s, the interest in field studies of mental disorders and developing psychiatric epidemiology was widespread (Zubin, 1961: v). As a direct result of this tour, Zubin along with Kramer and others, organized a conference in February 1959 on field studies of mental disorders. The conference, which took place under the auspices of the American Psychiatric Association and was funded by NIMH, brought together leading experts in public health, biostatistics, epidemiology, and clinical psychiatry, as well as a philosopher and a sociologist. The hefty conference proceedings provide a highly valuable source in relation to the intellectual origins of the DP, and I will discuss only certain key moments. On day one there was a lengthy debate over the difficulties in implementing the ICD in the UK, the only country where it had been made into the official classification system. Aubrey Lewis was drawn to argue against his colleagues who were in favour of a simpler classification system, directing focus at the psychiatrists themselves and their responsibility to diagnose carefully: I think it's more than instructing them. It's a matter of wrestling with their diagnostic souls. These people are not diagnosing for the purpose of providing international data. They are clinical psychiatrists from all over the country who have to do their work with the customary functions of a psychiatrist before their minds. For the ordinary clinical practitioner, diagnosis is a means of accumulating a portmanteau of information by a short cut.… I think the general run of psychiatrists, who make the diagnoses that Dr. Kramer tabulates and analyzes, are not very interested in the job. They don’t put their heart into making the correct diagnosis because they don’t really feel – to the extent that the general physician does – that it matters.… We cannot hope to improve the material that Dr. Kramer analyzes, unless we convince psychiatrists generally that it's worth their while to spend more time than hitherto on making and recording careful diagnoses.… They have to recognize that they are doing it for a purpose that goes beyond their ordinary clinical duties, it is not just a futile labeling activity to satisfy some administrators in the register office. (ibid.: 103–4)
Not long after the conference Kramer made moves to get such a field trial up and running. By the early 1960s biometric statistics on mental illness were helping to drive a boom in American psychiatric research. The director of NIMH, Robert Felix, had successfully used newly gathered statistics to lobby congress for greater funding for NIMH's extramural research programme. Having totalled $9 million in 1949, funds for extramural research were at $50 million by 1959 and soared to $189 million by 1964 (Grob, 1991: 68). There was no ‘road map’ for this funding; rather, NIMH's policy was to support the best research in any and all fields relating to mental illness (Scull, 2011: 271). In this context Kramer was free to make a proposition to fund epidemiological research into mental illness in the UK, no doubt in part to make use of and therefore retain the large federal grants the institute received (Decker, 2013: 107, 173).
In June 1961 Kramer presented a paper at the Third World Congress of Psychiatry in Montreal in which he argued there was a discrepancy between how American and British psychiatrists diagnosed schizophrenia by looking at mental hospital admission statistics (Kramer, 1961). Earlier that year he made a trip to the UK to try and capitalize on this initial finding. Kramer checked into the Tavistock Hotel, Tavistock Square, on 10 April 1961. 2 In May he met with Himsworth, then secretary of the MRC, as well as other key figures in the MRC and at the Institute of Psychiatry. The most prominent representative of a nascent psychiatric epidemiology in Britain was Michael Shepherd, professor at the London School of Hygiene and Tropical Medicine, who Kramer had hoped would help establish a centre for psychiatric epidemiology in London. Kramer had already met with Lewis and Michael Shepherd in 1950–2 and 1956–7 to discuss the mental hospital admissions data (Kramer, 1969: 9–10). In addition to a new institute, Kramer proposed a project matching British mental health records with census returns in order to gain information on ‘size of family, social class etc.’, arguing that US funds could make up for the shortages of ‘money, staff and machines’ in the General Registry Office.
The fast-paced arrival and impressive scope of Kramer's proposals were given weight by NIMH's financial commitment. However, it was unclear on the British side, and in Kramer's proposals, what exactly the purpose of this study would be. 3 Lewis was resistant to having an entirely American funded institute (with at least 50% American students) for Psychiatric Epidemiology in London. In a letter to Sir Harold Himsworth in June 1961, Lewis peppered his opening remarks that he had just received a bulky research proposal from Kramer with the following droll comment: ‘He has evidently moved at great speed, and expects similar hustle from us.’ 4 He went on to play down the novelty of Kramer's proposal for a new course on epidemiology at the London School of Hygiene and cast serious doubt on whether the school would accept Kramer's stipulation that 50% of students be American nationals. Overall the grand scale of the proposals unsettled Lewis: ‘I would feel more comfortable about it if Kramer had not mapped out such a grandiose programme.’ 5 While the archives do not allow a better reconstruction of Kramer's intentions, he was well known for complaining about the lack of morbidity statistics for mental disorders in the USA. In the letters from Felix and Kramer to Lewis they praise the UK's Mental Health Act of 1959 for its focus on voluntary entry to hospital and emphasis on community care. The morbidity data collected from the National Health Service by the General Register Office offered a vast reservoir of data for studying the effects of this shift and the sociocultural correlates of mental disorder. But the exact details of the plan escape us. What is clear is that the attempt to establish an American outpost for psychiatric epidemiology in London and for American epidemiologists to gain access to the statistical data produced by the NHS failed. Kramer was successful, however, in obtaining funding from NIMH and preparing for the DP, the planning for which began in 1963, and it is likely the involvement of the famous New York psychiatrist Paul Hoch helped win the collaboration of Lewis (Zubin, 1969).
The idea of comparing mental hospitals in New York and London made sense given the location of the two groups, but it is hard to see how the particular comparison was not knowingly designed to bring attention to the irregular practices of psychiatrists in the state of New York. Kramer, as head of biometrics at NIMH, was aware that New York was the only state in the USA where state hospitals did not use the DSM (Zubin, 1961: 108). This is not to deny that American psychiatrists in general had a more psychosocial concept of schizophrenia than in the UK, but it is clear that the DP was designed to capture this difference.
After initial meetings between the two sides from 1963 to 1965, the DP was structured into two phases. The first phase would study the relationship between mental hospital statistics and diagnosis. A team of six psychiatrists from both sides would use a standardized interviewing method to diagnose patients from a mental hospital in Brooklyn and one in London and compare their results to the respective mental hospital statistics (Zubin, 1969: 19). Alongside this study, videotapes of patient interviews would be made and shown to audiences of British and American psychiatrists to better sample how these two groups diagnosed schizophrenia and manic depressive psychosis. The aim of these two studies was to establish whether the increased incidence of schizophrenia in the US observed by Kramer represented a real difference in patients or a difference in local diagnostic practices. There was also a second phase of the DP, which was envisaged as following on from these studies and would investigate the ‘sociocultural correlates of psychiatric disorder’, but this was never pursued (Cooper et al., 1972: 14). There were further comparisons of diagnostic patterns for elderly patients made in the 1970s but the DP never led to an actual epidemiological study of mental illness (Copeland et al., 1975). After the boom of research funding in the 1960s, the more hostile environment of the Nixon administration and changing personnel on the DP led to this proposed second phase being dropped (Copeland, 2008: 34).
The methodological imperialism of the Maudsley
The task of developing a codified procedure for measuring consensus fell to the Maudsley psychiatrists, who were clear that the yardstick for comparing diagnoses internationally would be based on a Western European tradition of clinical psychiatry – in other words, on their own diagnostic style. The challenge of standardizing psychiatric diagnosis was a question of making the procedure reproducible without reducing the quality of the information gathered. The British and American groups had independently developed different approaches to this challenge. The former's Present State Examination (PSE) allowed much more independence in how the interviewer conducted the conversation with the patient and how they recorded mental symptoms. As a result, it required extensive training to be used correctly. The latter's Mental State Schedule (MSS), in contrast, required that the interviewer read out tightly scripted questions and could rate only whether the patient's response was true or false. While the DP employed a mixture of the two in generating a standard diagnosis, the resulting interview reflected the British approach and required weeks of training to administer in a reproducible way.
What was common to both interviews, however, was that they were developed alongside computer programs that would take the counted symptoms and use an algorithm to turn them into a diagnosis. There had been several attempts by psychologists to create diagnostic algorithms using purely statistical models, but they had been largely unsuccessful. In contrast to these, the two groups developed computer programs that explicitly simulated how the psychiatrist diagnosed. The American group, primarily Robert Spitzer and Jean Endicott, produced DIAGNO, which used a decision tree to come to a diagnosis, while the British group's CATEGO used a series of stages at which symptoms were clustered into groupings that would indicate in a diagnostic class. While the creators were keen to stress that the programs did not produce diagnoses but provisional diagnostic classes, which the psychiatrist must evaluate, they nonetheless materialized a split in the cognitive labour of psychiatric diagnosis. This split created two apparently separate procedures: the recording of symptoms, and the creation of a diagnosis from the symptoms. In practice, however, the structure of the interview and the diagnostic algorithms reflected a clinical tradition in which these two procedures were bundled together. By separating them, the psychiatrist could be statistically compared and tested on how reproducible their attempts to record symptoms and make a diagnosis. The computer program would not replace the psychiatrist. But it enabled a codified version of a diagnostic style that could be used to evaluate the diagnostic process itself.
The PSE was developed by the MRC Social Psychiatry, which had been directed by Lewis up until 1965, when he stepped back to be replaced by John Wing. The PSE was designed to formalize and specify the common technique of cross-examination employed in medical interviews – the clinician has in mind a particular symptom and pursues questions to ascertain its presence or absence, each question leading from the previous. The idea of the ‘present state’ referred to the fact that the instrument was restricted entirely to recording mental symptoms from the past four weeks. The PSE excluded what were considered organic symptoms, such as dementia, and conditions diagnosed predominantly on the basis of patient history, such as personality disorders. It was limited to ‘non-organic symptoms’ that had occurred during the month prior to interview.
The structure of the questions in the PSE followed what its authors considered ‘the practice of the European school of psychiatry, with its long tradition of clinical observation and emphasis on the importance of listening to the patient's description of unusual experiences’ (Wing, Cooper, and Sartorius, 1974: vii). This was in essence a clinical system formalized in Heidelberg in the 1930s, in which psychiatric diagnosis followed a three-part hierarchy and which had been brought to the Maudsley Hospital by German émigrés (Hayward, 2010). The British clinical picture of schizophrenia in this tradition was a mixture of clinical concepts developed by Emil Kraepelin, Eugen Bleuler, and Kurt Schneider. In this procedure, the psychiatrist first investigated the patient's memory and movements in search of organic psychoses, diagnosed on the evidence of organicity (e.g. severe cognitive impairment). This overrode all other considerations – no other symptoms, purely psychotic or neurotic, changed the diagnosis of organic psychosis. Only once there was no question of organic cerebral disease came schizophrenia, with Schneider's first-rank symptoms typically taken in practice as pathognomonic (e.g. thought disorder or blunting of affect). Lastly came manic depressive illness, which received a diagnosis only if no symptoms from the preceding diagnoses were present.
The PSE separated out what were deemed sufficiently specific symptoms that could be rated clearly and reliably for their presence or absence at interview. This list evolved over its development, which outlived the DP, to include around 400 symptoms by its eighth edition, covering observations of ‘worrying’ and ‘guilt’ to judgements of ‘bizarre appearance’ and ‘misleading answers’ (Wing, Cooper, and Sartorius, 1974: Appendix 3.2, 36–9). This long list was sorted into groupings of 38 syndromes and finally 11 provisional diagnostic classes. Schneider's first-rank symptoms were given preference in arriving at a diagnosis of schizophrenia, with borderline cases being placed in two other classes called ‘Class 0+ (Other Psychoses)’ and ‘Class 0?’, which contained patients who exhibited either limited psychotic phenomena (e.g. talking to oneself) or doubtful symptoms (e.g. no psychotic symptoms but affective flattening), respectively (ibid.: 254). In essence, CATEGO restricted the fuzzy clinical picture of nuclear schizophrenia to a clearly defined syndrome.
American reviewers picked up on the PSE's positioning as of the European style, distinguished from the American style of Spitzer and Endicott's MSS, which lacked such an interest in details of psychopathology – one reviewer lists the PSE's differentiation of hallucinations into numerous subcategories. A review of the PSE by the British psychiatrist David Goldberg celebrated that ‘an international organisation has at long last given formal recognition to something British’ (Goldberg, 1975: 159). Goldberg made the observation that the rules used in the PSE for grouping symptoms into syndromes entailed an arbitrariness characteristic of ‘its pragmatic British origin’ but doubted whether all these so-called syndromes were comparable: some resembled classic clinical pictures following traditional psychopathology, while others were ‘a rag bag’ of symptoms, designed to group symptoms that did not fit neatly elsewhere. In summary, while he was convinced the PSE would become a global standard for comparing symptom profiles, he hoped that the CATEGO program would not close the door to further validation attempts: The existence of the Catego programme will mean that a Maudsley-type diagnostic assessment can be made on any patient from Penge to Terra del Fuego: but one hopes its existence will not discourage psychiatrists from pondering the significance of symptom patterns in different settings, and one also hopes that the authors themselves will be prepared to persevere with multivariate analyses of raw PSE data. (ibid.: 159)
Videotapes and Venn diagrams
In the transatlantic Anglo-American network of the DP, the videotape apparatus played a central role. Although the tube-based video screen for television was invented in the 1930s, it was not until the videotape recorder was developed in the mid 1950s that video (as opposed to film) was used in behavioural research (Ginsburg, Anderson, and Dolby, 1957). By the 1960s the video recorder was being used in various fields of psychiatric research and practice (Berger, 1970). It is not just the new mobility of the video medium that is of historical significance, but the different epistemic role that filmed patients now played. The camera and the moving image have a long history within psychiatry. Emil Kraepelin, for instance, made use of the film camera as a tool for the investigation of psychopathological phenomena. In 1904 he added a studio where he could film patients to his psychiatric clinic in Munich. Kraepelin wrote that the moving image was an ideal technique for documenting certain types of phenomena hard to recreate in live demonstration, such as hysterical fits, and made several cinematic studies through the early 1920s. The film functioned primarily as a psychological method for the empirical investigation of mental states (Killen, 2017: 32). In the studies such as the DP, however, the videotape played a different role: it was not the patient's mental state that was being measured, but the reproducibility of the diagnostic process.
The media of video was a remarkable analogue to the epistemic ideals of the psychiatrists trying to construct reliable diagnostic classifications. The video enabled practically the separation of personal interview and diagnosis – the same single patient interview could be shown to many different psychiatrists multiple times and in different places – which remained the epistemic goal of standardization. It was evident that clinical communication, the patient interview, could not be made to replicate in the way that the filmed interview fixed the patient for the many watching psychiatrists. The patient might respond differently to different physicians, might give different responses in a second interview, and might provoke different responses from different physicians. The aim of these videotape studies was to artificially replicate the psychiatric encounter as closely as possible. As the first international comparison study, two years before the DP, reported, ‘The situation filmed resembled, as closely as possible, the psychiatrist's initial contact with a patient – the time during which he ascertains the mental state and takes the detailed history’ (Sandifer et al., 1968: 2).
In total, eight patient interviews were videotaped for the DP. The first five were assessed and rated by a smaller group of psychiatrists at the Maudsley in London. The remaining three were screened during several all-day events across the United Kingdom to different groups. In America the tapes were shown to psychiatrists in the state of New York. Most British participants were from university departments and psychiatric units within general hospitals (many were from the Maudsley), and, similarly, a large proportion of American participants worked in state hospitals. In contrast, however, more Americans had training in psychotherapy and were in private office practice.
The participating psychiatrists confidently reached their own diagnoses (there was a rating scale for confidence), for which the result, in clinical practice, would have been the recognition of a real patient. As explained at the beginning of this article, there were some significant disagreements. Clearly the two groups of psychiatrists were often talking about different things when they talked about schizophrenia, but within the framework of the study they could be compared as two different samples from the same population. The Venn diagram played a crucial role in post-war clinical science for simplifying the cognitive shift between statistically observed frequencies and theoretical entities essential to thinking scientifically about populations. The space of the Venn diagram belied a statistical view from nowhere in which theoretical entities could be represented empirically. In this diagram, the size and placement of the circles were informed by the relative statistical frequencies of diagnosis observed in the US–UK study, yet when they were placed together in the same space, the fundamental theoretical differences between the concepts was removed and they were represented as comparable ways of counting the same population. This comparative view was dependent upon the selection of specific patient interviews, turned into fixed experimental conditions by the videotape apparatus. The researchers acknowledged elsewhere that these experimental conditions did not reflect wider diagnostic practice: it was much harder for psychiatrists to make reliable diagnoses for some of the more actively psychotic schizophrenic patients, and the reliability of two separate interviews with one patient was always less than two ratings of a single videotaped interview (Cooper et al., 1972: 45). Videotapes and Venn diagrams were two crucial technologies for designing statistical tests of psychiatric diagnosis and representing their results. Moreover, they enabled a new method for evaluating psychiatric diagnosis at the level of statistical comparisons and turning professional consensus into a measurable statistical entity.
It is worth pausing here to reflect on how historians have discussed the diagnostic studies and the rise of what Allan Horwitz calls ‘diagnostic psychiatry’ (Horwitz, 2002). In his excellent Mad by the Millions, Harry Wu suggests that the videotape studies in psychiatric diagnostic reform played an important role in making mental symptoms measurable: ‘To make symptoms of mental illness measurable, first they had to be visualized’ (Wu, 2021: 206). The phrase suggests measurement involves seeing symptoms of a mental illness, but this introduces two confusions: first, there is nothing to see with mental symptoms; they are identified exclusively through talking, and there is nothing to measure, unless we mean counting, since these are qualitative features identified in the content and form of the patient's speech. The earliest experimental approaches to diagnosis tested how psychiatrists diagnose patients based on a written case history and in fact, psychiatrists were much more reliable with written case histories. This is because ultimately differential diagnosis is a form of verbal-textual analysis.
In a comparable vein, Horwitz writes in his DSM that ‘visible symptoms played a sharply divergent role in the DSM-III diagnoses compared to those of its predecessors’ (Horwitz, 2021: 64). There are, of course, no visible mental symptoms. He elsewhere writes ‘overt symptoms’, so it is clear that he is trying to say symptoms that are more observable or more empirical, but that is not the case. Rather, these symptoms were more specific in a semantic sense: we are dealing here with words, not visible things. What these historiographical confusions demonstrate, however, is the more fundamental success of these diagnostic studies in dividing the process of psychiatric diagnosis into two parts – a purely observational part in which symptoms are carefully measured, and a purely algorithmic part in which this data is put together into a diagnosis – and making this division appear self-evident. But this division is not self-evident and was contested at the time of the DP. In 1969, at a large conference in Aberdeen organized by the British side of the DP, the senior Swedish psychiatrist and former eugenicist Erik Essen-Möller argued, I think it is useful to remember in this connection that, in the ordinary clinical work with patients, recording of information does not necessarily come first and then diagnosis afterwards.… The process involves continuous attention to the informant's responses in the widest sense, to contents as well as to emotional reactions and subtle behaviour. What is seen and inferred, of course, depends on training and skill. Both sets of observations, conceptual and emotional, may carry diagnostic value. However, the observations also automatically influence the direction of the inquiry, which is thus subject to constant re-focusing, according to the nature of the responses. One might say that the diagnostic process, in its very formation, itself decides what will be probed for and looked for. This is what is meant by saying that recording and diagnosis are essentially one and the same and simultaneous. (Hare and Wing, 1970: 24)
The idea of fully operationalizing and automating psychiatric diagnosis remained science fiction, but the decisive results of the DP, which confirmed that British psychiatrists more or less agreed on how to diagnose nuclear schizophrenia, were taken as evidence that psychiatric diagnosis had passed the test. To this end, I will finish by charting how the place of diagnosis in psychiatry was redescribed in biostatistical terms.
Redescribing the place of diagnosis in psychiatry
In 1969 Michael Shepherd, now the first ever Professor of Epidemiological Psychiatry at the Maudsley, gave a lecture to his colleagues in Switzerland in which he explained that the old European view of clinical psychiatry was being replaced by a new statistical and computational clinical science. Shepherd contrasted the view that clinical psychiatry was a blend of physical, philosophical, and ethical considerations that formed the heart of scientific psychiatry, put forward by the Dutch psychiatrist Henricius C. Rümke at the First International Congress of Psychiatry in 1950, with the recent work of Alvin Feinstein, the North American clinical scientist, who argued that the clinician's ‘intellectual technology’ should more closely resemble statistical and computational analysis (Shepherd, 1969: 161).
As Feinstein put it in his influential work Clinical Judgement, the clinician must realize that ‘they think in mathematical sets’ and that diagnosis was really an exercise in clustering clinical observations within complex Venn diagrams of illness (Feinstein, 1967: 156; emphasis in original). Diagnosis could no longer be a feat of clinical expertise in which a clinical picture was discerned from close observation of a patient, since by deploying standardized classifications, quantifying their observations, and using computational analysis, the clinician could now understand their patient's features as a cluster within a vast spectrum of possibilities. As Feinstein put it, the idea that diagnosis identified a specific pathological lesion was an outdated technology for capturing the complexity of human illness: ‘Attempting to cope with the mosaic spectrums and moving pictures of human ailments, contemporary clinicians are severely restricted by the views of “disease” provided in the still photographs of the camera of pathology’ (ibid.: 105). Shepherd was himself the lead investigator in a precursor study to the DP and a pioneer in translating Feinstein's ideas to psychiatry (Shepherd, Brooke, and Cooper, 1968).
In the early 1970s the researchers involved in the DP defended the legitimacy of psychiatric diagnosis by restricting its evaluation to its terms of statistical evaluation, primarily reliability and validity. Robert Spitzer, who was on the US steering committee for the DP, responded to the infamous Rosenhan study in exactly these terms. The Rosenhan study, published to great fanfare in 1973, reported how a group of students allegedly faked schizophrenia and fooled several mental hospitals into admitting them, although we know thanks to recent research that this study probably never actually took place (Scull, 2023). In Spitzer's response, a paper he later considered the best thing he ever wrote, he outlined that psychiatric diagnosis had a set of specific uses, none of which included identifying ‘pseudopatients’ deliberately trying to fool psychiatrists. Only in relation to the uses of diagnosis within the psychiatric profession could one evaluate its reliability and validity. Diagnosis was, Spitzer explained, a classification procedure that can be evaluated only in reference to its stated utility and a specific population (Spitzer, 1975: 448–9).
More broadly, the existence of new statistical approaches to diagnosis was used to distance psychiatry from its critics. In his inaugural speech to the newly chartered Royal College of Psychiatrists, President Martin Roth described the medical model as agnostic about aetiology, discarding what he called the ‘nineteenth-century model’, which presupposed a physical lesion in all disease. Medicine began instead with ‘a description of a cluster of features which are considered to vary together’ and which nowadays could be validated by ‘more prompt, rigorous and decisive procedures, aided by multivariate statistics’. In all cases the cause and character of the symptoms ‘whether psychological or physical should be open to investigation without prejudice’ (Roth, 1972: 362).
Robert Kendell responded to the reasoning of the so-called anti-psychiatrists and academic critics of psychiatric diagnosis that mental illnesses without organic lesions were therefore not diseases in a similar manner by emphasizing that these arguments were ‘all based, wittingly or unwittingly, on a concept of disease which has been abandoned not just by psychiatry but by medicine as a whole’ (Kendell, 1975a: 312–13). While he admitted there was no single concept of disease to replace the lesion concept, a new concept would clearly have to be statistical and based on the effects of an observed phenomenon, not on its aetiology or pathology. In any case, the point was that psychiatric ignorance was in keeping with medical ignorance and did not in any way reduce its scientific or medical status. As John Wing concluded at the end of the 1970s, by following the standard diagnostic criteria, the psychiatrist made themselves more accountable and less open to ‘social, and even political, pressures and ideologies’ when diagnosing schizophrenia (Wing, 1979: 319).
Conclusion
As I have argued, the new methods developed in studies like the DP did not actually make everyday psychiatric diagnosis more reliable or valid. At an epistemic level, the challenges of interpreting the results of complex statistical analyses meant these studies played no direct role in formulating new classifications. At a practical level, it was impossible to know if psychiatrists using official classifications took sufficient care in making their diagnoses. Additionally, standardization did not remove the uncertainty work in everyday clinical management of patients (Henckes and Rzesnitzek, 2018; Pickersgill, 2011). Rather, in a decade of public crisis, these computer-assisted methods provided a way of measuring and legitimizing professional consensus. To this end, the importance of new computational methods was not in validating new classifications – they failed to do this – but in the codification of consensus.
In his introduction to the publication of the DP's results, Lewis reiterated that psychiatric diagnosis remained ultimately dependent upon the refined clinical judgement of the diagnostician. He noted that computers ‘will prove valuable’ in this task, but only ‘when the essential diagnostic features have been unequivocally established’ (Cooper et al., 1972: 4). This was evidence of Lewis's unwavering Kraepelinian view of psychiatric diagnosis, which, in keeping with a long tradition, hoped that thorough clinical observations would eventually establish links between specific mental symptoms with natural disease entities (Hoff, 1992). This way of talking, however, was alien to Kramer, who was a pragmatic biostatistician: for him psychiatric diagnosis was a matter of classification, and classifications did not reflect essential realities, but were more or less valid in relation to their stated purpose. Lewis died in 1975, the same year that more results from the DP were being published and the reforms to ICD-8 were underway. Kramer would go on to be the biostatistician in the task force behind DSM-III and, along with Robert Spitzer, help bring the American classification in line with the British ICD.
The huge commercial success of DSM-III came as a surprise to those involved with the reforms. The massive impact it had on American psychiatric practice, however, was due not so much to the new diagnostic concepts, but to the nature of the US health care system, in which diagnostic coding was essential for insurance payments (Horwitz, 2021: 83–4). As Healy and Horwitz have argued, the real changes in diagnostic practices were driven by health insurance and pharmaceutical companies, not the committees in charge of reformulating the national diagnostic classifications (Healy, 2002; Horwitz, 2021). In the UK, in contrast, despite the ICD-8 being vigorously supported by the General Register Office and glossaries being circulated to consultant psychiatrists, its impact on diagnostic practices was minimal, with official subclassifications being mostly ignored, although after ICD-9 more psychiatrists in England started using ICD diagnostic codes (Kendell, 1981). With the passing of the Mental Health Act in 1959 the professional power of the psychiatrist in the UK was strengthened, as the law left diagnosis in the hands of doctors, a situation that continues into the present (Ikkos and Bouras, 2021). The ICD became regulatory in function in the UK only in the 1990s. In one sense, then, Lewis’s emphasis on descriptive psychopathology and diagnostic expertise shaped the ICD. The PSE, which was developed under John Wing's supervision, continues, in modified form, to be used today for research. But the system of biometric evaluation that the DP set in motion has ultimately dislodged diagnosis in psychiatry from the esteemed role it was given by Lewis.
In 1975 Robert Kendell, a key figure in the DP, published The Role of Diagnosis in Psychiatry, a concise yet elegant textbook that went on to influence a generation of psychiatrists – indeed, it remains highly influential today (Huda, 2019). Its outlook was deeply nominalist and dismissive of debates over the reality of schizophrenia: ‘To our generation it is self-evident that diseases, tuberculosis as well as schizophrenia, are nothing but man-made abstractions, inventions justified only by their convenience and liable at any time to be adjusted or discarded’ (Kendell, 1975b: 21). However, Kendell made clear how the attempt to statistically validate psychiatric diagnostic classifications using new multivariate analytic techniques had, when seen from the clinic, clearly failed. In 1970 Kendell and Gourlay published two important studies that showed that they had failed using discriminant function analysis to validate the diagnostic distinction between schizophrenia and affective psychosis, as well as that between psychotic and neurotic depression. The results of statistical validation methods were either inconclusive or went against clinical expertise in a manner that psychiatrists were unwilling to accept. As Kendell later noted at the end of the 1980s, the dominant classifications of the ICD and DSM were still essentially Kraepelinian, with the only new nosological distinctions ‘owing to [Karl] Leonhard's imaginative insight, not to our new technologies’ (Kendell, 1989: 47). But in the crucible of this critical decade the instrumentalism of the biostatistical approach furnished psychiatrists with a new epistemological repertoire for legitimizing psychiatric diagnosis.
Footnotes
Declaration of conflicting interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
