Abstract

Barth and colleagues have recently reviewed the diagnosis of polycystic ovary. 1 In their insightful examination, they conclude that the current diagnostic criteria are insufficiently robust for clinical research. In fact, I would argue that currently available diagnostic criteria at present are insufficiently robust even for clinical care. We should note that this argument has significant economic and health-care consequences. 2 Polycystic ovary syndrome (PCOS) is one of the most common endocrinopathies of women, affecting between 7 and 10% of all reproductive-aged women, with significant associated morbidities. However, as for many other syndromes, the disorder is diagnosable only by a collection of signs and features, and no single test is available for its diagnosis.
There are currently three principal definitions of PCOS in widespread use. The core group of patients diagnosable as having the disorder are those demonstrating: (1) clinical or biochemical evidence of hyperandrogenism; and (2) ovulatory dysfunction; after (3) exclusion of other similarly appearing disorders (e.g. non-classic adrenal hyperplasia, Cushing's syndrome, androgen-secreting tumours, etc.). This definition was first described in the proceedings of a 1990 National Institutes of Health (NIH)-sponsored expert conference on the subject. 3 A second expert conference was convened by the European Society for Human Reproduction and Embryology (ESHRE) and the American Society for Reproductive Medicine (ASRM) in Rotterdam in May 2003, 4 which defined the disorder as affecting any individual that had two out of the following three features: (1) clinical and/or biochemical hyperandrogenism; (2) ovulatory dysfunction or (3) polycystic ovarian morphology. Like the NIH 1990 criteria, the diagnosis of PCOS was one of exclusion and required ruling out similarly presenting disorders. This definition basically expanded the original NIH 1990 criteria by considering two additional phenotypes of the disorder, namely women who had ovulatory dysfunction and hyperandrogenism, but normal ovulation, and women who had ovulatory dysfunction and polycystic ovarian morphology, but no clinical or biochemical evidence of hyperandrogenism. Finally, the Androgen Excess Society convened an expert taskforce to review published literature and arrived at a somewhat more evidence-based contemporary definition of the disorder. 5 The recommendations of this taskforce represented a compromise between the NIH 1990 and the Rotterdam 2003 criteria in that they suggested that PCOS be defined by the presence of: (1) clinical and/or biochemical hyperandrogenism; (2) ovarian dysfunction, including ovulatory dysfunction and/or polycystic ovarian morphology; and (3) the exclusion of related or similarly appearing disclosures.
What criteria to use for defining PCOS has significant implications for patient care, clinical research and public health. For example, the prevalence of PCOS as defined by the NIH 1990 criteria appears to be approximately 6.5% among reproductive-aged women. Alternatively, the prevalence of the disorder as defined by the Rotterdam 2003 criteria could be almost twice as high. 5 Use of a broader definition could make it more difficult to complete meaningful research on PCOS. Most affected would be large populational genetic studies, whose ability to detect the role of candidate genes is predicated on the inclusion of large numbers of relatively homogeneous patients. Comparability between studies is also detrimentally affected by the great variability in potential phenotypes that are included in different studies. Finally, clinical practice, diagnosis and counselling will be affected by the criteria selected, as the incidence of associated morbidities with PCOS, such as insulin resistance and risk of diabetes, are more prevalent with certain phenotypes of PCOS. 5,6
Notwithstanding the ongoing debate regarding which specific criteria should be used for diagnosing PCOS, it is clear that a much more fundamental problem has been mostly ignored and has been brought to our attention by Dr Barth and his colleagues. They eloquently note that if the features themselves are not well-defined, meaningful use of the proposed diagnostic criteria are severely hampered.
For example, biochemical evidence of hyperandrogenism is a criterion that has been present in all three currently used definitions of PCOS. However, what does this exactly mean? We need to clearly define which androgens are being measured. Should we include total testosterone (T)? Or free T? Or DHEAS? Or androstenedione? Or other lesser-known androgen metabolites? Should we define biochemical hyperandrogenism as the collection or combination of a series of these androgens? For example, among 316 consecutive untreated PCOS patients diagnosed by the NIH 1990 criteria, 38.6% had elevated levels of total T and 68.3% elevated levels of free T. 6 More critically may be how these androgens are measured. A recent position statement from the Endocrine Society indicated that the majority of currently available techniques for measuring total or free T are fraught with inaccuracy and error. 7
It is also important to define how the ‘reference range’ is calculated and what are the characteristics of the population used to define this range. Are all the women used to establish this range selected at random from a large population? Or, are we attempting to define the reference range using only those women who have been carefully selected to have no evidence of clinical hyperandrogenism, to have normal ovulatory function, and to have no family history of endocrine abnormalities? Clearly, the ‘reference range’ obtained in a general unselected population will be broader and will have a greater potential for including individuals with subclinical or even clinical endocrine disorders vs. the population of individuals who are much more carefully selected. Thus, a more strict upper reference limit should be chosen if the reference range is defined from the scatter of values found in the general population (e.g. 90th percentile) vs. that established using a well-characterized ‘normal’ population (e.g. 95th or 99th percentile).
Finally, the upper reference limit used should also take into consideration the population of subjects being assessed for the disorder. For example, when used for the evaluation of a ‘high-risk’ population of patients, such as those seeing a physician for the evaluation of possible PCOS, a lower cut-off value may be used, reducing false-negatives while maintaining a relatively high positive predictive value (PPV). Alternatively, if an investigator is interested in determining the prevalence of PCOS in a broad unselected population where the prevalence of the disorder is likely to be much lower and the incidence of false-positives higher, the cut-off level chosen for the androgen being assessed should be higher in order to garner a relatively high PPV. We noted this recently when assessing the PPV of total T for PCOS measured by radioimmunoassay vs. mass spectrometry in two population of patients, one from a general unselected populations of individuals and the other from a clinic dedicated to diagnosing patients with androgen excess. 8 Finally, we should note that androgen levels vary by age, race, body mass and other features, which should be considered when establishing a normative range. 9,10
Inexactitude also exists when attempting to assess the presence of clinically evident hyperandrogenism. Should we only assess hirsutism, or should we also include the presence of acne and diffuse female alopecia? How accurate are these predictors for the presence of other signs of hyperandrogenism? For example, up to 80% of patients with ‘hirsutism’ may have an androgen excess disorder, 11,12 while this decreases significantly to 20–40% (and varying greatly between studies) among patients with acne only, 13–15 and dropping even further to 5–15% of patients with diffuse female alopecia (again varying greatly between studies). 13,16,17 And is a combination of features better than any single feature alone?
Although defining ‘ovulatory dysfunction’ may seem to be straightforward, again this is not always the case. In many studies, investigators solely use evidence of ‘menstrual’ dysfunction to define ovulatory dysfunction, but how infrequent should menstrual cycles be to actually indicate ovulatory dysfunction? For example, in the studies of Treolar and colleagues, the upper reference limit menstrual interval appears to be 34 days or less. 18 Thus, women who have menstrual intervals of greater than 34 days (or slightly more than 10 menstrual cycles per year) would be considered to be ‘oligomenorrhoeic’. However, many studies use a stricter definition of oligomenorrhoea of eight or less menstrual cycles per year, 19 which translates into menstrual cycles that are at least 45 days in interval. However, the use of menstrual regularity as a marker of ovulatory function ignores the presence of more subtle or ‘subclinical’ forms of ovulatory dysfunction, with up to 20% of oligo-ovulatory patients with PCOS demonstrating a pattern of vaginal bleeding that mimics a normal menstrual cycle. 6 And what about ovulatory dysfunction that is primarily reflected by inadequate follicular development and/or luteal phase? Should these be considered subtle forms of ovulatory dysfunction?
The use of polycystic ovarian morphology as part of the definition of PCOS suffers no less from these inexactitudes. For example, Adams and colleagues defined polycystic ovaries by transabdominal ultrasonography by the presence of at least 10 cystic areas and measuring 2–8 mm in diameter in one plane of at least one ovary, arranged peripherally around a dense core of ovarian stroma, or scattered throughout an increased amount of stroma. 20 More recently, polycystic ovaries have been defined by the presence of at least 12 such areas measuring 2–9 mm in diameter. 21 Likewise, the cut-off used for total ovarian volume has varied from 7 to 11 cm3. 22 Thus, even the hallowed criterion of ‘polycystic ovaries’ is still in flux.
Do variations in these definitions actually make any difference? In fact, they make all of the difference. Changes in any one of these definitions can virtually double the prevalence of PCOS in a population, or double the number of patients that are now definable by one criteria, even without changing the specific criteria used. For example, if evidence of hyperandrogenemia is required to diagnose PCOS (e.g. in Asian individuals who do not readily demonstrate dermatological signs of hyperandrogenism) measuring total T alone would identify about 35% of affected individuals, while the addition of free T would identify almost 70% of patients. 6 As clearly argued by Barth and colleagues, we should now be focusing on defining the limits of the actual features that will be included in the various criteria for PCOS. If this is not done first, then the ongoing debate concerning which of the currently available criteria for PCOS has the most merit is simply flawed. More careful research into defining the normal variability of these features in unselected populations and the diagnostic limits of these instruments should be the first priority of all interested investigators and societies. We should be thankful to Dr Barth and his colleagues for making such an eloquent argument for this effort.
Footnotes
Acknowledgement
This study was supported by National Institutes of Health grant RO1-HD29364.
