Abstract
Background
In Down's syndrome screening, the change in analyte concentrations in maternal serum with advancing gestational age is compensated for by converting concentration to multiples of the median (MoM) by using a mathematical equation describing the expected relationship. However, owing to assay drifts and shift, the equation may be incorrect, leading to deviation of the observed MoM distribution from the ideal MoM distribution. The NHS Fetal Anomaly Screening Programme has produced standards limiting acceptable deviation, and has provided the Down's Syndrome Screening Quality Assurance Service (DQASS) to monitor it. DQASS recommends monitoring by cumulative sum plot.
Methods
Down's screening data for 61,368 consecutive samples (12 October 2004 to 31 December 2007) was evaluated using different median assignment techniques.
Results
A change in the paradigm for median equation derivation is described, which significantly improves the probability that medians will be correct at any point in time.
Conclusion
Software developers need to change the way medians are derived in their programmes.
Introduction
The NHS Fetal Anomaly Screening Programme (FASP) has recently published a standard for quality assurance of multiples of the median (MoM) in Down's syndrome screening. 1 This specifies that ‘The median of the MoM values used in risk calculation, for any subpopulation defined by time-period, gestational age, maternal weight, smoking status and ethnicity should lie within 5% of the target value of 1, i.e. the median adjusted MoM value should lie between 0.95 and 1.05’. To achieve this standard, it is recommended that laboratories monitor a whole range of different parameters to ensure that when median recalculation is necessary this will be carried out with a minimum of delay, thereby ensuring compliance with the standard.
To monitor the achievement of standards, the Down's Syndrome Screening Quality Assurance Support Service (DQASS) has been created by FASP. It is compulsory for all laboratories carrying out Down's screening in the UK to submit data assessment every six months. DQASS sets targets for median MoM performance (Table 1). It can be seen that these performance targets are very tight with only seven green flags (8.75%) and 33 (41.25%) red flags being awarded from a total of 80 possible flags. The laboratory in the Northern General Hospital, Sheffield was in position 15 of 31 for alphafetoprotein (AFP), seven of 13 for free β-human chorionic gonadotrophin (hCG) and 12 of 18 for unconjugated oestriol (uE3) in the cycle one report, where position one was the best.
Down's Syndrome Screening Quality Assurance Service (DQASS) targets for median multiples of the median (MoM) performance
AFP, alphafetoprotein; t-hCG, total human chorionic gonadotrophin; uE3, unconjugated oestriol
The current paradigm for median calculation is that median calculation parameters are determined and stored and the same parameters are used until it is decided that they are no longer satisfactory and a new set of parameters is required. This process was defined when Down's syndrome screening began, during the late 1980s/early 1990s. In 20 years much has changed. Most analyses are now carried out using random access analysers and workload has quadrupled with no significant increase in staffing.
ll laboratories carrying out Down's screening monitor their medians extremely carefully. At the Sheffield laboratory, the median MoM is monitored daily, and every four weeks a definitive decision-making meeting decides whether to retain the current median parameters or change them. In addition, when kit lot numbers change medians are monitored extra-closely and urgent changes are made if it appears necessary. Despite this, the median performance can only achieve a yellow flag in DQASS performance standards.
Because there is much work involved in the recalculation of medians, the decision to recalculate them is not taken lightly. In general, laboratory staff need to be convinced that any observed change in median is significant and long term in its effect, and that any corrective change will not introduce error. Therefore, it is necessary to consider automating median recalculation decisions, but the question remains of what should the appropriate threshold for recalculation be.
On the basis that it is impossible to decide upon a rational threshold, a complete paradigm shift was considered. Down's screening was initially described using a case-control set based on just the 385 controls, 2 and this was sufficient to prove that screening was effective. It is now considered good practice to use larger data-sets to define medians, but this is a later addition to the protocol based on no specific evidence other than the knowledge that larger data-sets allow more precise estimates of population parameters. However, perhaps this is gilding the lily, and that smaller data-sets may be entirely adequate for controlling what is actually a dynamic system: for example, one would not drive up a motorway with one's eyes closed, only checking which lane you are in every five minutes. Therefore, instead of calculating median parameters every time a human operator determines that the data are no longer acceptable (the ‘old/routine’ method), we recalculated MoMs based on median parameters calculated from the preceding samples on a rolling time-window.
Methods
Down's screening data for 61,368 consecutive samples (from 12 October 2004 to 31 December 2007) were identified from the database of the Sheffield Down's syndrome screening service. Medians were calculated using the following methods:
Medians calculated routinely in the screening programme.
Medians calculated using the entire data-set to create a new set of median parameters.
Medians calculated using the moving time-window method, using window sizes of 125, 250 and 500 samples.
The moving time-window method works thus:
Samples 1 − n are used to determine the relationship (slope and intercept) between gestation (X) and log10 (analyte value) (Y) by least squares regression. The slope and intercept are used to calculate the median for sample n + 1 (using the relevant gestation). The MoM for sample n + 1 is calculated using the median.
Samples 2 − n + 1 are used to determine the median slope and intercept by least squares regression of gestation (X) and log10(analyte value) (Y). The MoM for sample n + 2 is calculated as above.
Samples 3 − n + 2 are used to determine the median slope and intercept by least squares regression of gestation (X) and log10(analyte value) (Y). The MoM for sample n + 3 is calculated as above.
Etc.
To assess the effectiveness of the median calculation method in maintaining the median of the MoM distribution within the 5% and 10% DQASS acceptability criteria, another rolling time-window was used to assess the median MoM. Two time-windows were assessed: 1000 and 2000 cases.
The purpose of deriving medians is to standardize the assay measurements for gestational age. By reductio ad absurdam, a very short time-window (one sample) would either reduce the MoM to 1.0 in every case, if samples were standardized against themselves or produce a random number if it were based on the previous sample. To assess the impact of sample window length on MoM derivation, the mean (±SD [standard deviation]) was calculated for log10MoM for each analyte globally and for specific gestation ages.
Results
Figure 1 shows cumulative sum plots for AFP, free β-hCG and uE3 MoMs calculated prospectively from 23,500 consecutive Down's syndrome screening tests carried out by the Sheffield sub-regional screening programme. It is clear that the ‘old/routine’ MoM calculation methods result in significant deviation of the cusum away from the ideal target (0) with time, but that the new MoM calculation method (calculated using the 500 case time-window) maintains the target MoM very close to ideal.

Cusum plot showing effects of old and new mechanisms for calculation of medians. The three lines for the new median calculation method overlap each other on the Y = 0 axis. The Y-axis scale shows the cumulative sum of log10 multiples of the median, the X-axis is the count of sample number
Table 2 shows the proportion of time that the median MoM spends outside the DQASS performance window. It can be seen that using a shorter time-window to calculate the medians results in a greater proportion of the median MoMs being within the acceptable DQASS band. It is clear that using the rolling median method almost entirely eliminates periods when the median MoM is outside the 10% limit, and significantly reduces the proportion of time when the median MoM is outside the 5% limit. Inaccuracy in the setting of the median has a large impact on the accuracy of Down's risk estimates.
The proportion of times (as %) that the median MoM is outside the acceptable time limits, using the different median calculation strategies and the two median time-windows
AFP, alphafetoprotein; hCG, human chorionic gonadotrophin; MoM, multiples of the median; uE3, unconjugated oestriol
Table 3 shows the mean and SD of log10 transformed MoMs, for the entire data-set and for subsets of the data at specific gestation days (105 days = 15 weeks to 126 days = 18 weeks). It is evident that the mean log10MoM is close to zero in all cases, and that when using the rolling median method, the mean MoM either of the whole data-set or for specific days is generally closer to zero than that achieved routinely. Equally, comparison of the SDs shows great similarity: for example, for AFP, the overall SD for whole set (0.1591) is similar to the SD for all three window variants (0.1591, 0.1587, 0.1584). These are slightly wider than for the ‘Routine’ set (0.1475), but the mean for the routine set was non-zero, indicating that the medians were not truly correct. Looking across a variety of gestations, there is also similarity across the data-set: for example, the overall set SDs for AFP are 0.1526, 0.1507, 0.1437, 0.1478, 0.1478, 0.1413 and 0.1421 for the gestation examples cited compared with 0.1649, 0.1601, 0.1561, 0.1594, 0.1557, 0.1517 and 0.1494 for the window 500 data-set – essentially, a similar pattern for both modes of calculation. This shows that even with the traditional approach for median recalculation there is a gradient of SD values across the gestation range, and that the rolling median method does not cause wild variations across the data range.
Mean and standard deviation log10 MoM for three analytes at different gestation (Gest) ages and for each method of calculation
AFP, alphafetoprotein; hCG, human chorionic gonadotrophin; uE3, unconjugated oestriol
Finally, the effect on screen-positive rate was evaluated for the moving time-window method using the 500 case window in the data-set of 23,500 cases shown in Figure 1. The prospective method gave a screen-positive rate of 7.49% (95% confidence interval [CI]: 7.16–7.83%) and the time-window method gave a screen-positive rate of 6.48% (95% CI: 6.17–6.80%).
Discussion and conclusion
When Down's screening was developed, the entire data-set was based on the 385 controls samples and 77 Down's cases. Since then, it has become the perceived wisdom that larger and larger data-sets are necessary to ensure the validity of median calculation parameters. This has never been tested scientifically.
DQASS have demonstrated that a 10% error in any one MoM in the ’worst’ direction is equivalent to a 2% increase in screen-positive rate. 3 Deviation in MoMs occurs because medians are incorrectly defined. The current paradigm for recalculation of median parameters is that medians are changed when a human operator is convinced that the deviation from perfection is unacceptable. DQASS has defined acceptable limits for performance and the majority of laboratories fail to achieve the standard.
Human psychology means that in a process where human intervention is needed to change data parameters, it is necessary for the operator to be convinced that it is necessary to make a change before a change is made. Humans take time to see and accept that a pattern of change is genuine and will not correct itself. Automated recalculation of medians at a set time interval defined in risk calculation software would remove inertia from the decision pathway and tighten control of data interpretation. This begs the question: what is the appropriate time interval between automatic recalculations? Clearly, if a change occurs at set intervals, there is a possibility that something could happen immediately after new medians are set. One reason why median changes are currently intermittent is the operator time needed to calculate and apply new parameters. If the process is automated there is no reason why a rolling system should not be applied. Here, we demonstrate that automatic recalculation of medians using a rolling time-window method significantly reduces the variability of screening performance and reduces screen-positive rate, without causing changes to the population parameters needed to calculate risks.
The reduction in screen-positive rate by 1% from 7.49% to 6.49% resulting by changing from a periodic to a rolling time-window method for median recalculation represents a 13.5% reduction in the total number of women who would be offered amniocentesis. This is a significant reduction in the number of amniocenteses that would need to be performed and the consummate risk of fetal death.
We therefore suggest that the requirement for large data-sets for derivation of medians is a fallacy that increases the likelihood of error rather than decreases it. We further recommend that all Down's syndrome screening software manufacturers should consider the redesign of their MoM calculation modules to include the possibility of rolling median calculation.
