Abstract
The time period over which relevant symptoms shifts unfold is not uniform across individuals. This article proposes an adaptation of the Reliable Change Index (RCI) to detect symptom changes of varying durations in individual patients’ time series: the Duration-Adjusted RCI (DARCI). The DARCI proportionally raises the RCI cut-off to account for its extension over additional time increments, resulting in different DARCI thresholds for different change durations. The method is illustrated with a simulation study of depressive symptom time series with varying degrees of discontinuity and overall mean change, and four empirical case examples from two clinical samples. The results suggest that the DARCI may be particularly useful for identifying symptom shifts that appear relatively abrupt, which can help indicate when a patient is showing significant improvement or deterioration. Its ease of use makes it suitable for application in clinical contexts and a promising method for exploring transitions in psychiatric populations.
Keywords
Background
Identifying whether and when a patient’s psychological symptoms have changed in a clinically relevant way is an integral part of many treatment settings, and studies of therapeutic interventions. Typically, methods to determine clinical change identify when a patient reaches a score over or under a certain threshold (e.g., within the range of a nonclinical population norm score; Jacobson et al., 1999) or shows a change in scores that meets a cut-off (e.g., 50% reduction; Ilardi & Craighead, 1994) or a combination of criteria (e.g., minimal score reduction and statistical significance; Jabrayilov et al., 2016). The Reliable Change Index (RCI) is a widespread method that determines whether the variability in a person’s measurements on a symptom questionnaire is more likely to be due to the instrument’s precision (measurement error) or due to an actual clinical change (Jacobson et al., 1999; Jacobson & Truax, 1991; Maassen, 2000; Maassen et al., 2009). Determining whether a symptom change indicated by a questionnaire is reliable is important to establish whether a drop or increase in scores is not merely due to chance and can aid decisions on whether to start, continue, end, or alter the intensity of treatment (Bauer et al., 2004).
As routinely measuring (former) patients’ psychological complaints and collecting time series of repeated assessments within individuals have become common practice in the context of therapy and relapse prevention (Boswell et al., 2015; Fortney et al., 2017; Lambert et al., 2018; Lewis et al., 2019; Schiepek et al., 2016), studies mapping repeated symptom assessments have shown that psychopathological change is often characterized by nonlinearity and abrupt changes (Gelo & Salvatore, 2016; A. M. Hayes et al., 2007; Helmich, Wichers, et al., 2020; Schiepek, 2009). Clinically, this is relevant as particularly sudden shifts may indicate that the patient may have experienced a transition to a better or worsened state, which could be predictive of their treatment outcomes (Aderka et al., 2012; Aderka & Shalom, 2021; Helmich, Wichers, et al., 2020; Shalom & Aderka, 2020; Tang & DeRubeis, 1999; Tang et al., 2002). Sudden gains (symptom improvements), for instance, may occur when a therapy session or intervention has been especially effective (Abel et al., 2016; Lutz et al., 2009; Shalom & Aderka, 2020; Tadić et al., 2010), while sudden losses (deteriorations) indicate when a patient is less likely to benefit from treatment, and should be identified as soon as possible to prevent the treatment from failing (Lutz et al., 2013; Thompson et al., 1995). A pattern of steady early improvement over the first few treatment sessions has also been linked to better treatment outcomes (Finch et al., 2001; Haas et al., 2002; Lambert, 2005; Lutz et al., 2009; Tadić et al., 2010), and conversely, early changes that did not conform to the expected response patterns have been linked to poorer outcomes (Lambert et al., 2002). Thus, identifying discontinuous symptom changes as they unfold is relevant given their association to treatment outcome.
Within-person change patterns of varying duration and magnitude have been described in the psychotherapy context (Rubel et al., 2015; Schiepek et al., 2017; Vittengl et al., 2016), but exploring a range of relevant time periods with existing methods is challenging. Most reliable change methods do not incorporate the time frame over which a change occurred, nor are they optimized for application to repeated symptom assessments. Many approaches are primarily used to test pre- to post-treatment change, which can take weeks or months, whereas others focus specifically on symptom shifts as they occur between or even within therapy sessions (Keller, 2003; Lutz et al., 2013; Tang & DeRubeis, 1999). For instance, the aforementioned sudden gains and sudden losses in symptoms are identified as changes between therapy sessions that combine a predefined minimum magnitude of change with the requirement that it takes place within a short period of time, most often a week (Aderka et al., 2012; Shalom & Aderka, 2020; Tang & DeRubeis, 1999; Tang et al., 2002). Yet, symptom reductions of 50% or more over 3 to 4 weeks of treatment have also been described as “rapid” in studies of early response and are certainly considered clinically relevant (Haas et al., 2002; Ilardi & Craighead, 1994). Symptom shifts may thus take different amounts of time and still be considered relatively abrupt, and this variability should be taken into account when studying periods of notable improvement, as the time it takes to change can be clinically meaningful in itself (Paul et al., 2019; Stulz et al., 2007).
Examining changes with a variable time frame and a standard single cut-off is not straightforward. Imagine two individuals who show the same reliable reduction in scores, for example, –15 points. For Person A, the cut-off is met relatively suddenly, from 1 week to the next, while Person B shows slower, gradual improvement and meets the same reduction only after 3 weeks. These individuals both show a reliable improvement according to this criterion, but due to the different timings, the process appears as a sudden change for one (Person A) and a much slower gradual improvement for the other (Person B). Here, using a single criterion is effective at detecting a minimally relevant reduction for both people, but it disregards the qualitative difference in “velocity” of the changes. Alternatively, one could set the threshold to apply to a fixed time frame such as 1 week and examine change over longer periods by requiring that the cut-off is met again each additional increment—in this case, change over 2 weeks would require a 30-point decrease. However, this approach lacks sensitivity to the fact that consecutive observations are being considered. To illustrate this point further, imagine the stepwise changes shown by a third person, Person C, who shows a pronounced decline in scores over multiple weeks (e.g., steps of −12, –9, –11 points). The changes between two adjacent points (over 1 week) never meet the minimal change criterion of −15 points, yet the overall change shown by person C is substantial: –32 over the 3 weeks, more than double the 15-point threshold. Intuitively, one could also consider this pattern a clinically relevant and rapid improvement (Haas et al., 2002; Ilardi & Craighead, 1994) and one worth detecting. However, as illustrated, using a single cut-off that is unadjusted for the duration of a symptom change allows one to determine reliable change over a set interval (Person A) but will miss smaller within-week changes that culminate into a relevant change over a longer period (Person C). Ideally, a change criterion would be able to account for clinically meaningful changes of different durations by requiring that changes over longer periods must also be larger as a whole, although not necessarily so large as a basic multiplication of the cut-off. Then, Person C’s consistent improvement could be identified as a reliable change that is comparable in its clinical relevance to the rapid 1-week shift shown by Person A. In short, standard available single cut-offs cannot optimally identify clinically relevant changes that occur over consecutive time points.
To summarize, various methods exist to determine whether a relevant change in symptoms has occurred at the within-person level, but these typically make no particular assumption about the time it took for the symptoms to change. Furthermore, even those that examine sudden gains and losses in repeated assessment data do not provide solutions to identify abrupt changes that extend over multiple time points or therapy sessions. More data-intensive methods may focus on testing the significance of an overall symptom change with a regression model (Ferrer & Pardo, 2014; Maassen et al., 2009; Maric et al., 2015; Slofstra et al., 2018) or try to identify abrupt shifts with a change-point model (Albers & Bringmann, 2020; Cabrieto et al., 2017), but these methods may be difficult to implement in real time in the course of clinical practice, as they require more data than may be available in early stages of treatment, and also a fair level of statistical knowledge to be conducted (de Vries & Morey, 2013). Thus, adjusting a simple method like the RCI may help to identify both reliable symptom changes that are very abrupt and symptom transitions that accumulate into considerable change over a slightly longer time frame.
In this article, I propose a method, based on the well-established RCI (Jacobson & Truax, 1991), which allows researchers and clinicians to explore the presence of symptom changes of varying durations in individual patients’ time series: the Duration-Adjusted Reliable Change Index (DARCI). A simulation study is conducted to test the DARCI’s ability to pick up periods of relevant change, and discontinuous change in particular, in the context of a larger overall symptom time series. Empirical case examples from two clinical data sets, with visualizations of the DARCI’s detection of transitions at different confidence levels (CLs), are also presented.
Method
Materials
This study uses the Symptom Checklist–90 (SCL-90) depression subscale and Dutch norm scores (Arrindell & Ettema, 2003; Derogatis, 1977) as a basis for illustration of the DARCI in the simulation study and in the first set of empirical case examples. This questionnaire consists of 16 items that ask to what extent one was bothered in the past week by particular depressive symptoms (e.g., “feeling blue”) on a 5-point scale ranging from
Reliable Change Index
The RCI was developed as a method to ensure that any identified pre- to posttreatment change was a reliable change that could be distinguished from measurement error (Jacobson et al., 1999; Jacobson & Truax, 1991). The RCI can be used to calculate a threshold at which the difference between a pre- and postmeasurement for one person is, with a 95% two-tailed CL, “unlikely to occur without actual change” (Jacobson & Truax, 1991, p. 14). It uses the standard error of measurement (
For this study,
Duration-Adjusted RCI (DARCI)
The DARCI is proposed as an adaptation of the RCI, with the aim to capture symptom changes of varying durations, particularly changes that appear as sudden or large in overall scope. The DARCI requires setting a fixed time period between two points (e.g., 1 week) as a basis for extension when more points are added (i.e., when testing change over longer durations) and allows one to calculate thresholds for each additional increment of time (e.g., each added week). Like the regular RCI, the DARCI tests the difference score between two points, but it accounts for instances where the two compared points are farther apart by proportionally increasing the change threshold.
To detect reliable change from a given starting point (
Essentially, the RCI threshold is reduced to a range of uncertainty around a single point, and then proportionally extended for the number of observations (i.e., period of time) at hand while maintaining the chosen CL (e.g., 95%). In doing so, the DARCI provides a way to detect symptom changes over various durations with the same degree of reliability, even if some shifts take longer. Applied to our illustrative sample and instrument, with the
Alternatively, we can calculate the
The DARCI thresholds for different increments are presented in Table 1. Using this method, higher and lower CLs may also be calculated and explored, and all DARCI critical threshold scores are rounded up to maintain the cut-off ≥
The DARCI Over Different Durations and Confidence Levels for the SCL-90 Depression Subscale.
The indicated time periods are illustrative, to show that the threshold is extended proportionally over time increments of equal size. This could also be (a number of) days or another set number of weeks.
The DARCI does not prescribe over which time period the change must take place. Instead, the time interval between two observations serves as a basis for the other increments over which it is extended. Researchers or clinicians must choose the duration of time over which detecting a change would be of interest (e.g., based on the clinical literature, pilot studies, or other conceptual grounding). For instance, if the change between
Analysis
Simulation Study
To test the accuracy of the DARCI thresholds for change over two (Tn2), three (Tn3), and four (Tn4) time points, the frequency at which modeled shifts were correctly identified in simulated repeated symptom assessments was examined. A set of 10,000 time series with a length of 15 points was simulated, to reflect a typical duration of psychological treatment (Gloaguen et al., 1998; Hansen et al., 2002). Each time series was drawn from a randomized normal distribution with a mean of 0 and variance concurrent with the SCL-90 depression subscale
These change patterns were chosen to explore the interplay between the strength of the overall slope and different degrees of discontinuity. For instance, at an overall reduction of 10 points and a linear function, any identified change would be due to a random fluctuation, as the (DA)RCI thresholds start at 13 points for
In short, DARCI thresholds at 95% CL were calculated for change over two (Tn2), three (Tn3) and four observations (Tn4) and applied to simulated symptom time series to identify periods over which the criteria for a reliable change were met. Particularly, we examined the sensitivity of the DARCI thresholds for finding the modeled discontinuities in the middle of the data set, and the extent to which they were specific to indicating the intended shift, rather than random fluctuations.
All analyses were conducted in R (version 4.3.0), and code for the simulations and plotting of the symptom data (Figures 1–3) is available online at https://osf.io/24cfa/.

Case Demonstration of a Simulated Time Series, With Increasing Levels of Overall Decline, and Increasingly Abrupt Shifts Around the Middle of the Time Series.

Heatmap of DARCI-Detected Starting Points (tstart) of Reliable Changes in the Simulated Time Series.

Application of the DARCI Across Confidence Levels (90%, 95%, 99%) in Four Empirical Case Examples.
Empirical Case Examples
The application of the DARCI in clinical data is illustrated by visualizing the identified reliable
Two cases were drawn from the TRANS-ID Recovery data set (Helmich, Snippe, et al., 2020), an intensive longitudinal study of individuals receiving psychological treatment for depression, who completed an average of 23 weekly SCL-90 depression subscale (Arrindell & Ettema, 1981; Derogatis, 1977) measurements during the 6-month assessment period. The DARCI has previously been applied to these data in a study that focused on detecting changing dynamics in ecological momentary assessments prior to the identified symptom transitions (Helmich et al., 2023). Note that in the weekly measurements of this data set, two items on suicidal ideation were omitted from the 16-item scale, which means that the calculated DARCI thresholds are likely to yield more conservative results in this sample.
The second set of cases was drawn from anonymized patient data from the Modum Bad psychiatric hospital anxiety department (see also Johnson et al., 2017). Weekly BAI assessments are collected as a standard part of care during psychological (inpatient) treatment, about 10 assessments per person during therapy. The example cases were taken from a random subset of 50 cases that belong to a larger pool of data collected between 2016 and 2023. The RCI calculation for the BAI is based on the original research by Beck et al. (1988), as there are currently no representative psychiatric norm scores available for the Norwegian population (Lisøy & Martinsen, 2023) and the same RCI has been used in previous studies on these data (Johnson et al., 2017). Based on a reliability (Cronbach’s α) of .92, and a consequent
2
Results
Simulation Case Illustration
A visual illustration of the ability of the DARCI to identify the intended periods of relevant change is provided in Figure 1. The indicated changes for the simulated time series in Figure 1 demonstrate clearly how the DARCI was able to pick up on the periods of increased discontinuity, and thus, relevant change—especially when overall change was at least 30 points. At a lower overall reduction, of 20 points, only the more discontinuous changes were detected in this case example.
Another noteworthy point is that thresholds for reliable changes of different duration may be met simultaneously. This can be seen, for instance, in the bottom row of plots, Step (
Results of the Simulation
In Figure 2, the results of the 10,000 simulations are visually represented as a heatmap, with higher values (darker purple) representing a higher frequency of DARCI-indicated change start points for different durations (Tn2, Tn3, and Tn4).
Linear
Looking at the first row of subplots, for Linear change, we see a low degree of false positives across the different thresholds. There is a slightly higher degree of false positives for the DARCI threshold for changes between two points (Tn2), indicating that the criterion picked up approximately 5% to 7% random fluctuations rather than true changes—this also applied to the DARCI for change over two points in the other overall change models. The DARCI criteria for changes over three or four points showed negligible rates of false positives across the overall change levels, 0.5% to 2% at Tn3% and 0% to 0.6% at Tn4.
Sigmoid
Looking at the Sigmoid change curves, where the exact starting point of change is less apparent, the DARCI method started to noticeably pick up changes from an overall change of −30: about 21% for all three different duration thresholds. When the overall amount of change was −40, the DARCI thresholds identified the correct period of increased discontinuity in about 35% to 58% of the simulations.
Step (n = 4)
For the third row of subplots in Figure 2, changes started to be noticeably detected in the −20 overall change model. Given that the change was modeled to take place over four time points, the Tn4 threshold yields the highest specificity: 24% of changes were determined to start at Time Point 6, as modeled. At higher levels of overall change, this increased to 82% and 100%. This while in the −20 overall change models, changes over two (Tn2) and three points (Tn3) were still similar in accuracy: About 19% and 21% of simulations identified a change in the correct period, respectively. For Tn2, this rose to 37% at −30, and 58% at −40 overall change, whereas Tn3 accurately captured 62%, and 91% at those levels of overall change.
Step (n = 3)
When change was modeled as a step-function over three time points, the DARCI over three change points picked this up accurately at 62% at −20 points overall change, and 97% and 100% at larger changes (−30, −40). To compare, the DARCI at Tn2 showed about 37% accuracy at identifying the location of the mean shift at −20, but the identified starting points were placed at Time Points 7 and 8, as the 1-week duration could not capture the entire change period that was modeled. Conversely, for the longer duration of Tn4, the changes were harder to identify because often the overall required change (Tn4 = 25) to meet that threshold was not met (only in about 24% of cases at −20 decline). The detection rates for these durations improved when the overall change increased to −30: with about 68% of simulations indicating changes over two time points, and 82% finding shifts over four points as well. At −40, the Tn2 threshold picked up the modeled shift period 90% of the time, and 99% with the Tn4 criterion. Worth noting is that the transition thresholds of Tn3 and Tn4 are occasionally also met for x-values that lie before or after the actual modeled shift. For instance, Tn3 is met 61% of the time at both x = 6, and x = 8 when the overall change is −40, because part of the modeled shift was already sufficiently large to meet the threshold. This “blurring” effect is visible in all the modeled shifts that occur over more than two time points (from Sigmoid to Step
Step (n = 2)
In the bottom row of subplots, where a mean shift was modeled as occurring between two time points, the RCI was still able to pick up the correct location of the shift for about 36% of cases at Tn2 and for about 9% at Tn3 at the lowest level of overall score reduction (−10). For the −20 change model, the DARCI over two points picked up the change point accurately in 90% of simulations (compared with 62% at Tn3, and 24% at Tn4). At −30 overall change, only the Tn4 model did not always pick up the modeled shift (the other thresholds picked up ∼100%), yet in 82% of simulations it indicated a shift started somewhere in the range of time point 5 to 7. Finally, at the highest level of overall change (−40), the change was correctly identified in 100% of cases for all DARCI thresholds.
Empirical Case Examples
Figure 3 shows the results of the four empirical case examples from the depression (A and B; TRANS-ID Recovery) and anxiety samples (C and D; Modum Bad). Per case, three subplots are presented that demonstrate the DARCI-identified
Case A
In the first empirical case example, we see a depression symptom time series with an overall improvement (from moderately high to mild symptoms) and relatively many fluctuations from week to week. We can see that the 90% threshold picks up seven shifts in total, where smaller fluctuations also meet the cut-off. Note that some of the weekly fluctuations toward higher symptom levels likely also meet these criteria. The smaller changes are no longer picked up in the second subplot, as we see the first and last 1-week (purple) transitions disappear once the confidence level is raised to 95%, and the 2-week (green) and 4-week (red) period, which both include a very minor score change from Week 14 to 15, no longer meet their respective DARCI thresholds. The cascade of two large decreases from Week 11 through 14 is captured well at 95% CL, with two 1-week and a 3-week (orange) shift. However, at the strictest CL of 99%, only the largest 1-week shift remains reliable.
Case B
The depression time series of the second case example generally shows smaller incremental steps from each week to the next, no persistent change from the moderately severe symptom levels, although it is notably marked by two phases of discontinuity in the trajectory. There is an interesting period of continued gradual improvement from Week 7 through 11, which is preceded by a deterioration (indicated by the *asterisk) which is reliable at the 99% CL. The improvement is marked by the DARCI thresholds at 90% CL: in its entirety by a 4-week transition, as well as two 3-week and one 2-week shift. This period is no longer identified as reliable at higher CLs, and the fact that it is preceded by such a strong deterioration may also alter the clinical evaluation of the improvement. Later in this time series, we see a clear demonstration of overlapping transitions being separable by
Case C
Moving to the anxiety symptom time series collected during treatment, Case C represents a patient who improves over the course of 11 weeks from severe to mild symptom levels, with the most marked improvement occurring in the first 4 to 5 weeks. The change across those first weeks, as maintained at 99% CL for 1-, 2-, and 3-week durations, appears to be driven by the strong symptom drop from Week 1 to 2, particularly considering that the overall change remains reliable despite the slight increase in scores from point 3 to 4. Examining the
Case D
This final empirical case shows an overall trajectory from severe to more moderate symptom levels during treatment. Again, we see that the most profound change is a 1-week shift, here located between Weeks 6 and 7. The time series shows various periods of improvement identified by the DARCI thresholds at 90%, but these also illustrate the importance of exploring the appropriate level of sensitivity, as most of these changes are not maintained at higher CLs, and yet many other patients never meet any of the thresholds even at the 90% CL.
Discussion
This article provided a first illustration of a newly proposed method to identify reliable changes in symptoms over multiple increments of time. Based on the simulation study and empirical case examples, it appeared that the DARCI was well-suited at picking up relevant (discontinuous) changes of varying length in the overall course of the symptom time series. Where transitions overlapped, the period that showed relatively (to the time it took) the largest change could be identified by looking at which of the overlapping transitions had the highest
The simulations showed that the DARCI thresholds generally were accurate and sensitive to changes, especially when the overall change was large (≥30 points, or about half the total scale). The DARCI thresholds over more than two observations (Tn3 and Tn4) were able to pick up the modeled points and periods of discontinuity with high accuracy, and the longer durations also showed fewer false-positive values than the DARCI for change over two points (Tn2, which is equivalent to the standard RCI, but set to a chosen time interval). The DARCI over three time points (Tn3) seemed to be particularly well-suited to identifying both the relatively abrupt changes modeled with the step-function over two and three observations, and the more gradual changes in the sigmoid function and the slowest step-function over four points. The DARCI over four observations (Tn4) also performed well, even though it showed slightly more dispersion of the identified start points. This “blurring” effect was not necessarily a sign of poor performance, as the modeled transitions did meet the Tn4 criteria, but the starting point of quicker changes was less precisely estimated due to the larger range of the DARCI at that duration. Similarly, the threshold for change over two points sometimes picked up reliable changes within the context of a larger overall shift, thus identifying a partial transition. Taken together, the simulations showed that the DARCI thresholds could accurately pick up known shifts and detect instances when a shift was less well-defined. Extrapolating these results to an applied context, we may be confident that the DARCI picks up larger changes very well, and that smaller changes could also be explored by lowering the CL.
Apart from the accuracy of the DARCI thresholds, the simulations also revealed that investigating reliable changes over different time periods with a purposely adapted index has the potential to uncover clinically relevant symptom changes that may be modest, step-by-step, but large overall. Employing only the standard RCI cut-off and testing difference scores without regard for time (any change that meets the criterion counts), or with a mere repetition of the same criterion for each increment (test the differences for point 1–2, point 2–3, etc., change is found only when those adjacent points meet the cut-off), would overlook these kinds of continued changes. This is further supported by the empirical case examples, which showed that the detected periods of reliable change often spanned multiple weeks, even if the largest change did tend to be a shift over a single week. A more extensive exploration of the prevalence of the various change durations is needed to understand the interrelation of these faster and slower transition processes more fully. Clinically, it is interesting to learn more about when and how multiple change thresholds are met by one individual, and broadening that, to explore which kinds of changes tend to occur within a given study population (e.g., response patterns in depression, Korf, 2014; Rubel et al., 2015; Vittengl et al., 2013, or mood shifts in bipolar disorder, Bos et al., 2022; Kramlinger & Post, 1996). Moreover, the DARCI method may provide a novel way to describe and explore within-person change patterns, which have shown to be of importance for the outcome of treatment (e.g., A. M. Hayes et al., 2007; Lutz et al., 2013; Schiepek et al., 2017; Stulz et al., 2007; Thompson et al., 1995; Vittengl et al., 2016).
The ability of the DARCI method to detect periods of relevant change relies on the chosen time period between observations, and setting an appropriate duration from which to extend the DARCI thresholds is quite the conceptual challenge. The time scale of clinical change processes is an ongoing and highly important topic of study in and of itself (S. C. Hayes et al., 2019; Kazdin, 2001; Mahoney, 2004; Strunk & Lichtwarck-Aschoff, 2019), so there is no readily available gold standard. Researchers must decide for their study what the relevant periods of change may be, which may require pilot studies and in-depth clinical knowledge of the population and change process under study to come to an educated best guess. In a clinical context, priority may be given to what is available, for example, session-to-session assessments. Yet, this uncertainty around the most appropriate time scale for the DARCI thresholds also offers grounds for exploratory studies. While more frequent measurements would likely reveal more nonlinearity over time (Helmich, Wichers, et al., 2020; Schiepek, 2009), there may nonetheless be practical limitations (e.g., patient burden) or theoretical considerations (e.g., conceptualization of symptoms; Bringmann et al., 2022) that lead one to prefer weekly intervals or simply observations between therapy sessions. The application of the DARCI to time periods ranging from days to months could be investigated to learn more about the optimal time scale to describe the process of symptom change that would be captured with this method.
Commonly, when more data are available, the additional power allows smaller changes to be detected (e.g., in a linear regression), so the fact that the change thresholds increase for longer periods may be unexpected from a purely statistical standpoint. While this is a fair observation, the aim is not to determine a statistically significant change (Bauer et al., 2004; Jacobson et al., 1999; Jacobson & Truax, 1991). Instead, the DARCI may provide a simple method to add to a clinician’s toolbox, which allows periods of reliable change to be uncovered within the course of a longer time series, even when little data are available. The DARCI uses between-persons information (the
This method is not without some limitations. First, the proposed method for extending the RCI is rather simplistic, which may mean that the thresholds for longer changes is not optimally tuned (or too strongly penalized) compared with change over two points. This might explain the predominance of highly reliable 1-week changes in the empirical case examples. However, given the alternative of using a single cut-off without any concession to time, this method still offers an improvement and a first step toward the in-depth exploration of within-person symptom changes. Second, although the DARCI is very flexible and can be adapted to any chosen symptom measurement instrument, the RCI relies on clinical norm scores for optimal performance (Jacobson & Truax, 1991). In our simulations and first set of empirical case examples, we used the SCL-90 norms as a reference as they are based on a large sample (
A strength of this method is that it has the potential for easy application in the clinical context. Once the thresholds have been calculated, clinicians can check whether incoming new symptom assessments (e.g., as gathered with routine outcome monitoring) meet the shorter or longer duration thresholds of the DARCI. This has applications during treatment, to monitor the psychotherapy process, as well as afterward, in relapse prevention monitoring. Moreover, the nature of transitions in mental health is a topic that warrants further study and using this method to explore the presence of sudden-gain-type shifts as well as slower accumulated reliable change could yield novel insights into within-person psychological change processes.
Future research should further validate this method with clinical data from various patient populations and instruments, to provide insight into the kind of symptom changes that can be detected and expected in real-world symptom assessments. In addition, comparing the DARCI with existing models of change within the course of treatment, such as sudden gains (Tang & DeRubeis, 1999) would be worthwhile, although the two methods may have slightly different objectives (i.e., identifying changes of various durations vs. change between therapy sessions).
To conclude, the DARCI provides a simple adaptation of a well-established method for identifying reliable change (Bauer et al., 2004; Ferrer & Pardo, 2014; Jabrayilov et al., 2016) and may encourage researchers to consider exploring (discontinuous) symptom shifts of varying durations in the context of psychological treatment.
Footnotes
Acknowledgements
The author would like to thank Arnout Smit, Merijn Mestdagh, and Francis Tuerlinckx for their critical input on the design of the simulation study in this manuscript, Leonardo Rydin Gorjão for his help with improving the visualization code, and Evelien Snippe, Laura Bringmann, and Tineke Oldehinkel for their encouragement and insightful text revisions.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (ERC-CoG-2015; No 681466 to M. Wichers).
